摘要
针对政务及金融等领域对于内部文件保密要求高,移动介质上存储的文件数据通过传统脱敏方法面临着数据内容量大、数据类型多样导致的脱敏效率低、脱敏内容不彻底等问题,提出了一种基于SM4与FF1结合的混合数据类型文件脱敏系统,该系统通过内容分割脱敏处理任意类型的数据,提升了文件脱敏的范围、准确性和效率;为了进一步减少脱敏系统代码运行的内存消耗,提出了汉字字典库索引转换算法,该算法通过构建待检测明文与汉字编码库的相对索引关系,优化传统脱敏系统中依赖于构建哈希表的键值映射;通过随机生成1000份测试文件进行脱敏测试,基于混合类型的文本不可识别率达到99.8%,脱敏以及内容复原的准确率达到99.9%;通过随机生成10份总大小约为10 MB的测试文件,纯文本类型的脱敏速率平均可达2500字符/秒。
Due to the high confidentiality requirements for internal documents in the fields of government and finance,traditional desensitization methods for data files stored on mobile media face problems such as low efficiency and incomplete desensitization content caused by large data volumes and diverse data types,a hybrid data type file desensitization system based on the combination of the SM4 and FF1 is proposed.This system processes any type of data through content segmentation desensitization,and improves the range,accuracy,and efficiency of file desensitization.In order to further reduce the memory consumption during the execution of desensitized system code,a method for converting Chinese character library indexes is presented.This algorithm constructs the relative index relationship between the plaint text to be detected and the Chinese character encoding library,and optimizes the key value mapping that relies on building a hash table in traditional desensitization systems.By randomly generating 1000 test files for desensitization test,the text unrecognition rate based on mixed types reaches 99.8%,and the accuracy of desensitization and content recovery reaches 99.9%.The average desensitization rate of pure text type can reach 2500 characters/second for 10 test files with a total size of about 10 MB.
作者
黄俊
刘家甫
曹志威
HUANG Jun;LIU Jiafu;CAO Zhiwei(Data Security Technology Research and Development Center,The Third Institute of The Ministry of Public Security,Shanghai 201204,China;Faculty of Medical Imaging,Naval Medical University,Shanghai 200433,China)
出处
《计算机测量与控制》
2024年第11期315-321,共7页
Computer Measurement &Control
基金
科技部重点研发计划资助(2021YFB3102002)。
关键词
国密SM4
保形加密
数据脱敏
国密算法
文件脱敏系统
national secret SM4
format preserving encryption
data desensitization
national secret algorithm
file desensitization system