摘要
在海量数据中快速、准确地对数据进行分类分级,快速识别用户异常行为是目前数据安全领域的重要研究内容。在数据分类分级研究领域,自然语言处理技术提升了分类分级的准确率,但是中文语体混杂、无监督学习准确率低、有监督学习样本标注工作量大等问题亟待取得关键突破。本文提出多元中文语言模型和基于无监督算法构建样本,突破数据分类分级领域面临的关键问题。在用户异常行为分析研究领域,由于样本依赖度过高,导致异常行为识别准确率较低,本文提出利用离群点检测方法构建异常行为样本库,解决样本依赖过高问题。为验证方法可行性,进一步构建实验系统开展实验分析,通过实验验证所提出方法可以显著提高数据分类分级和异常行为分析的准确率。
It is an important research content in the field of data security to classify data quickly and accurately in mass data,and to quickly identify user abnormal behavior.In the field of data classification research,natural language pro-cessing technology improves the accuracy of classification,but the problems of mixed Chinese language,low accuracy of unsupervised learning,and large workload of supervised learning sample labeling need to be Chinese made urgently.In the field of user anomaly analysis,due to high sample dependence,which leads to low accuracy of abnormal behavi-or recognition,this paper proposes to use outlier detection to build an abnormal behavior sample library to solve the problem of excessive sample dependence.In order to verify feasibility of the method,the experimental system is further constructed to carry out experimental analysis,and the proposed method can significantly improve the accuracy of data classification and abnormal behavior analysis.
作者
喻波
王志海
孙亚东
谢福进
安鹏
YU Bo;WANG Zhihai;SUN Yadong;XIE Fujin;AN Peng(Beijing Wondersoft Technology Co.,Ltd,Beijing 100876,China)
出处
《智能系统学报》
CSCD
北大核心
2021年第5期931-939,共9页
CAAI Transactions on Intelligent Systems
基金
国家火炬计划项目(2011GH010018)
国家电子发展基金项目(工信部财【2014】425号).