摘要
传统时间序列分类方法存在鼠标轨迹特征挖掘不充分、数据不平衡与标记样本量少等问题,造成识别效果较差。结合特征组分层和半监督学习,提出一种鼠标轨迹识别方法。通过不同视角构建有层次的鼠标轨迹特征组,并借鉴半监督学习的思想,利用多个随机森林模型对未标记样本进行伪标记,且将抽取标签预测一致且置信度较高的部分样本加入到训练集中。基于基础特征组和辅助特征组,在扩充后的训练集上训练随机森林模型,以实现鼠标轨迹的人机识别。实验结果表明,该方法可有效识别鼠标轨迹,且精确率、召回率与调和均值分别达到97.83%、94.72%和96.56%。
Traditional time series classification methods have problems such as insufficient mining of mouse trajectory features,unbalanced data,and few labeled samples,resulting in poor recognition results.Combining feature group hierarchy and semi-supervised learning,this paper proposes a mouse track recognition method.In this method,hierarchical mouse trajectory feature groups are constructed from different perspectives.Then based on the idea of semisupervised learning,multiple random forest models are used to pseudo-label unlabeled samples,and some samples with consistent label predictions and high confidence are added to the training set.Based on the basic feature set and auxiliary feature set,the random forest model is trained on the expanded training set to realize the human-machine recognition of the mouse trajectory.The experimental results show that this method can effectively identify the mouse track,and its precision,recall rate and harmonic mean values reach 97.83%,94.72%and 96.56%,respectively.
作者
康璐璐
范兴容
王茜竹
杨晓雅
明蕊
KANG Lulu;FAN Xingrong;WANG Qianzhu;YANG Xiaoya;MING Rui(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;School of Computer Science and Information Engineering,Chongqing Technology and Business University,Chongqing 400067,China;Electronic Information and Networking Research Institute,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第4期277-284,共8页
Computer Engineering
基金
重庆市自然科学基金(cstc2018jcyjAX0587)
重庆市科技重大主题专项重点示范项目(cstc2018jszx-cyztzxX0035)
中国移动科研基金项目(MCM20170203)。
关键词
鼠标轨迹识别
特征组分层
半监督学习
随机森林模型
不平衡数据
mouse trajectory recognition
feature group hierarchy
semi-supervised learning
random forest model
unbalanced data