期刊文献+

融合字词特征的互联网敏感言论识别研究 被引量:2

Research on Internet Sensitive Speeches Recognition Combining Features of Characters and Words
下载PDF
导出
摘要 互联网敏感言论与普通言论之间存在显著差异,为规避过滤规则,其语义较为隐晦,一词多义现象频出,不规范程度较高。为高效识别互联网中的敏感言论并对其进行准确分类,针对敏感言论的特点与现有模型的缺点,对文本卷积神经网络进行了改进,结合ALBERT(a Lite BERT)动态字级编码模型、文本卷积神经网络、多头自注意力机制与门控机制的优势,提出了一种融合字词特征的双通道分类模型ALBERT-CCMHSAG。该模型将文本的字级与词级语义信息、局部关键特征与上下文语义进行了充分提取与融合,以此提升敏感言论的分类效果。ALBERTCCMHSAG模型在敏感言论数据集上、噪声敏感言论数据集、小样本敏感言论数据集上的表现均为最优,证明了该模型对敏感言论识别与分类能力更强,能应对噪声数据与适应训练数据不足的情况,鲁棒性更强。在酒店评论数据集上,该模型的性能同样优于对比模型,证明了模型在其他语料上也很可能具有优异表现。 Sensitive speeches on the Internet are quite different from ordinary speeches.In order to avoid filtering rules,they have a high degree of irregularity,more obscure semantics,and frequent multiple meanings of words.In order to efficiently identify sensitive speeches on the Internet and classify them accurately,according to the characteristics of sensitive speeches and the shortcomings of existing models,the text convolutional neural network is improved.Combining the advantages of ALBERT(a Lite BERT)dynamic character-level encoding model,text convolutional neural network,multi-head self-attention mechanism and gating mechanism,a dual-channel classification model ALBERT-CCMHSAG that combines features of characters and words is proposed.The model fully extracts and integrates the characterlevelandword-levelsemantic information,local key features and contextual semantics of the text to improve the classification effect of sensitive speeches.The ALBERT-CCMHSAG model performs optimally on the sensitive speeches dataset,the noisy sensitive speeches dataset,and the small-sample sensitive speeches dataset,proving that the model is more capable of recognizing and classifying sensitive speech,coping with noisy data and adapting to the situation of insufficient training data,and being more robust.The model also outperforms the comparison models on the hotel reviews dataset,demonstrating that the model is likely to perform well in other corpora.
作者 闫尚义 王靖亚 朱少武 崔雨萌 陶知众 YAN Shangyi;WANG Jingya;ZHU Shaowu;CUI Yumeng;TAO Zhizhong(School of Information Network Security,People’s Public Security University of China,Beijing 100045,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第13期129-138,共10页 Computer Engineering and Applications
基金 国家社会科学基金(20AZD114) CCF-绿盟科技“鲲鹏”科研基金(CCF-NSFOCUS 2020011) 中国人民公安大学公共安全行为科学实验室开放课题基金(2020sys08)。
关键词 敏感言论识别 字特征 词特征 多头自注意力机制 门控机制 sensitive speeches recognition characters features words features multi-head self-attention mechanism gating mechanism
  • 相关文献

参考文献8

二级参考文献20

共引文献142

同被引文献7

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部