期刊文献+

基于局部对抗训练的命名实体识别方法研究 被引量:4

Name entity recognition based on local adversarial training
下载PDF
导出
摘要 命名实体识别研究中,数据集内普遍存在实体与非实体,实体内部类别间边界样本混淆的问题,极大地影响了命名实体识别方法的性能.提出以BiLSTM-CRF为基线模型,结合困难样本筛选与目标攻击对抗训练的命名实体识别方法.该方法筛选出包含大量边界样本的困难样本,利用边界样本易被扰动偏离正确类别的特性,采用按照混淆矩阵错误概率分布的目标攻击方法,生成对抗样本用于对抗训练,增强模型对混淆边界样本的识别能力.为验证该方法的优越性,设计非目标攻击方式的全局、局部对抗训练方法与目标攻击全局对抗训练方法作为对比实验.实验结果表明,该方法提高了对抗样本质量,保留了对抗训练的优势,在JNLPBA、MalwareTextDB、Drugbank三个数据集上F1值分别提升1.34%、6.03%、3.65%. Boundary samples of different categories staggered on the boundary in the datasets of named entity recognition research,which affects the performance of named entity recognition model.A method based on local adversarial training and BiLSTM-CRF model is proposed to solve the problem above.The method selects hard examples which contain a lot of boundary samples to crafting adversarial samples.The process is based on the characteristics of boundary samples that are easily perturbed to leave from the correct category,and then get adversarial samples from the target attack step according to the confusion matrix error probability distribution.Finally,the datasets mixing with the original data and the adversarial is used to adversarial training to enhance the model’s recognition ability.In order to verify the superiority of this method,global/local adversarial training based on non-target attack method and local adversarial training based on target attack are designed as comparative experiments.Experimental results show that the method proposed improves the quality of adversarial samples while retaining the advantages of adversarial training.The F1 scores on the three datasets of JNLPBA,MalwareTextDB,and Drugbank are increased by 1.34%,6.03%,and 3.65%respectively.
作者 李静 程芃森 许丽丹 刘嘉勇 LI Jing;CHENG Peng-Sen;XU Li-Dan;LIU Jia-Yong(College of Cybersecurity,Sichuan University,Chengdu 610065,China)
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2021年第2期107-114,共8页 Journal of Sichuan University(Natural Science Edition)
基金 四川省重点研发项目(2020YFG0076) 四川大学基金(2020SCUNG205) 国家自然科学基金(U2066203,61473197)。
关键词 命名实体识别 对抗训练 困难样本 目标攻击 Named entity recognition Adversarial training Hard samples Target attack
  • 相关文献

参考文献4

二级参考文献28

  • 1刘刚,张洪刚,郭军.不同训练样本对识别系统的影响[J].计算机学报,2005,28(11):1923-1928. 被引量:15
  • 2Wilson D R, Martinez T R. Instance pruning techniques [C]// Proceedings of the 14th International Conference. San Francisco: Morgan Kaufmann Publishers Inc, 1997:404-411. 被引量:1
  • 3Astrahan M M. Speech analysis by clustering, or the hyper-phoneme method [R]. Calif: Stanford Univ, 1970. 被引量:1
  • 4Mitra P, Pal S K. Density-based multiscale data condensation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(6): 734-747. 被引量:1
  • 5Ng W W Y, Yeung D S, Cloete I. Input sample selection for rbf neural network classification problems using sensitivity measure [C]// IEEE International Conference on Systems Man and Cybernetics. Washington: [s. n.], 2003: 2593-2598. 被引量:1
  • 6Tambouratzis T. Counter-clustering for training pattern selection [J]. The Computer Journal, 2000, 43 (3) :177-190. 被引量:1
  • 7Lyhyaoui A, Ynez M M, Mora I. Sample selection via clustering to construct support vector-like classifiers [J]. IEEE Transactions on Neural Networks, 1999, 10 (6) :1474-1480. 被引量:1
  • 8Brighton H, Mellish C. Advances in instance selection for instance-based learning algorithms [J]. Data Mining and Knowledge Discovery, 2002, 6(2): 153-172. 被引量:1
  • 9Luo Dingsheng, Chen Ke. Refine decision boundaries of a statistical ensemble by active learning [C] // International Joint Conference on Neural Networks. Portland: [s.n.], 2003: 1523-1528. 被引量:1
  • 10Guan Donghai,Yuan Weiwei,Lee Youngkoo,et al.Improving supervised learning performance by using fuzzy clustering method to select training data[J].Journal of Intelligent and Fuzzy Systems,2008,19(4):321-334. 被引量:1

共引文献46

同被引文献56

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部