期刊文献+

基于自主学习的专业领域文本DBLC分词模型 被引量:2

DBLC Model for Word Segmentation Based on Autonomous Learning
原文传递
导出
摘要 【目的】提高对专业术语、名词占比较高的专业领域文本的分词准确度。【方法】提出将词典、统计、深度学习三者有机结合的DBLC模型,并编程实现。获取中国管理案例库中的部分案例作为专业领域语料,将其他几种已有分词模型作为对比对象进行实验与分析。【结果】通过实验得到各模型在实验语料上的分词效果,DBLC模型在各评价指标上均优于其他模型,分词准确率达到96.3%。【局限】未对原词典词与新词做区别处理,没有考虑词典的存储结构问题,模型计算时间复杂度较高。【结论】本文提出的DBLC模型提高了专业领域文本的分词准确度,且该模型分词准确率与词典规模正相关。 [Objective] This paper tries to improve the accuracy of word segmentation for literature with lots of scientific terms. [Methods] First, we programed the DBLC model, which combined the methods of dictionary, statistics and deep learning. Then, we retrieved articles from the Chinese Management Case Center to build the experimental corpus. Finally, we compared the performance of this new model with the existing ones. [Results] The performance of the DBLC model was better than others. Its word segmentation accuracy was up to 96.3%. [Limitations] We did not separate the words of the original dictionary from the new words. We did not re-design the storage structure of the dictionary, which prolonged the computing time of our model. [Conclusions] The proposed DBLC model improves the accuracy of word segmentation, which is also positively co-related to the dictionary size.
作者 冯国明 张晓冬 刘素辉 Feng Guoming;Zhang Xiaodong;Liu Suhui(School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2018年第5期40-47,共8页 Data Analysis and Knowledge Discovery
关键词 中文分词 序列标注 BI-LSTM-CRF 自主学习 基于词典的分词 Chinese Word Segmentation Sequence Labeling BI-LSTM-CRF Autonomous Learning Word Segmentation Based on Dictionary
  • 相关文献

参考文献6

二级参考文献40

  • 1黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 2罗智勇,宋柔.基于多特征的自适应新词识别[J].北京工业大学学报,2007,33(7):718-725. 被引量:14
  • 3H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999 被引量:1
  • 4Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002 被引量:1
  • 5S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002 被引量:1
  • 6J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002 被引量:1
  • 7Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286 被引量:1
  • 8Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62 被引量:1
  • 9Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143 被引量:1
  • 10J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998 被引量:1

共引文献250

同被引文献23

引证文献2

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部