期刊文献+

基于神经网络纠正器的领域分词方法 被引量:1

Domain Adaptation for Chinese Word Segmentation Based on Neural Network
下载PDF
导出
摘要 提出了一种基于神经网络的中文分词方法,以提高分词系统向新领域迁移的适应性和灵活性。该文方法采用了对现有分词器分词结果进行纠正的思路。这种基于纠正的两阶段方法与分词模型解耦,避免了对源领域语料和分词器构建方式的依赖。然而现有的基于纠正的方法依赖于特征工程,无法自动适应不同领域。该文利用神经网络对纠正器进行建模,在无需手工设计特征的情况下即可实现领域适应。实验表明,与当前方法相比,该文方法在领域文本上具有更好的分词性能和鲁棒性,尤其在未登录词召回率方面提升显著。 This paper proposes a neural network based method for Chinese Word Segmentation to enhance its adaptability and flexibility when transformed to a new domain.Our method is based on the idea of revising the results of an existing segmenter.This two-phase correction model does not depend on either the source domain data or the way of building a segmenter.However,the existing method based on the correction relies on the feature engineering,which is hard to be automatically adapted for different domains.We propose a neural network based corrector to conduct the domain adaptation,which does not require any hand-crafted features.Experimental results show that,the proposed method achieves better performance and higher robustness on domain text segmentation compared with the state-of-the-art approach,especially on the recall of OOV(out-of-vocabulary).
出处 《中文信息学报》 CSCD 北大核心 2017年第6期41-49,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(61472436 61532001)
关键词 中文分词 领域适应 神经网络 Chinese word segmentation domain adaptation neural network
  • 相关文献

参考文献1

二级参考文献28

  • 1黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 2Xue Nianwen. Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29-48. 被引量:1
  • 3Feng Haodi, et al. Aecessor variety criteria for Chinese word extraction. Association for Computational Linguistics, 2004, 30(1) : 75-93. 被引量:1
  • 4Feng Haodi, et al. Unsupervised segmentation of Chinese corpus using aceessor variety//Proceedings of the 1st Inter- national Joint Conference on Natural Language Processing. Hainan Island, China, 2004:255-261. 被引量:1
  • 5Huang Degen, Tong Deqin, Luo Yanyan. HMM revises low marginal probability by CRF for Chinese word segmentation //Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing 2010. Beijing, China, 2010:216-220. 被引量:1
  • 6Chang Baobao, Han Dongxu. Enhancing domain portability of Chinese segmentation model using chi-square statistics and bootstrapping//Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing. Massa- chusetts, USA, 2010: 789-798. 被引量:1
  • 7Shen Jianping, et al. Chinese word segmentation based on mixing multiple preprocessor and CRF//Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing 2010. Beijing, China, 2010:270-273. 被引量:1
  • 8Xu Xiaoming, et al. High OOV-recall Chinese word segmenter //Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing 2010. Beijing, China, 2010:252-255. 被引量:1
  • 9Jiang Huixing, Dong Zhe. An double hidden HMM and an CRF for segmentation tasks with Pinyin's finals//Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing 2010. Beijing, China, 2010:277-281. 被引量:1
  • 10Wang Kun, et al. A characte:based joint model for CIPS SIGHAN word segmentation Bakeoff 2010//Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing 2010. Beijing, China, 2010:245-248. 被引量:1

共引文献58

同被引文献6

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部