摘要
提出了一种基于神经网络的中文分词方法,以提高分词系统向新领域迁移的适应性和灵活性。该文方法采用了对现有分词器分词结果进行纠正的思路。这种基于纠正的两阶段方法与分词模型解耦,避免了对源领域语料和分词器构建方式的依赖。然而现有的基于纠正的方法依赖于特征工程,无法自动适应不同领域。该文利用神经网络对纠正器进行建模,在无需手工设计特征的情况下即可实现领域适应。实验表明,与当前方法相比,该文方法在领域文本上具有更好的分词性能和鲁棒性,尤其在未登录词召回率方面提升显著。
This paper proposes a neural network based method for Chinese Word Segmentation to enhance its adaptability and flexibility when transformed to a new domain.Our method is based on the idea of revising the results of an existing segmenter.This two-phase correction model does not depend on either the source domain data or the way of building a segmenter.However,the existing method based on the correction relies on the feature engineering,which is hard to be automatically adapted for different domains.We propose a neural network based corrector to conduct the domain adaptation,which does not require any hand-crafted features.Experimental results show that,the proposed method achieves better performance and higher robustness on domain text segmentation compared with the state-of-the-art approach,especially on the recall of OOV(out-of-vocabulary).
出处
《中文信息学报》
CSCD
北大核心
2017年第6期41-49,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(61472436
61532001)
关键词
中文分词
领域适应
神经网络
Chinese word segmentation
domain adaptation
neural network