摘要
针对目前句群划分工作缺乏计算语言学数据支持、忽略篇章衔接词的问题以及当前篇章分析较少研究句群语法单位的现象,提出一种汉语句群自动划分方法。该方法以汉语句群理论为指导,构建汉语句群划分标注评测语料,并且基于多元判别分析(MDA)方法设计了一组评价函数J,从而实现汉语句群的自动划分。实验结果表明,引入切分片段长度因素和篇章衔接词因素可以改善句群划分性能,并且利用Skip-Gram Model比传统的向量空间模型(VSM)有更好的效果,其正确分割率Pμ达到85.37%、错误分割率Window Diff降到24.08%。同时该方法在句群划分任务上有更大的优势,比传统MDA方法有更好的句群划分效果。
In order to solve the problems in Chinese sentence grouping domain, including the lack of computational linguistics data and the joint makers in a discourse, this paper proposed an automatic Chinese sentence grouping method based on Multiple Discriminant Analysis( MDA). Moreover, sentences group was rarely considered as a grammar unit. An annotated evaluation corpus for Chinese sentence group was constructed based on Chinese sentence group theory. And then, a group of evaluation functions J was designed based on the MDA method to realize automatic Chinese sentence grouping. The experimental results show that the length of a segmented unit and one discourse's joint makers contribute to the performance of Chinese sentence group. And the Skip-Gram model has a better effect than the traditional Vector Space Model( VSM). The evaluation parameter Pμreaches to 85. 37% and Window Diff reduces to 24. 08% respectively. The proposed method has better grouping performance than that of the original MDA method.
出处
《计算机应用》
CSCD
北大核心
2015年第5期1314-1319,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(61202281
61103101)
教育部人文社会科学研究项目青年基金资助项目(10YJCZH052
12YJCZH201)
关键词
汉语句群划分
多元判别分析
篇章分析
Skip-Gram模型
篇章衔接
Chinese sentences grouping
Multiple Discriminant Analysis (MDA)
discourse analysis
Skip-Gram model
discourse coherence