期刊文献+

基于多元判别分析的汉语句群自动划分方法 被引量:4

Automatic Chinese sentences group method based on multiple discriminant analysis
下载PDF
导出
摘要 针对目前句群划分工作缺乏计算语言学数据支持、忽略篇章衔接词的问题以及当前篇章分析较少研究句群语法单位的现象,提出一种汉语句群自动划分方法。该方法以汉语句群理论为指导,构建汉语句群划分标注评测语料,并且基于多元判别分析(MDA)方法设计了一组评价函数J,从而实现汉语句群的自动划分。实验结果表明,引入切分片段长度因素和篇章衔接词因素可以改善句群划分性能,并且利用Skip-Gram Model比传统的向量空间模型(VSM)有更好的效果,其正确分割率Pμ达到85.37%、错误分割率Window Diff降到24.08%。同时该方法在句群划分任务上有更大的优势,比传统MDA方法有更好的句群划分效果。 In order to solve the problems in Chinese sentence grouping domain, including the lack of computational linguistics data and the joint makers in a discourse, this paper proposed an automatic Chinese sentence grouping method based on Multiple Discriminant Analysis( MDA). Moreover, sentences group was rarely considered as a grammar unit. An annotated evaluation corpus for Chinese sentence group was constructed based on Chinese sentence group theory. And then, a group of evaluation functions J was designed based on the MDA method to realize automatic Chinese sentence grouping. The experimental results show that the length of a segmented unit and one discourse's joint makers contribute to the performance of Chinese sentence group. And the Skip-Gram model has a better effect than the traditional Vector Space Model( VSM). The evaluation parameter Pμreaches to 85. 37% and Window Diff reduces to 24. 08% respectively. The proposed method has better grouping performance than that of the original MDA method.
出处 《计算机应用》 CSCD 北大核心 2015年第5期1314-1319,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61202281 61103101) 教育部人文社会科学研究项目青年基金资助项目(10YJCZH052 12YJCZH201)
关键词 汉语句群划分 多元判别分析 篇章分析 Skip-Gram模型 篇章衔接 Chinese sentences grouping Multiple Discriminant Analysis (MDA) discourse analysis Skip-Gram model discourse coherence
  • 相关文献

参考文献23

  • 1朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量:15
  • 2MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed repre- sentations of words and phrases and their compositiouality [ C ]// NIPS 2013: Proceedings of the Advances in Neural Information Pro- cessing Systems 26. Cambridge: MIT Press, 2015:3111 -3119. 被引量:1
  • 3MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimationof word representations in vector space[ C/OL]. [ 2014 - 04 - 20]. http://arxiv, org,/pdtYl301. 3781. pdf. 被引量:1
  • 4王跃洪.英语句群分析[J].上海理工大学学报(社会科学版),2004,26(2):30-32. 被引量:6
  • 5罗天妮.论以句群为汉英翻译的有效基本单位[J].东南大学学报(哲学社会科学版),2006,8(3):110-113. 被引量:3
  • 6徐凡,朱巧明,周国栋.篇章分析技术综述[J].中文信息学报,2013,27(3):20-32. 被引量:15
  • 7MANN W C, THOMPSION S A. Rhetorical structure theory: a the- ory of text organization[J]. Text, 1988, 3(8): 243 -281. 被引量:1
  • 8WEBBER B. D-LTAG: extending lexicalized TAG to discourse[ J]. Cognitive Science, 2004, 28(5): 751-779. 被引量:1
  • 9吴为章,田小琳著..汉语句群[M].北京:商务印书馆,2000:246.
  • 10郝长留编..语段知识[M].北京:北京出版社,1983:187.

二级参考文献169

共引文献67

同被引文献38

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部