期刊文献+

基于卡方检验的汉语术语抽取 被引量:14

Extraction of Chinese term based on Chi-square test
下载PDF
导出
摘要 发现术语在中文信息处理和语言学习方面具有非常重要的作用和意义。提出了一种基于卡方检验的汉语术语抽取方法:先从网络上下载语料,然后使用改进的互信息参数(F-M I)抽取结构简单的质串,并在其基础上进一步使用卡方检验结合质子串分解方法抽取具有复杂结构的合串。实验结果显示,该算法有效地提高了汉语术语抽取的精确度。 Discovering the term has very important applications in Chinese information processing and language learning. A method for the extraction of Chinese term based on Chi-square test was proposed. First, download Web documents and build a corpus, then prime words were extracted by using the F-MI parameter improved by mutual-information, while combined words were extracted by the Chi-square test with the help of decomposition of prime string. The experiments show that the algorithm can effectively improve the precision in the extraction of Chinese term.
出处 《计算机应用》 CSCD 北大核心 2007年第12期3019-3020,3025,共3页 journal of Computer Applications
基金 国家自然科学基金资助项目(60673040) 国家社会科学基金资助项目(06BYY029) 教育部科学技术研究重点资助项目(105117)
关键词 卡方检验 质子串分解 互信息 Chi-square test decomposition of prime string mutual information
  • 相关文献

参考文献5

  • 1吴立德等..大规模中文文本处理[M],1997.
  • 2FRANTZI K T , ANANIADOU S . Extracting nested collocations [C]//Proceedings of the 16th international Conference On Computational Linguistics. Morristown: Association for Computational Linguistics, 1996:41 -46. 被引量:1
  • 3PANTEL P, LIND K. A statistical corpus-based term extractor[C] //Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, LNCS 2056. London: Springer-Verlag, 2001:36 -46. 被引量:1
  • 4刘建舟,何婷婷,姬东鸿等.基于开放语料的汉语术语的自动抽取[C].第20届东方语言计算机处理国际学术会议,沈阳,2003:43-49. 被引量:2
  • 5LUO Z Y, SONG R. An integrated method for Chinese unknown word extraction[C/OL]//Proceedings of Third SIGHAN Workshop on Chinese. [2007 -04-13]. http://acl. ldc. upenn. edu./W/ W04/W04-1122. pdf. 被引量:1

共引文献1

同被引文献187

引证文献14

二级引证文献145

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部