摘要
发现术语在中文信息处理和语言学习方面具有非常重要的作用和意义。提出了一种基于卡方检验的汉语术语抽取方法:先从网络上下载语料,然后使用改进的互信息参数(F-M I)抽取结构简单的质串,并在其基础上进一步使用卡方检验结合质子串分解方法抽取具有复杂结构的合串。实验结果显示,该算法有效地提高了汉语术语抽取的精确度。
Discovering the term has very important applications in Chinese information processing and language learning. A method for the extraction of Chinese term based on Chi-square test was proposed. First, download Web documents and build a corpus, then prime words were extracted by using the F-MI parameter improved by mutual-information, while combined words were extracted by the Chi-square test with the help of decomposition of prime string. The experiments show that the algorithm can effectively improve the precision in the extraction of Chinese term.
出处
《计算机应用》
CSCD
北大核心
2007年第12期3019-3020,3025,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60673040)
国家社会科学基金资助项目(06BYY029)
教育部科学技术研究重点资助项目(105117)
关键词
卡方检验
质子串分解
互信息
Chi-square test
decomposition of prime string
mutual information