摘要
词语搭配在语言学领域占有重要的地位,近年来,它已成为自然语言处理研究的重点方向之一。为了实现词语搭配的自动抽取,本文给出了语义搭配和句式搭配的定义,并针对这两类搭配,给出了一种基于五元组的词语搭配抽取方法。通过基于统计量的搭配提取实验,得出此方法有利于词语搭配的自动抽取。其中,基于互信息的搭配抽取效果最好,其准确率可达80%。
Collocation plays an important role in the field of linguistics. In recent years, it has become one of the major research directions in natural language processing. In order to realize the automatic extraction of collocations, the definitions of semantic collocation and syntactic collocation are given in this paper. For these two types of collocation, a quintuple-based collocation extraction method is also presented. Through the experiment based on statistics, it indicates that this method is advantageous to the collocation extraction. And among these statistics, mutual information is the best, the accuracy rate can go up to 80%.
出处
《电子设计工程》
2015年第19期75-78,共4页
Electronic Design Engineering
关键词
词语搭配
统计量
五元组
评价指标
互信息
collocation
statistics
quintuple
evaluation index
mutual information