摘要
现有的中文术语自动抽取方法主要针对术语的高频特征与单元性指标,而低频术语和术语的术语性指标缺乏有效的处理方法。针对上述问题,将背景语料库引入C-value方法,提出词语领域分布度与有效词频的概念,通过计算候选术语的EC-value值来自动抽取术语,并结合术语簇识别与挖掘,改善低频术语抽取性能。通过计算机领域术语抽取实验,表明本文提出的改进方法(EC-value方法)能更有效地衡量术语的术语性,改善低频术语抽取性能。
Existing Chinese term automatic extraction methods focus on the high - frequency characteristics and unithood indicators of terms, while low frequency terms and termhood indicators lack of effective treatment methods. In response to these problems, this paper introduces the background corpus into C - value method and proposes the concepts of word field distribution degree and effective word frequency. Then the paper automatically extracts the terms by calculating EC - value ( Effective C - value) of candidate terms, and improves the extraction performance of low - frequency terms combined with the term cluster recognition and mining. The term extraction experiment in the computer field shows that the proposed im- proved method ( EC - value method) can measure the termhood of terms more effectively, and improve the extraction per- formance of low -frequency terms.
出处
《现代图书情报技术》
CSSCI
北大核心
2013年第9期54-59,共6页
New Technology of Library and Information Service
基金
国家自然科学基金项目"论证体篇章‘结构与语义’协同交叉分析模型与算法研究"(项目编号:61240036)
教育部人文社会科学基金项目"论证体篇章‘结构与语义’协同分析方法研究"(项目编号:11YJC740157)
江西省自然科学基金项目"面向语义理解的网页文本‘结构与语义’协同交叉分析模型研究"(项目编号:20114BAB201027)的研究成果之一