摘要
介绍了词干提取和词形还原两种词形规范方式以及基于规则和基于词典的两种词形规范实现方法,阐述了基于术语原形化的同义词群构建的基本原理,同时对术语原形化的基本处理方式、通过单词字顺排序构建同义词群以及原形化过程中的排序和词性等问题进行了分析和论述,最后指出对缩略语需进行特殊处理,并针对原形化方法的不足之处提出辅以人工判断和其他同义词获取方法的必要性。
This article introduces two ways of normalization, Stemming and Lemmatization, and also the rule-based and dictionary-based implementation methods. It also expounds the basic principles of the synset construction method based on term normalization, and then makes an analysis and discussion on its basic approach, synset construction method using alphabetical sorting of words and the problems of sorting and parts of speech. Finally it proposes the necessity to pay attention to acronyms and to assist with human judgment and other syn-set construction methods with regard to the inadequacies of the synset construction method based on term normalization.
出处
《情报杂志》
CSSCI
北大核心
2014年第7期171-175,共5页
Journal of Intelligence
基金
国家社会科学基金资助项目“网络环境下叙词表的编制模式与应用方式研究”(编号:10BTQ048)的研究成果之一