摘要
本文提出了关于一个词的文本类间频率的概念 ,给出一个词在文本分类中的区分度的定义 ,讨论了区分度的性质 ,提出了选择特词新的方法 ,定义了特征词的权重 ,建立了向量空间模型的一套加权距离分类规则。实验结果表明 。
This paper presents a conception of frequencies of a word distributed all over the classes of texts,gives a definition of the degrees of distinction of a word in text categorization,discusses the properties of the degrees of distinction,puts forward a new approach to select the feature words,defines the weights of all selected feature words,and finally establishes the weighted distance categorization rules of VSM. The experiment results show that the method is effective and useful.
出处
《中文信息学报》
CSCD
北大核心
2002年第3期15-19,共5页
Journal of Chinese Information Processing