摘要
文本分类研究逐渐成为网络文本挖掘的研究热点,针对中文文本进行自动分类的研究也在逐渐升温。针对新闻文本的特殊性,在文本分类中经典的向量空间模型的基础上,提出了一套改进的四维向量空间模型及自适应追踪策略,进而提高了新闻文本分类的效果。实验结果表明,算法可以使传统空间向量模型的分类性能由81.5%提高至92.49%,证明算法是有效的。
Web - page classification has become a hot spot in the fields of Web Text Mining in recent years. Research in Chinese text automatic classification is gradually warming. In this paper, we have put forward a four - dimensional vector space model which is based on the classic vector space model, and have improved the adaptive methods. Experimental results show that the proposed method can improve the effectiveness of classification from 81.5% to 92.49%, which prove that the method is effective.
出处
《微计算机应用》
2010年第3期58-62,共5页
Microcomputer Applications
关键词
文本挖掘
文本分类
向量空间模型
四维向量空间模型
text mining, text classification, vector space model, four -dimensional vector space model