摘要
针对短文本分类关键词特征稀疏和样本数量多,难以处理的技术难点,提出一种基于语义的KNN短文本分类算法.该算法采用基于字的分词策略提取出短文本的特征词,结合中国知网对关键词进行概念映射以提高短文本的语义表达,并针对短文本特点,通过使用LSA降维处理,对KNN分类算法加以改进.实验结果表明,该算法能够有效提高短文本的分类性能.
Aiming at the problems of key words sparse features, sample quantity of the short text classifica- tion and differcuh dealing with, a method based on semantic KNN short text classification algorithm was presented. The algorithm extracts short text feature words based on the word segmentation strategy, combi- ning CNKI to key for concept mapping to improve the short text semantic expression, KNN classification al- gorithm was improved according to the characteristics of short text through application of LSA dimensionali- ty reduction. The experiment results showed that the algorithm can effectively improve the short text classi- fication performance.
出处
《郑州轻工业学院学报(自然科学版)》
CAS
2012年第6期1-4,共4页
Journal of Zhengzhou University of Light Industry:Natural Science
基金
郑州市科技攻关计划项目(0910SGYG23259-3)
关键词
短文本
文本分类
语义扩展
KNN分类算法
short text
text classification
semantic expansion
KNN classification algorithm