摘要
传统3DVM(3-Dimension Document Vector Model)由于没有使用新闻报道的时间因子,这使得该模型表示的新闻报道具有不准确性,进而影响新闻报道的聚类结果。本研究在三维文档向量模型的基础上加入了时间因子,提出了四维文档向量模型表示新闻报道。最后,用k-means聚类算法进行新闻报道的的无监督聚类。实例验证结果表明本文提出的4DVM和k-means相结合的聚类算法优于3DVM以及VSM(vector space mode)和k-means相结合的聚类算法。
The news report expressed by Traditional 3DVM(3-Dimension Document Vector Model) is low accuracy and affect the clustering results of news reports due to the time factor of news reports excluded from it.In the present paper,a 4-dimensional document vector model(4DVM) was proposed to express news texts by including the time factor in the 3-dimensional document vector model.Finally,the k-means news texts clustering algorithm was used for the unsupervised clustering.The experimental results showed that the clustering algorithm of combining 4DVM with k-means is superior to 3DVM,and the clustering algorithm of combining VSM(vector space mode) with k-means.
基金
2011年度西藏自治区大学生创新性实验训练计划项目"基于向量空间模型的藏文文本倾向性分析系统的设计与实现"阶段性成果
项目号:2011CX051