摘要
随着网络中数据信息的大量积累,如何从海量文本数据中有效提取所需要的信息成为当前文本挖掘的重要内容。本文主要研究K-means和K-medoids两种聚类算法在文本挖掘中的应用,并通过实验利用基于人工判定的指标对两类算法在聚类文档的准确率和召回率方面进行了性能比较。实验结果表明,与K-means算法相比,K-medoids算法无论在准确率还是召回率方面都要高出5个百分点以上,且后者在处理异常数据和噪声数据方面更为鲁棒。
With the acceleration of massive data on Internet, how to extract information needed effectively has been become an important issue in text mining. This paper mainly studies the application of K-means algorithm and K-medoids algorithm in text mining. Experiments have been conducted to evaluate the performance of the algorithms in accuracy rate and the recall rate based on artificial appraisable standard. Experiment results show that K-medoids algorithm is 5 percent higher than K-means algorithm in terms of accuracy and the recall rate, and the former is more robust in dealing with abnormal and noise data.
出处
《微计算机信息》
2011年第2期168-169,65,共3页
Control & Automation