期刊文献+

基于K-center和信息增益的Web搜索结果聚类方法 被引量:1

Web search result clustering based on K-center and information gain
下载PDF
导出
摘要 基于K-center和信息增益的概念,将改进后的FPF(furthest-point-first)算法用于Web搜索结果聚类,提出了聚类标志方法,使得聚类呈现出的结果更易于用户理解,给出了评价聚类质量的模型。将该算法与Lingo,K-means算法进行比较,其结果表明,本算法能够较好地平衡聚类质量和速度,更加适用于Web检索聚类。 Based on K-center and information gain, this paper represented a version of modified FPF algorithm and cluster labeling algorithm on Web search clustering, made the result better understood. At last, presented a simple and intuitionistic criterion NMI for estimating cluster quality. The proposed solution was evaluated in search results returned from actual Web search engine and compared with other methods, like Lingo, K-means. The result proves that the algorithm can balance better clustering time and quality, and meets the requirements of Web searching clustering.
作者 丁振国 孟星
出处 《计算机应用研究》 CSCD 北大核心 2008年第10期3125-3127,共3页 Application Research of Computers
基金 国家"863"计划资助项目(2004AA1Z2520) 军队网络互联与信息安全策略研究资助项目(2006QB1069)
关键词 WEB文档 聚类 聚类标志 K-center 信息增益 Web document clustering cluster labeling K-center information gain
  • 相关文献

参考文献10

  • 1CNNIC.第19次中国互联网发展状况统计报告[R].2007. 被引量:1
  • 2ZAMIR O, ETZIONI O. Web document clustering:a feasibility demon- stration[C]//Proc of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval. 1998:46-54. 被引量:1
  • 3OSINSKI S. An algorithm for elustering of Web search result[D]. Poland : Poznan University of Technology, 2003. 被引量:1
  • 4OSINSKI S, WEISS D. Conceptual clustering using Lingo algorithm: evaluation on open directory project data[C]//Proc of the 5th Conference on Intelligent Information Processing and Web Mining. 2004: 369-377. 被引量:1
  • 5GONZALEZ T F. Clustering to minimize the maximum inter cluster distance [ J ]. Theoretical Computer Science, 1985,38 ( 2/3 ) : 293- 306. 被引量:1
  • 6COVER T M, THOMAS J A. Elements of information theory[ M ]. New York: Wiley, 1991. 被引量:1
  • 7GERACI F. A scalable algorithm for high quality clustering of Web snippets[ C]//Proc of the 21st ACM Symposium on Applied Computing. New York: ACM Press, 2006 : 1058-1062. 被引量:1
  • 8FEDER T, GREENE D. Optimal algorithms for approximate clustering [ C ]//Proc of the 20th ACM Symposium on Theory of Computing. New York: ACM Press, 1988:434-444. 被引量:1
  • 9GONZALEZ T F. Clustering to minimize the maximum inter cluster distance[ j ]. Theoretical Computer Science, 1985,38 ( 2/3 ) : 293- 306. 被引量:1
  • 10ODP[EB/OL]. http://www. dmoz. org/. 被引量:1

同被引文献10

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部