期刊文献+

一种层次化的检索结果聚类方法 被引量:15

A Hierarchical Search Result Clustering Method
下载PDF
导出
摘要 检索结果聚类能够帮助用户快速地浏览搜索引擎返回的结果.传统的聚类方法由于不能生成有意义的类别标签因此是不适合的,为了改善检索结果层次化聚类的效果,采用了基于标签的聚类算法,提出了将DF、查询日志、查询词上下文特征融合的类别标签抽取算法,并以抽取的标签构造基础类别图,通过GBCA算法构建层次化聚类结果.实验证明了多特征融合模型的有效性;GBCA算法在类别标签抽取和F-Measure两个评价指标上都比STC和Snaket算法有很大的提高. Search result clustering can help users quickly browse through the documents returned by search engine. Traditional clustering techniques are inadequate since they can not generate clusters with highly readable names. In order to improve the performance of the search result clustering and help user to quickly locate the relevant document, a label-based clustering method is used to make the search result clustering. A multi-feature integrated model is developed to extract base-cluster labels, which combines the DF, query log and query context features together. Using the extracted labels, some basic clusters are built. In order to setup a hierarchical clustering structure, a basic cluster relation graph is built based on these basic clusters. A hierarchical cluster structure is generated from the basic cluster relation graph using the graph based cluster algorithm (GBCA). To evaluate the search result clustering method, a test-bed is set up. P@ N and F-Measure are introduced to evaluate the extracted labels and the document distribution in clusters. The experiment shows that the integrated label-extraction model is very effective. The more feature is used, the higher P@ N can be gained. Compared with the STC and Shaker clustering method, GBCA outperforms the STC and Shaker in cluster label extraction and F-Measure.
出处 《计算机研究与发展》 EI CSCD 北大核心 2008年第3期542-547,共6页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展规划基金项目(2004CB318109 2007CB311100)
关键词 信息检索 检索结果聚类 层次化聚类 文本聚类 聚类 information retrieval search result clustering hierarchical clustering text clustering clustering
  • 相关文献

参考文献10

  • 1Hiroyuki Toda, Ryoji Kataoka. A search result clustering method using informatively named entities [C]. In: Proc of the ACM Workshop on Web Information and Data Management. New York: ACM Press, 2005. 81-86. 被引量:1
  • 2M A Hearst, J O Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results [C]. In: Proc of the ACM Special Interest Group on Information Retrieval Conf. New York: ACM Press, 1996. 76-84. 被引量:1
  • 3F C-iannotti, M Nanni, D Pedreschi, Webcat: Automatic categorization of Web search results [C]. In: Proc of the 11th Italian Syrup on Advanced Database Systems. Italian: Rubbettino Editore, 2003. 507-518. 被引量:1
  • 4王志梅,张俊林,李秋山.Web检索结果快速聚类方法的研究与实现[J].计算机工程与设计,2004,25(12):2231-2233. 被引量:2
  • 5Oren Zamir, Oren Etzioni. Web document clustering: A feasibility demonstration [C]. In: Proc of the ACM Special Interest Group on Information Retrieval Conf. New York: ACM Press, 1998. 46-54. 被引量:1
  • 6Florian Beil, Martin Ester, Xiaowei Xu. Frequent term-based text clustering [C]. In: Proc of the 8th ACM Int'l Conf on Knowledge Discovery and Data Mining. New York: ACM Press, 2002. 436-442. 被引量:1
  • 7H Zeng, Q He, Z Chen, et al. Learning to cluster Web search results [C]. In: Proc of the ACM Special Interest Group on Information Retrieval Conf. New York: ACM Press, 2004. 210-217. 被引量:1
  • 8Paolo Ferragina, Antonio Gulli, A personalized search engine based on Web-Snippet hierarchical clustering [C] . In: Proc of the 14th Int'l Conf on World Wide Web, New York: ACM Press, 2005, 801-810. 被引量:1
  • 9X He, H Zha, C Ding, et al. Web document clustering using hyperlink structures [R], Department of Computer Science and Engineering, Pennsylvania State University, Tech Rep: CSE- 01-006, 2001. 被引量:1
  • 10Jianbo Shi, Jitendra Malik, Normalized cuts and image segmentation [J ]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905. 被引量:1

二级参考文献11

  • 1Oren Zamir, Oren Etaioni. Gouper:A dynamic clustering interface to websearch results(1999) [EB/OL]. http:// www.cs. washington.edu/research/projects/WebWare1/etzioni/www/papers/www8.pdf. 被引量:1
  • 2How the vivisimo clustering engine works institute of physics [EB/OL].http://www.iop.org/EJ/help/-topic=cluster/ 被引量:1
  • 3http://www.google.com[EB/OL]. 被引量:1
  • 4http://www.baidu.com/[EB/OL]. 被引量:1
  • 5韩家炜, Kamber M. 数据挖掘: 概念与技术[M]. 北京:机械工业出版社, 2002. 被引量:1
  • 6Willett P. Document clustering using an inverted file approach[J]. Journal of Information Science, 1990, (2): 223-231. 被引量:1
  • 7陶晓鹏. 面向(中文)全文数据库的全文索引的研究[D]. 上海:复旦大学, 2001. 被引量:1
  • 8Van Rijsbergen C J. Information retrieval[EB/OL]. http://www. dcs.gla.ac.uk/Keith/Preface.html. 被引量:1
  • 9William B Frakes,Ricardo Baeza-Yates .Information retrieval data structure & algorithms [M].Prentice Hall, INC,1992. 被引量:1
  • 10Qin He . A review of clustering algorithms as applied in IR [EB/OL]. http://alexia.lis.uiuc.edu/research/irg/uiuclis--1999-6+irg. pdf. 被引量:1

共引文献1

同被引文献202

引证文献15

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部