期刊文献+

基于后缀树的Web检索结果聚类标签生成方法 被引量:9

Suffix Tree Based Label Generation Method for Web Search Results Clustering
下载PDF
导出
摘要 对检索结果进行聚类能够方便用户从搜索结果中快速地找到自己需要的信息,当前已有各种聚类方法和系统被广泛使用,但是,现有大部分方法由于聚类标签的可读性和描述性较差,难以达到预期效果。该文提出了一种新的思路,注重于如何在聚类之前就产生好的标签,在生成了标签的基础上,再进行检索结果聚类。对于搜索引擎返回的结果,我们先统一建立一棵后缀树,然后计算后缀树中各个短语的得分,选取得分最高的若干短语作为候选标签。得到标签后,将搜索引擎返回的各个结果项分配到它所包含的标签对应的分类中,形成最后的聚类。实验表明,我们的方法是比较有效的。 Organizing web search results into clusters is helpful for users to browse through search results. Many clustering methods have been widely used for this purpose, but most of them do not work well because the generated cluster labels are not readable and informative enough for users to identify the right cluster quickly. In this paper, we focus on how to generate more readable cluster labels and propose a novel method to address this problem. Based on the ranked list of snippets returned by a web search engine for a given query, we first construct a suffix tree for these snippets. Then we calculate scores for all the phrases in the tree by Ieveraging their statistic and syntactic information. Finally, we rank the phrases in descending order of their scores, and then select the top k phrases as the final cluster labels. Having the labels, we can form clusters by assigning each snippet to the relevant label Experimental results show that our method works well for clustering web search results.
出处 《中文信息学报》 CSCD 北大核心 2009年第2期83-88,共6页 Journal of Chinese Information Processing
基金 国家十一五资助项目(2006BAH02A10) 国家863计划资助项目(2008AA01Z421)
关键词 计算机应用 中文信息处理 检索结果聚类 聚类标签生成 后缀树 computer application Chinese information processing search results clustering cluster label generation suffix tree
  • 相关文献

参考文献10

  • 1Baidu search engine[CP].http://www, baidu, com. 被引量:1
  • 2Carrot clustering engine[CP].http://demo, carrot2. org/demo-stable/main. 被引量:1
  • 3Dragon toolkit[CP].http://www, dragontoolkit, org 被引量:1
  • 4H. Chim and X. Deng. A new suffix tree similarity measure for document clustering[C]//WWW.121- 129, 2007. 被引量:1
  • 5Google search engine[CP].http://www, google, com 被引量:1
  • 6Vivisimo clustering engine[CP].http://vivisimo.com 被引量:1
  • 7X. Wang and C. Zhai. Learn from web search logs to organize search results[C]//SIGIR, 87-94, 2007. 被引量:1
  • 8O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration.[C]//SIGIR, 46-54,1998. 被引量:1
  • 9H. Zeng, Q. He, Z. Chen, W. Ma and J. Ma. Learning to cluster web search results.[C]//SIGIR, 210- 217, 2004. 被引量:1
  • 10Levenshtein distance [EB]. http://en.wikipedia.org/ wiki/Levenshtein_distance. 被引量:1

同被引文献150

引证文献9

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部