摘要
针对信息检索中查询关键词与文档用词不匹配的问题,提出一种基于关联规则与聚类算法的查询扩展算法。该算法在第1阶段对初始查询结果的前N篇文档进行关联规则挖掘,提取含有初始查询项的关联规则构建规则库,并从中选取与查询用词关联度最大的K个词作为扩展词,与初始查询组成新查询后再次查询,在第2阶段将新查询结果进行聚类分析并计算结果中每篇文档的最终相关度,按最终相关度大小重新排序。实验结果表明,该算法比单独使用关联规则算法或是单独使用聚类算法均有更优的检索性能。
To solve the problem of word-mismatch between query key words and document words, this paper puts forward a query expansion algorithm based on the combination of association rules and cluster algorithm. At the first stage it uses association rules on the front N documents in the first query result, and gets the rules that have query item to build the rules base, and gets the K words that have the most similarity with the query words to form a new query and query again to get a new result. At the second stage it uses cluster algorithm on the new result and compute every document's final similarity to get a document re-ranking. Experimental result shows this query expansion algorithm outperforms both the association rules and the cluster algorithm.
出处
《计算机工程》
CAS
CSCD
北大核心
2009年第6期44-46,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60702056)
关键词
信息检索
查询扩展
关联规则
聚类算法
information retrieval
query expansion
association rules
cluster algorithm