摘要
信息检索系统不仅需要考虑文档的相关性,还要考虑文档的多样性和新颖性。针对信息检索结果的多元化问题,探讨了数据融合方法在搜索结果多元化上的适用性。针对线性组合方法,重新考察了成员系统的权重分配策略。通过考虑成员检索系统的有效性和成员检索系统之间的差异性,提出了一种比较简单方便的基于集合覆盖率的方法,使得采用这种权重分配方式的线性组合方法在结果的多样性上能够有所改善。实验采用了3组来自于TREC文本检索会议的针对Web检索多样化任务的数据,实验结果表明在多样性方面,所提出的数据融合方法均能提高检索结果的性能,优于最佳的成员检索系统。
Information retrieval systems need to consider both aspects of relevance and diversity for those retrieved documents. To solve the problem of search result diversification,a different perspective was adopted to solve the problem based on a discussion of the application of data fusion method in the search result diversification. Especially for the linear combination method,the weight allocation strategy for component systems was reexamined. Both the effectiveness of component retrieval systems and the dissimilarity of them were concerned,and a simple and convenient method for calculating the dissimilarity was put forward,based on set covering rate. Thereby a linear combination method with such weighting assignment can improve the performance of results in the diversity. Experiments were carried out with 3groups of top-ranked results submitted to the TREC web diversity task. The result of experiments shows that data fusion is still a useful approach to performance improvement for diversity as for relevance previously.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2015年第1期31-36,共6页
Journal of Shandong University(Natural Science)
基金
江苏特聘教授项目(1221170037
1221170038)
江苏大学特聘教授启动基金资助项目(1281170024
1281170025)
关键词
数据融合
检索结果多元化
线性组合
权重分配
data fusion
search result diversification
linear combination
weight assignment