基于查询性能预测的鲁棒检索排序研究

Robust Ranking via Query Performance Prediction

下载PDF

导出

摘要信息检索技术致力于从海量的信息资源中为用户获取所需的信息。相较于传统的简单模型,近些年来的大量研究工作在提升了检索结果平均质量的同时,往往忽略了鲁棒性的问题,即造成了很多查询的性能下降,导致用户满意度的显著下降。本文提出了一种基于排序学习的查询性能预测方法,针对每一个查询,对多种模型得到的检索结果列表进行预测,将其中预测性能最优的检索结果列表展示给用户。在LETOR的三个标准数据集OHSUMED、MQ2008和MSLR-WEB10K上的一系列对比实验表明,在以经典的BM25模型作为基准的情况下,与当前最好的检索模型之一LambdaMART相比,该方法在提升了检索结果平均质量的同时,显著地减少了性能下降的查询的数量,具备较好的鲁棒性。 The main purpose of information retrieval technology is satisfying users information needs by using massive amounts of information recource. Recent years, many techniques increase average effectiveness relative to traditional simple model while they often ignore the robustness issue. Users satisfaction will be significantly hurt because of degraded results of many queries. A query performance prediction method based on learning to rank is proposed to obtain robust ranking results. For each query, the performance of multiple ranking results generated by different models are predicted and the best one is shown to the user. A series of experiments are conducted on three standard LETOR benchmark datasets which are OHSUMED, MQ2008 and MSLR-WEB10K. The results show that, compared to one of the state-of the-art models named LambdaMART, the ranking results obtained this way significantly reduced the number of queries whose performance are hurt with respect to BM25 model while improving the nearly same degree of everage effectiveness.

作者薛源海俞晓明刘悦关峰程学旗

机构地区中国科学院网络数据科学与技术重点实验室中国科学院计算技术研究所中国科学院大学

出处《中文信息学报》 CSCD 北大核心 2016年第5期169-175,186,共8页 Journal of Chinese Information Processing

基金国家自然科学基金(61232010 61173008) 国家"863"高技术研究发展计划(2012AA011003 2013AA01A213) 国家"973"重点基础研究发展规划(2012CB316303 2013CB329602) 国家科技部"十一五"科技计划(2012BAH39B02 2012BAH46B04)

关键词查询性能预测排序学习鲁棒检索排序 query performance prediction learning to rank robust ranking

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1郎皓,王斌,Gareth Jones,李锦涛,丁凡,刘宜轩.Query Performance Prediction for Information Retrieval Based on Covering Topic Score[J].Journal of Computer Science & Technology,2008,23(4):590-601. 被引量：3

二级参考文献21

1Carmel D, Yom-Tov E, Soboroff I. Predicting query difficulty. In Proc. SIGIR Workshop, Salvador, Brazil, 2005, http://www.haifa.ibm.com/sigir05-qp/index.html. 被引量：1
2Zhou Y, Croft W B. Ranking robustness: A novel framework to predict query performance. In Proc. the 15th ACM International Conference on Information and Knowledge Management. Virginia, USA, 2006, pp.567-574. 被引量：1
3Vinay V, Cox I J, Milic-Frayling N, Wood K. On ranking the effectiveness of searches. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.398-404. 被引量：1
4C J van Rijsbergen. Information Retrieval. Second Edition, London: Butterworths, 1979. 被引量：1
5Carmel D, Yom-Tov E, Darlow A, PelIeg D. What makes a query difficult? In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.390-397. 被引量：1
6Song F, Croft W B. A general language model for information retrieval. In Proc. the 18th ACM International Conference on Information and Knowledge Management, Kansas City, USA, 1999, pp.316-321. 被引量：1
7D MetzIer, W Bruce Croft. A Markov random field model for term dependencies. In Proc. the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp.472-479. 被引量：1
8G Mishne, M de Rijke. Boosting web retrieval through query operations. In Proc. the 27th European Conference on Information Retrieval, pp.502-516. 被引量：1
9Yang Y, Liu X. A re-examination of text categorization methods. In Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkcley, California, USA, 1999, pp.42-49. 被引量：1
10Wasserman L. All of Statistics: A Concise Course in Statistical Inference. Springer Press, 2004. 被引量：1

共引文献2

1王明文,吴世勇,罗文兵,熊超.搜索引擎自动性能评估函数的研究[J].郑州大学学报（理学版）,2010,42(1):74-77.
2吴世勇,王明文.基于聚类分析的搜索引擎自动性能评价[J].中文信息学报,2010,24(5):62-69. 被引量：2

1刘茂福,周斌,胡慧君,陈建勋.问答系统中基于维基百科的问题扩展技术研究[J].工业控制计算机,2012,25(9):101-103. 被引量：3
2陶永全.基于一种改进离散度的检索前查询性能预测[J].软件导刊,2015,14(9):37-39.
3郎皓,王斌,李锦涛,丁凡.文本检索的查询性能预测[J].软件学报,2008,19(2):291-300. 被引量：8
4项琳,冯瑾,张新民.从基于超链接结构到面向语义网的网页排序研究[J].信息技术与信息化,2010(1):28-31. 被引量：1
5寻杨.个性化科研信息检索系统的探讨与设计[J].济宁学院学报,2009,30(6):54-58.
6胡旷达.基于神经网络的个性化信息检索模型研究[J].现代计算机（中旬刊）,2016(4):18-23. 被引量：2
7申健,柴艳娜.Web搜索引擎技术研究[J].计算机技术与发展,2016,26(12):30-34. 被引量：1
8张春元,康耀红,伍小芹.Web信息检索排序算法研究[J].海南大学学报（自然科学版）,2009,27(1):78-83. 被引量：2
9乔亚男,齐勇.查询语义图辅助的信息检索性能预测模型[J].电子学报,2011,39(A03):158-162. 被引量：2
10张映海,何中市,陈永锋.搜索引擎结果中Web文档的排序研究[J].计算机与数字工程,2007,35(2):126-129. 被引量：2

中文信息学报

2016年第5期

浏览历史

内容加载中请稍等...

基于查询性能预测的鲁棒检索排序研究

参考文献1

二级参考文献21

共引文献2

相关作者

相关机构

相关主题

浏览历史