期刊文献+

基于混合模型的多搜索引擎融合 被引量:1

Multi-Engine Fusion Based on Mixture Model
下载PDF
导出
摘要 为提高组合检索系统的性能,提出一种基于混合模型的多搜索引擎融合方法.该方法利用高斯、指数密度函数分别描述相关、非相关文档的相关分值分布,用基于混合模型的算法规范化处理相关分值,估计非相关文档的相关分值,并进行分值合并.这样做既考虑到相关、非相关文档在分值分布上的差异, 又考虑了用户对成员搜索引擎的性能评价.实验结果表明,利用该方法的平均查准率要比成员搜索引擎平均提高37 8%,也明显高于Sum CombSUM、Sum CombMNZ和Standard CombSUM 3种常用的融合方法. In order to increase the performance of the combined retrieval system, a multi-engine fusion method based on a mixture model was presented. The method describes the relevant score distribution of the relevant and non-relevant documents using Gaussian density function and exponential density function respectively. Based on the algorithm of the mixture model the relevant scores are normalized, the scores of non-relevant documents are estimated and combined, which consider both the difference between relevant and non-relevant documents in the score distribution and the retrieval performances of the member search engine estimated by users. Experimental results show that the average search accuracy is improved by 37.8% compared with member engines, and also higher than three often used fusion methods of Sum-CombSUM, Sum-CombMNZ, and Standard-CombSUM.
作者 霍华 冯博琴
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2005年第4期356-359,共4页 Journal of Xi'an Jiaotong University
基金 国家高技术研究发展计划资助项目(2003AA1Z2610).
关键词 相关分值 混合模型 搜索引擎融合 分值合并 Computer simulation Iterative methods Maximum likelihood estimation Normal distribution Parameter estimation
  • 相关文献

参考文献7

  • 1向日华,王润生.一种基于高斯混合模型的距离图像分割算法[J].软件学报,2003,14(7):1250-1257. 被引量:54
  • 2Savoy J. Combining multiple strategies for effective monolingual and cross-language retrieval [J]. Information Retrieval, 2004, 7(1): 121-148. 被引量:1
  • 3Montague M, Aslam J. Relevance score normalization for metasearch[A]. The ACM Tenth International Conference on Information and Knowledge Management, Atlanta, USA, 2001. 被引量:1
  • 4Sever H, Tolun M R. Comparison of normalization techniques for metasearch [A]. Advances in Information Systems (ADVIS), Izmir, Turkey, 2002. 被引量:1
  • 5Manmatha R, Rath T, Feng F. Modeling score distributions for combining the outputs of search engines [A]. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, 2001. 被引量:1
  • 6McLachlan G, Peel D. Finite mixture models [M]. New York: John Wiley and Sons Inc, 2001. 40-51. 被引量:1
  • 7Arampatzis A, van Hameren A. Maximum likelihood estimation for filtering thresholds [A]. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, 2001. 被引量:1

二级参考文献9

  • 1Jiang XY, Bunke H. Edge detection in range images based on scan line approximation. Computer Vision and Image Understanding,1999,73(2): 183~ 199. 被引量:1
  • 2Hoover A, Jean-Baptiste G, Jiang XY, Flynn PJ, Bunke H, Goldgof DB, Bowyer K, Eggert DW, Fitzgibbon A, Fisher RB. An experimental comparison of range image segmentation algorithms. IEEE Transactions on PAMI, 1996,18(7):673--689. 被引量:1
  • 3Hoffman R, Jain AK. Segment and classification of range images. IEEE Transactions on PAMI, 1996,9(5):608---620. 被引量:1
  • 4Bihnes JA. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. 1998. http://ssli.ee.washington.edu/people/bihnes/mypapers/em.ps.gz. 被引量:1
  • 5Redner RA, Walker HF. Mixture density, maximum likelihood and the EM algorithm. SIAM Review, 1984,26(2):195~239. 被引量:1
  • 6Hoover A, Powell MW. Range image segmentation comparison project. Department of Computer Science and Engineering,University of South Florida, 1996. http://marathon.csee.usf.edu/range/seg-comp/SegComp.html. 被引量:1
  • 7Raflery AE. Approximate Bayes factors and accounting for model uncertainty in generalizes linear model. Technical Report, 1993.http://www.stat.washington.edu/www/research/reports/1993/tr255 .ps. 被引量:1
  • 8Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster analysis. Technical Report,1998. http://www.stat.washington.edu/www/research/reports/1998/tr329.ps. 被引量:1
  • 9Buhmann/M. Data clustering and learning. 2002. http://www-dbv.cs.uni-bonn.de,/pdf/buhmann.hobtann02.pdf. 被引量:1

共引文献53

同被引文献10

  • 1Cacheda F,Plachouras V,Ounis I.A case study of distributed information retrieval architectures to index one terabyte of text[J].Information Processing & Management,2005,41 (5):1141-1161. 被引量:1
  • 2Croft W B.Combining approaches to information retrieval[M]//Croft W B.Advances in Information Retrieval.[S.l.]:Kluwer Academic Publishers,2002:1-36. 被引量:1
  • 3Montague M,Aslam J.Relevance score normalization for metasearch[C]//the Proc of the ACM Tenth International Conference on Information and Knowledge Management,2001,11:427-433. 被引量:1
  • 4Manmatha R,Rath T,Feng F.Modeling score distributions for combining the outputs of search engines[C]//the Proc of 24th ACM SIGIR Conf on Research and Development in Information Retrieval,2001,9:267-275. 被引量:1
  • 5Sever H,Tolun M R.Comparison of normalization techniques for metasearch[C]//Yakhno T.LNCS 2457:ADVIS 2002:133-143. 被引量:1
  • 6Mclachlan G,Peel D.Finite mixture models[M].New York:John Wiley & Sons,Inc,2001:40-51. 被引量:1
  • 7Dankmar B,Seidel W,Garel B.Advances in mixture models[J].Computational Statistics & Data Analysis,2006,11:151-159. 被引量:1
  • 8Arampatzis A,van Hameren A.Maximum likelihood estimation for filtering thresholds[C]//the Proc of the 24th ACM SIGR Conf on Research and Development in Information Retrieval,Sept 2001:185-293. 被引量:1
  • 9Si L,Callen J.A semisupervised learning method to merge search engine results[J].ACM Transactions on Information Systems,2003,21(4):457-491. 被引量:1
  • 10向日华,王润生.一种基于高斯混合模型的距离图像分割算法[J].软件学报,2003,14(7):1250-1257. 被引量:54

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部