摘要
为提高组合检索系统的性能,提出一种基于混合模型的多搜索引擎融合方法.该方法利用高斯、指数密度函数分别描述相关、非相关文档的相关分值分布,用基于混合模型的算法规范化处理相关分值,估计非相关文档的相关分值,并进行分值合并.这样做既考虑到相关、非相关文档在分值分布上的差异, 又考虑了用户对成员搜索引擎的性能评价.实验结果表明,利用该方法的平均查准率要比成员搜索引擎平均提高37 8%,也明显高于Sum CombSUM、Sum CombMNZ和Standard CombSUM 3种常用的融合方法.
In order to increase the performance of the combined retrieval system, a multi-engine fusion method based on a mixture model was presented. The method describes the relevant score distribution of the relevant and non-relevant documents using Gaussian density function and exponential density function respectively. Based on the algorithm of the mixture model the relevant scores are normalized, the scores of non-relevant documents are estimated and combined, which consider both the difference between relevant and non-relevant documents in the score distribution and the retrieval performances of the member search engine estimated by users. Experimental results show that the average search accuracy is improved by 37.8% compared with member engines, and also higher than three often used fusion methods of Sum-CombSUM, Sum-CombMNZ, and Standard-CombSUM.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2005年第4期356-359,共4页
Journal of Xi'an Jiaotong University
基金
国家高技术研究发展计划资助项目(2003AA1Z2610).
关键词
相关分值
混合模型
搜索引擎融合
分值合并
Computer simulation
Iterative methods
Maximum likelihood estimation
Normal distribution
Parameter estimation