摘要
网页排序技术是搜索引擎的核心技术之一. 校园网搜索引擎是指以一个校园网内的Web网页为搜索内容的搜索引擎. 由于校园网相对于互联网和内联网的特殊性,各种启发式条件对校园网网页排序优化的影响及排序融合技术在校园网搜索引擎的作用是研究的重点. 实验结果表明各个启发式条件的影响和实验数据集有关,而不同启发式条件组合经过排序融合后所获得的查全率差别很大(2%~48%). 查全率大于35%的启发式条件组合至少包含4个启发式条件,即校园网搜索引擎的排序需要依据数据集综合考虑多个启发式条件的排序结果. 排序融合技术是校园网搜索引擎具有良好的查全率的必要技术之一. 基于排序融合技术的网页排序模块已经应用于清华大学校园网搜索引擎中.
Relevance ranking is one of the key technologies for web pages search engine.Campus network search engine(CNSE) focuses on web information within a certain campus network,which has its own characteristics compared with Internet and Intranets.The influence of heuristic evidence in web page ranking and the performance of rank aggregation to CNSE were analyzed.The impact of each heuristic evidence differs in different data sets,and the recall of each combination of subsets of heuristics varies from 2% to 48%.The combination whose recall is over 35% includes at least four heuristics,that is,a few heuristics should be considered according to dataset in ranking system.The experimental results show that rank aggregation technology is necessary for producing robust results in CNSE.The rank aggregation algorithm has been deployed in Tsinghua University campus network search engine.
出处
《大连理工大学学报》
EI
CAS
CSCD
北大核心
2005年第z1期257-260,共4页
Journal of Dalian University of Technology
基金
国家自然科学基金资助项目(90104002)