摘要
Top-k查询是搜索引擎领域广泛应用的技术之一,该算法从海量数据中返回最符合用户需求的前k个结果,在执行时能避免对大部分无关文档的打分处理。Top-k查询虽然极大提升了查询性能,但其存在的慢启动问题并未得到有效解决。为此,该文首先提取倒排索引的静态Top-k信息,再动态计算针对具体查询词项的初始阈值,在此基础上,结合MaxScore和WAND算法,提出了快速启动的Top-k查询处理算法。实验结果表明,该方法能够有效解决上述问题,具有良好的性能。
Top-k query is a popular technique of search engines,which returns the most relative results for user from massive data.Although Top-k query significantly improves the performance of the system,its slow-start issue has not been effectively resolved.This paper extracts static Top-k information of inverted index and then calculats initial threshold in real time for specific query.On this basis,this paper presents a rapid start algorithm of Top-k query for the current state-of-art methods MaxScore and WAND.Experimental results show that the proposed approach achieves better performance.
出处
《中文信息学报》
CSCD
北大核心
2017年第5期163-170,共8页
Journal of Chinese Information Processing
基金
湖南省自然科学基金(2016JJ2007)