摘要
Web搜索引擎中,对用户查询结构的有效分析,能更好地理解用户的查询意图,促进检索效果的提升。该文提出了一种简单高效的基于逐点互信息的查询结构分析方法,该方法包含了基于MapReduce的离线训练算法,以及一种自下向上的在线查询树构建算法。实验显示,该方法具有很高的切分速度,并能取得不错的可比较的切分效果。进一步的,该方法对检索性能的提升,也有明显的促进作用,在MAP,p@5,p@10评价指标上,都取得了不错的性能提升。
The effective analysis of user query structure is helpful for understanding the user's intent and promoting performance of the Web search engine. This paper proposes a straightforward and effective analysis method for user query structure based on PMI (pointwise mutual information). The method contains an off-line training algorithm based on MapReduce and a bottom-up online building method for query analysis. The experiment result shows that our approach possesses a high segmentation speed while maintain a comparable segmentation performance to other approaches. The experiment on TREC WT10g dataset further validates the effectiveness of our method and shows that it can prompt the search results in terms of MAP, p@5, p@10.
出处
《中文信息学报》
CSCD
北大核心
2012年第5期33-39,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60903139
60873243
60933005)
国家863计划重点项目(2010AA012502
2010AA012503)