摘要
近年来微博检索已经成为信息检索领域的研究热点。相关的研究表明,微博检索具有时间敏感性。已有工作根据不同的时间敏感性假设,例如,时间越新文档越相关,或者时间越接近热点时刻文档越相关,得到多种不同的检索模型,都在一定程度上提高了检索效果。但是这些假设主要来自于观察,是一种直观简化的假设,仅能从某个方面反映时间因素影响微博排序的规律。该文验证了微博检索具有复杂的时间敏感特性,直观的简化假设并不能准确地描述这种特性。在此基础上提出了一个利用微博的时间特征和文本特征,通过机器学习的方式来构建一个针对时间敏感的微博检索的排序学习模型(TLTR)。在时间特征上,考察了查询相关的全局时间特征以及查询-文档对的局部时间特征。在TREC Microblog Track 2011 2012数据集上的实验结果表明,TLTR模型优于现有的其他时间敏感的微博排序方法。
Microblog search has become a hot research problem in information retrieval area in recent years. Related work shows that most queries in microblog search are time-sensitive. To address this problem, many existing meth- ods were proposed based on different time-sensitive assumptions, such as, "the newer of a document, the more im- portant it is" or "the closer to the peak point a document is, the more important it is". All these methods have im- proved retrieval effectiveness somehow. However, it is hard to summarize the temporal role in ranking of microblog search to one straight forward assumption as above. In this paper, our study on temporal distributions of relevant documents of different queries shows the complexity of temporal role in ranking; therefore, simple straight forward assumptions are not accurate. We proposed to use the temporal and entity evidences of query-document pairs to train a time-sensitive learning to rank model to tackle this problem. As for temporal features, both global features of query and local features of query-documents pair are extracted. Experimental results show that TLTR significantly improves the retrieval effectiveness over existing time aware ranking models on TREC Microblog Track 2011-2012 data set.
出处
《中文信息学报》
CSCD
北大核心
2015年第4期175-182,共8页
Journal of Chinese Information Processing
基金
中国科学院先导专项课题(XDA06030200)
关键词
时间敏感
排序学习
微博搜索
time-sensitive
learning to rank
microblog search