摘要
TREC 2004 R obust任务有一项新要求,就是要把检索主题按照从易到难的顺序排列。针对新的要求,提出了基于单词歧义性大小的检索主题难易度模型。根据W ordN et和它附带的B row n语料库构造了单词义项分布词典,再把检索主题中的单词按歧义性大小分为7类,通过计算平均单词容易度来度量检索主题的难度。实验结果表明,该模型有一定的预测能力。最后用此模型预测了TREC 2004 R obust任务的250个检索主题的难易度。
TREC2004 robust track requires predicting the relative difficulty of the topics. A topic difficulty model based on word sense ambiguity was proposed in this paper. After constructing a sense distribution dictionary using WordNet and Brown corpus, the words in a topic could be put into seven classes. Average word easiness reflected the topic difficulty. Experimental results show that the model can predict topic difficulty to some extent. Finally, according to the model, the relative difficulty of 250 topics ...
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2005年第S1期1833-1837,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家"八六三"高技术项目(2002AA117010-8)
国家自然科学基金资助项目(60203022)
关键词
信息检索
文本检索会议
鲁棒性任务
检索主题难易度
义项分布
information retrieval
text retrieval conference TREC
robust track
topic difficulty
sense distribution