摘要
在信息检索领域,相似度评价模型是一个重要的研究课题。基本的评价模型有布尔模型,向量空间模型和概率模型。后两种模型在许多的信息检索系统中被采用,但是它们都没有考虑查询词在文档中的位置信息对相似性度量起到的作用。一些研究考虑了诸如HTML标签之类的信息,但是确定加权系数的方案不是太理想。针对这些问题,文中提出了一种基于加权词频的相似度评价模型(Weighted Term Frequency Model,WTFM),而引入的权重系数可以通过模拟退火算法学习得到。实验结果表明,权重系数的引入提高了系统的相关度评价质量。
Relevance evaluation model is an important research issue in the field of information retrieval. The basic information retrieval models are boolean model, vector space model and probabilistic model. The latter two models are implemented in many retrieval systems extensively but the different position of query term in every document is ignored. Some researches have considered the information HTML tags but the scheme of assigning weighted parameters is not ideal. In this paper, WTFM(Weighted Term Frequency Model) is proposed according to the existence of term frequency (TF). And these weighted coefficients are learned by simulated annealing algorithm. The results of the experiments show that the introduction of TF's weights brings improvements to the system.
出处
《计算机仿真》
CSCD
2008年第1期134-137,239,共5页
Computer Simulation
基金
国家自然科学基金(60672056)
微软亚洲研究院基金项目(06120809)
关键词
信息检索
相关度评价
模拟退火算法
Information retrieval
Relevance evaluation
Simulated annealing algorithm