摘要
本文提出一种基于分类和关键词组抽取的信息检索算法。该算法利用文本分类和信息抽取技术辅助检索,避免了向量空间模型算法中时间复杂度过大,查准率不高的缺点。针对传统的信息检索性能指标无法有效地衡量检索结果的排序状况,本文还引入了排序误差率概念用于评价检索结果的排序。实验结果表明,所提算法与TFIDF算法、基于分类的交互式检索算法相比,具有更快的查询速度,更高的查准率和更小的排序误差率。
In this paper, a new information retrieval algorithm based on classification and key phrase extraction is proposed. Compared with traditional vector space model, this algorithm reduces time complexity and improves precision using of text classification and information extraction. Then a new performance criterion named ranking error is contributed to solve the problem that the traditional performance evaluation methodology cant evaluate the ranking results of retrieved documents efficiently. The experiment result shows that the proposed algorithm outperforms TF*IDF and Interactive Retrieval based on classification in speed, precision and ranking error.
出处
《系统仿真学报》
CAS
CSCD
2004年第5期1009-1012,1016,共5页
Journal of System Simulation
基金
国家自然科学基金(60272051)