摘要
Lucene是一个纯Java实现的高性能、可扩展的全文信息检索工具库,可以很方便地把它融入到应用程序中来增加索引和搜索功能.分析了Lucene的索引文件结构和搜索排序算法,探讨了Lucene中计算文档与查询项相关度的向量空间模型(VSM).最后,通过实验验证了索引过程的建立以及如何提高索引性能.
As an information retrieval library written in Java, Lucene, with its high performance and easy to scale, can easily add indexing and searching capabilities to applications. This paper analyzes the structure of index file and ranking algorithm, and discusses the vector space model used in Lucene to compute the relevance between documents and query. We do an experiment to test the indexing process and discuss how to improve the performance of index in Lucene at the end.
出处
《河南工程学院学报(自然科学版)》
2008年第4期40-43,共4页
Journal of Henan University of Engineering:Natural Science Edition