摘要
互联网和信息技术的迅速发展,使得如何高效、快速地从海量信息中获取所需信息成为一个非常重要的问题。基于Lucene的搜索引擎的广泛应用,为我们实现高效查找提供了一种可能。本文研究了Lucene的系统结构及检索原理,提出了对PDF文档进行文本解析的方法,实现了对PDF文档文本数据的提取。
With the rapid development of Internet and information technology,how to effectively and quickly retrieval Gigabytes has become an absolutely important problem.However,the great Application of Search Engine Based on Lucene has offered a potential method.In this paper,the structure and the index principles of Lucene has been analyzed,the method of text analyzing for PDF documents has been put forward,and text extracting PDF documents has been implemented.
出处
《信息与电脑(理论版)》
2009年第11期66-66,共1页
China Computer & Communication