摘要
随着当前网络信息资源的急剧膨胀,传统的检索系统已经难以在处理海量数据时提供高效的、可靠的服务。针对该情况,设计并实现一个基于Solr的分布式全文检索系统。系统通过网络爬虫抓取网页信息,将抓取的信息储存为文本文件;然后利用Solr索引处理模块,在多台计算机节点上并行创建索引,有效地提高系统建立索引的速度;系统通过Zoo-keeper管理集群,将搜索模块设计为分布式,有效地提高检索性能;最后设计了友好的用户界面。目前,系统可以在百万数据量的环境下稳定运行,具有较强的实用价值。
With the rapid growth of network information resources,traditional retrieval system has been difficult to provide efficient and reliable services to the mass data.In response to this situation,this paper designs a distributed full-text retrieval system based on Solr.The system uses a Web crawler to collect information which is stored as text files.Then the system creates indexes in parallel on multiple computers through Solr index module.It turns out that the design improves the indexing speed effectively.The system improves the retrieval performance by applying Zookeeper management and distributed design in search module.Finally a user-friendly interface is designed.Currently,the system can operate millions of data stably and has a strong practical value.
出处
《计算机与现代化》
2012年第11期171-176,共6页
Computer and Modernization