摘要
为在海量数据中快速定位所需信息,解决因数据结构化、半结构化差异造成的检索困难,该文提出了一种基于Lucene的全文检索架构。根据分布式并行计算的设计原理,将检索任务分发给每个子节点服务器并行完成检索工作,最终由根节点服务器汇总结果。子节点服务器也采用了并行化的设计理念。验证性实验显示该文基于Lucene的全文检索架构与传统全文检索架构相比检索耗时降低55%以上。
In order to locate needed information in massive data and solve the search problem caused by the difference between structured and unstructured data,a full-text search architecture based on Lucene is proposed here. According to the design principle of the distributed parallel computing,the search tasks are dispatched to every child-node server,and the root-node server took responsibility for gathering results. Every child-node server adopts the design concept of parallel. Verification experiments show that compared with the traditional full-text search architecture,the search consuming time of the full-text search architecture based on Lucene proposed here decreases by 55% at least.
出处
《南京理工大学学报》
EI
CAS
CSCD
北大核心
2015年第6期692-697,共6页
Journal of Nanjing University of Science and Technology
基金
国家自然科学基金(61272419)
江苏省未来网络前瞻性研究项目(BY2013095-3-02)
江苏省产学研前瞻性项目(BY2014089
BY2013039
BY2013037)
连云港国际合作项目(CH1304)
关键词
全文检索
分布式并行计算
子节点服务器
根节点服务器
full-text search
distributed parallel computing
child-node servers
root-node servers