摘要
对传统的综合性搜索引擎召回率和精确率低的问题,可以用主题爬虫取代其中的普通爬虫构建主题搜索引擎,为用户提供信息检索服务,满足用户伴随信息多元化的增长而不断扩展的信息需求.文中研究了主题爬虫设计中的相关度分析、概念分析和链接分析等关键技术,通过实验给出了一系列解决方案.结果表明,主题爬虫的精度高于普通爬虫,具有可行性与实用性,并有助于主题搜索引擎的设计和主题信息的采集.
In order to avoid the low quantity coverage percentage and the low quality coverage percentage of the traditional all-around search engines, the theme crawler is used to replace the normal crawler to construct a theme search engine, thus providing an information retrieval service for users and meeting the information requirements which are growing increasingly with the increase of information species. In this paper, the key technologies for the theme crawler, such as the correlativity analysis, the concept analysis and the link analysis, are discussed. A series of resolutions to these technologies are then provided by experiments. The results indicate that the theme crawler is feasible and practical, with higher precision than the normal one, and it is helpful to the design of theme search engines and the retrieval of theme information.
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2004年第z1期137-141,共5页
Journal of South China University of Technology(Natural Science Edition)
关键词
搜索引擎
主题爬虫
相关度分析
概念分析
链接分析
信息采集
search engine
theme crawler
correlativity analysis
concept analysis
link analysis
information retrieval