期刊文献+

主题爬虫的解决方案 被引量:10

A Resolution Scheme of the Theme Crawler
下载PDF
导出
摘要 对传统的综合性搜索引擎召回率和精确率低的问题,可以用主题爬虫取代其中的普通爬虫构建主题搜索引擎,为用户提供信息检索服务,满足用户伴随信息多元化的增长而不断扩展的信息需求.文中研究了主题爬虫设计中的相关度分析、概念分析和链接分析等关键技术,通过实验给出了一系列解决方案.结果表明,主题爬虫的精度高于普通爬虫,具有可行性与实用性,并有助于主题搜索引擎的设计和主题信息的采集. In order to avoid the low quantity coverage percentage and the low quality coverage percentage of the traditional all-around search engines, the theme crawler is used to replace the normal crawler to construct a theme search engine, thus providing an information retrieval service for users and meeting the information requirements which are growing increasingly with the increase of information species. In this paper, the key technologies for the theme crawler, such as the correlativity analysis, the concept analysis and the link analysis, are discussed. A series of resolutions to these technologies are then provided by experiments. The results indicate that the theme crawler is feasible and practical, with higher precision than the normal one, and it is helpful to the design of theme search engines and the retrieval of theme information.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2004年第z1期137-141,共5页 Journal of South China University of Technology(Natural Science Edition)
关键词 搜索引擎 主题爬虫 相关度分析 概念分析 链接分析 信息采集 search engine theme crawler correlativity analysis concept analysis link analysis information retrieval
  • 相关文献

参考文献3

  • 1[7]Page L,Brin S,Motwani R,et al. The PageRank citation ranking:Bringing order to the Web [ EB/OL]. http://www-db. stanford. edu/~ backrub/pageranksub. ps, 1998 -01 - 20/2003 - 03 - 25. 被引量:1
  • 2[8]Brin S,Page L. The anatomy of a large-scale hypertextual web search engine [J]. Computer Networks and ISDN Systems, 1998,30:107 - 117. 被引量:1
  • 3曹军.Google的PageRank技术剖析[J].情报杂志,2002,21(10):15-18. 被引量:70

二级参考文献8

  • 1R. Baeza Yates, B. Ribeiro Neto. Modern Information Retrieval ACM Press,1998 被引量:1
  • 2Google inc. http: //www. google. com 被引量:1
  • 3Dell Zhang, Yisheng Dong. An Efficient Algorithm to Rank Web Resources.The 9th International World Wide Web Conference, 2000. http: //www9. org/w9cdrom/251/251. html 被引量:1
  • 4Jon Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 1999;46(5) 被引量:1
  • 5L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing order to the Web. http://www - db. stanford. edu/~ backrub /pageranksub.ps, January, 1998. 被引量:1
  • 6S. Brin, L. Page The Anatomy of a Large- scale Hypertextual Web Search Engine Computer Networks and ISDN Systems, 1998 被引量:1
  • 7Arvind Arasu, Junghoo Cho. Hector Garcia - Molina, Andreas Paepcke, Sriram Raghavan. Searching the Web. ACM Transactions on Intemet Technology,2001 ;1(1) 被引量:1
  • 8Taher Haveliwala. Effcient Computation of Pagerank. Technical Report 1999 -31, Database Group, Computer Science Department, Stanford University,February 1999. http: //dbpubs. stanford. edu /pub/1999 - 31. 被引量:1

共引文献69

同被引文献54

引证文献10

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部