期刊文献+

一种Deep Web聚焦爬虫爬行策略 被引量:2

A Deep Web Sources Focused Crawler's Crawling Strategy
下载PDF
导出
摘要 实现大规模Deep Web数据源集成是方便用户使用Deep Web信息的一种有效途径.Deep Web爬虫是Deep Web数据源集成的关键组成部分.提出一种针对结构化Deep Web的聚焦爬虫爬行策略.通过对查询接口的特征分析来判断Deep Web数据源的主题相关性.同时,在评价链接重要性时,综合考虑了页面内容的主题相关性和链接的相关信息.实验证明该方法是有效的. Large-seale integration of Deep Web sources is an efficient way to meet users' need for Deep Web information. Deep Web crawler is a key component of data sources integration. This paper presents a focus crawler strategy for structural Deep Web. When evaluating the correlation of Deep Web data sources to the subject, consider the characteristics of query interface. Evaluating the importance of the link, consider the correlation of the page content and links related information. Experiments indicate that this method is effective.
出处 《微电子学与计算机》 CSCD 北大核心 2009年第8期117-120,共4页 Microelectronics & Computer
基金 国家自然科学基金项目(60673092) 2008年江苏省重大科技支撑与自主创新项目(BE2008044) 江苏省现代企业信息化应用支撑软件工程技术研发中心开放基金项目(SX200904)
关键词 结构化Deep WEB数据源 聚焦爬虫 决策树分类器 structural Deep Web sources focused crawler decision tree classifier
  • 相关文献

参考文献1

二级参考文献6

  • 1Kevin Chang Chenchuan. Structured Databases on the Web: Observations and Implications[J]. SIGMOD Record, 2004, 33(3): 61-65. 被引量:1
  • 2Cho J, Garcia-Molina H, Page L. Efficient Crawling Through URL Ordering[J]. Computer Networks and ISDN Systems, 1998, 30(7): 161-172. 被引量:1
  • 3Rennie J, McCallum A. Using Reinforcement Learning to Spider the Web Efficiently[C].Proc. of the International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers Inc., 1999: 335-343. 被引量:1
  • 4Diligenti M, Coetzee F M, Lawrence S, et al. Focused Crawling Using Context Graphs[C].Proc. of the International Conference on Very Large Database. San Francisco, USA: Morgan Kaufmann Publishers Inc., 2000: 527-534. 被引量:1
  • 5Kevin Chang Chenchuan, He Bin, Zhang Zhen. Toward Large-scale Integration: Building a MetaQuerier over Databases on the Web [C].Proc. of Conference on Innovative Data Systems Research. [S. l]: Asilomar, 2005. 被引量:1
  • 6Barbosa L, Freire L. Searching for Hidden-Web Databases[C].Proc. of WebDB'05. Baltimore, USA: [s. n.]. 2005. 被引量:1

共引文献10

同被引文献26

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部