期刊文献+

Web论坛数据源增量爬虫的研究 被引量:5

Research on Web Forum Data Source Incremental Crawler
下载PDF
导出
摘要 针对Web论坛站点结构复杂、内容更新快等特点,提出一种针对论坛的增量信息采集算法,使用站点地图重建技术及网页更新频繁度估计方法,根据站点地图选择有效的链接,按照网页更新频度确定网页的采集频度。实验结果表明,该方法是有效的。 According to the characters of Web forum site such as the complex structure and quickly updating contents,an algorithm of forum incremental information sampling is presented.The technologies of site map rebuilding and estimating the frequency of page update are used.According to the site map,the crawler selects effective links.According to the frequency of Web page update,the crawler determines the crawling frequency of the Web page.Experimental results indicate this method is effective.
出处 《计算机工程》 CAS CSCD 北大核心 2010年第9期285-287,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60673092) 2008年江苏省重大科技支撑与自主创新基金资助项目(BE2008044)
关键词 WEB论坛 增量爬虫 站点地图 泊松模型 Web forum incremental crawler site map Poisson model
  • 相关文献

参考文献7

  • 1Cai Rui,Yang Jiangming,Lai Wei,et al.iRobot:An Intelligent Crawler for Web Forums[C]//Proc.of the 17th International World Wide Web Conference.Beijing,China:[s.n.],2008. 被引量:1
  • 2李魁,程学旗,郭岩,张凯.WWW论坛中的动态网页采集[J].计算机工程,2007,33(6):80-82. 被引量:11
  • 3Cho J,Garcia M H.The Evolution of the Web and Implications for an Incremental Crawler[C]//Proc.of the 26th Int'l Conf.on Very Large Data Bases.Cairo,Egypt:[s.n.],2000. 被引量:1
  • 4Cho J,Garcia M H.Estimating Frequency of Change[J].ACM Trans.on Internet Technology,2003,3(3):256-290. 被引量:1
  • 5Brewington B,Cybenko G.Keeping up with the Changing Web[J].IEEE Computer,2000,33(5):52-58. 被引量:1
  • 6Zheng Shuyi.Joint Optimization of Wrapper Generation and Template Detection[C]//Proc.of the 13th ACM Int'l Conf.on Knowledge Discovery and Data Mining.San Jose,CA,USA:[s.n.],2007. 被引量:1
  • 7Cho J,Garcia M H.Synchronizing a Database to Improve Freshness[C]//Proc.of 2000 ACM SIGMOD International Conference on Management of Data.Dallas,Texas,USA:[s.n.],2000. 被引量:1

二级参考文献5

  • 1Cho J,Garcia-Molina H,Page L.Efficient Crawling Through URL Ordering[C]//Proceedings of the 7^th International World Wide Web Conference.1998:161-172. 被引量:1
  • 2Najork M,Wiener J L.Breadth-first Crawling Yields High-quality Pages[C]//Proceedings of the 10^th International World Wide Web Conference.2001:114-118. 被引量:1
  • 3Li Jun,Furuse K,Yamaguchi K.Focused Crawl -ing by Exploiting Anchor Text Using DecisionTree[C]//Proceedings of the 14^th International World Wide Web Conference.2005:1190-1191. 被引量:1
  • 4Castillo C.Effective Web Crawling[D].University of Chile,2004. 被引量:1
  • 5Brin S,Page L.The Anatomy of a Large-scale Hypertextual Web Search Engine[J].Computer Networks and ISDN Systems,1998,30(1-7):107-117. 被引量:1

共引文献10

同被引文献67

引证文献5

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部