期刊文献+

支持DOM模板可视化配置的网页抽取方法 被引量:4

A Webpage Extraction Method Supporting Visual Configuration of DOM Template
下载PDF
导出
摘要 为提高网页采集的效率和精准度,提出一种支持可视化模板配置的抽取方法。此方法通过在目标网页中点选元素的方式,自动生成基于DOM路径的抽取模板。将详细介绍基于DOM路径的抽取原理;研究可视化模板配置的关键技术;并将此方法应用于新闻采集系统,以测试其实用效果。 In order to improve the efficiency and precision of Web acquisition, proposes an extraction method supporting visualized template configuration. This method automatically generates a template based on the DOM path by clicking on the elements in the destination page. Introduces the principle of the extraction method in detail, and discusses the key technology of visualized template configuration, and applies this method to the news acquisition system to test its practical effect.
作者 李健 马延周 LI Jian;MA Yan-zhou(Basic Department of Luoyang Campus,the PLA Information Engineering University,Louyang 471003)
出处 《现代计算机》 2018年第7期56-60,共5页 Modern Computer
基金 国家自然科学基金重大项目(No.11590771)
关键词 网络爬虫 网页抽取 DOM模板 可视化配置 Web Crawler Webpage Extraction DOM Template Visual Configuration
  • 相关文献

参考文献4

二级参考文献45

  • 1Char1esFG Pau1P 张利译.XML实用技术[M].北京:清华大学出版社,1999.. 被引量:1
  • 2中国互联网络信息中心.第32次中国互联网络发展状况统计报告[EB/OL].http://www.cnnic.net.cn/hlwfzyi/hlwxzbg/hlwtjbg/20130717_40664.htm,2014-02-04. 被引量:13
  • 3Pretzsch S, Muthmann K, Schil A. FODEX-Towards Generic Data Extraction from Web Forums//Proc of the 26th International Con- ference on Advanced Information Networking and Applications. Fukuoka, Japan, 2012 : 821-826. 被引量:1
  • 4Liu W, Yan H L, Xiao J G. Automatically Extracting User Reviews from Forum Sites. Computers and Mathematics with Applications,2011, 62(7) : 2779-2792. 被引量:1
  • 5Liu J, Song X Y, Jiang J T, et al. An Unsupervised Method for Au- thor Extraction from Web Pages Containing User-Generated Content //Proe of the 21st ACM International Conference on Information and Knowledge Management. Maui, USA, 2012:2387-2390. 被引量:1
  • 6Song X Y, Liu J, Cao Y B, et al. Automatic Extraction of Web Da- ta Records Containing User-Generated Content // Proe of the 19th ACM International Conference on Information and Knowledge Man- agement. Toronto, Canada, 2010:39-48. 被引量:1
  • 7Yang J M, Cai R, Wang Y D, et al. Incorporating Site-Level Knowledge to Extract Structured Data from Web Forums// Proe of the 18th International Conference on World Wide Web. Madrid, Spain, 2009:181-190. 被引量:1
  • 8Van der Meer ,1, Frasinear F. Automatic Review Identification on the Web Using Pattern Recognition. Software: Practice and Experi- ence, 2013, 43(12): 1415-1436. 被引量:1
  • 9Yin X X, Tan W Z, Li X, et al. Automatic Extraction of Clickable Structured Web Contents for Name Entity Queries// Proc of the 19th International Conference on World Wide Web. Raleigh, USA, 2010:991-1000. 被引量:1
  • 10Hong J L, Tan E X, Fanzi F. Data Extraction for Search Engine Using Safe Matching// Proc of the 24th Australasian Joint Confer- ence on Artificial Intelligence. Perth, Australia, 2011 : 759-768. 被引量:1

共引文献17

同被引文献33

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部