期刊文献+

全自动生成网页信息抽取包装器的主要技术方法研究 被引量:4

Research on the Major Technologies of Fully Automatic Wrapper Generation for Web Information Extraction
原文传递
导出
摘要 网页信息抽取包装器的生成方法很多,按自动化程度可分为手工、半自动和全自动三类,本文旨在研究全自动生成网页信息抽取包装器的主要技术方法,首先构建了对应的分类体系;其次对近年来主流的15种包装器生成技术进行了定性分析和分类比较;最后提出5点发展趋势。 There are many wrapper generation methods for Web information extraction. According to the automation degree, they can be divided into 3 categories: manual, semi-automatic and fully automatic. This paper aims to study the main technologies of fully automatic wrapper generation for Web information extraction. Firstly, a corresponding classification system is constructed. Secondly, 15 major fully automatic wrapper generation technologies in recent years are analyzed qualitatively and compared according to classifications, Finally, 5 development trends are summarized.
出处 《情报理论与实践》 CSSCI 北大核心 2010年第1期100-104,共5页 Information Studies:Theory & Application
关键词 信息抽取 包装器 信息技术 深层网 information extraction wrapper information technology deep Web
  • 相关文献

参考文献19

  • 1EIKVIL L. Information extraction from World Wide Web--a survey [R]. [S. l. ] : Norwegian Computing Center, 1999. 被引量:1
  • 2ALBERTO H F, ALTIGRAN S, et al. A brief survey of Web data extraction tools [J]. SIGMOD Rec. , 2002, 31 (2). 被引量:1
  • 3CRESCENZI V, MECCA G, MERIALDO P. RoadRunner: towards automatic data extraction from large Web sites [ C ]// VLDB2001 : 109-118. 被引量:1
  • 4MENG Xiaofeng, L U Hongjun, et al. SG-WRAP: a schemaguided wrapper generator data engineering [ C ]//Proceedings of 18th International Conference on Data Engineering, 2002. 被引量:1
  • 5ARASU A, GARCIA-MOLINA H. Extracting structured data from Web pages [ C]//ACM SIGMOD Conference, 2003. 被引量:1
  • 6LIU B, GROSSMAN R, ZHAI Y. Mining data records in Web pages [C]//KDD2003, 2003: 601-606. 被引量:1
  • 7WANG J, LOCHOVSKY F H. Data extraction and label assignment for Web databases [ C] //Proceedings of the 12th International Conference on World Wide Web, 2003: 187-196. 被引量:1
  • 8ANTON T. XPath-Wrapper induction by generalizing tree traversal patterns [C]//LWA2005, 2005: 126-133. 被引量:1
  • 9ZHAI Yanhong, LIU Bing. Automatic wrapper generation using tree matching and partial tree alignment [ C ]//2006 American Association for Artificial Intelligence, 2006. 被引量:1
  • 10ZHAI Yanhong, LIU Bing. Web data extraction based on partialTree alignment [ C ]//WWW 2005, 2005. 被引量:1

二级参考文献31

  • 1Georg Gottlob,Christoph Koeh.Monadie datalog and the expressive power of languages for Web information extraction[J].Journal of the ACM, 2004,51 ( 1 ):74-113. 被引量:1
  • 2Chang C H,Hsu C N,Lui Shao-cheng.Automatic information extraction from semi-structured Web pages by pattern discovery[J]. Decision Support Systems,2003,35(4):129-147. 被引量:1
  • 3Buttler D,Liu Ling,Pu C.A fully automated object extraction system for the World Wide Web[C]//Proceedings of the 2001 International Confference on Distrubuted Computing Systems,2001:361-370. 被引量:1
  • 4Muslea I,Minton S,Knoblock C A.Hierarchical wrapper induction for semistructured information sources[J],Autonomous Agents and Multi-Agent Systems,2001,4(1/2) : 93-114. 被引量:1
  • 5Kushmerick N.Wrapper induction: efficiency and expressiveness [J]. Artificial Intelligence, 2000,118 ( 1/2 ) : 15-68. 被引量:1
  • 6Meng X F,Wang H Y,Hu D D,et al.Schema guided wrapper maintenance:a demonstration[C]//Proceedings of ICDE2003,2003:750-752. 被引量:1
  • 7Grossi R,haliano G F.Suffix trees and their applications in string algorithms[C]//Proc 1st South American Workshop on String Processing, 1993 : 57-76. 被引量:1
  • 8Weiner P,Linear pattern matching algorithm[C]//Proc 14th IEEE Symposium on Switching and Automata Theory, 1973:1-11. 被引量:1
  • 9McCreight E M.A space-economical suffix tree construction algorithm[J].Journal of ACM, 1976,23 (2) : 262-272. 被引量:1
  • 10UkKonen E.On-line construction of suffix trees[J].Algorithmica, 1995,14: 249-260. 被引量:1

共引文献60

同被引文献18

引证文献4

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部