摘要
网页信息抽取包装器的生成方法很多,按自动化程度可分为手工、半自动和全自动三类,本文旨在研究全自动生成网页信息抽取包装器的主要技术方法,首先构建了对应的分类体系;其次对近年来主流的15种包装器生成技术进行了定性分析和分类比较;最后提出5点发展趋势。
There are many wrapper generation methods for Web information extraction. According to the automation degree, they can be divided into 3 categories: manual, semi-automatic and fully automatic. This paper aims to study the main technologies of fully automatic wrapper generation for Web information extraction. Firstly, a corresponding classification system is constructed. Secondly, 15 major fully automatic wrapper generation technologies in recent years are analyzed qualitatively and compared according to classifications, Finally, 5 development trends are summarized.
出处
《情报理论与实践》
CSSCI
北大核心
2010年第1期100-104,共5页
Information Studies:Theory & Application
关键词
信息抽取
包装器
信息技术
深层网
information extraction
wrapper
information technology
deep Web