期刊文献+

基于Web的新闻采集系统 被引量:2

News Extraction System Based on Web
下载PDF
导出
摘要 随着Intemet的飞速发展,Web已经发展成为一个巨大的信息资源库,但是目前Web数据大都以HTML形式出现,这使得应用程序无法直接利用Web上的海量信息。针对这一问题,出现了Web信息采集技术。该文对信息采集技术进行了探讨,并在此基础上实现了一个基于Web的新闻采集系统.该系统可根据用户使用正则袁达式编写的采集规则快速而精确的采集目标网页中的信息,保存在本地数据库中,用于内部使用或外网发布。 With the rapid development of Intemet, Web has become a huge, distribution and sharing of information resources library. But most of Web-data are represented with HTML. So the massive Web-data are not available to the applications. For this purpose, the technology of Web-information extraction appeared. In this thesis, we discussed the technology of information extraction, and on this basis to achieve a Web-based news extraction system, which users can use regular expressions to make extraction rule and use it to extarct the Web-information quickly and accurately, save in local database, for internal use or released them to the Intemet.
作者 胡静芳 沈亚斌 HU Jing-fang, SHEN Ya-bin(1.School of Information Engineering, Jingdezhen Ceramic lnsititute, Jingdezhen 333403, China;2. China Helicopter Research and Development Institute, Jingdezhen 333001, China)
出处 《电脑知识与技术》 2009年第7期5111-5113,共3页 Computer Knowledge and Technology
关键词 WEB信息采集 正则表达式 采集规则 Web-information extraction regular expressions extraction rule
  • 相关文献

参考文献7

二级参考文献71

  • 1[1]Deutsch A , Fernandez M , Suciu D . Storing Semistructured Data with STORED. 1999 ACM SIGMOD International Conference onManagement of Data, 1999,28(2): 431-442 被引量:1
  • 2[2]Papakonstantinou Y , Vassalos V . Query Rewriting for Semistructured Data. 1999 ACM SIGMOD International Conference onManagement of Data, 1999,28(2): 455-466 被引量:1
  • 3[3]alvanese D , Giacomo G D , Lnzerini M , et al . Rewriting of Regular Expressions and Regular Path Queries. Proc. PODS Conf., 1999:194-204 被引量:1
  • 4[4]Mchugh J, Abiteboul S , Goldman R , et al . Lore: A Database Management System for Semostructured Data. S IGMOD Record, 1997, 26(3): 54-56 被引量:1
  • 5[5]Fernandez M , Suciu D . Optimizing Regular Path Expressions Using Graph Schemas. Proc. ICDE Conf., 1998:14-23 被引量:1
  • 6[16]Hobbs J,Appelt D,Bear J et al.FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text[C].In:Roche,Schabes eds. Finite State Devices for Natural Language Processing, MIT Press,Cambridge MA, 1996 被引量:1
  • 7[17]Appelt D E.Introduction to Information Extraction[J].AI COMMUNICATIONS, 1999; 12(3) 被引量:1
  • 8[18]Yangarber R.Scenario Customization for Information Extraction[D].Ph D Thesis.New York University,2001-01 被引量:1
  • 9[19]Cowie J, Lehnert W.Information Extraction[J].Communications of the ACM, 1996;39(1) 被引量:1
  • 10[20]Grishman R Adaptive information extraction and sublangu age analysis[C].In:Proceedings of IJCAI-2001 Workshop on Adaptive Text Extraction and Mining,2001 被引量:1

共引文献229

同被引文献10

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部