期刊文献+

基于Internet的农业信息资源采集系统

The Agriculture Information Collection System Based on Internet
下载PDF
导出
摘要 随着Internet的迅速发展,其信息容量呈爆炸性增长,但信息的庞杂无序也给人们更好地利用信息带来了困难。这就要求人们必须能够对Web信息资源进行及时采集、高效处理和科学组织。为此,提出了一种在HTML结构分析和特征词匹配基础上实现网页特定信息采集的新方法,并运用此方法设计了一个农业信息资源采集系统。实践证明,此方法在Web信息采集系统中具有较强的实用性和灵活性。 Along with the Internet rapid development, its information capacity presents an explosion growth. But the numerous and disorderly information also gave the people the better use information to bring the difficulty. This requests the people to have to be able to the Web information resource carry on collecting in time, highly effective processing and scientifically organizes. This paper proposed one kind realizes the homepage specific information gathering new method based on the HTML structure analysis and the characteristic words match, and designed the agriculture information resource gathering system by this method. The practice proved that, this method has a stronger usability and the flexibility in the Web information collection system.
出处 《农机化研究》 北大核心 2008年第10期139-141,共3页 Journal of Agricultural Mechanization Research
基金 河北农大大学生科技创新基金(07-KJ-026)
关键词 信息采集 互联网 HTML 特征词 information collection Internet HTML characteristic words
  • 相关文献

参考文献5

二级参考文献29

  • 1陈琼,苏文健.基于网页结构树的Web信息抽取方法[J].计算机工程,2005,31(20):54-55. 被引量:24
  • 2杨明福.计算机网络[M].北京:电子工业出版社,1999.123-127. 被引量:6
  • 3Yi Lan,Liu Bing.Web Page Cleaning for Web Mining through Feature Weighting[C].In:the proceedings of Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03),Acapulco,Mexico,2003-08 被引量:1
  • 4Bar-Yossef Z,Rajagopalan S.Template Detection via Data Mining and its Applications[C].In:the proceedings of 11th World Wide Web conference (WWW 2002),Hawaii,USA,2002-05 被引量:1
  • 5Lin S-H,Ho J-M.Discovering Informative Content Blocks from Web Documents[C].In:the proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (SIGKDD'02),Alberta,Canada,2002 被引量:1
  • 6Deng Cai,Yu Shipeng,Wen Jirong et al.VIPS:a vision-based page segmentation algorithm[R].Microsoft Technical Report,MSR-TR-2003-79,2003 被引量:1
  • 7Gupta S,Kaiser G,Neistadt D et al.DOM based Content Extraction of HTML Documents[C].In:the proceedings of the 12th World Wide Web conference (WWW 2003),Budapest,Hungary,2003-05 被引量:1
  • 8Aidan Finn,Nicholas Kushmerick,Barry Smyth.Fact or fiction:Content Classification for digital libraries[C].In:Joint DELOS-NSF Workshop on Personalisation and Recommender Systems in Digital Libraries,Dublin,2001 被引量:1
  • 9CyberNeko HTML Parser.http://www.apache.org/~andyc/neko/doc/html/index.html 被引量:1
  • 10Laender H F, Ribeiro-Neto B A, A S da Silva, et al.A Brief Survey of Web Data Extraction Tools.SIGMOD Record, 2002, 31(2): 84-93 被引量:1

共引文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部