摘要
随着Internet的迅速发展,其信息容量呈爆炸性增长,但信息的庞杂无序也给人们更好地利用信息带来了困难。这就要求人们必须能够对Web信息资源进行及时采集、高效处理和科学组织。为此,提出了一种在HTML结构分析和特征词匹配基础上实现网页特定信息采集的新方法,并运用此方法设计了一个农业信息资源采集系统。实践证明,此方法在Web信息采集系统中具有较强的实用性和灵活性。
Along with the Internet rapid development, its information capacity presents an explosion growth. But the numerous and disorderly information also gave the people the better use information to bring the difficulty. This requests the people to have to be able to the Web information resource carry on collecting in time, highly effective processing and scientifically organizes. This paper proposed one kind realizes the homepage specific information gathering new method based on the HTML structure analysis and the characteristic words match, and designed the agriculture information resource gathering system by this method. The practice proved that, this method has a stronger usability and the flexibility in the Web information collection system.
出处
《农机化研究》
北大核心
2008年第10期139-141,共3页
Journal of Agricultural Mechanization Research
基金
河北农大大学生科技创新基金(07-KJ-026)
关键词
信息采集
互联网
HTML
特征词
information collection
Internet
HTML
characteristic words