摘要
受到学习模型爬虫的启发,主题爬虫结合网页内容和链接信息来估计网页对给定主题的相关性,得到两个新型的爬虫变种。新型爬虫强调的不仅是有学习相关网页内容的能力,而且有引向相关网页的能力,并且在查找特定主题方面的能力有质的提高。
Inspired by learning crawler, this paper obtains two new focused crawlers which combine Web page content and link information. The new focused crawlers emphasis not only on the capability of learning the content of relevant pages but also paths leading to relevant pages. Furthermore, the new crawlers' ability to find more specific topics has improved.
出处
《微型机与应用》
2011年第5期72-74,80,共4页
Microcomputer & Its Applications
基金
广东省软科学研究项目(2009B070300052)
关键词
主题爬虫
学习型爬虫
学习型主题爬虫
focused crawler
learning crawler
learning focused crawler