摘要
为了能及时采集到有关网页信息,搜索引擎应根据相应网站及其更新速度,动态调整其信息采集的频度.本文就模型化网页更新过程以及根据相关性动态调整搜索引擎的信息采集频度进行了探讨.一方面使用泊松过程来描述网页更新并分析了搜索引擎如何有效完成信息采集;另一方面采用基于网页从属关系和内容分析的相关性来调节该过程,使得在进行信息采集与数据更新时的针对性更强.实验表明了该方法的有效性.
As for a search engine, keeping up with the evolving Web is necessary. We concern about modeling on an effective Web page collecting policy and propose an adaptive refresh strategy based on the relevance, which is used to adjust the process. On one hand, we think the refresh behavior follows the properties of the Poisson process and analyze the strategy on how to crawl the Web effectively. Further, the relevance is on the basis of the affiliation detecting and the contents analysis. It is used to adjust the process. This makes the process more targeted. The experimental results validate the feasibility of the approach.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2007年第10期1984-1988,共5页
Acta Electronica Sinica