摘要
当今时代,互联网技术发展迅速,人们的社交需求日益增长,网络爬虫技术已被成熟地应用于各大搜索引擎和检索领域。文章针对分布式爬虫系统中的任务分配问题,提出了具体的爬行任务分配算法。本算法建立了多维度计算机资源模型,采用优先匹配启发式算法进行爬行任务的静态分配,通过求解目标函数,使整个系统的费用开销最小化。实验证明该算法能在满足系统需求的前提下,当系统需求确定时,使得总费用最小。
Nowadays,with the rapid development of Internet technology and the growing social demand of people,the technology of web crawler has been applied to various search engines and retrieval fields.In order to solve the problem of task allocation in distributed crawler system,this paper proposes a specific algorithm of task allocation.This algorithm establishes a multi-dimensional computer resource model,uses the priority matching heuristic algorithm for static allocation of crawling tasks,and solves the objective function to minimize the cost of the whole system.Experiments show that the algorithm can meet the requirements of the system and minimize the total cost when the system needs are determined.
作者
刘凤
Liu Feng(Hebei GEO University,Shijiazhuang 050000,China)
出处
《无线互联科技》
2020年第4期129-130,共2页
Wireless Internet Technology
基金
河北省科技厅支持项目
项目编号:16210347。
关键词
资源分配
启发式匹配
网络爬虫
resource allocation
heuristic matching
Web crawler