摘要
政府采购过程中产生的大量招投标数据,基本都以Web文本的形式向公众呈现,难以获取结构化数据,严重制约着公众对政府采购过程的知情、分析和监督。本文提出一种基于Web挖掘的政府采购数据的工程化采集方案,构建了一套面向政府采购公开数据的结构化数据形成体系。首先,通过对招投标信息来源和结构的分析,设计基于Scrapy爬虫框架的工程化数据抓取平台;其次,结合基于规则和基于统计两种抽取方式,设计专用信息抽取器;最后,根据领域特点建立阶段性数据清洗中心,分层过滤数据,最终输出可用于分析和挖掘的结构化数据。系统实验结果证明了该方案的可行性和优越性,为政府采购信息公开发挥监督和引导职能提供了有力的技术支撑。
A large amount of bidding information is generated in the process of government procurement,which is presented to the public in the form of Web text.It is difficult for people to obtain structured data behind it,which seriously restricts the ability of realization,analysis and supervision of the public for the process of government procurement.This paper presents an engineering data collection scheme based on Web mining for government procurement data,and constructs a system for the structured data in public government procurement field.At first,an engineering data crawling platform based on Scrapy crawler framework is designed by analyzing the source and structure of bidding information.Secondly,a special information extractor is designed by combining rulebased and statistics-based information extraction methods.Finally,a stage data cleaning center is established according to the characteristics of the field where the data is filtered hierarchically,and the final output can be used for analysis and mining.The system experimental results prove the feasibility and superiority of the scheme,and provide strong technical support for the supervision and guidance function through the public information of government procurement.
作者
王宏
夏禹
常静静
WANG Hong;XIA Yu;CHANG Jingjing(Collage of Computer Science,Xi'Anshiyou University,Xi'an 710065,China)
出处
《智能计算机与应用》
2020年第7期170-175,共6页
Intelligent Computer and Applications
基金
教育部产学合作协同育人项目(201802224022)