期刊文献+

面向政府采购数据的工程化采集方案设计 被引量:2

Design of engineering collection scheme for government procurement data
下载PDF
导出
摘要 政府采购过程中产生的大量招投标数据,基本都以Web文本的形式向公众呈现,难以获取结构化数据,严重制约着公众对政府采购过程的知情、分析和监督。本文提出一种基于Web挖掘的政府采购数据的工程化采集方案,构建了一套面向政府采购公开数据的结构化数据形成体系。首先,通过对招投标信息来源和结构的分析,设计基于Scrapy爬虫框架的工程化数据抓取平台;其次,结合基于规则和基于统计两种抽取方式,设计专用信息抽取器;最后,根据领域特点建立阶段性数据清洗中心,分层过滤数据,最终输出可用于分析和挖掘的结构化数据。系统实验结果证明了该方案的可行性和优越性,为政府采购信息公开发挥监督和引导职能提供了有力的技术支撑。 A large amount of bidding information is generated in the process of government procurement,which is presented to the public in the form of Web text.It is difficult for people to obtain structured data behind it,which seriously restricts the ability of realization,analysis and supervision of the public for the process of government procurement.This paper presents an engineering data collection scheme based on Web mining for government procurement data,and constructs a system for the structured data in public government procurement field.At first,an engineering data crawling platform based on Scrapy crawler framework is designed by analyzing the source and structure of bidding information.Secondly,a special information extractor is designed by combining rulebased and statistics-based information extraction methods.Finally,a stage data cleaning center is established according to the characteristics of the field where the data is filtered hierarchically,and the final output can be used for analysis and mining.The system experimental results prove the feasibility and superiority of the scheme,and provide strong technical support for the supervision and guidance function through the public information of government procurement.
作者 王宏 夏禹 常静静 WANG Hong;XIA Yu;CHANG Jingjing(Collage of Computer Science,Xi'Anshiyou University,Xi'an 710065,China)
出处 《智能计算机与应用》 2020年第7期170-175,共6页 Intelligent Computer and Applications
基金 教育部产学合作协同育人项目(201802224022)
关键词 政府采购 WEB挖掘 Scrapy爬虫 信息抽取 数据清洗 Government procurement Web mining Scrapy crawler Information extractor Data cleaning
  • 相关文献

参考文献5

二级参考文献21

  • 1ERahm, H HDo. Data cleaning: Problems and current approaches[J]. IEEE DATA ENGINEERING BULLETIN, 2000, 23(4): 3-13. 被引量:1
  • 2RBaxter, PChristen, TChurches. A comparison of fast blocking methods for record linkage[J].KDD WORKSHOPS, 2003: 25-27. 被引量:1
  • 3W EWinkler. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage[J].PROCEEDINGS OF THE, 1990:8. 被引量:1
  • 4V ILevenshtein. Binary codes capable of correcting deletions, insertions and reversals[J].SOVIET PHYSICS DOKLADY, 1966, 10(10): 707-710. 被引量:1
  • 5T FSmith, Waterman M S. Identification of common molecular subsequences[J]. JOURNAL OF MOLECULAR BIOLOGY, 1981 (1): 195-197. 被引量:1
  • 6WWCohen. Integration of heterogeneous databases without common domains using queries based on textual similarity[J].ACM SIGMOD RECORD,1998, 27(2): 201-212. 被引量:1
  • 7LGravano, P Glpeirotis, NKoudas, et al. Text joins in an RDBMS for web data integration[C]. Proceedings of the 12th international conference on World Wide Web. New York:ACM, 2003: 90-101. 被引量:1
  • 8D RWilson. Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage[C].Neural Networks (IJCNN),USA:IEEE, 2011: 9-14. 被引量:1
  • 9G.P.Hettiarachchi, N.N.Hettiarachchi,D.S. Hettiarachchi,et al. Next generation data classification and linkage: Role of probabilistic models and artificial intelligence[C]. 2014 Global Humanitarian Technology Conference (GHTC), USA:IEEE,2014: 569-576. 被引量:1
  • 10PChristen. Automatic training example selection for scalable unsupervised record linkage[M]. Berlin :Springer, 2008: 511-518. 被引量:1

共引文献9

同被引文献18

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部