摘要
在全球范围内,收集、处理和分析互联网公开可获取的数据正逐渐成为一种趋势,并在多个政府部门和行业得到应用。本文提出了社会大数据的基本概念,分析了自然资源领域社会大数据获取和处理中面临的问题与挑战,设计并实现了数据采集、分析处理、分布式存储和系统应用4层架构的自然资源社会大数据监测与分析系统,研究了分布式爬虫、海量文本数据集成存储和基于预训练语言模型的语义挖掘等系统实现关键技术。在实际应用中,对自然资源领域舆情进行了常态化监测分析,为及时获取自然资源政策的舆论反馈、快速掌握地方自然资源管理工作情况、实时精准监测全国住宅用地市场供应和交易情况、分析研判房地产市场形势等工作提供技术支撑。
To collect, process and analyze the available data exposed on the Internet has become a global growing trend and has been used in multiple government sectors and industries. This paper proposes the basic concept of social big data,analyzes the problems and challenges in social big data acquisition and processing in the field of natural resources, and designs and implements a natural resources social big data monitoring and analysis system with four layers which includes data acquisition, analysis and processing, distributed storage and system application. Some key technologies like distributed web crawler, integrated storage of massive text data and semantic mining based on pre-trained language model are also studied in this paper. In practical application, the system is used for routine monitoring analysis of the natural resources sector public opinion, and provides technical supports for timely public opinion feedback of natural resources policy,quickly grasping the local natural resources management work, precisely and real-time monitoring of the national housing land supply and trading market, and analyzing the real estate market situation.
作者
肖飞
刘文超
曾建鹰
张玉韩
王娜萍
XIAO Fei;LIU Wenchao;ZENG Jianying;ZHANG Yuhan;WANG Naping(Technology Innovation Center for Territorial&Spatial Big Data,MNR,Beijing 100812,China;Information Center of Ministry of Natural Resources,Beijing 100812,China)
出处
《自然资源信息化》
2022年第5期106-113,共8页
Natural Resources Informatization
基金
自然资源部部门预算项目“自然资源大数据应用与服务”(121101000000190008)。
关键词
社会大数据
自然资源
监测分析
语义挖掘
自然语言处理
social big data
natural resources
monitoring and analysis
semantic mining
natural languages processing