摘要
结合社会媒体大数据获取城市降雨灾情数据和开展灾害风险评估是一种新的可行途径。但互联网数据量大,有效处理数据是工作中的难点。为此提出利用社交媒体数据,并基于降雨专业词汇、广州地区语言特色、支持向量机算法以构建降雨灾情文档分类模型。同时根据数据采集与预处理、降雨灾情文档分类模型、灾情权重分级和热点分析的流程设计了广州线上降雨灾情检测系统。该系统采用B/S架构,利用WEB与GIS技术,实现了灾情应用管理、风险告警、数据分类、数据过滤、数据采集的功能。实际运行效果表明,系统利用机器学习算法解决了大量数据处理效率低下的问题,同时通过灾情热点分析结合利用气象雷达、自动站观测数据进一步提高灾情提取的准确度,以自动检测和评估降雨雨情、灾情的状态是可行的,在灾情收集业务应用上具有一定的参考价值。
Combining social media with big data to obtain rainfall disaster data and carry out disaster risk assessment is a new feasible way.However,the Internet has a large amount of data,and it is difficult to deal with data effectively.Based on the social media data,professional vocabulary,Cantonese and support vector machine algorithm,rainfall disaster text categorization model is established.Guangzhou online rainfall disaster detection system is designed based on the processes of data collection and preprocessing,rainfall disaster text categorization model,disaster weight classification and hot spot analysis.The system uses B/S architecture,WEB and GIS technology to realize the functions of disaster application management,risk alarm,data classification,data filtering,and data acquisition.The actual operation result shows that the system uses machine learning algorithm to solve the problem of low efficiency of a large number of data processing,and it is feasible to further improve the accuracy of disaster extraction by analyzing the hot spot of disaster and using the observation data of meteorological radar and automatic station,so as to automatically detect and evaluate the state of rainfall and disaster.It has a certain reference value in the application of disaster collection business.
作者
黎洁仪
梁之彦
范绍佳
梁家鸿
LI Jie-yi;LIANG Zhi-yan;FAN Shao-jia;LIANG Jia-hong(Guangzhou Emergency Early Warning Release Center,Guangzhou 511430,China;Guangdong Provincial Observation and Research Station for Climate Environment and Air Quality Change in the Pearl River Estuary,Guangzhou 510275,China;Guangzhou Meteorological Observatory,Guangzhou 511430,China;School of Atmospheric Science,Sun Yat-sen University,Guangzhou 510275,China)
出处
《计算机技术与发展》
2022年第8期191-196,共6页
Computer Technology and Development
基金
广东省科技计划项目(科技创新平台类)(2019B121201002)
广州市科技计划项目(201803030014)。
关键词
数据挖掘
机器学习
灾情提取
文本分类
社交媒体
data mining
machine learning
disaster information extraction
text categorization
social media