摘要
为提高监管部门对污染源排放单位监测数据的真实性、有效性以及客观性,提出一种融合Benford律和限制值模型的企业污染排放监测数据造假行为的风险评估方法。首先结合Benford律建立多层级数据是否造假的检测结构并以此生成非造假正样本数据集;其次,针对环境监测数据人为造假行为提出限制值造假模型(Limit Value Fake Model,LVFM),实现造假负样本数据集的生成;最后将扩充后的数据集输入多分类学习器进行预测识别。仿真结果表明,LVFM模型具有较好的造假隐蔽性;同时合成造假数据与真实造假数据基本相似度达到81%,即具有一定数据造假能力;综合多分类器预测对正负样本识别结果分析,随机森林(Random Forests,RF)获得了81.35%的准确率和71.33%的F1值,研究结果可为辅助生态安全监督以及决策的智慧管理提供参考。
In order to improve the authenticity,validity and objectivity of the monitoring data of pollution emission units,we propose a risk assessment method that integrates Benford's law and the Limit Value Fake model to assess the falsification behavior of enterprise pollution emission monitoring data.Firstly,we combined Benford's law to establish a multi-level data falsification detection structure and generate a non-falsified positive sample data set;Secondly,we proposed the Limit Value Fake Model(LVFM)to generate a falsified negative sample data set for artificial falsification of environmental monitoring data;Finally,we input the expanded data set into a multi-classification learner.The expanded data set was input to a multi-classification learner for prediction identification.The simulation results show that the LVFM model has good faking concealment;at the same time,the basic similarity between the synthetic faked data and the real faked data reaches 81%,i.e.,it has certain data faking ability.And the comprehensive multi-classifier prediction identification results analysis,Random Forests(RF)obtained 81.35% accuracy and 71.33% F1 value,and the research results can provide reference to assist the intelligent management of ecological safety supervision and decision-making.
作者
唐海涛
陈迪三
范广义
TANG Hai-tao;CHEN Di-san;FAN Guang-yi(School of Science,Guilin University of Aerospace Technology,Guilin Guangxi 541004,China;Research Center for Big Data Technology Application in Guat,Guilin Guangxi 541004,China;Nanjing High-Speed&Accurate Gear Group Co.,Ltd.,Nanjing Jiangsu 210000,China)
出处
《计算机仿真》
2024年第8期549-556,共8页
Computer Simulation
基金
国家自然科学基金项目(62001134)
广西高等教育本科教改工程项目(2020JGZ157)
广西高校中青年教师科研基础能力提升项目(.2019KY0992)
2020年桂林航天工业学院教学团队建设项目(2020JXTD16)
桂林航天工业学院校级基金项目(XJ21KT28)。
关键词
限制值造假模型
散度
多分类器
Restricted value falsification model
Scatter
Multi-classifier