摘要
为推进数据开放过程中个人信息保护,深入分析政府开放数据中个人信息的披露现状:首先从相关平台中获取数据,并对其预处理,根据字段、表名等特征筛选出含有个人信息的数据;其次利用敏感信息识别方法识别数据中各类个人信息,并将其映射到个体,以统计个体数量同时检测其关联数据;最后通过数据可视化,直观展示个人信息披露现状。虽然部分公共数据开放平台虽然对公共数据进行分级分类以及去标识化等处理,但是已开放的数据中依旧包含大量直接展示的个人信息,需要在数据规范化分级分类、敏感信息识别和敏感信息脱敏等方面进行完善。
To promote the protection of personal information during data opening,an in-depth analysis of the current status of disclosure of personal information in the open government data is conducted.Firstly,the paper obtains the datasets from relevant platforms and pre-process to classify the datasets that containing personal information based on features such as field and table names,etc.Then,methods of sensitive information identification are applied to identify and extract various types of personal information in the data,and map the information back to individuals to summarise the total number of individuals and detect their associated data.Through data visualizations,the current status of personal information disclosure could be examined.Although some open government data platforms may have implemented certain measures such as data categorization and de-identification,the published open datasets still contain a large amount of personal information,which is required to be improved in terms of data categorization and classification,sensitive information identification and data desensitization in a normative and accurate manner.
作者
陈海粟
廖佳纯
姚思诚
Haisu CHEN;Jiachun LIAO;Sicheng YAO(Research Center of Big Data Technology,Nanhu Laboratory,Jiaxing 314002,Zhejiang,China)
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2024年第3期95-106,共12页
Journal of Shandong University(Natural Science)
基金
南湖实验室小微课题资助项目(NSS2023C2002)。
关键词
大数据隐私
个人信息
政府开放数据
信息识别
统计分析
big data privacy
personal information
open government data
information identification
statistical analysis