摘要
【目的/意义】针对豆瓣读书书目信息中作者名称信息较少和不规范的现象,将其与中文名称规范联合数据库中个人名称数据进行聚合,丰富名称变异形式,实现异构、异地数据的相互关联。【方法/过程】首先介绍了受控词表与分众分类词表,分析了二者结合的必要性;然后构建数据集,并且基于LCS、Jaro-Winkler Diatance及编辑距离算法进行实验,确认豆瓣读书数据和中文名称规范联合数据库数据异构对齐的可行性;最后通过构造资源描述框架模式(RDFS)词表,对各实体属性进行规范化,将其发布为关联数据,实现本地站点的数据关联化。【结果/结论】基于关联数据实现了豆瓣读书作者名称数据与其对应的作品数据,以及中文名称规范库个人名称数据的聚合。
【Purpose/significance】In view of the fact that there are few information on the author’s name and have informal name in the Douban Reading,aggregating it with the personal name data in the Chinese name authority joint database to enrich the form of name variation and realize the correlation between heterogeneous and off-site data.【Methods/process】This paper introduces the controlled vocabulary and folksonomy,and analyzes the necessity of their integrate.Then constructs a experimental data set,utilizes LCS、Jaro-Winkler Diatance and the edit distance similarity algorithm to confirm the feasibility of heterogeneous alignment of the Douban reading data and the Chinese name authority joint database data.Finally,the Resource Description Framework Schema(RDFS)vocabulary list is made to normalize the entity attributes,and it is published as linked data to realize the data correlation in the local site.【Result/conclusion】Based on linked data,it achieves the aggregation of the author data、the work data and the personal name data.
作者
李捷佳
贾君枝
LI Jie-jia;JIA Jun-zhi(School of Economics and Management,Shanxi University,Taiyuan 030006,China;School of Information Resource Management,Renmin University of China,Beijing 100872,China)
出处
《情报科学》
CSSCI
北大核心
2019年第1期16-21,共6页
Information Science
基金
国家社会科学基金重点项目“基于关联数据的中文名称规范档语义描述及数据聚合研究”(15ATQ004)
关键词
作者数据
中文名称规范档
关联数据
聚合
author data
Chinese name authority files
linked data
aggregation