摘要
富含语义知识的数据网络是实现大数据智能的基石。资源描述框架(Resource Description Framework,RDF)是用于描述网络资源的W3C标准。大规模转换、存储管理RDF三元组是构建关联数据网络或语义知识图谱,实现数据可查找、可访问、可交互、可再用的重要路径。本文选择国际主流的10种RDF三元组转换工具,以及6种广受欢迎的RDF存储系统,从技术原理、性能特点及应用场景等多个视角进行对比分析,并总结存在问题和不足。提出未来大规模RDF三元组数据转换与存储管理需要实现的目标是实现RDF抽取、转换和加载(ETL)的流程化和集成化,并重点支撑4类典型应用需求场景,包括从非RDF数据到RDF数据的转换,不同RDF数据格式之间的双向转换,RDF三元组在数据库之间的数据迁移,以及RDF数据的动态更新和进化管理。
Data network rich in semantic knowledge is the cornerstone of realizing big data intelligence. Resource Description Framework(RDF) is the W3 C standard for describing web resources. Large-scale conversion and storage management of RDF triples is an important path for building a linked data network or semantic knowledge graph and realizing data Findable, Accessible, Interoperable and Reusable(FAIR principle). In this paper, ten international mainstream RDF conversion tools and six popular RDF triple storage systems are selected, and a comparative analysis is made from the perspectives of technical principles, performance characteristics and application scenarios, and briefly summarize the existing problems and shortcomings. It is proposed that the goal of large-scale RDF triple data conversion and storage management is to realize the flow, integration and integration of RDF Extract-TransformLoad(ETL), and to focus on supporting four typical application requirements scenarios, including: conversion from non-RDF data to RDF data;bidirectional conversion between different RDF data formats;data migration of RDF triples between databases;dynamic update and evolutionary management of RDF data.
作者
李悦
孙坦
赵瑞雪
李娇
黄永文
罗婷婷
鲜国建
LI Yue;SUN Tan;ZHAO RuiXue;LI Jiao;HUANG YongWen;LUO TingTing;XIAN GuoJian(Agricultural Information Institute of CAAS,Beijing 100081;Key Laboratory of Agricultural Big Data,Ministry of Agriculture and Rural Affairs,Beijing 100081)
出处
《数字图书馆论坛》
CSSCI
2020年第11期2-12,共11页
Digital Library Forum
基金
国家社会科学基金项目“科技论文全景式摘要知识图谱构建与应用研究”(编号:19BTQ061)资助。