摘要
为了把分布在不同系统中的异构数据集整合起来,实现高度集成的数据查询功能,需要一种普适性方案对数据进行规范化与再利用。该文对典型的应用系统分布场景进行建模,制定一种伸缩性良好的元数据规范,进而提出一种可对分散系统中的异构数据集进行集成,对集中的元数据信息进行统一管理,并为用户提供单一入口查询的整合方案。对原始系统的改造极小,且用户可以透明地访问原始系统的所有数据资源。方案已经应用在医药卫生科学数据共享工程中,取得了可观的经济效益。
A pervasive scheme which normalizes and reuses datasets is needed to consolidate the utilities of heterogeneous datasets in distinct systems to achieve highly integrated information retrieval. A flexible metadata standard was developed for a typical system distribution scenario with an integration scheme which standardizes the original datasets, establishes a gateway for all the metadata, and provides query services through a single entry point. Extra work in the original systems is minimized and users get transparent access to all the original data resources. The scheme has been implemented in a medical data sharing project with excellent results.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第7期1037-1040,共4页
Journal of Tsinghua University(Science and Technology)
基金
"十五规划"国家科技基础条件平台建设重大项目(2005DKA32401-4)
关键词
科学数据共享
元数据
异构数据整合
scientific data sharing
metadata
integration ofheterogeneous datasets