摘要
通过分析半结构化数据的特点,以实际的采购退货数据(BokeDataInfo.xml)为例,利用DOM对象对基于XML的半结构化的数据进行抽取,设计并实现了一个基于半结构数据的数据仓库ETL工具,解决了商用ETL工具不能直接加载抽取XML文件进行数据仓库数据加载的弊端,为XML半结构化数据的抽取并装载到数据仓库当前细节级中的这一问题的解决进行了有益的探索。
By analyzing the characteristics of Semi-structured data and using the actual Book Return Data(BokeDataIn- fo. xml) as an example, this paper uses DOM objects to extract XML-based SemPstructured data and designs and implements a Data Warehouse ETL tool based on the Semi-structured data. At the same time, it also solves the commercial ETL tool can not directly load and extract the XML documents for the disadvantages of the loading of Data Warehouse data. To solve the problem of extracting and loading the semi-structured XML data into the current level of detail of the Data Warehouse is use- ful exploration.
出处
《计算机与数字工程》
2014年第11期2198-2201,共4页
Computer & Digital Engineering
基金
陕西省教育厅科研计划专项项目(编号:12JK1055)资助
关键词
XML数据
DOM对象
半结构化数据
抽取
装载
ETL工具
数据仓库
XML(eXtensible Markup Language) data, DOM(Document Object Model) object, semi-structured data,extraction, loading, ETL(Extract-Transform-Load) tool, data warehouse