摘要
伴随海量数据存储、处理技术的发展,数据中心中积累了大量的格式化历史数据,此类数据呈现出数据规模庞大、被查询频次低和查询内容规律不确定等特点,而当前以文件为操作对象的系统在查询此类数据时主要采用分布式计算引擎对数据进行全局遍历,存在处理时间长、系统资源消耗高等问题。因此,本文提出了一种基于列式多级索引的海量数据高效查询方法,使得查询过程中只有承载相关数据的节点参与计算,大幅降低了系统资源消耗。实验表明,本文方法在用于大规模历史数据内容查询时,相对于较主流的文件系统查询技术有明显的效率提升。
With mass data storage and processing technology development, data center has accumulated a large amount of historical formatted data, historical data has following characteristics: large scale, query frequency is low and the target of query is irregularly, and the current system which include query service and target file usually use distributed computation engine to search global data, they usually take long time and system resources consumption is high. Therefore, this paper puts forward a method of efficient query based on the column-multilevel index, which greatly reduces the consumption of system resources. Experimental results show that this method is effective in improving the efficiency of query technology for large scale historical data content.
出处
《软件》
2016年第3期79-83,共5页
Software