摘要
基于数据规模导致难以应对的存储量、数据规模导致传统算法失效、大数据复杂的数据关联性导致高复杂度的计算等问题,对大数据下的k-means聚类优化算法进行研究,给出了适用于大数据任务处理的MapReduce软件架构的模型机制,通过改进k-means初始聚类中心的选取,提出了一种基于MapReduce模型的k-means聚类优化算法.最后将改进的算法应用于煤炭煤质的分析中,结果显示较传统算法,改进算法的效率有明显提高.
For the difficulty of storage capacity dealing with big data, failure of traditional algorithms for big scale data and high complexity computation, k-means clustering mining optimization algorithm is studied based on big data, and a MapReduce software architecture is proposed. It is suitable for large data processing mechanism, provides an improved method for selecting initial clustering centers and puts forward a k-means algorithm optimization based on MapReduce model. The improved algorithm is applied to coal quality analysis, and the result shows that compared with traditional algorithms, the optimization algorithm improves the efficiency obviously, and the accuracy is also enhanced.
出处
《大连交通大学学报》
CAS
2015年第3期91-94,共4页
Journal of Dalian Jiaotong University
基金
国家自然科学基金资助项目(61074029)
大连市科技计划资助项目(2014A11GX006)