摘要
面向通用数据资源,研究聚类数据可视化方法与技术,旨在探索有效的数据处理方法,满足信息领域对高维数据处理的要求。通过对高维数据进行降维处理和可视化映射实现,建立K均值算法的聚类数据挖掘可视化系统模型,实现中间聚簇结果、聚类中心、收敛准则函数值三类要素的可视化。利用加利福利亚大学欧文分校(UCI)数据库中的Iris数据集、Wine数据集、Seeds数据集对可视化系统模型方法进行测试。结果表明,该模型实现了对数据集的有效聚类,能够将中间聚类、聚类中心、收敛准则函数值进行实时有效的可视化表达,达到了预期效果。
Visualization methods and techniques provide powerful tools for discovering hidden laws,helping decision making,and explaining the empirical phenomena.The objective of the research on clustering data mining model visualization methods and techniques is to explore effective data processing methods,and to meet the needs of efficient data processing in the field of information science.This proposal mainly focused on clustering data mining visualization technology,visualization techniques for high-dimensional data via dimension reduction,and visual mapping technology.It studied K-means algorithm for clustering data and visualization,and developed methods for visualizing intermediate clustered results,cluster centers,and convergence criterion functions.It investigated a number of visualization methods,such as clustering data process-oriented,integrated color ratio method,coordinate change,and dimension constraint,with the goal of achieving adequate visualization of clustering data mining and analysis and establishing aK-means algorithm mining visualization system model.Using the Iris data set,Wine data sets,Seeds data set in UCI database,it also systematically tested and verified our data mining visualization models,and analyzed the effects of visualization models on the clustering results and convergence criterion.The test shows that desired results have been adieved.
出处
《解放军理工大学学报(自然科学版)》
EI
北大核心
2015年第1期7-15,共9页
Journal of PLA University of Science and Technology(Natural Science Edition)
基金
江苏省自然科学基金资助项目(BK2012511)
关键词
聚类数据挖掘
可视化
平行坐标法
K均值算法
clustering data mining
visualization
parallel coordinate
K-means algorithm