摘要
研究了数据预处理方法对模糊C均值聚类结果的影响。通过对国际标准数据集IRIS和某电力公司所管辖的不同行业电力用户实际负荷数据,利用不同的方法进行预处理,运用模糊C均值聚类算法(FCM)进行聚类,并对实验结果进行了验证和比较。结果显示对于FCM聚类算法,通过总和标准化和极大值标准化方法对数据进行预处理后,FCM的平均准确度最高;Max-Min、平均数方差法两种方法处理后FCM聚类效果较差;用标准差标准化后聚类效果最差。进一步地,对标准差标准化做了相应的改进,改进后FCM聚类效果明显提高。
This paper studies the influence of using different data preprocessing methods on fuzzy c-means clustering results. By preprocessing the international standard data set IRIS and electricity consumers data of different industries in a power company with different preprocessing methods, we use fuzzy c-means clustering algorithm for clustering, and compare and analyze the results of clustering. The results show that when using FCM clustering algorithm for clustering, among the preproeessing methods, used in this paper, the total standardization and maxima standardization are the best in the average accuracy; the' Max-Min and average variance are the second; the standard deviation standardization is the worst. Moreover, after we change the way of using the standard deviation standardization, the efficiency of FCM is obviously improved.
出处
《电力科学与工程》
2011年第8期24-27,46,共5页
Electric Power Science and Engineering
关键词
数据预处理
聚类
负荷特性
聚类准确率
data preprocessing
cluster
load characteristic
effect of clustering