摘要
针对极大熵聚类算法MEC (maximumentropyclustering)对例外点 (outliers)较敏感和不能标识例外点的缺陷 ,提出了一种改进的极大熵聚类算法RMEC (robustmaximumentropyclustering)。该算法的基本思想是通过引入Vapnik’sε-不敏感损失函数和权重因子重新构建目标函数 ,并利用优化理论推导出新的学习公式。RMEC算法不但对例外点较之MEC算法有更好的鲁棒性 ,而且还能有效地利用学习后的权重因子标识出数据集中存在的例外点。
In this paper, the novel robust maximum entropy clustering algorithm RMEC, as the improved version of the maximum entropy algorithm MEC, is presented to overcome its drawbacks: very sensitive to outliers and uneasy to label them. With the introduction of Vapnik's ε-insensitive loss function and the new weight factors, the new objective function is re-constructed, and consequently, its new update rules are derived according to the Lagrangian optimization theory. Compared with algorithm MEC, the main contributions of algorithm RMEC exist in its much better robustness for outliers and the fact that it can effectively label outliers in the dataset using the obtained weight factors. The experimental results demonstrate its superior performance in enhancing the robustness and labeling outliers in the dataset.
出处
《中国工程科学》
2004年第9期38-45,共8页
Strategic Study of CAE
基金
国家自然科学基金资助项目 ( 60 2 2 5 0 15 )
江苏省自然科学基金资助项目 (BK2 0 0 3 0 17)
江苏计算机信息技术重点实验室资助
关键词
熵
聚类
鲁棒性
例外点
Ε-不敏感损失函数
权重因子
entropy
clustering
robustness
outliers
ε-insensitive loss function
weight factors