摘要
度量亦称距离函数,是度量空间中满足特定条件的特殊函数,一般用来反映数据间存在的一些重要距离关系.而距离对于各种分类聚类问题影响很大,因此度量学习对于这类机器学习问题有重要影响.受到现实存在的各种噪声影响,已有的各种度量学习算法在处理各种分类问题时,往往出现分类准确率较低以及分类准确率波动大的问题.针对该问题,本文提出一种基于最大相关熵准则的鲁棒度量学习算法.最大相关熵准则的核心在于高斯核函数,本文将其引入到度量学习中,通过构建以高斯核函数为核心的损失函数,利用梯度下降法进行优化,反复测试调整参数,最后得到输出的度量矩阵.通过这样的方法学习到的度量矩阵将有更好的鲁棒性,在处理受噪声影响的各种分类问题时,将有效地提高分类准确率.本文将在一些常用机器学习数据集(UCI)还有人脸数据集上进行验证实验.
Metric, also called distance function, is a special function in metric space that satisfies certain conditions. It is generally used to reflect some important distance relationships between data examples. Since distance has a great influence on various classification and clustering problems, metric learning has an important influence on these machine learning problems. Existing metric learning algorithms for classification problems are vulnerable to noise, the classification accuracy is not stable and tends to fluctuate. To solve this problem, this paper presents a robust metric learning algorithm based on maximum correntropy criterion. The core of maximum correntropy criterion is Gaussian kernel function, which is introduced into metric learning in this study. We construct a loss function with Gaussian kernel function and optimize the objective function using gradient descent method. The output metric matrix is computed through repeatedly testing and adjusting the parameters. The metric matrix learned through this method will have better robustness and will effectively improve the classification accuracy when dealing with various classification problems affected by noise. This study performs validation experiments on some popular machine learning datasets(UCI) and face datasets.
作者
谢林江
尹东
XIE Lin-Jiang;YIN Dong(School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China)
出处
《计算机系统应用》
2018年第10期146-153,共8页
Computer Systems & Applications
关键词
度量学习
噪声
最大相关熵准则
高斯核函数
鲁棒
metric leaming
noise
maximum correntropy criterion
Gaussian kernel function
robust