Credit risk prediction models seek to predict quality factors such as whether an individual will default (bad applicant) on a loan or not (good applicant). This can be treated as a kind of machine learning (ML) ...Credit risk prediction models seek to predict quality factors such as whether an individual will default (bad applicant) on a loan or not (good applicant). This can be treated as a kind of machine learning (ML) problem. Recently, the use of ML algorithms has proven to be of great practical value in solving a variety of risk problems including credit risk prediction. One of the most active areas of recent research in ML has been the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper explores the predicted behaviour of five classifiers for different types of noise in terms of credit risk prediction accuracy, and how could such accuracy be improved by using pairs of classifier ensembles. Benchmarking results on five credit datasets and comparison with the performance of each individual classifier on predictive accuracy at various attribute noise levels are presented. The experimental evaluation shows that the ensemble of classifiers technique has the potential to improve prediction accuracy.展开更多
线性局部切空间排列算法(Linear local tangent space alignment,LLTSA)是能够较好应用于模式识别问题的降维方法,但由于其属于无监督的降维方法且在降维过程中只使用全局统一的邻域参数,使得在对高维数据集进行约简时,不能利用部分样...线性局部切空间排列算法(Linear local tangent space alignment,LLTSA)是能够较好应用于模式识别问题的降维方法,但由于其属于无监督的降维方法且在降维过程中只使用全局统一的邻域参数,使得在对高维数据集进行约简时,不能利用部分样本的类别标签信息且不能根据样本空间分布的变化调整邻域参数。针对上述问题,提出了一种半监督邻域自适应线性局部切空间排列算法(Semi-supervised neighborhood self-adaptive LLTSA,SSNA-LLTSA)。该算法在LLTSA的基础上,利用部分标签信息来调整样本点与点之间的距离以形成新的距离矩阵来完成邻域构建,同时根据每个数据样本点邻域的概率密度自适应地调整邻域参数,进而得到更好的降维效果。经典的三维流形、UCI典型数据集模式识别和轴承故障诊断的实验结果表明,该算法克服了LLTSA算法无监督和使用全局统一邻域参数的不足,可更有效地寻找数据的低维本质流形,提高了识别准确率,具有一定优势。展开更多
The traditional Gaussian Mixture Model (GMM) for pattern recognition is an unsupervised learning method. The parameters in the model are derived only by the training samples in one class without taking into account th...The traditional Gaussian Mixture Model (GMM) for pattern recognition is an unsupervised learning method. The parameters in the model are derived only by the training samples in one class without taking into account the effect of sample distributions of other classes, hence, its recognition accuracy is not ideal sometimes. This paper introduces an approach for estimating the parameters in GMM in a supervising way.The Supervised Learning Gaussian Mixture Model (SLGMM) improves the recognition accuracy of the GMM. An experimental example has shown its effectiveness. The experimental results have shown that the recognition accuracy derived by the approach is higher than those obtained by the Vector Quantization (VQ) approach, the Radial Basis Function (RBF) network model, the Learning Vector Quantization (LVQ) approach and the GMM. In addition, the training time of the approach is less than that of Multilayer Perceptron (MLP).展开更多
In many machine learning problems, a large amount of data is available but only a few of them can be labeled easily. This provides a research branch to effectively combine unlabeled and labeled data to infer the label...In many machine learning problems, a large amount of data is available but only a few of them can be labeled easily. This provides a research branch to effectively combine unlabeled and labeled data to infer the labels of unlabeled ones, that is, to develop transductive learning. In this article, based on Pattern classification via single sphere (SSPC), which seeks a hypersphere to separate data with the maximum separation ratio, a progressive transductive pattern classification method via single sphere (PTSSPC) is proposed to construct the classifier using both the labeled and unlabeled data. PTSSPC utilize the additional information of the unlabeled samples and obtain better classification performance than SSPC when insufficient labeled data information is available. Experiment results show the algorithm can yields better performance.展开更多
为更全面地提取局部放电信号的特征值信息,提高识别率,将局部放电统计特征参数和矩特征参数相结合,提取出高维的特征值。从不同的角度出发,结合两种不同的方法对局放特征进行提取。同时在流形学习非监督的基础上引入了监督信息,从而保...为更全面地提取局部放电信号的特征值信息,提高识别率,将局部放电统计特征参数和矩特征参数相结合,提取出高维的特征值。从不同的角度出发,结合两种不同的方法对局放特征进行提取。同时在流形学习非监督的基础上引入了监督信息,从而保证高维到低维的映射在保留流形某些结构,同时也可进一步分离不同类别的流形。利用基于监督的局部线性嵌入(Supervised Locally Linear Embedding,SLLE)对局部放电特征值进行降维优化处理,提取出具有较高分类能力的最优特征值,利用电力电缆附件的4种典型缺陷进行实验对比,结果表明文中方法较好地提取出最优特征值,且能得到更准确的识别结果。展开更多
文摘Credit risk prediction models seek to predict quality factors such as whether an individual will default (bad applicant) on a loan or not (good applicant). This can be treated as a kind of machine learning (ML) problem. Recently, the use of ML algorithms has proven to be of great practical value in solving a variety of risk problems including credit risk prediction. One of the most active areas of recent research in ML has been the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper explores the predicted behaviour of five classifiers for different types of noise in terms of credit risk prediction accuracy, and how could such accuracy be improved by using pairs of classifier ensembles. Benchmarking results on five credit datasets and comparison with the performance of each individual classifier on predictive accuracy at various attribute noise levels are presented. The experimental evaluation shows that the ensemble of classifiers technique has the potential to improve prediction accuracy.
文摘线性局部切空间排列算法(Linear local tangent space alignment,LLTSA)是能够较好应用于模式识别问题的降维方法,但由于其属于无监督的降维方法且在降维过程中只使用全局统一的邻域参数,使得在对高维数据集进行约简时,不能利用部分样本的类别标签信息且不能根据样本空间分布的变化调整邻域参数。针对上述问题,提出了一种半监督邻域自适应线性局部切空间排列算法(Semi-supervised neighborhood self-adaptive LLTSA,SSNA-LLTSA)。该算法在LLTSA的基础上,利用部分标签信息来调整样本点与点之间的距离以形成新的距离矩阵来完成邻域构建,同时根据每个数据样本点邻域的概率密度自适应地调整邻域参数,进而得到更好的降维效果。经典的三维流形、UCI典型数据集模式识别和轴承故障诊断的实验结果表明,该算法克服了LLTSA算法无监督和使用全局统一邻域参数的不足,可更有效地寻找数据的低维本质流形,提高了识别准确率,具有一定优势。
文摘The traditional Gaussian Mixture Model (GMM) for pattern recognition is an unsupervised learning method. The parameters in the model are derived only by the training samples in one class without taking into account the effect of sample distributions of other classes, hence, its recognition accuracy is not ideal sometimes. This paper introduces an approach for estimating the parameters in GMM in a supervising way.The Supervised Learning Gaussian Mixture Model (SLGMM) improves the recognition accuracy of the GMM. An experimental example has shown its effectiveness. The experimental results have shown that the recognition accuracy derived by the approach is higher than those obtained by the Vector Quantization (VQ) approach, the Radial Basis Function (RBF) network model, the Learning Vector Quantization (LVQ) approach and the GMM. In addition, the training time of the approach is less than that of Multilayer Perceptron (MLP).
基金supported by the National Natural Science of China(6057407560705004).
文摘In many machine learning problems, a large amount of data is available but only a few of them can be labeled easily. This provides a research branch to effectively combine unlabeled and labeled data to infer the labels of unlabeled ones, that is, to develop transductive learning. In this article, based on Pattern classification via single sphere (SSPC), which seeks a hypersphere to separate data with the maximum separation ratio, a progressive transductive pattern classification method via single sphere (PTSSPC) is proposed to construct the classifier using both the labeled and unlabeled data. PTSSPC utilize the additional information of the unlabeled samples and obtain better classification performance than SSPC when insufficient labeled data information is available. Experiment results show the algorithm can yields better performance.
文摘为更全面地提取局部放电信号的特征值信息,提高识别率,将局部放电统计特征参数和矩特征参数相结合,提取出高维的特征值。从不同的角度出发,结合两种不同的方法对局放特征进行提取。同时在流形学习非监督的基础上引入了监督信息,从而保证高维到低维的映射在保留流形某些结构,同时也可进一步分离不同类别的流形。利用基于监督的局部线性嵌入(Supervised Locally Linear Embedding,SLLE)对局部放电特征值进行降维优化处理,提取出具有较高分类能力的最优特征值,利用电力电缆附件的4种典型缺陷进行实验对比,结果表明文中方法较好地提取出最优特征值,且能得到更准确的识别结果。