摘要
特征选择是机器学习和数据挖掘领域中实现数据降维和数据清理的有效方法之一.针对现有相关性度量方法不能直接度量混合特征(连续特征与离散特征)之间相关性的问题,将连续特征的特征值按照离散特征取值相同的原则进行分组,通过分组前后的数据变异性来度量混合特征之间的相关性.在度量连续特征与类别之间相关性的基础上结合类别区分互补性方法进行特征选择.在UCI数据集上的实验结果表明,提出的混合特征相关性度量方法是有效的、可行的.相比于几种经典的特征选择方法,提出的特征选择方法在特征约减效果及分类性能上都具有优势.
Feature selection is one of the effective ways to dimensionality reduction and data cleaning in machine learning and data mining. For the shortcomings of the existing correlation measure between mixed features ( a continuous feature and a discrete feature , we first group the continuous feature values in different groups according to the number of different dicrete feature values, and then measure the correlation between a continuous and a discrete feature by variation of continuous features between before and after grouped. On the basis of the new correlation, we combine category discriminate complementary with it to select features. The experi- ments, conducted on UCI data, show the new correlation measure between a continuous and a discrete feature is effective and practi- cable. Compared our feature selection method with several state-of-the-art feature selection approaches, the performance of dimen- sionlitv reduction and classfication accuracy obtained by the proposed method is obviously superior to them.
出处
《小型微型计算机系统》
CSCD
北大核心
2013年第8期1798-1802,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61070061
61202271)资助
教育部人文社会科学研究青年项目(11YJCZH086)资助
广州市哲学社会科学发展"十二五"规划课题项目(11Q20)资助
广东外语外贸大学校级青年项目(11Q01)资助
广东省高层次人才项目资助
关键词
特征选择
相关度
类别区分互补
混合特征
feature selection
correlation
category discriminate complementary
mixed features