摘要
目前已有蚁群算法优化的特征选择方法,大多采用的是以属性依赖度和信息熵属性重要度作为路径上启发搜索因子,但这类搜索方法在某些决策表中存在算法早熟或搜索到的特征子集包含了冗余特征,从而导致选择精度显著下降。针对此类问题,根据条件属性在分辨矩阵中的占比提出了一种属性重要度的度量方法,以分辨矩阵重要度作为路径上启发因子,设计了一种基于分辨矩阵与蚁群算法优化的特征子集搜索方法。该算法从特征核出发,蚁群依次选择概率大的特征加入特征核集,直至找到最小特征子集算法终止。通过实例验证和UCI数据集实验结果表明,与基于属性依赖度和信息熵属性重要度的特征选择方法相比,在通常情况下,该算法能较小代价找到最小特征子集,并且可以有效减少计算工作量。
At present,the existing feature selection methods based on ant colony algorithm optimization mostly use attribute dependence and information entropy attribute importance as the heuristic search factor on the path.However,this kind of search method has premature convergence in some decision tables or the searched feature subset contains redundant features,which leads to a significant decrease in selection accuracy.Aiming at such problems,this paper proposed an attribute importance measurement method based on the proportion of conditional attributes in the discernibility matrix.Taking the importance of the discernibility matrix as the heuristic factor on the path,this article designed a feature subset search method based on the discernibility matrix and ant colony algorithm optimization.The algorithm started from the feature core,and the ant colony selected features with high probability in turn to add to the feature core set,until the ant found the smallest feature subset.Validation of examples and experimental results of UCI data set show that compared with the feature selection method based on attribute dependence and information entropy attribute importance,under normal circumstances,this algorithm can find the smallest feature subset at a lower cost.
作者
杨震宇
叶军
季雨瑄
敖家欣
王磊
Yang Zhenyu;Ye Jun;Ji Yuxuan;Ao Jiaxin;Wang Lei(College of Information Engineering,Nanchang Institute of Technology,Nanchang 330000,China;Jiangxi Province Key Laboratory of Water Information Cooperative Sensing&Intelligent Processing,Nanchang 330000,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第4期1118-1123,共6页
Application Research of Computers
基金
江西省教育厅科技项目(GJJ211920,GJJ170995)
国家自然科学基金资助项目(61562061)。
关键词
粗糙集
蚁群算法
特征选择
分辨矩阵
特征子集
rough set
ant colony optimization
feature selection
discernibility matrix
feature subset