摘要
针对邻域信息系统的特征选择模型存在人为设定邻域参数值的问题。分别计算样本与最近同类样本和最近异类样本的距离,用于定义样本的最近邻以确定信息粒子的大小。将最近邻的概念扩展到信息理论,提出最近邻互信息。在此基础上,采用前向贪心搜索策略构造了基于最近邻互信息的特征算法。在两个不同基分类器和八个UCI数据集上进行实验。实验结果表明:相比当前多种流行算法,该模型能够以较少的特征获得较高的分类性能。
Feature selection of neighborhood information system is constrained by the neighborhood size. First, this paper calculates the distance between a given sample and its nearest samples with the same and different labels to define the concept of nearest-neighbor, and determines the size of nearest neighbor simultaneously. Second, the notion of nearest-neighbor is extended to Shannon information theory, and the concept of nearest neighbor mutual information is presented. Then, a forward greedy strategy is used to construct feature selection algorithm based on nearest-neighbor mutual information.Finally, experiments are conducted on eight UCI data sets and two different base classifiers. Experimental results show that the proposed algorithm selects a few features and effectively improves classification performance compared with other popular algorithms.
作者
王晨曦
林耀进
刘景华
林梦雷
WANG Chenxi;LIN Yaojin;LIU Jinghua;LIN Menglei(Department of Computer Engineering, Zhangzhou Institute of Technology, Zhangzhou, Fujian 363000, China;School of Computer Science, Minnan Normal University, Zhangzhou, Fujian 363000, China)
出处
《计算机工程与应用》
CSCD
北大核心
2016年第18期74-78,共5页
Computer Engineering and Applications
基金
国家自然科学基金(No.61303131)
福建省自然科学基金(No.2013J01028)
福建省教育厅科技项目(No.JA14192
No.JAT60866)
关键词
特征选择
最近邻
互信息
邻域互信息
feature selection
nearest-neighbor
mutual information
neighborhood mutual information