摘要
在粗糙集领域中,粒球的产生可以被视作是一个无监督的进程,其终止条件是无监督产生的粒球需达到根据标签信息所计算出来的纯度.当数据中存在大量不一致情形时,样本自身的标签信息有可能会为生成高纯度的粒球带来较大阻碍,基于粒球粗糙集的约简求解因受粒球生成这一因素的影响,也会耗时巨大.鉴于此,首先,将伪标签策略引入粒球的计算过程中,因为伪标签的生成也可以采用无监督的方式,所以可以较好地贴合粒球中样本的聚集,减少不一致情形,提高粒球的产生效率.其次,设计了前向贪心搜索算法,用于求解基于伪标签粒球粗糙集的约简.最后,在12组基准数据集上的实验结果验证了所提方法不仅能够有效地提升约简的求解效率,而且也能够保证约简中的属性具备相当的分类能力.
The process of generating granular balls can be regarded as an unsupervised mechanism in the field of rough set.The termination of such process is the generated balls achieve the purity which is determined by the label based information.Immediately,if a large number of inconsistencies emerge in the data,then the label itself may hinder the generation of granular balls with higher purity.Consequently,the generation of granular ball rough set based reduct is also time consuming because the generation of granular balls is continuously required in the process of searching reduct.In view of this,first of all,the pseudo-label strategy is introduced into the process of generating granular balls,this is mainly because the calculation of pseudo labels of samples can also be considered as an unsupervised process,and then the pseudo labels are better for us to fit the aggregation of samples in granular balls,from which the inconsistencies can be reduced and the efficiency of generating granular balls can be improved.Furthermore,a forward greedy searching algorithm is designed to derive reduct based on the pseudo-label granular ball rough set.Finally,the experimental results over 12 benchmark data sets demonstrate that the proposed strategy can not only significantly improve the efficiency of deriving reduct,but also provide reduct with well-matched classification performance.
作者
陈中华
巴婧
徐泰华
王平心
杨习贝
CHEN Zhong-hua;BA Jing;XU Tai-hua;WANG Ping-xin;YANG Xi-bei(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China;School of Science,Jiangsu University of Science and Technology,Zhenjiang 212100,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2023年第1期24-29,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62076111,62006099,62006128,61906078)资助.
关键词
属性约简
粒球
伪标签
粗糙集
attribute reduction
granular ball
pseudo label
rough set