摘要
面向范畴数据的序列化信息瓶颈算法(CD-sIB)假设数据各个属性特征对二元化转化的贡献均匀,从而影响转化效果.文中提出二元化加权转化方法来反映非共现数据的特征.该方法通过突出非共现数据的代表性属性,从抑制非代表性(冗余)属性,从而获取最佳共现表示.文中提出随机分布数据的适用性和计算方法的无监督性两个非共现加权原则,并基于加权粒度概念构造二元化加权转化算法.实验结果表明,文中算法的聚类精度优于其它算法.
The assumption that all data features are equally important in the categorical data-sequential information bottleneck (CD-sIB) lowers the transformation quality. A weighting binary transformation method is proposed to reveal the feature of non co-occurrence data by highlighting the representative features and depressing the redundancy features. Meanwhile, two weighting rules, the applicability of stochastically distributed data and the non supervision of weighting schemes, are introduced. Then, the weighted categorical data-sequential information bottleneck (WCD-sIB) algorithm is presented based on the weighting granularity concept. The experimental results show that the weighting binary transformation method generates good co-occurrence data representation, and the WCD-sIB algorithm is superior to the other algorithms.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2013年第6期584-591,共8页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金资助项目(No.61170223)