非共现数据的二元化加权转化算法

Weighting Binary Transformation Algorithm for Non Cooccurrence Data

下载PDF

导出

摘要面向范畴数据的序列化信息瓶颈算法(CD-sIB)假设数据各个属性特征对二元化转化的贡献均匀,从而影响转化效果.文中提出二元化加权转化方法来反映非共现数据的特征.该方法通过突出非共现数据的代表性属性,从抑制非代表性(冗余)属性,从而获取最佳共现表示.文中提出随机分布数据的适用性和计算方法的无监督性两个非共现加权原则,并基于加权粒度概念构造二元化加权转化算法.实验结果表明,文中算法的聚类精度优于其它算法. The assumption that all data features are equally important in the categorical data-sequential information bottleneck （CD-sIB） lowers the transformation quality. A weighting binary transformation method is proposed to reveal the feature of non co-occurrence data by highlighting the representative features and depressing the redundancy features. Meanwhile, two weighting rules, the applicability of stochastically distributed data and the non supervision of weighting schemes, are introduced. Then, the weighted categorical data-sequential information bottleneck （WCD-sIB） algorithm is presented based on the weighting granularity concept. The experimental results show that the weighting binary transformation method generates good co-occurrence data representation, and the WCD-sIB algorithm is superior to the other algorithms.

作者姬波叶阳东

机构地区郑州大学信息工程学院计算机科学与技术系

出处《模式识别与人工智能》 EI CSCD 北大核心 2013年第6期584-591,共8页 Pattern Recognition and Artificial Intelligence

基金国家自然科学基金资助项目(No.61170223)

关键词非共现数据特征权重信息瓶颈面向范畴数据的序列化信息瓶颈(CD—sIB)算法二元化转化 Non Co-occurrence Data, Feature Weighting, Information Bottleneck, Categorical Data-Sequential Information Bottleneck（CD-sIB） Algorithm, Binary Transformation

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献18

1Bekkennan R, El-Yaniv R, Tishby N. Distributional Word Clusters vs Words for Text Categorization. Journal of Machine Learning Re?search, 2003, 3: 1183 -1208. 被引量：1
2Slonim N. The Infonnation Bottleneck: Theory and Application. Ph. D Dissertation. Jerusalem, Israel: The Hebrew University of Je?rusalem, 2002. 被引量：1
3叶阳东,何锡点,贾利民.面向范畴类型数据的sIB算法[J].电子学报,2009,37(10):2165-2172. 被引量：5
4Seldin Y, Slonim N, Tishby N. Information Bottleneck for Non Co-Occurrence Data//Scholkopf B, Platt]C, Hoffman T, eds. Ad?vances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007, XIX: 1241-1248. 被引量：1
5Shamir O, Sabato S, Tishby N. Learning and Generalization with the Information Bottleneck. Theoretical Computer Science, 2010, 411(29/30): 2696-2711. 被引量：1
6Yuan H Q, Ye Y D. Iterative sIB Algorithm. Pattern Recognition Letters, 2011 , 32 (4) : 606-614. 被引量：1
7夏利民,谭立球,钟洪.基于信息瓶颈算法的图像语义标注[J].模式识别与人工智能,2008,21(6):812-818. 被引量：6
8van Rijsbergen C J. A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval. Journal of Documentation, 1997, 33 (2): 106-119. 被引量：1
9Peat H J, Willett P. The Limitations of Term Co-occurrence Data for Query Expansion in Document Retrieval Systems. Journal of the A?merican Society for Information Science, 1991,42(5): 378-383. 被引量：1
10Andritsos P, Tsaparas P, Miller R J, et al. LIMBO: Scalable Clustering of Categorical Data // Proc of the 9th International Con?ference on Extending Database Technology. Heraklion, Greece, 2004: 531-532. 被引量：1

二级参考文献32

1路晶,马少平.基于概念索引的图像自动标注[J].计算机研究与发展,2007,44(3):452-459. 被引量：10
2钟洪,夏利民.基于本体的图像检索[J].计算机工程与应用,2007,43(17):37-40. 被引量：12
3叶阳东,刘东,贾利民,LI Gang.一种自动确定参数的sIB算法[J].计算机学报,2007,30(6):969-978. 被引量：5
4欧阳军林,夏利民.基于二值信息的颜色和形状特征的图像检索[J].小型微型计算机系统,2007,28(7):1262-1266. 被引量：10
5Wang Lei, Liu Li, Latifu K. Automatic Image Annotation and Retrieval Using Subspace Clustering Algorithm//Proc of the 2nd ACM International Workshop on Muhimedia Databases. Washington, USA, 2004:100 - 108 被引量：1
6Duygulu P, Barnard K, de Freitas N. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary //Proc of the 7th European Conference on Computer Vision. Compehagen, Denmark, 2002, Ⅳ: 97 - 112 被引量：1
7Li Wei , Sun Maosong. Automatic Image Annotation Based on WordNet and Hierarchical Ensembles. // Proc of the 7th International Conference on Computational Linguistics and Intelligent Text Processing. Mexico City, Mexico, 2006 : 417 - 428 被引量：1
8Slonim N, Tishby N. Agglomerative Information Bottleneck//Solla S A, Leen T K, Muller K R, et al. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1999:617 - 623 被引量：1
9Jain R, Kasturi R, Schunck B G. Machine Vision. New York, USA: Mc-Graw Hill, 1995 被引量：1
10Wagstaff K, Cardie C, Rogers S, et al. Constrained k -Means Clustering with Background Knowledge//Proc of the 18th International Conference on Machine Learning. Williams College, USA, 2001: 577 - 584 被引量：1

共引文献9

1张素兰,郭平,张继福,胡立华.图像语义自动标注及其粒度分析方法[J].自动化学报,2012,38(5):688-697. 被引量：20
2欧阳军林,夏利民,文杏梓.结合SML与本体的图像语义自动标注方法[J].小型微型计算机系统,2012,33(9):2109-2112. 被引量：1
3姬波,叶阳东.非共现数据两阶段加权IB算法[J].小型微型计算机系统,2012,33(10):2278-2282.
4姬波,叶阳东,卢红星.基于样本权重的出租车聚集区识别算法[J].计算机应用,2013,33(5):1338-1342. 被引量：1
5娄铮铮,叶阳东,刘瑞娜.基于IB方法的无冗余多视角聚类[J].计算机研究与发展,2013,50(9):1865-1875. 被引量：6
6娄铮铮,杨晨,叶阳东.基于数据选择模型的IB算法[J].电子学报,2014,42(9):1839-1846. 被引量：2
7姬波,叶阳东,卢红星.一种基于赋权联合概率模型的聚类算法[J].数据采集与处理,2016,31(1):130-138.
8闫小强,卢耀恩,娄铮铮,叶阳东.基于并行信息瓶颈的多语种文本聚类算法[J].模式识别与人工智能,2017,30(6):559-568. 被引量：2
9郭鹏,李仁发,胡慧.一种基于超图Markov链松弛的聚类学习方法[J].计算机科学,2019,46(B06):452-456. 被引量：3

1彭智勇,黄席樾,郭英.基于Pocket PC的便携式故障诊断专家系统设计[J].装甲兵工程学院学报,2008,22(1):62-67. 被引量：4
2夏利民,谭立球,钟洪.基于信息瓶颈算法的图像语义标注[J].模式识别与人工智能,2008,21(6):812-818. 被引量：6
3谭立球,夏利民,谷士文.基于信息瓶颈算法的图像分割[J].计算机工程,2008,34(18):215-216.
4辛伯宇.基于查询的XML数据库设计[J].电脑开发与应用,2013,26(11):32-33.
5柏战华,吕强.基于WebService和OPC技术的综合监控系统[J].微计算机信息,2008,24(7):54-55. 被引量：6
6房新娜,钱振萍.有效做好培训学用转化效果评估的数据收集方法研究[J].石油化工管理干部学院学报,2016,18(3):53-58.
7大数据的未来[J].网络运维与管理,2015,0(10):27-27.
8钟洪,夏利民.基于互信息约束聚类的图像语义标注[J].中国图象图形学报,2009,14(6):1199-1205. 被引量：5
9闫婷.牛博网慈善实验室[J].东方企业家,2008(7):78-79.
10能源环保[J].互联网周刊,2010(24):14-14.

模式识别与人工智能

2013年第6期

浏览历史

内容加载中请稍等...

非共现数据的二元化加权转化算法

参考文献18

二级参考文献32

共引文献9

相关作者

相关机构

相关主题

浏览历史