期刊文献+

基于多源的跨领域数据分类快速新算法 被引量:9

A New Cross-multidomain Classification Algorithm and Its Fast Version for Large Datasets
下载PDF
导出
摘要 研究跨领域学习与分类是为了将对多源域的有监督学习结果有效地迁移至目标域,实现对目标域的无标记分类.当前的跨领域学习一般侧重于对单一源域到目标域的学习,且样本规模普遍较小,此类方法领域自适应性较差,面对大样本数据更显得无能为力,从而直接影响跨域学习的分类精度与效率.为了尽可能多地利用相关领域的有用数据,本文提出了一种多源跨领域分类算法(Multiple sources cross-domain classification,MSCC),该算法依据被众多实验证明有效的"罗杰斯特回归模型"与"一致性方法"构建多个源域分类器并综合指导目标域的数据分类.为了充分高效利用大样本的源域数据,满足大样本的快速运算,在MSCC的基础上,本文结合最新的CDdual(Dual coordinate descent method)算法,提出了算法MSCC的快速算法MSCC-CDdual,并进行了相关的理论分析.人工数据集、文本数据集与图像数据集的实验运行结果表明,该算法对于大样本数据集有着较高的分类精度、快速的运行速度和较高的领域自适应性.本文的主要贡献体现在三个方面:1)针对多源跨领域分类提出了一种新的"一致性方法",该方法有利于将MSCC算法发展为MSCC-CDdual快速算法;2)提出了MSCC-CDdual快速算法,该算法既适用于样本较少的数据集又适用于大样本数据集;3)MSCC-CDdual算法在高维数据集上相比其他算法展现了其独特的优势. Cross-domain learning and classification involved in this paper attempts to effectively transfer the classification results obtained from supervised multisource domains to an unsupervised target domain. Generally speaking, although current cross-domain learning methods have obtained great successes for cross-single-domain learning problems, they will encounter overwhelming troubles in the sense of classification accuracy and running speed when carrying out them on large cross-multisource datasets. In this paper, based on the logistic regression model and the proposed consensus measure, a multi-source cross-domain classification (MSCC) algorithm is proposed to realize effective cross-domain classification for the target domain. In order to enable the MSCC to work well for large datasets, based on the algorithm CDdual (Dual coordinate descent method) as the recent advance about large-scale logistic regression, an MSCC^s fast version MSCC-CDdual for large datasets is derived and theoretically analysed. The experimental results on artificial data, text data and image data indicate that the proposed algorithm MSCC-CDdual has a fast speed, high classification accuracy and good domain adaption for large cross-multisource datasets. The contributions of the work here contain three aspects: 1) A novel consensus measure is proposed, which is suitable for boosting multi-classifiers and convenient for us to develop MSCC's fast version for large datasets; 2) The proposed algorithm MSCC-CDdual is demonstrated to be suitable for cross-multisource learning for both small and large datasets; 3) MSCC-CDdual exhibits its additional advantage, i.e., the applicability for high dimensional datasets from another "large" perspective.
出处 《自动化学报》 EI CSCD 北大核心 2014年第3期531-547,共17页 Acta Automatica Sinica
基金 国家自然科学基金(60903100 60975027)资助~~
关键词 跨领域 多源 罗杰斯特回归 后验概率 分类 Cross-domain, multi-source, logistic regression, posterior probability, classification
  • 相关文献

参考文献1

共引文献2

同被引文献96

  • 1毛罕平,吴雪梅,李萍萍.基于计算机视觉的番茄缺素神经网络识别[J].农业工程学报,2005,21(8):106-109. 被引量:24
  • 2刘树文,王庆伟,何东健,李华,武苏里.基于模糊神经网络的葡萄病害诊断系统研究[J].农业工程学报,2006,22(9):144-147. 被引量:29
  • 3Evgeniou T, Micchelli C A, Pontil M. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 2005, 6(4): 615-637. 被引量:1
  • 4Duan L X, Tsang I W, Xu D. Domains transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 465-479. 被引量:1
  • 5Tu W T, Sun S L. A subject transfer framework for egg classification. Neurocomputing, 2012, 82: 109-116. 被引量:1
  • 6Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359. 被引量:1
  • 7Ando R K, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal Machine Learning Research, 2005, 6: 1817-1853. 被引量:1
  • 8Zheng V W, Pan J L, Yang Q, Pan J F. Transferring multi-device localization models using latent multi-task learning. In: Proceedings of the 23th International Conference on Artificial Intelligence. Chicago, USA: ACM, 2008. 1427-1432. 被引量:1
  • 9Pan S J, Kwok J T, Yang Q. Transfer learning via dimensionality reduction. In: Proceedings of the 23th International Conference on Artificial Intelligence. Chicago, USA: ACM 2008. 677-682. 被引量:1
  • 10Si S, Tao D C, Geng B. Bregman divergence-based regularization for transfer subspace learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(7): 929-942. 被引量:1

引证文献9

二级引证文献85

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部