期刊文献+

多标记中医问诊数据的症状选择 被引量:9

Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine
原文传递
导出
摘要 中医诊断中,一个患者可能兼有多个证型标记,其计算机辅助诊断是高维数据多标记学习的一个典型应用.中医问诊过程中往往会产生大量症状,这影响诊断算法建模的效果.特征选择旨在寻求最小的相关症状特征子集,且能使模型泛化能力达到最大.目前有关多标记数据特征选择的研究还很少,本文提出使用一种组合的优化技术进行中医问诊多标记数据的症状选择,通过多标记k近邻等4个算法进行建模.本文所提算法与当前流行的多种多标记数据降维算法如MEFS(多标记嵌入式特征选择方法)、MDDM(多标记特征降维方法)进行了比较,在UCI酵母多标记数据集和一个冠心病问诊数据上的实验结果显示本文算法较之已有多种算法有明显提高,在average precision上对分类器的提高可达10.62%和14.54%.论文实现了冠心病问诊症候模型的建立,为冠心病的诊断和其他多标记数据分析提供了有效的参考. In traditional Chinese medicine (TCM) diagnosis, a patient may be associated with more than one syndrome tags, and its computer-aided diagnosis is a typical application in the domain of multi-label learning of high-dimensional data. It is common that a great deal of symptoms can occur in traditional Chinese medical diagnosis, which affects the modeling of diagnostic Mgorithm. Feature selection entails choosing the smallest feature subset of relevant symptoms, and maximizing the generalization performance of the model. At present there are rare researches on feature selection on multi-label data. A hybrid optimization technique is introduced to symptom selection for multi-label data in TCM diagnosis in this paper, and modeling is made by means of four multi-label learning algorithms like k nearest neighbors, etc. We compare the performance of the algorithm with the current popular dimension reduction algorithms like MEFS (embedded feature selection for multi-Label learning), MDDM (multi-label dimensionality reduction via dependence maximization) on the UCI Yeast gene functional data set and an inquiry diagnosis dataset of coronary heart disease (CHD). Experimental results show that the algorithm we present has significantly improved the performance. In particular, the improvement on the average precision for the classifier is up to 10.62% and 14.54%. Syndrome inquiry modeling of CHD in TCM is realized in this paper, providing effective reference for the diagnosis of CHD and analysis of other multi-label data.
出处 《中国科学:信息科学》 CSCD 2011年第11期1372-1387,共16页 Scientia Sinica(Informationis)
基金 国家自然科学基金(批准号:60873129 30901897 61005006) 上海市重点学科(批准号:S30302 B004) 模式识别国家重点实验室开放课题资助项目
关键词 多标记学习 特征选择 高维 中医问诊 冠心病 multi-label learning, feature selection, high-dimensionality, inquiry of traditional Chinese medicine coronary heart disease
  • 相关文献

参考文献22

二级参考文献43

  • 1陈启光,申春悌,张华强,符为民,闵捷,王澄淑,郦永平,朱佳,史锁方,朱学云,陈晓虎,石磊,徐丽华,常惠.结构方程模型在中医证候规范标准研究中的应用[J].中国卫生统计,2005,22(1):2-4. 被引量:67
  • 2李国春,陈文垲,梅晓云,彭昌孝,周玲.中医宏观辨证指标量化方法研究探讨[J].中国中医基础医学杂志,2005,11(9):650-652. 被引量:23
  • 3李丹,李国正,陆文聪.用于药物活性预报的Co-Training方法[J].计算机科学,2006,33(12):159-161. 被引量:3
  • 4Schapire R E, Singer Y. Boostexter: A boosting-based system for text categorization. Machine Learning, 2000, 39 (2--3):135-168. 被引量:1
  • 5McCallum A. Multi-label text classification with a mixture model trained by EM. Working Notes of the AAAI' 99 Workshop on Text Learning. Orlando: AAAI, 1999. 被引量:1
  • 6Boutell M R, Luo J, Shen X, et al. Learning multi-label scene classification. Pattern Recognition, 2004, 37(9): 1757-1771. 被引量:1
  • 7Yin Z, Zhou Z H. Multi-label dimensionality reduction via dependency maximization. Proceedings of the 23^rd AAAI Conference on Artificial Intelligence, Chicago, IL: AAAI, 2008, 1503-1505. 被引量:1
  • 8Yu K, Yu S P, Tresp V. Multi-label informed latent semantic indexing. Proceedings of the 28^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY:ACM, 2005, 258--265. 被引量:1
  • 9Moody J, Utans J. Principled architecture selection for neural networks: Application to corporate bond rating prediction. Moody J E, Hanson S J, Lippmann R P. Neural Information Processing Systems 4. Morgan Kaufmann Publishers, Inc. 1992, 683-690. 被引量:1
  • 10Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3:1157-1182. 被引量:1

共引文献81

同被引文献209

引证文献9

二级引证文献115

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部