摘要
中医诊断中,一个患者可能兼有多个证型标记,其计算机辅助诊断是高维数据多标记学习的一个典型应用.中医问诊过程中往往会产生大量症状,这影响诊断算法建模的效果.特征选择旨在寻求最小的相关症状特征子集,且能使模型泛化能力达到最大.目前有关多标记数据特征选择的研究还很少,本文提出使用一种组合的优化技术进行中医问诊多标记数据的症状选择,通过多标记k近邻等4个算法进行建模.本文所提算法与当前流行的多种多标记数据降维算法如MEFS(多标记嵌入式特征选择方法)、MDDM(多标记特征降维方法)进行了比较,在UCI酵母多标记数据集和一个冠心病问诊数据上的实验结果显示本文算法较之已有多种算法有明显提高,在average precision上对分类器的提高可达10.62%和14.54%.论文实现了冠心病问诊症候模型的建立,为冠心病的诊断和其他多标记数据分析提供了有效的参考.
In traditional Chinese medicine (TCM) diagnosis, a patient may be associated with more than one syndrome tags, and its computer-aided diagnosis is a typical application in the domain of multi-label learning of high-dimensional data. It is common that a great deal of symptoms can occur in traditional Chinese medical diagnosis, which affects the modeling of diagnostic Mgorithm. Feature selection entails choosing the smallest feature subset of relevant symptoms, and maximizing the generalization performance of the model. At present there are rare researches on feature selection on multi-label data. A hybrid optimization technique is introduced to symptom selection for multi-label data in TCM diagnosis in this paper, and modeling is made by means of four multi-label learning algorithms like k nearest neighbors, etc. We compare the performance of the algorithm with the current popular dimension reduction algorithms like MEFS (embedded feature selection for multi-Label learning), MDDM (multi-label dimensionality reduction via dependence maximization) on the UCI Yeast gene functional data set and an inquiry diagnosis dataset of coronary heart disease (CHD). Experimental results show that the algorithm we present has significantly improved the performance. In particular, the improvement on the average precision for the classifier is up to 10.62% and 14.54%. Syndrome inquiry modeling of CHD in TCM is realized in this paper, providing effective reference for the diagnosis of CHD and analysis of other multi-label data.
出处
《中国科学:信息科学》
CSCD
2011年第11期1372-1387,共16页
Scientia Sinica(Informationis)
基金
国家自然科学基金(批准号:60873129
30901897
61005006)
上海市重点学科(批准号:S30302
B004)
模式识别国家重点实验室开放课题资助项目
关键词
多标记学习
特征选择
高维
中医问诊
冠心病
multi-label learning, feature selection, high-dimensionality, inquiry of traditional Chinese medicine coronary heart disease