摘要
该文提出了基于无监督判别投影特征选择的支持向量机方法(UDPFS-SVM)用于标志物筛选。UDPFS-SVM首先通过无监督判别投影算法(UDPFS)引入分类先验信息、添加正则化与惩罚函数等约束自适应地获得具有稀疏性的判别投影矩阵,然后根据获得的矩阵求得相应低维代谢矩阵,最后建立支持向量机(SVM)分类模型寻找生物标志物。所提出的方法能够同时进行模糊学习与稀疏学习,并可合理利用变量之间的依赖关系。通过UDPFS-SVM与偏最小二乘判别分析(PLS-DA)方法对高脂血症大鼠血浆代谢组学数据进行变量筛选,并采用方差分析、ROC曲线、线性判别分析(LDA)对筛选得到的生物标志物进行评价。结果表明,两种方法均发现8个生物标志物。方差分析显示UDPFS-SVM方法获得的生物标志物均具有显著性差异,且显著性差异值均大于PLS-DA;ROC结果显示UDPFS-SVM结果为1.00,比PLS-DA结果高0.05;LDA显示UDPFS-SVM获得的生物标志物在高脂血症样本中可以更好地消除组内代谢差异,区分组间代谢差异,说明UDPFS-SVM方法在高脂血症生物标志物发现上优于PLS-DA,为生物标志物的发现提供了一种新思路。
Partial least squares discriminant analysis(PLS-DA)is currently a common method for biomarker screening in metabolomics research.However,it is often not ideal for finding the bio⁃markers in biomedicine,a class of complex non-linear research objects since it is a typical linear al⁃gorithm.Thus,a support vector machine approach based on unsupervised discriminative projection feature selection(UDPFS-SVM)is proposed in this paper.This method may be divided into two steps.The first step is to obtain the low-dimensional discriminant projection matrix.The UDPFSSVM firstly introduces category prior information,then adding regularization and constraints such as penalty functions to obtain a discriminant projection matrix.Subsequently,the discriminant projec⁃tion matrix is filtered by weights to become a low-dimensional discriminant projection matrix.The second step is to establish the support vector machine classification model.The UDPFS-SVM is used to build a support vector machine classification model based on the projection matrix to find bio⁃markers.It is worth mentioning that it is able to adaptively adjust the low-dimensional sparse projec⁃tion matrix.Meanwhile,the UDPFS-SVM is able to perform both fuzzy and sparse learning,and it can also make reasonable use of the dependency relationships between variables.Therefore,it can handle non-linear research objects very well.In this paper,the metabolomic data of hyperlipid⁃emic rats were screened for variables using the UDPFS-SVM and PLS-DA.And the biomarkers obtained from the screening were evaluated by variance analysis,ROC curves,and linear discrimi⁃nant analysis(LDA).The results showed that eight biomarkers were identified by each of the two methods.Variance analysis showed that the numbers of significant biomarkers obtained by UDPFSSVM were more than those of PLS-DA.Furthermore,the significant difference values obtained by UDPFS-SVM were all larger than those by PLS-DA.ROC curves results showed that the ROC val⁃ue of UDPFS-SVM was significantly higher
作者
王娅妮
杜丽晶
郭拓
肖雪
WANG Ya-ni;DU Li-jing;GUO Tuo;XIAO Xue(School of Electronic Information and Artificial Intelligence,Shaanxi University of Science and Technology,Xi’an 710021,China;School of Pharmacy,Shanghai Jiao Tong University,Shanghai 200240,China;Institute of Traditional Chinese Medicine,Guangdong Pharmaceutical University,Guangzhou 510006,China)
出处
《分析测试学报》
CAS
CSCD
北大核心
2023年第4期423-431,共9页
Journal of Instrumental Analysis
基金
陕西省教育厅科学研究计划项目(20JK0532)。
关键词
变量筛选
无监督判别投影
分类先验信息
非线性
高维小样本
代谢组学
variable screening
unsupervised discriminative projection
classified prior informa⁃tion
non-linear
high-dimensional and small samples
metabonomics