摘要
针对高维少样本问题,利用偏最小二乘PLS模型,构造适合于小样本问题的挖掘算法。即在PLS的统一框架下,实现维数约简与分类学习,并在基因表达谱(Colon)癌数据分类问题中,实现PLS对小样本数据的挖掘与可视化。与经典算法SVMs进行比较分析,结果验证了PLS算法对高维少样本数据挖掘问题的有效性和可靠性。
For high dimension small sample problem, in this paper, we use the partial least squares (PLS) model to construct a mining algorithm which is suitable for small sample problem. That is, in the unified framework of PLS, we realized the dimension reduction and classification learning, in the classification of Colon cancer using gene expression profile data, the data mining with small sample by PLS and the visualization of the result were implemented. Compared with the classical algorithm SVMs, the results show that our PLS algorithm is effective and reliable for data mining in high dimension and small sample.
出处
《计算机应用与软件》
2017年第11期58-63,共6页
Computer Applications and Software
基金
国家自然科学基金项目(61673327)
福建省中青年教师教育科研项目(JA13355)
关键词
广义小样本
偏最小二乘
基因表达谱
Generalized small-sample Partial least squares (PLS) Gene expression profiles