摘要
肿瘤基因表达数据具有高维小样本、类别分布不平衡等特点,导致传统方法对其准确分类面临新的挑战。为此,本文提出一种基于Weka平台和代价敏感特征选择的基因表达数据分类方法。首先,对实验中所采用的6组肿瘤基因表达数据进行标准化预处理;然后,通过Weka平台中代价敏感特征选择方法和非代价敏感特征选择方法进行特征选择;最后采用4种分类器进行分类效果对比。实验结果表明,基于代价敏感特征选择的基因表达数据分类方法能够获取较高的分类性能。
Tumor gene expression data are characterized by high dimensional small sample size and unbalanced classification distribution,which leads to new challenges for accurate classification of tumor gene expression data by traditional methods.To this end,we propose a method for gene expression data classification based on Weka platform and cost-sensitive feature selection.First,the 6 groups of tumor gene expression data used in the experiment were standardized pretreatment.Then,cost-sensitive feature selection method and non-cost-sensitive feature selection method in Weka platform are used for feature selection.Finally,four kinds of classifiers are used to compare the classification effects.Experimental results show that the method based on cost-sensitive feature selection can achieve high classification performance.
作者
韩磊
黄瑞龙
范文静
叶明全
HAN Lei;HUANG Ruilong;FAN Wenjing;YE Mingquan(School of Medical Information,Wannan Medical College,Wuhu,Anhui,241002)
出处
《智慧健康》
2022年第17期1-4,共4页
Smart Healthcare
基金
国家级大创项目《面向基因表达数据的代价敏感特征选择研究》(项目编号:202010368063)
国家级大创项目《慢性病用药大数据智能管理与监测系统》(项目编号:202110368062)
安徽省“六卓越、一拔尖”卓越人才培养创新项目《智能医学工程专业卓越工程师培养创新项目》(项目编号:2020zyrc159)
安徽省新工科研究与实践项目《新工科和新医科交叉复合型人才创意创新创业能力培养探索与实践》(项目编号:2020-24)。
关键词
基因表达数据
分布不平衡
特征选择
分类性能
Gene expression data
Uneven distribution
Feature selection
Classification performance