摘要
乳腺癌基因识别技术可以筛选出基因芯片中与乳腺癌相关的特异基因,为乳腺癌的早期的诊断提供参考。传统的分类方法难以解决基因谱数据高维、高噪、数据量庞大的问题。提出基于模糊支持向量机(FSVM)技术的乳腺癌基因识别,通过对乳腺癌患者的特征基因进行提取,剔除与病变无关的基因、删除突变基因中的冗余信息、构造KNN几何平均隶属度函数的模糊支持向量机,并嵌入新的内核函数L-KMOD,对不同的样本点赋予不同的惩罚参数,克服样本数据少维数高的影响,提高了乳腺癌基因辨识的正确率。选用美国国立生物信息中心共享数据库下载的乳腺癌基因芯片进行识别试验,结果的平均错分率为3. 89%,最高正确率达到了98. 9%,训练时间和识别时间均在合理范围内。
Specific genes related to breast cancer in gene chip can be selected by breast cancer gene recognition technology, which provides reference for early diagnosis of breast cancer. The traditional classification method is diffi- cult to solve the problem of high-dimensional, high noise and huge amount of data in gene spectrum data. In this pa- per, a recognition method of breast cancer gene based on fuzzy support vector machine is proposed, which extracts characteristic genes, excludes irrelevant and similarity redundancy genes, designs a geometric mean KNN (K-Nearest Neighbor) membership function and L-KMOD kernel function, and assigns different penalty parameters to different sample points. The method improves the accuracy of breast cancer gene recognition. The proposed is tested with shared database of breast cancer gene microarray downloaded from National Center for Biotechnology Information ( NCBI), the average error rate is 3.89% and the highest accuracy rate is 98.9%, the training time and recognition time are within the range of reasonable.
作者
易丛琴
田丰
周汝雁
YI Cong-qin;TIAN Feng;ZHOU Ru-yan(College of Information Technology,Shanghai Ocean University,Shanghai 201306,China;Guizhou Institute of Electronic Science and Technology,Guiyang Guizhou 550003,China)
出处
《计算机仿真》
北大核心
2018年第11期431-435,共5页
Computer Simulation
基金
上海海洋大学博士基金(A2-0203-00-100351)
关键词
模糊支持向量机
基因表达谱数据
隶属度函数
基因识别
Fuzzy support vector machine (FSVM)
Gene expression profiling data
Membership function
Generecognition