摘要
乳腺癌是世界上对于女性非常危险的疾病,其患病率逐年增长,是世界妇女死亡的主要原因。在大样本情况下,乳腺癌临床诊断受优质医疗资源相对短缺的限制,诊断周期长、检测费用高。因此,高效、准确、性价比高的乳腺癌诊断方法具有广阔的应用前景,为临床诊断迫切需求。荧光光谱检测是一种可以表征细胞中物理和化学综合变化的方法,可用于表征正常和癌变细胞的特征。机器学习擅长从大量数据中挖掘有用信息,是进行分类和预测的有效手段。以往机器学习多使用包含部分生化信息的数据库训练模型,易导致信息缺失。荧光光谱是细胞多种物质的叠加光谱,使用荧光光谱特征峰诊断乳腺癌存在量化不确定性问题。因此,提出了机器学习结合乳腺癌样本荧光光谱的诊断方法。使用405 nm波长的激光,采集了正常和癌变乳腺组织(已做出病理诊断)的荧光光谱数据,以此作为数据集,比较了K-近邻(KNN)、支持向量机(SVM)、随机森林(RF)三种算法对正常和癌变乳腺组织荧光光谱的分类能力。判别结果显示:与SVM算法相比,KNN和RF算法的准确率更高、平衡召回率和精度的能力更强,对乳腺癌荧光光谱的分类能力更好,其准确性、召回率、精度以及F1-score函数结果均在95%之上,更利于乳腺癌的诊断。进而探讨了权重KNN(WKNN)算法对正常和癌变乳腺组织荧光光谱的分类能力。WKNN较KNN算法的分类评估结果有小幅度提升,且具有更好的抗噪和适应能力,算法简单。综上所述,本文提出的机器学习结合荧光光谱的乳腺癌诊断方法,精度高、速度快、性价比高,是未来乳腺癌诊断方法的发展方向,具有重要的临床应用价值。
Breast cancer is a very dangerous disease for women worldwide,its prevalence is increasing year by year,and it is the main cause of death among women worldwide.In the case of large samples,the clinical diagnosis of breast cancer is limited by the relative shortage of high-quality medical resources,the diagnosis cycle is long,and the detection cost is high.Therefore,efficient,accurate and cost-effective breast cancer diagnosis methods have broad application prospects and are urgently needed for clinical diagnosis.Fluorescence spectroscopy is a method that can characterize the combined physical and chemical changes in cells and can be used to characterize normal and cancerous cells.Machine learning is good at mining useful information from a large amount of data and is an effective classification and prediction method.In the past,machine learning mostly used databases containing some biochemical information to train models,which easily led to information loss.The fluorescence spectrum is the superimposed spectrum of multiple substances in cells,and the use of fluorescence spectrum characteristic peaks to diagnose breast cancer has the problem of quantitative uncertainty.Therefore,this paper proposes a diagnostic method combining machine learning with fluorescence spectra of breast cancer samples.The fluorescence spectrum data of normal and cancerous breast tissue(pathological diagnosis has been made)was collected as a data set,and K-nearest Neighbor(KNN),support vector machine(SVM),Random Forest(RF)three algorithms to classify the fluorescence spectrum of normal and cancerous breast tissue.The discriminant results show that compared with the SVM algorithm,the KNN and RF algorithms have higher accuracy,stronger ability to balance recall and precision,and better classification ability for breast cancer fluorescence spectra.The results of the F1-score function are all above 95%,which is more conducive to the diagnosis of breast cancer.Furthermore,the classification ability of the Weighted K-nearest Neighbor(WKNN)algor
作者
陈文静
许诺
教召航
尤家华
王赫
齐东丽
冯瑜
CHEN Wen-jing;XU Nuo;JIAO Zhao-hang;YOU Jia-hua;WANG He;QI Dong-li;FENG Yu(School of Science,Shenyang Ligong University,Shenyang 110158,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2023年第8期2407-2412,共6页
Spectroscopy and Spectral Analysis
基金
国家自然科学基金面上项目(51974063)
辽宁省教育厅自然科学基金项目(LG201910)
兴辽人才项目(XLYC1902047)资助。
关键词
荧光光谱
乳腺癌
机器学习
KNN
Fluorescence spectrum
Breast cancer
Machine learning
K-nearest Neighbor