摘要
目的基于计算机的胸腔X线影像疾病检测和分类目前存在误诊率高,准确率低的问题。本文在视觉Transformer(vision Transformer,ViT)预训练模型的基础上,通过迁移学习方法,实现胸腔X线影像辅助诊断,提高诊断准确率和效率。方法选用带有卷积神经网络(convolutional neural network,CNN)的ViT模型,其在超大规模自然图像数据集中进行了预训练;通过微调模型结构,使用预训练的ViT模型参数初始化主干网络,并迁移至胸腔X线影像数据集中再次训练,实现疾病多标签分类。结果在IU X-Ray数据集中对ViT迁移学习前、后模型平均AUC(area under ROC curve)得分进行对比分析实验。结果表明,预训练ViT模型平均AUC得分为0.774,与不使用迁移学习相比提升了0.208。并针对模型结构和数据预处理进行了消融实验,对ViT中的注意力机制进行可视化,进一步验证了模型有效性。最后使用Chest X-Ray14和CheXpert数据集训练微调后的ViT模型,平均AUC得分为0.839和0.806,与对比方法相比分别有0.014~0.031的提升。结论与其他方法相比,ViT模型胸腔X线影像的多标签分类精确度更高,且迁移学习可以在降低训练成本的同时提升ViT模型的分类性能和泛化性。消融实验与模型可视化表明,包含CNN结构的ViT模型能重点关注有意义的区域,高效获取胸腔X线影像的视觉特征。
Objective The chest X-ray-relevant screening and diagnostic method is essential for radiology nowadays.Most of chest X-ray images interpretation is still restricted by clinical experience and challenged for misdiagnose and missed diag⁃noses.To detect and identify one or more potential diseases in images automatically,it is beneficial for improving diagnos⁃tic efficiency and accuracy using computer-based technique.Compared to natural images,multiple lesions are challenged to be detected and distinguished accurately in a single image because abnormal areas have a small proportion and complex representations in chest X-ray images.Current convolutional neural network(CNN)based deep learning models have been widely used in the context of medical imaging.The structure of the CNN convolution kernel has sensitive to local detail information,and it is possible to extract richer image features.However,the convolution kernel cannot be used to get global information,and the features-extracted are restricted of redundant information like its relevance of background,muscles,and bones.The model’s performance in multi-label classification tasks are affected to a certain extent.At pres⁃ent,the vision Transformer(ViT)model has achieved its priorities in computer vision-related tasks.The ViT can be used to capture information simultaneously and effectively for multiple regions of the entire image.However,it is required to use large-scale dataset training to achieve good performance.Due to some factors like patient privacy and manual annotate costs,the size of the chest X-ray image data set has been limited.To reduce the model′s dependence on data scale and improve the performance of multi-label classification,we develop the CNN-based ViT pre-training model in terms of the transfer learning method for diagnosis-assisted of chest X-ray image and multi-label classification.Method The CNN-based ViT model is pre-trained on a huge scale ground truth dataset,and it is used to obtain the initial parameters of the model.The model
作者
邢素霞
鞠子涵
刘子骄
王瑜
范福强
Xing Suxia;Ju Zihan;Liu Zijiao;Wang Yu;Fan Fuqiang(Beijing Technology and Business University,Beijing 100048,China)
出处
《中国图象图形学报》
CSCD
北大核心
2023年第4期1186-1197,共12页
Journal of Image and Graphics
基金
国家自然科学基金项目(61671028)
北京市自然科学基金项目(KZ202110011015)。