期刊文献+

视觉Transformer预训练模型的胸腔X线影像多标签分类 被引量:3

Multi-label classification of chest X-ray images with pre-trained vision Transformer model
原文传递
导出
摘要 目的基于计算机的胸腔X线影像疾病检测和分类目前存在误诊率高,准确率低的问题。本文在视觉Transformer(vision Transformer,ViT)预训练模型的基础上,通过迁移学习方法,实现胸腔X线影像辅助诊断,提高诊断准确率和效率。方法选用带有卷积神经网络(convolutional neural network,CNN)的ViT模型,其在超大规模自然图像数据集中进行了预训练;通过微调模型结构,使用预训练的ViT模型参数初始化主干网络,并迁移至胸腔X线影像数据集中再次训练,实现疾病多标签分类。结果在IU X-Ray数据集中对ViT迁移学习前、后模型平均AUC(area under ROC curve)得分进行对比分析实验。结果表明,预训练ViT模型平均AUC得分为0.774,与不使用迁移学习相比提升了0.208。并针对模型结构和数据预处理进行了消融实验,对ViT中的注意力机制进行可视化,进一步验证了模型有效性。最后使用Chest X-Ray14和CheXpert数据集训练微调后的ViT模型,平均AUC得分为0.839和0.806,与对比方法相比分别有0.014~0.031的提升。结论与其他方法相比,ViT模型胸腔X线影像的多标签分类精确度更高,且迁移学习可以在降低训练成本的同时提升ViT模型的分类性能和泛化性。消融实验与模型可视化表明,包含CNN结构的ViT模型能重点关注有意义的区域,高效获取胸腔X线影像的视觉特征。 Objective The chest X-ray-relevant screening and diagnostic method is essential for radiology nowadays.Most of chest X-ray images interpretation is still restricted by clinical experience and challenged for misdiagnose and missed diag⁃noses.To detect and identify one or more potential diseases in images automatically,it is beneficial for improving diagnos⁃tic efficiency and accuracy using computer-based technique.Compared to natural images,multiple lesions are challenged to be detected and distinguished accurately in a single image because abnormal areas have a small proportion and complex representations in chest X-ray images.Current convolutional neural network(CNN)based deep learning models have been widely used in the context of medical imaging.The structure of the CNN convolution kernel has sensitive to local detail information,and it is possible to extract richer image features.However,the convolution kernel cannot be used to get global information,and the features-extracted are restricted of redundant information like its relevance of background,muscles,and bones.The model’s performance in multi-label classification tasks are affected to a certain extent.At pres⁃ent,the vision Transformer(ViT)model has achieved its priorities in computer vision-related tasks.The ViT can be used to capture information simultaneously and effectively for multiple regions of the entire image.However,it is required to use large-scale dataset training to achieve good performance.Due to some factors like patient privacy and manual annotate costs,the size of the chest X-ray image data set has been limited.To reduce the model′s dependence on data scale and improve the performance of multi-label classification,we develop the CNN-based ViT pre-training model in terms of the transfer learning method for diagnosis-assisted of chest X-ray image and multi-label classification.Method The CNN-based ViT model is pre-trained on a huge scale ground truth dataset,and it is used to obtain the initial parameters of the model.The model
作者 邢素霞 鞠子涵 刘子骄 王瑜 范福强 Xing Suxia;Ju Zihan;Liu Zijiao;Wang Yu;Fan Fuqiang(Beijing Technology and Business University,Beijing 100048,China)
机构地区 北京工商大学
出处 《中国图象图形学报》 CSCD 北大核心 2023年第4期1186-1197,共12页 Journal of Image and Graphics
基金 国家自然科学基金项目(61671028) 北京市自然科学基金项目(KZ202110011015)。
关键词 胸腔X线影像 多标签分类 卷积神经网络(CNN) 视觉Transformer(ViT) 迁移学习 chest X-ray images multi-label classification convolutional neural network(CNN) vision Transformer(ViT) transfer learning
  • 相关文献

参考文献4

二级参考文献41

  • 1韩家炜,KamberM.数据挖掘概念与技术.第2版.北京:机械工业出版社,2006:396-399. 被引量:1
  • 2Datta R, Joshi D, Li J, et al. Image retrieval; Ideas, influ- enees, and trends of the new age. ACM Computing Surveys, 2008, 40(2): 51-60. 被引量:1
  • 3Swain M J, Ballard D H. Color indexing. International Jour- nal of Computer Vision, 1991, 7(1): 11-32. 被引量:1
  • 4Liu G H, Yang J Y. Image retrieval based on the texton co- occurrence matrix. Pattern Recognition, 2008, 41 (12) : 3521-3527. 被引量:1
  • 5Quellec G, Lamard M, Cazuguel G, et al. Fast wavelet- based image characterization for highly adaptive image re- trieval. IEEE Transactions on Image Processing, 2012, 21 (4): 1613-1623. 被引量:1
  • 6Deserno T M. Biomedical Image Processing. Germany: Springer Berlin Heidelberg, 2011. 被引量:1
  • 7Pan Haiwei, Xie Xiaoqin, Zhang Wei, et al. Mining image sequence similarity patterns in brain images//Proceedings of the Pacifie Rim International Conference on Artificial Intelli- gence. Guilin, China, 2006:965-969. 被引量:1
  • 8Quddus A, Basir O. Semantic image retrieval in magnetic resonance brain volumes. IEEE Transactions on Information Technology in Biomedicine, 2012, 16(3): 348-355. 被引量:1
  • 9Wang Rui, Pan Haiwei, Han Qilong, et al. Medical image retrieval method based on relevance feedback//Proceedings of the International Conference on Advanced Data Mining and Applications. Nanjing, China, 2012:650-662. 被引量:1
  • 10Rahman M, Antani S, Thoma G. A learning-based similarity fusion and filtering approach for biomedical image retrieval using SVM classification and relevance feedback. IEEE Transactions on Information Technology in Biomedicine, 2011, 15(4) : 640-646. 被引量:1

共引文献100

同被引文献20

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部