摘要
传统的医学文本数据分类方法忽略了文本的上下文关系,每个词之间相互独立,无法表示语义信息,文本描述和分类效果差;并且特征工程需要人工干预,因此泛化能力不强。针对医疗文本数据分类效率低和精度低的问题,提出了一种基于Transformer双向编码器表示BERT、卷积神经网络CNN和双向长短期记忆BiLSTM神经网络的医学文本分类模型CMNN。该模型使用BERT训练词向量,结合CNN和BiLSTM,捕捉局部潜在特征和上下文信息。最后,将CMNN模型与传统的深度学习模型TextCNN和TextRNN在准确率、精确率、召回率和F1值方面进行了比较。实验结果表明,CMNN模型在所有评价指标上整体优于其他模型,准确率提高了1.69%~5.91%。
The traditional medical text data classification methods ignore the context of the text.Each word is independent of each other and cannot represent semantic information.The text description and classification effect are poor,and feature engineering requires manual intervention,so the generalization ability is not strong.Aiming at the problems of low efficiency and low accuracy of medical text data classification,this paper proposes a medical text classification model CMNN based on bidirectional encoder representations from Transformer(BERT),convolutional neural network(CNN)and Bi-directional long and short-term memory(BiLSTM)neural network.The model uses BERT to train word vectors and combines CNN and BiLSTM to capture local latent features and contextual information.Finally,the proposed model is compared with the traditional deep learning models TextCNN and TextRNN in terms of accuracy,precision,recall and F1 score.The experimental results show that the CMNN model outperforms other models on all evaluation metrics,and the accuracy is improved by 1.69%~5.91%.
作者
许浪
李代伟
张海清
唐聃
何磊
于曦
XU Lang;LI Dai-wei;ZHANG Hai-qing;TANG Dan;HE Lei;YU Xi(School of Software Engineering,Chengdu University of Information Technology,Chengdu 610225;Sichuan Province Engineering Technology Research Center of Support Software of Informatization Application,Chengdu 610225;Stirling College,Chengdu University,Chengdu 610106,China)
出处
《计算机工程与科学》
CSCD
北大核心
2023年第6期1116-1122,共7页
Computer Engineering & Science
基金
欧盟项目(598649-EPP-1-2018-1-FR-EPPKA2-CBHE-JP)
国家自然科学基金(61602604)
四川省科技厅项目(2021YFH0107,2022YFS0544,2022NSFSC0571)。