摘要
预训练语言模型在机器阅读理解领域具有较好表现,但相比于英文机器阅读理解,基于预训练语言模型的阅读理解模型在处理中文文本时表现较差,只能学习文本的浅层语义匹配信息。为了提高模型对中文文本的理解能力,提出一种基于混合注意力机制的阅读理解模型。该模型在编码层使用预训练模型得到序列表示,并经过BiLSTM处理进一步加深上下文交互,再通过由两种变体自注意力组成的混合注意力层处理,旨在学习深层语义表示,以加深对文本语义信息的理解,而融合层结合多重融合机制获取多层次的表示,使得输出的序列携带更加丰富的信息,最终使用双层BiLSTM处理输入输出层得到答案位置。在CMRC2018数据集上的实验结果表明,与复现的基线模型相比,该模型的EM值和F1值分别提升了2.05和0.465个百分点,能够学习到文本的深层语义信息,有效改进预训练语言模型。
The pre-training language model performs well in the field of machine reading comprehension.Compared with English machine reading comprehension,the reading comprehension model based on the pre-training language model performs slightly worse in processing Chinese text and can only learn the shallow semantic matching of the text.To improve the ability of the model to understand Chinese text,this paper proposes a Chinese machine reading comprehension model based on hybrid attention mechanism.The model uses the pre-training model to obtain the sequence representation in the coding layer and further deepens the context interaction through BiLSTM processing.Then,this is processed by a hybrid attention layer comprising two variants of self-attention mechanism,which aims to learn the deep semantic representation,to deepen the understanding of the text semantic information.Further,the fusion layer combines multiple fusion mechanisms to obtain the multi-level representation,making the output sequence carry more rich information.Finally,after double BiLSTM processing,input output layer to get the answer position.The experimental results on CMRC2018 dataset show that the EM and F1 values of this model are increased by 2.05 and 0.465 percentage points,respectively,compared with those of the baseline model.This enables to learn the deep semantic information of the text and effectively improve the pre-trained language model.
作者
刘高军
李亚欣
段建勇
LIU Gaojun;LI Yaxin;DUAN Jianyong(School of Information,North China University of Technology,Beijing 100144,China;CNONIX National Standard Application and Promotion Laboratory,North China University of Technology,Beijing 100144,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第10期67-72,80,共7页
Computer Engineering
基金
国家自然科学基金(61972003,61672040)。