摘要
长文本特征抽取是语义理解和关键信息抽取领域的研究热点,如何从长文本中抽取有效信息并进行长文本之间的相似度计算一直是自然语言处理的主要研究方向之一。基于此,提出了一种基于分层门控神经网络的长文本摘要相似度模型。该模型主要分为两个部分:(1)基于BiLSTM的摘要生成,在BiLSTM模型的基础上加入多头注意力机制,使模型可以提取到更加深层次的特征;(2)基于摘要的文本相似度计算,将传统的相似度分类模型转变成回归模型,采用多层BiLSTM对生成的摘要进行特征提取并加入自适应因子作为门控,控制每层BiLSTM信息量的输出。实验结果表明该算法能够实现对长文本的特征提取,同时能够基于提取出的特征,利用余弦距离进行相似度比较。
Long text feature extraction is a research hotspot in the field of semantic understanding and key information extraction,and how to extract effective information from long text and calculate the similarity between long texts has always been one of the main research directions of natural language processing.Based on this,this paper proposes a long text abstract similarity model based on hierarchical gated neural network.The model is mainly divided into two parts:a)the abstract generation based on BiLSTM,and the multi-head-attention mechanism is added on the basis of the BiLSTM model,so that the model can extract deeper features.b)Text similarity calculations based on the abstract,the traditional similarity classification model is transformed into a regression model,and the multi-layer BiLSTM is used to extract the features of the generated abstract and add adaptive factors as gating to control the output of each layer of BiLSTM information.
出处
《工业控制计算机》
2024年第6期58-60,62,共4页
Industrial Control Computer