摘要
复句是自然语言的基本单位之一,复句的判定及其语义关系的识别,对于句法解析、篇章理解等都有着非常重要的作用。基于神经网络模型识别自然语料中的复句,判断其复句关系,构造复句判定和复句关系识别联合模型,以最大程度地减少误差传递。在复句判定任务中通过Bi-LSTM获得上下文语义信息,采用注意力机制捕获句内跨距离搭配信息,利用CNN捕获句子局部信息。在复句关系识别任务中,使用Bert增强句子的语义表示,运用Tree-LSTM对句法结构和成分标记进行建模。在CAMR中文语料上的实验结果表明,基于注意力机制的复句判定模型F1值达到91.7%,基于Tree-LSTM的复句关系识别模型F1值达到69.15%。在联合模型中,2项任务的F1值分别达到92.15%和66.25%,说明联合学习能够使不同任务获得更多特征,从而提高模型性能。
Complex sentence is one of the basic units in natural languages.The identification of complex sentences and the recognition of their semantic relations are crucial to syntactic parsing and text understanding.In this study,a neural network model is used to recognize the complex sentences in texts and determine the relationships between them.A model is constructed for the joint recognition of complex sentences and their semantic relations to minimize the propagation of errors.For recognition of complex sentences,a Bi-LSTM model is used to obtain sentence-level contextual semantic information,an attention mechanism to capture the cross-distance collocation information within a sentence,and a Convolutional Neural Network(CNN)to capture the local information of the sentences.For recognition of complex sentence relationships,Bert is used to enhance the semantic representation of sentences,and Tree-LSTM is used to model syntactic structure and component tags.The experimental results on the Chinese corpus dataset,CAMR,show that the F1 value of the attention mechanism-based model reaches 91.7%in complex sentence recognition,and that of the Tree-LSTM-based model reaches 69.15%in recognition of complex sentence relationships.The F1 value of the joint model reaches 92.15%and 66.25%in the two tasks respectively,which proves that joint learning increases the number of obtained features and thus improves the model performance.
作者
贾旭楠
魏庭新
曲维光
顾彦慧
周俊生
JIA Xunan;WEI Tingxin;QU Weiguang;GU Yanhui;ZHOU Junsheng(School of Computer Science and Technology,Nanjing Normal University,Nanjing 210023,China;International College for Chinese Studies,Nanjing Normal University,Nanjing 210097,China;School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第11期54-61,共8页
Computer Engineering
基金
国家自然科学基金“汉语抽象意义表示关键技术研究”(61772278)
江苏省高校哲学社会科学基金“面向机器学习的汉语复句语料库建设研究”(2019JSA0220)。
关键词
复句判定
神经网络
复句关系识别
联合模型
语义建模
complex sentence identification
neural network
complex sentence relation recognition
joint model
semantic modeling