摘要
采用一个自建的汉语篇章结构语料库(隐式关系占80%)进行隐式关系识别。语料中将篇章关系分成3个层次,第一层包含因果、并列、转折、解说四大类。在此语料上,利用上下文特征、词汇特征、依存树特征,采用最大熵的分类方法对四大类关系进行识别。实验结果显示,总正确率为62.15%,其中并列类识别效果最好,F1值达到75.26%。
The authors use a self-built Chinese Discourse Treebank (80% relations are implicit) to recognize implicit relations. In this corpus, discourse relations are divided into three layers, the first layer has four types: causality, coordination, transition and explanation. Based on this corpus, maximum entropy classifier is employed to identify four types relations with context, lexical and dependency parse features. Experimental results show that total accuracy is 62.15% and the identification effect of coordination is the best, F 1 reaches 75.26%.
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2014年第1期111-117,共7页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
863计划(2012AA011102)
国家自然科学基金(61273320)
教育部人文社会科学青年基金(13YJC740022)资助
关键词
篇章结构分析
篇章关系
隐式关系识别
汉语篇章语料库
discourse parsing
discourse relation
implicit relation recognition
Chinese Discourse Treebank