摘要
在给定目标词及其所属框架的条件下,汉语框架语义角色标注可以分为语义角色识别和角色分类两个步骤。该文将此任务通过IOB2标记策略形式化为词序列标注问题,以词为基本标注单元,采用条件随机场模型进行自动标注实验。先对语料使用清华大学的基本块自动分析器进行分析,提取出15个块层面的新特征,并将这些特征标记形式化到词序列上。以文献[20]已有的12个词层面特征以及15个块层面特征共同构成候选特征集,采用正交表方法来选择模型的最优特征模板。在与文献[20]相同的语料上,相同的3组2折交叉验证实验下,语义角色标注的总性能的F1-值比文献[20]的F1-值提高了近1%,且在显著水平0.05的t-检验下显著。实验结果表明:(1)基于词序列模型,新加入的15个块层面特征可以显著提高标注模型的性能,但这类特征主要对角色分类有显著作用,对角色识别作用不显著;(2)基于词序列的标注模型显著好于以基本块为标注单元以及以句法成分为标注单元的标注模型。
Given a predicate word and its frame, semantic role labeling of Chinese FrameNet can be divided into two steps: the boundary identification of semantic roles and the classification of semantic roles. In this paper, these tasks are formalized onto the word sequential labeling problem through lOB2 strategy. We apply conditional random field model to automatic labeling experiment with word as the basic tagging unit. We extract 15 new base chunk fea- tures by applying the base chunk parser of Tsinghua University to automatic parsing on sentences, and the features are formalized onto the word sequence. Experiments show that the Fl-value of the total performance of semantic roles labeling increases by nearly 1% in comparison with the baseline, which is significant under 0.05 significance level of the t-test.
出处
《中文信息学报》
CSCD
北大核心
2014年第3期36-47,共12页
Journal of Chinese Information Processing
基金
国家自然科学基金(60873128)