摘要
汉语功能块描述了句子的基本骨架,是联结句法结构和语义描述的重要桥梁。本文提出了两种不同功能块分析模型:边界识别模型和序列标记模型,并使用不同的机器学习方法进行了计算模拟。通过两种模型分析结果的有机融合,充分利用了两者分析结果的互补性,对汉语句子的主谓宾状四个典型功能块的自动识别性能达到了80%以上。实验结果显示,基于局部词汇语境机器学习算法可以从不同侧面准确识别出大部分功能块,句子中复杂从句和多动词连用结构等是主要的识别难点。
Chinese functional chunks are defined as a series of non-overlapping, non-nested skeleton segments of a sentence, representing the implicit grammatical relations between the sentence-level predicates and their arguments. In this paper, we proposed two statistical models for parsing four main functional chunks in a sentence. In the chunk boundary detection model, we focus on building the sub models based on SVM algorithm for detecting SP (subjectpredicate) and PO (predicate-object) boundaries. In the sequence labeling model, we formulate the chunking task as a sequence labeling problem and base our model on CRF algorithm, By introducing some revision rules, we build a combined parsing model which integrates the advantages of both statistical models and have achieved the best F- Score of 82.93%, 86, 58%, 78.46% and 86.64% for subject, predicate, object and adverb functional chunks respectively. Experimental results show that the complex clauses and serial verb structures are the main recognition difficulties.
出处
《中文信息学报》
CSCD
北大核心
2007年第5期18-24,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(6057318560
520130299)
关键词
计算机应用
中文信息处理
汉语功能块
边界识别模型
序列标记模型
模型融合
computer application
Chinese information processing
functional chunk
boundary recognition model
sequence labeling model
model merging