摘要
通过分析中文话题型微博的文本特点以及人称代词的分布特征,指出该类微博是一种特殊的多人会话文本,其人称代词具有词型较少、各类人称代词使用比重差异较大、用法不规范、常出现泛指与外指等特点,并依据这些特点,提出去伪、特殊命名实体提取、建立话题语用表单、多层面语言信息提取等策略,以及相应的消解方法。实验结果证明该方法可取得较好的效果。
This paper, analyzing the textual features and the distribution of personal pronouns in Chinese topic- oriented microblog, holds that this type of microblog is a special multi-lateral discourse text. The personal pro- nouns show such features as less types, great difference in use percentage, irregular usage, more generic refer- ence and exophora and so on. According to these features, the study proposes some strategies including the fake- removing, the extraction of special naming entity, the establishment of topic-centered sheet and the multi-level abstraction of language information, as well as the corresponding resolutions. The test proves that the strategies can ensure the better effects.
出处
《海南大学学报(人文社会科学版)》
CSSCI
2014年第2期119-126,共8页
Journal of Hainan University (Humanities & Social Sciences)
关键词
话题型微博
人称代词
指代功能分类
消解
topic-oriented microblog
personal pronoun
classification of referential function
resolution