摘要
话语标记作为一种常见的话语现象,已成为话语分析研究的重要课题。由于研究角度不同,人们对于话语标记的认识和分类至今仍存在较大差异。该文从语体的角度提出假设,认为话语标记具有一定的语体特征。为准确描写话语标记的语体特征,提出了"语体度"的概念。通过对采样话语标记在不同语体的语料中分布情况进行定量分析,证实了相当一部分话语标记具有明显的语体特征,并根据分析结果选择特征向量,采用Rocchio分类法对开放文本进行自动语体分类实验,正确率达到82.9%。事实证明话语标记的语体特征对文本分类具有一定的参考价值。
As a common discourse phenomenon, discourse markers have become an important subject in the discourse analysis. Due to various research perspectives, there still exist substantial differences in the perception and classification of discourse markers. From the perspective of style, this paper proposes the concept of "style degree" for the discourse marker, hypothesizing it bears certain stylistic features. The distribution of sampling discourse markers in the corpus of different styles is found with obvious distinction, and the Rocchio method based on these markers classify the text with a precision of 82.9%. It is concluded that the stylistic feature of discourse markers is a valuable in the text classification.
出处
《中文信息学报》
CSCD
北大核心
2009年第4期34-39,共6页
Journal of Chinese Information Processing
关键词
计算机应用
中文信息处理
话语标记
语体特征
语体度
相似度
文本分类
computer application
Chinese information processing
discourse marker
stylistic feature
style degree
similarity
classification of texts