摘要
比较句是表明事物之间关系的常见表达方式,对于文本挖掘,特别是情感分析,具有重要的价值。目前汉语比较句的研究还是一个新颖的课题,包括汉语比较句的识别和比较关系的抽取。对于汉语比较句的识别,在前人研究的基础上,以SVM为分类器,以特征词和CSR序列规则为特征,同时利用CRF算法抽取实体对象,并增加以实体对象的信息作为特征,显著提高了比较句识别的准确率、召回率和F-度量,最高分别达到96.55%、88.63%和92.43%。对于汉语比较关系的抽取,在CRF算法抽取实体对象的基础上,通过定义一些规则,抽取比较主体和比较基准,也取得了较好的效果,其中比较主体的抽取效果要好于比较基准。
Comparative sentences are a common kind of expression to indicate the relations of different objects. They are valuable for text mining, especially for opinion mining. It is a novel research to identify Chinese comparative sentences and extract comparative relations. To identify Chinese comparative sentences, this paper took SVM as classifier and regarded keywords and class sequential rule as feature based on the previous research, and then used CRF algorithm to identify entity and also took the entity’s information as feature. Finally, remarkably improve the precision, recall and F-measure for identifying comparative sentences and got the result up to 96.55%, 88.63% and 92.43% respectively. To mine comparative relations, extracted comparative subject and objected by defining some rules together with the result of CRF algorithm for identifying entity, and obtained good result. And the result to extract comparative subject is better than comparative object.
出处
《计算机应用研究》
CSCD
北大核心
2010年第6期2061-2064,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(60773087)
关键词
比较句
比较关系
CRF模型
比较主体
比较基准
comparative sentence
comparative relation
CRF model
comparative subject
comparative object