摘要
针对英文产品方面属性词抽取,提出了一种基于Bootstrapping的抽取方法。该方法利用少数几个种子模板,通过增量迭代的过程发现新的属性词,在每一轮迭代中通过统计技术,结合情感词典的情感词分析,利用属性词与模板的亲密度关系得到属性词被抽取出的概率得分,对候选属性词进行排序过滤。对于抽取后的特征词集利用Wordnet计算属性词间的相似度,根据得分进行聚类,得到产品不同方面的属性词类簇,同时过滤掉得分较低的类簇,进一步去掉噪声。此外还利用种子模板代替种子属性词以提高系统的可移植性。实验结果表明,利用该方法进行产品方面属性词抽取的准确率为0.799,召回率为0.779,调和平均值为0.789,具有较好的抽取性能。
An feature extraction method based on Bootstrapping in English product comment was proposed. By this method, starting with a set of extraction patterns as seeds, and then applying an incremental iterative procedure to find new features. During the process of the each iteration, the system ranks the new features by score, which is calculated by the intimacy relationship between the candidate features and patterns. This is useful for prevent topic drift. After ex- tracting features, WordNet is used to calculate the similarity between features. Then clustering the features by the simi- larity score, get different aspects of the product features, then filtering out the low score of the class clusters, remove noise. What's more, to improve the portability of the system, the seed features are replaced by seed patterns. Experi- mental results show that extracting features by this method has a good result, the precision, recall and F-measure reach 0. 799, 0. 779, 0. 789 and it has good extraction performance.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2014年第12期23-29,共7页
Journal of Shandong University(Natural Science)
基金
高等学校学科创新引智计划(111计划)项目(B08004)
科技重大专项项目(2011ZX03002-005-01)
国家自然科学基金资助项目(61273217)
博士点基金资助项目(20130005110004)