摘要
产品属性的自动抽取是情感分析中的重要研究内容.文中提出一种基于特征选择和词频及点互信息剪枝的产品属性提取方法.首先引入在分类任务中常用的l1-norm正则化(Lasso)方法,将产品属性抽取问题转换为分类中的特征选择问题,利用Lasso生成稀疏模型的特性,将模型中少量的特征作为产品特征属性候选集.然后根据候选特征属性集中的特征属性在文本中出现的频率进行排序并剪枝.最后经过进一步合并和点互信息剪枝处理,得到最终的产品属性集.在中文产品评论集上的实验证实文中方法的有效性.
Product attribute extraction is a key point in sentiment analysis. In this paper, a product attribute extraction method based on feature selection and pointwise mutual information pruning strategies is proposed. Firstly, the extraction task is transferred to a feature selection task in a classifier. The classification model with l1-norm regularization, such as Lasso, can encourage a sparse model with fewer important selected features. Secondly, some extracted features are selected through a frequency threshold. The features as the product attributes are finally generated with point mutual information pruning . The experiments on the product reviews in Chinese demonstrate the effectiveness of the proposed method.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2015年第2期187-192,共6页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.61003112
61170181)
国家社会科学基金重点项目(No.11AZD121)
江苏省自然科学基金项目(No.BK2011192)资助
关键词
情感分析
产品属性提取
l1-norm正则化
点互信息剪枝
Sentiment Analysis
Product Attribute Extraction
l1-norm Regularization
Pointwise Mutual Information Pruning