摘要
从信息有用性的角度对垃圾商品评论信息进行分析,选择数码领域的相机评论作为研究对象,构建数据集,从评论、评论者和被评论的商品三个方面选择11个特征,使用支持向量机模型中4种常用的核函数进行垃圾商品评论的识别,对识别效果较好的RBF核函数中的参数C和γ进行优化,使得商品评论中的垃圾评论识别的准确率提高到78.16%,召回率提高到72.18%,并选取4种不同特征组合进行对比,证明评论、评论者和被评论的商品三大特征组合的效果最好,最后通过与Logistic回归模型的对比,验证SVM对垃圾评论的识别效果明显优于其他算法。
This paper analyses review spam from the perspective of the usefulness of information, selects digital camera reviews as the research object and builds the data set, then from the three aspects of review, reviewer and product chooses 11 features, uses 4 different kernel functions in SVM model to identify review spam of products, optimizes the parameters C and γ of RBF that has a better identification, which improves accuracy rate of the identification effect of review spain to 78.16% and recall rate to 72.18%. By comparing the selected 4 different combinations of features, the authors find the combination of review, reviewer and product is the best. Finally, it proves that SVM is significantly better than other algo- rithms compared to the Logistic Regression.
出处
《现代图书情报技术》
CSSCI
北大核心
2013年第1期63-68,共6页
New Technology of Library and Information Service
基金
国家自然科学基金项目"基于文本语义挖掘的商品评论信息可信度分析研究"(项目编号:71103085)
教育部人文社会科学研究规划基金项目"基于语义的电子商务产品主/客观信息提取研究"(项目编号:09YJA870015)的研究成果之一