期刊文献+

垃圾商品评论信息的识别研究 被引量:33

Research on Review Spam Recognition
原文传递
导出
摘要 从信息有用性的角度对垃圾商品评论信息进行分析,选择数码领域的相机评论作为研究对象,构建数据集,从评论、评论者和被评论的商品三个方面选择11个特征,使用支持向量机模型中4种常用的核函数进行垃圾商品评论的识别,对识别效果较好的RBF核函数中的参数C和γ进行优化,使得商品评论中的垃圾评论识别的准确率提高到78.16%,召回率提高到72.18%,并选取4种不同特征组合进行对比,证明评论、评论者和被评论的商品三大特征组合的效果最好,最后通过与Logistic回归模型的对比,验证SVM对垃圾评论的识别效果明显优于其他算法。 This paper analyses review spam from the perspective of the usefulness of information, selects digital camera reviews as the research object and builds the data set, then from the three aspects of review, reviewer and product chooses 11 features, uses 4 different kernel functions in SVM model to identify review spam of products, optimizes the parameters C and γ of RBF that has a better identification, which improves accuracy rate of the identification effect of review spain to 78.16% and recall rate to 72.18%. By comparing the selected 4 different combinations of features, the authors find the combination of review, reviewer and product is the best. Finally, it proves that SVM is significantly better than other algo- rithms compared to the Logistic Regression.
作者 李霄 丁晟春
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第1期63-68,共6页 New Technology of Library and Information Service
基金 国家自然科学基金项目"基于文本语义挖掘的商品评论信息可信度分析研究"(项目编号:71103085) 教育部人文社会科学研究规划基金项目"基于语义的电子商务产品主/客观信息提取研究"(项目编号:09YJA870015)的研究成果之一
关键词 SVM 垃圾评论 特征选择 核函数 商品评论信息 SVM Review spare Feature selection Kernel function Product review
  • 相关文献

参考文献2

二级参考文献13

  • 1Brooks C H, Montanez N. Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering[C]//Proc. of the 15th International Conference on World Wide Web. New York, USA: ACM Press, 2006: 625-632. 被引量:1
  • 2Kolari E Detecting Spam Blogs: A Machine Learning Approach[C]//Proc. of the 21st National Conference on Artificial Intelligence. Maryland, USA: [s. n.], 2006: 1351-1356. 被引量:1
  • 3Niu Yuan. A Quantitative Study of Forum Spamming Using Context-based Analysis[C]//Proc. of the 14th Annual Network and Distributed System Security Symposium. San Diego, CA, USA: [s. n.], 2007: 79-92. 被引量:1
  • 4Hoad T, Zobel J. Methods for Identifying Versioned and Plagiarised Documents[J]. Journal of the American Society of Information Science and Technology, 2003, 54(3): 203-215. 被引量:1
  • 5Niu Yuan.A quantitative study of forum spamming using contextbased analysis[C]//Proeeedings of the 14th Annual Network and Distributed System Security Symposium,San Diego,CA,2007:79-92. 被引量:1
  • 6Mishne G,Carmel D.Blocking blog spam with language model disagreement[C]//Proceedings of the 1st AIRWeb.New York:ACM, 2005 : 1-6. 被引量:1
  • 7Kolari P.Detecting spam blogs:A machine learning approach[C]// Proceedings of the 21st National Conference on Artificial Intelligence.Baltimore : University of Maryland, 2006 : 1351-1356. 被引量:1
  • 8Lin Yu-ru.Splog detection using self-similarity analysis on blog temporal dynamics[C]//Proceedings of AIRWeb 2007.New York: ACM, 2007 : 1-8. 被引量:1
  • 9Brooks C H,Montanez N.Improved annotation of the blogosphere via autotagging and hierarchical clustering[C]//Proceedings of the 15th International Conference on World Wide Web.New York: ACM, 2006 : 625-632. 被引量:1
  • 10Lin C J,Weng R C,Keerthi S S.Trust region newton methods for large-scale logistic regression[C]//Proceedings of the 24th International Conference on Machine Learning.New York:ACM,2007: 561-568. 被引量:1

共引文献16

同被引文献359

引证文献33

二级引证文献190

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部