期刊文献+

基于半监督主动学习的虚假评论检测 被引量:2

Detection of Fake Reviews Based on Semi-Supervised Active Learning
原文传递
导出
摘要 基于有监督的虚假评论检测方法受限于标注语料的规模,为了更好地利用未标注评论数据来提高分类器的正确率和泛化能力,本文提出一种基于半监督主动学习的虚假评论检测方法.首先,定义并提取评论内容特征以及评论者行为特征,结合这两类特征来对虚假评论进行检测.然后,采用基于熵的主动学习算法选择对学习最有帮助的评论样本,获得其类别标注,将其合并到基于Tri-training的半监督学习算法的训练集中,利用大量未标注评论数据进行学习,提升分类器性能.最后,在领域评论数据集上进行实验,结果表明,将半监督学习与主动学习相结合,能够更有效的利用未标注评论数据,从而有效地提高虚假评论检测的效果. Detection of fake reviews based on supervision is limited by the size of the annotation corpus. In order to make better use of unlabeled review data to improve the classifier's accuracy and generalization ability,a fake review detection method based on semi-supervised active learning is proposed in this paper. Firstly,review content features and reviewers' behavioral features are defined,extracted and combined to detect fake reviews. Secondly,entropy-based active learning algorithm is utilized to select the most helpful review samples for learning,and to obtain their labeled categories that will be merged into the semi-supervised learning training set based on Tri-training algorithm,which exploits a large number of unlabeled reviews to learn and improves the performance of the classifier. Finally,a test is carried out on domain review datasets. The experimental results show that the combination of semi-supervised learning and active learning takes effective advantage of the unlabeled reviews to improve the detection.
出处 《昆明理工大学学报(自然科学版)》 CAS 2015年第5期59-65,共7页 Journal of Kunming University of Science and Technology(Natural Science)
基金 国家自然科学基金项目(61175068 61462055) 云南省自然科学基金重点项目(2013FA030) 云南省软件工程重点实验室开放基金(2011SE14) 教育部回国人员基金 云南省教育厅基金重大专项资助
关键词 虚假评论 半监督学习 主动学习 TRI-TRAINING fake review semi-supervised learning active learning Tri-training
  • 相关文献

参考文献29

  • 1赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848. 被引量:539
  • 2Jindal N, Liu B. Review Spare Detection[ C]//Proceedings of the 16th International Conference on World Wide Web. 2007: 1189 - 1190. 被引量:1
  • 3Feng S, Banerjee R, Choi Y. Syntactic Stylomet~y for Deception Detection[ C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju, Republic of Korea, 2012:8 - 14. 被引量:1
  • 4Li J W, Cardie C, Li S J. TopicSpam : a Topic - Model - Based Approach for Spare Detection [ C ]//Proceedings of the 51 st An- nual Meeting of the Association for Computational Linguistics. 2013:217 -221. 被引量:1
  • 5Jindal N, Liu B, Lim E P. Finding Unusual Review Patterns Using Unexpected Rules[ Cl//Proceedings of the 19th ACM in- ternational conference on Information and knowledge management. 2010:1549 -1552. 被引量:1
  • 6Lira E P, Nguyeu V A, Jindal N, et al. Detecting Product Review Spammers Using Rating Behaviors[ C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Manazement. New York. USA:f s. n. ]. 2010. 被引量:1
  • 7谭文堂,朱洪,葛斌,李芳芳,肖卫东.垃圾评论自动过滤方法[J].国防科技大学学报,2012,34(5):153-157. 被引量:15
  • 8宋海霞,严馨,余正涛,石林宾,苏斐.基于自适应聚类的虚假评论检测[J].南京大学学报(自然科学版),2013,49(4):433-438. 被引量:33
  • 9Zhu X J. Semi -Supervised Learning Literature Survey[ R]. Technical Report, 1530, Madison, USA:University of Wisconsin at Madison, Department of Computer Sciences, 2006. 被引量:1
  • 10Chapelle O, Scholkopf B, Zien A. Semi- Supervised Learning[ M]. Cambridge, MA: MIT Press, 2006. 被引量:1

二级参考文献77

  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 2Jindal N, I.iu B. Review spare detection. Proceedings of the 16-th International Conference on World Wide Web,2007:1189-1190. 被引量:1
  • 3谭文堂,朱洪,葛斌等.垃圾评论自动过滤方法.同防科技大学学报,2012,34(5):153-157. 被引量:1
  • 4Feng S,Banerjee R,Chai Y J. Syntactic stylometry for deception detection. Proceedings of the 50^th Annual Meeting of the Association for Oomputational I.inguistics, 2012 : 8- 14. 被引量:1
  • 5Jindal N, Liu B, Lim E P. Finding unusual review patterns using unexpected rules. Proceedings of the 19^th ACM International Conference on Information and Knowledge Management. 2010 : 1549- 1552. 被引量:1
  • 6Lira E P,Nguyen V A,Jindal N,et ag. Detecting product review spammers using rating behaviors. Proceedings of the 19^th ACM International Con{erence on Information and Knowledge Man agement, New York, USA : 2010. 被引量:1
  • 7Wang G, Xie S H, Liu B, et al. Identify online store review spammers via social review graph. ACM Transactions on Intelligent Systems and Technology(TIST) ,2012,3(4). 被引量:1
  • 8Xie S H, Wang G, Lin S Y, et al. Review spam detection via temporal pattern discovery. Proceedings of the 18^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2012: 823-831. 被引量:1
  • 9Lappas T. Fake reviews:The malicious perspective. Proceedings of the 17^th International conference on Applications of Natural Language Processing to In- formation Systems, 2012 : 23-34. 被引量:1
  • 10Almela A, Rafael V, Cantos P. Seeing through deception: A computational approach to deceit detection in written communication. Proceedings of the 13^th Conference of the EuropeanChapter of the Association for Computational Linguistics: EACL. 2012: 15-22. 被引量:1

共引文献662

同被引文献22

引证文献2

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部