摘要
针对网络舆情数据存在数据量大、分散度高、数据非结构化等特点,而常用的文本分类算法难以实现对网络舆情快速、准确分类的问题,因此提出一种基于Hadoop平台的并行k NN网络舆情分类算法,利用Hadoop分布式存储特性和设计并行k NN的MapReduce程序来解决处理大批量数据时存在的问题。对并行k NN算法进行分类能力和分类效率进行测试验证,实验结果表明,基于Hadoop平台的并行k NN网络舆情分类算法在处理大批量网络舆情数据时,能够快速、高效和准确对网络舆情数据进行分类。
According to the characters of Network Public Opinion data,which are volume,high-distribution and non-structured data,the traditional text classification algorithm is diffiuclt to achieve of accurate and fast classification,so a parallel k NN network public opinion classification algorithm was presented based on Hadoop platform. The use of Hadoop distributed storage features and design of parallel k NN MapReduce program to solve the problem of dealing with high-volume data. The results show that the parallel k NN network public opinion classification algorithm based on Hadoop platform can classify the network public opinion data quickly,efficiently and accurately when dealing with high-volume network public opinion data.
作者
杜少波
DU Shaobo(School of Computer & Information Engineering,Guizhou University of Commerce,Guiyan 550014,China)
出处
《电视技术》
2018年第3期58-62,共5页
Video Engineering
基金
贵州省教育厅青年科技人才成长项目(黔教合KY字[2016]235
黔教合KY字[2016]240)
贵州省教育厅教学内容和课程体系改革项目(SJ-JXGC-KC-002)
贵州省普通高等学校工程研究中心(黔教合KY字[2016]016
黔教合KY字[2017]022)
贵州省普通高等学校科技拔尖人才支持计划(黔教合KY字[2016]086)