The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall...The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.展开更多
灰狼优化算法(Grey Wolf Optimization,GWO)是一种新型的群智能优化算法。与其他智能优化算法类似,该算法仍存在收敛速度慢、容易陷入局部极小点的缺点。针对这一问题,提出了具有自适应搜索策略的改进算法。为了提高算法的收敛速度和优...灰狼优化算法(Grey Wolf Optimization,GWO)是一种新型的群智能优化算法。与其他智能优化算法类似,该算法仍存在收敛速度慢、容易陷入局部极小点的缺点。针对这一问题,提出了具有自适应搜索策略的改进算法。为了提高算法的收敛速度和优化精度,通过适应度值控制智能个体位置,并引入了最优引导搜索方程;另一方面,为提高GWO的种群多样性,改进算法利用位置矢量差随机跳出局部最优。最后对10个标准测试函数进行了仿真实验,并与其他4种算法进行了比较,统计结果和Wilcoxon符号秩检验结果均表明,所提出的改进算法在收敛速度以及搜索精度方面具有明显优势。展开更多
基金Project(KC18071)supported by the Application Foundation Research Program of Xuzhou,ChinaProjects(2017YFC0804401,2017YFC0804409)supported by the National Key R&D Program of China
文摘The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.
文摘灰狼优化算法(Grey Wolf Optimization,GWO)是一种新型的群智能优化算法。与其他智能优化算法类似,该算法仍存在收敛速度慢、容易陷入局部极小点的缺点。针对这一问题,提出了具有自适应搜索策略的改进算法。为了提高算法的收敛速度和优化精度,通过适应度值控制智能个体位置,并引入了最优引导搜索方程;另一方面,为提高GWO的种群多样性,改进算法利用位置矢量差随机跳出局部最优。最后对10个标准测试函数进行了仿真实验,并与其他4种算法进行了比较,统计结果和Wilcoxon符号秩检验结果均表明,所提出的改进算法在收敛速度以及搜索精度方面具有明显优势。