摘要
随着大数据的爆发,如何提高算法的执行效率是大数据分类的研究热点,Spark是分布式并行计算框架,支持迭代数据流,该文对朴素贝叶斯文本分类算法作并行流式化处理,实验证明,并行流式化Bayes分类算法能有效提高大数据分类效率。
With the big data burst,how to improve the execution efficiency of the algorithm is the research focus of big data classification,Spark is the distributed parallel computing framework,support the iterative data flow. In this paper,the naive Bayes text classification algorithm is used in parallel flow processing. Experiments show that the parallel flow type Bayes classification algorithm can effectively to improve the efficiency of data classification.
作者
张睿敏
张琪淼
杜叔强
贾桂霞
ZHANG Ruimin;ZHANG Qimiao;DU Shuqiang;JIA Guixia(Department of Software,Lanzhou Institute of Technology,Lanzhou 730050,China;Lanzhou Municipal Public Security Bureau,Lanzhou 730030,China)
出处
《工业仪表与自动化装置》
2018年第3期116-118,123,共4页
Industrial Instrumentation & Automation
基金
2016年度甘肃省高等学校科研项目自筹经费项目(2016B-115)