期刊文献+

基于双重并行计算模型的TFIDF算法 被引量:2

TFIDF algorithm based on dual parallel calculation model
下载PDF
导出
摘要 针对大数据集下文本分类算法在单机上实现效率低下的问题,提出基于GPU(graphic processing unit)和MapReduce技术的双重并行计算的云计算框架。通过构造双重并行计算的自适应计算过程,结合TFIDF(term frequency inverse document frequency)改进算法的特点,实现基于双重并行自适应计算模型的改进TFIDF算法。实验中,在不同的运行环境下对改进TFIDF算法的运行效率进行对比分析,比较不同计算节点下算法的执行效率,实验结果表明,改进TFIDF算法可实现对海量数据的高速有效处理,随着节点数量的增加,双重并行自适应计算下,算法执行效率更加高效。 Text classification algorithm achieves the low efficiency for the large data sets on the stand-alone.The double parallel cloud computing framework based on GPU and MapReduce was put forward.The improved TFIDF text categorization algorithm with double parallel adaptive computing was realized by constructing the adaptive computation process of double parallel computing and combining the advantage of improved TFIDF algorithm.In the experiment,the efficiency of improved TFIDF algorithm was compared in different operating environments.The algorithm execution efficiency was also compared with different computing nodes in the meantime.The results show that massive data can be processed in high-speed and effectively using improved TFIDF algorithm adopting double parallel adaptive computing.With the increase of the number of nodes,the algorithm execution efficiency with double parallel adaptive computing is more effective.
作者 孙玉强 巢碧霞 SUN Yu-qiang CHAO Bi-xia(School of Information Science and Engineering, Changzhou University, Changzhou 213164, China)
出处 《计算机工程与设计》 北大核心 2016年第11期3016-3021,共6页 Computer Engineering and Design
关键词 TFIDF改进算法 MAPREDUCE模型 图形处理器 并行计算 自适应 improved TFIDF algorithm MapReduce model GPU parallel computing adaptive
  • 相关文献

参考文献7

二级参考文献61

共引文献236

同被引文献18

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部