摘要
该文首先介绍了文本过滤模型的特点以及发展状况。针对传统信息过滤处理方式无法满足现阶段海量数据环境下业务需求这一现状,该文提出了一种基于MapReduce框架的文本数据过滤模型,实现了传统的向量空间模型的分布式扩展。在实际环境中的测试表明,该模型的过滤精度和速度都较为理想,较好的满足了用户的需求。
This paper first describes the characteristics and development of text filtering model. For the traditional information filtering approach cannot meet the present needs of the business environment of massive data, the paper presents a text data filtering framework model based on MapReduce to expand the traditional vector space model to the distributed environment. Tests in the real world showed that the model's accuracy and speed of filtration is ideal, can meet the needs of users.
出处
《信息网络安全》
2011年第9期91-93,119,共4页
Netinfo Security
基金
国家高技术研究发展计划(863计划)资助项目(2010AA012505
2011AA010702)
国家自然科学基金重点课题资助项目(60933005
60873204)