期刊文献+

基于改进TF-IDF特征的中文文本分类系统 被引量:12

A Chinese Text Classification System Based on Improved TF-IDF Feature
下载PDF
导出
摘要 随着Internet技术的发展,人们不仅可以从网络获取信息,也能够在网络上表达个人观点、分享自身体验。自Web2.0以来网络已经由原来的阅读式网络转换成为了当今的交互式网络。而伴随网络发展的是成几何速率增长的网络信息。文本信息是网络信息的重要组成部分,不同文本信息可以分成新闻、娱乐、时评、财经等不同类别。进行中文文本分类不仅能为建立文本语料库提供便利还能够应用到其它数据挖掘领域。论文基于改进TF-IDF特征并结合SVM模型设计了一种自动化的中文文本分类系统。实验证明,对比传统特征提取方式,采用改进TF-IDF特征策略进行文本分类能够获得更高的准确度。 With the development of Internet technology,people can not only obtain information from the Internet,but also express personal opinions and analyze their own experiences on the Internet. Since Web2.0,the network has been transformed from the original reading network to today’s interactive network. What’s more,with the development of network,the network information of geometric growth rate is growing. Text information is an important part of network information. Different text information can be divided into different categories such as news,entertainment,commentary,finance and so on. Chinese text classification can not only facilitate the establishment of a text corpus,but also can be applied to other data mining areas. In this paper,an automatic Chinese text classification system is designed based on improved TF-IDF features and SVM model. Experiments show that the classification system constructed by machine learning algorithms can achieve high degree of accuracy and meets practical needs.
作者 但唐朋 许天成 张姝涵 DAN Tangpeng;XU Tiancheng;ZHANG Shuhan(School of Computer,Central China Normal University,Wuhan 430079)
出处 《计算机与数字工程》 2020年第3期556-560,共5页 Computer & Digital Engineering
基金 华中师范大学国家级大学生创新创业训练计划(编号:201810511002) 华中师范大学院级大学生创新创业训练计划(编号:CA20180418221834349C)资助。
关键词 文本分类 自然语言处理 BOW模型 机器学习 改进TF-IDF特征 text classification natural language processing BOW model machine learning improved TF-IDF feature
  • 相关文献

参考文献8

二级参考文献89

共引文献320

同被引文献117

引证文献12

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部