摘要
开源R软件集成了各种的数据分析和可视化方法,具备强大的数据分析功能和良好的可扩展性,适用于数据挖掘;结合城市主要经济指标的数据挖掘案例,给出了R软件在挖掘过程中各主要阶段的应用方法;数据准备阶段包括数据抽取、数据选择与统计分析应用;挖掘建模阶段给出了聚类和分类的典型挖掘应用;模型评估阶段给出了决策树的评估方法;从简洁的R语言脚本设计和良好的分析效果,展示了R软件的基本特点和在数据挖掘应用中的优势。
R is open source software integrated with various data analysis and visualization methods. It has powerful data analysis ability and good extendibility; therefore it is adapted to data mining. Through the cities' major economic indicators of mining case, the application methods are presented to complete the main data mining procedures. Data preparation includes data extraction, selection and statistic analysis; mining modeling includes cluster and classification application; model evaluation includes the assessing approach for decision tree. From the concise R script design style and excellent analysis effects, the general features of R and its application advantage in data mining are revealed.
出处
《重庆工商大学学报(自然科学版)》
2011年第6期602-607,共6页
Journal of Chongqing Technology and Business University:Natural Science Edition
基金
福建省自然科学基金项目(2008J04005)