摘要
为了从土壤养分数据中分析出其对土壤肥力的贡献,以挖掘出描述土壤肥力状况的知识,本文选择数据挖掘技术的C4.5决策树、K-means和DBSCAN聚类算法,利用农安县三个乡(镇)采集的土壤养分数据,从算法的准确率和时间效率两个方面进行模拟实验,并分别就同一数据集不同算法、同一算法不同数据集两种情况进行对比分析。结果表明:对于同一数据集C4.5与K-means算法准确率和时间效率都较高(精度分别为98.7903%、98.1182%,运行时间分别为0.03s、0.08s),但对于依靠大量数据分析土壤肥力状况以预测未来土壤肥力的变化趋势,显然K-means算法更适合。对于不同数据集的比较,选择DBSCAN算法的效果较好(正确率分别为97.1774%、94.0226%、92.3240%)。上述研究结果为分析土壤肥力状况提供了新的参考依据。
Data mining is the extracting or " mining" knowledge from large amounts of data. In order to analyze its contribution to soil fertility from the soil nutrient data and to dig out the description of the knowledge of soil fertility status. This article select some typical algorithms of data mining technology, such as C4.5 decision tree, K-means and DBSCAN clustering algorithm. Combining soil nutrient data collected from the three town of NongAn, and Simulation from two aspects of the precision rate and time efficiency. Clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. The results showed that C4.5 and K-means algorithm have higher accuracy and time efficiency for comparing the same data set (Precision respectively 98.7903%, 98.1182%, time efficiency respectively 0.03s, 0.08s) . However, relying on large amounts of data for analysis of soil fertility status to predict future trends in fertility, apparently, K-means algorithm is more suitable. DBSCAN algorithm has good effect for comparing different data sets (correct rates were 97.1774%, 94.0226%, and 92.3240%) ; these results provide a new reference for analysis of soil fertility status.
出处
《中国农机化学报》
北大核心
2014年第3期252-255,262,共5页
Journal of Chinese Agricultural Mechanization
基金
国家863项目(2006AA10A309)
国家星火计划(2008GA661003)
吉林省世行项目(2011-Z20)
关键词
数据挖掘
C4
5决策树
聚类算法
土壤肥力
data mining
C4.5 decision tree
clustering algorithm
soil fertility