摘要
数据分类是数据挖掘技术在医疗数据分析中的一个重要应用,在分析了医疗数据特点后,以大肠早癌诊断数据为例,提出了利用计数最近邻算法对其进行分类的思想;同时在分析该算法性能的基础上,提出了基于检索树和样本密度的计数最近邻新算法对改数据进行分析,以检索树的构建来提高原算法的计算效率,基于全局密度、K-密度的改进算法来提高原算法的精确度。通过实验证明新算法在大肠早癌的数据分析中,其计算复杂度、存储空间和数据分类精确度都得到了较大的提高,同时新算法适应于数值数据、文本数据以及混合数据的分类。
Data classification is an important data mining role in biomedicine.This paper proposes a method to analyze Colorectal Carcinoma diagnosis data based on counting KNN algorithm after analyzing the characteristics of biomedicine data.Though the count-weight-k-nearest neighbours for classification is simple and effective,it doesn't deal with biomedicine data well.After analyzing the algorithm performance,an novel counting KNN algorithm by index tree and sample density is presented.The new method improves the accuracy of classification by using different algorithms of overall density and K-local density,and also improves efficiency by using a tree structure index.Experimefits show that this method outperforms the distance-based voting KNN, and CwKNN.More importantly it is a single method that works for ordinal,nominal or mixed data.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第20期208-211,共4页
Computer Engineering and Applications
基金
国家自然科学基金(the National Natural Science Foundation of China under Grant No.60776834)
湖南省自然科学基金(the Natural Science Foundation of Hunan Province of China under Grant No.06JJ50143)