摘要
数据简化的目的是简化数据集并保留有用的分类结构 .本文提出一个基于空间分隔的数据简化和分类算法 ,该算法将常规数据库的记录映射到多维空间上 ,从而将数据简化过程转变成在多维空间中同类数据的空间合并问题 ,也就是多维空间中不同类数据的空间分隔问题 ,最终得到一系列分隔空间 ,达到数据简化和分类的作用 .该方法用现实世界的 7个数据集进行评估 ,并与 C4.5所获得的结果进行比较 ,效果是显著的 。
The proposal of data reduction is to make data sets smaller but preserves classification structures of interest. A novel approach to data reduction and classification based on spatial partition is proposed in this paper. This algorithm projects conventional database relations into multidimensional space. The advantage of this approach is to change the data reducing process into spatial merging process of data in same class, as well as spatial partitioning process of data in different classes in multidimensional space. A series of partitioned regions are obtained eventually and can be easily used in data classification. The proposed method was evaluated using 7 real world data sets. The results were quite remarkable compared with those obtained by C4.5.
出处
《小型微型计算机系统》
CSCD
北大核心
2002年第4期456-459,共4页
Journal of Chinese Computer Systems
基金
中英文化交流基金资助 ( ACADEMIC L INKS WITH CHINA SCHEME ( ALCS) Ref:CTN/ 992 / 2 44)
关键词
数据简化
相似性
多维空间
数据分类
空间分割
数据库
data reduction
classification
similarity
multidimensional space
partition
hyper relation