摘要
高维分类数据的处理一直是数据挖掘研究所面临的巨大挑战.传统聚类算法主要针对低维连续性数据的聚类,难以处理高维分类属性数据集.本文提出一种处理高维分类数据集的子空间聚类算法(FP-Tree-based SUBspace clustering algorithm,FPSUB),利用频繁模式树将聚类问题转化为寻找属性值的频繁模式发现问题,得到的频繁模式即候选子空间,然后基于这些子空间进行聚类.针对真实数据集的实验结果表明,FPSUB算法比其他算法具有更高的准确度.
High-dimensional categorical datasets play an important role, so it's significant to cluster these datasets. However, traditional clustering algorithms mainly aim at lower-dimensional continuous datasets, whereas they are difficult to deal with categorical datasets. A new subspace clustering algorithm -FPSUB is proposed. R stores the information of datasets with a FP-Tree framework, which transforms clustering clusters into finding the frequent patterns, and then utilizes them to cluster the objects. The experiment results demonstrate the feasibility and robusmess of this algorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2009年第10期2016-2021,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(70671016
60673066)资助
关键词
分类属性
子空间聚类
频繁模式
FP-树
categorical data
subspace clustering
frequent-pattern
FP-Tree