大多数以规则为基础的分类不能直接处理像血压这一类连续数据.离散化数据预处理可以将连续的数据转变成分类格式.现有的离散化算法没有考虑到数据集中连续变量的多模态分类密度,这可能会降低以规则为基础的分类器性能.提出一种新的基于...大多数以规则为基础的分类不能直接处理像血压这一类连续数据.离散化数据预处理可以将连续的数据转变成分类格式.现有的离散化算法没有考虑到数据集中连续变量的多模态分类密度,这可能会降低以规则为基础的分类器性能.提出一种新的基于高斯混合模型的离散化算法(Discretization Algorithm based on Gaussian Mixture Model,DAGMM),通过考虑连续变量的多峰分布以保留数据的原始模式.DAGMM算法的有效性通过4个公开可用的医疗数据集进行验证.实验结果表明,在产生的规则数和关联分类算法的分类准确度方面,DAGMM算法优于其它6个静态离散化算法.因此,在临床专家系统中运用此方法,有潜力提高以规则为基础的分类器的性能.展开更多
This paper presents a new inductive learning algorithm, HGR (Version 2.0), based on the newly-developed extension matrix theory. The basic idea is to partition the positive examples of a specific class in a given exam...This paper presents a new inductive learning algorithm, HGR (Version 2.0), based on the newly-developed extension matrix theory. The basic idea is to partition the positive examples of a specific class in a given example set into consistent groups, and each group corresponds to a consistent rule which covers all the examples in this group and none of the negative examples. Then a performance comparison of the HGR algorithm with other inductive algorithms, such as C4.5, OC1, HCV and SVM, is given in the paper. The authors not only selected 15 databases from the famous UCI machine learning repository, but also considered a real world problem. Experimental results show that their method achieves higher accuracy and fewer rules as compared with other algorithms.展开更多
文摘大多数以规则为基础的分类不能直接处理像血压这一类连续数据.离散化数据预处理可以将连续的数据转变成分类格式.现有的离散化算法没有考虑到数据集中连续变量的多模态分类密度,这可能会降低以规则为基础的分类器性能.提出一种新的基于高斯混合模型的离散化算法(Discretization Algorithm based on Gaussian Mixture Model,DAGMM),通过考虑连续变量的多峰分布以保留数据的原始模式.DAGMM算法的有效性通过4个公开可用的医疗数据集进行验证.实验结果表明,在产生的规则数和关联分类算法的分类准确度方面,DAGMM算法优于其它6个静态离散化算法.因此,在临床专家系统中运用此方法,有潜力提高以规则为基础的分类器的性能.
文摘This paper presents a new inductive learning algorithm, HGR (Version 2.0), based on the newly-developed extension matrix theory. The basic idea is to partition the positive examples of a specific class in a given example set into consistent groups, and each group corresponds to a consistent rule which covers all the examples in this group and none of the negative examples. Then a performance comparison of the HGR algorithm with other inductive algorithms, such as C4.5, OC1, HCV and SVM, is given in the paper. The authors not only selected 15 databases from the famous UCI machine learning repository, but also considered a real world problem. Experimental results show that their method achieves higher accuracy and fewer rules as compared with other algorithms.