摘要
为解决K-modes算法初始化k簇时误差率较高和KNN(K最近邻算法)算法面对大样本数据量时分类不准确的现状,分析传统的K-modes算法从k簇的初始化到簇中心不再变化的全过程和KNN(K最近邻算法)算法在面对大样本数据时执行效率低下的问题,提出改进的K-modes-KNN算法。使用字符串核函数初始化k簇,字符串核函数迭代计算样本到簇中心的距离来动态改变簇中心,利用改进的K-modes算法将数据集进行分簇处理后,在每个子簇中建立KNN(K最近邻算法)分类模型。通过真实数据验证了所提算法在一定程度上优于同种分类算法。
To solve the problems that the K-modes algorithm initializes k clusters with high error rate and KNN (K nearest neighbor algorithm) algorithm is inaccurate when it faces large sample data volume,the problems that the traditional K-modes algorithm from the initialization of the k-cluster to the whole process of the cluster center is no longer changed and the KNN (K-nearest neighbor algorithm) algorithm is inefficient in the face of large sample data were analyzed.An improved K-modes-KNN algorithm was proposed.The string kernel function was used to initialize the k-cluster.The string kernel function was used to iteratively calculate the distance from the sample to the cluster center to dynamically change the cluster center,and the improved K-modes algorithm was used to cluster the data set after each sub-cluster.A KNN (K nearest neighbor algorithm) classification model was established.The real data of a research institute verified that the proposed algorithm is better than the same classification algorithm to some extent.
作者
王志华
刘绍廷
罗齐
WANG Zhi-hua;LIU Shao-ting;LUO Qi(School of Software and Applied Science and Technology,Zhengzhou University,Zhengzhou 450002,China)
出处
《计算机工程与设计》
北大核心
2019年第8期2228-2234,共7页
Computer Engineering and Design
基金
国家社会科学基金项目(15BTQ064)
河南省科技攻关基金项目(182102210007)