摘要
基于欧氏距离的传统模糊划分聚类算法较适用于球型结构的聚类。将其应用于维度较高的文本聚类时,准确率和效率均有所下降。为解决这一问题,提出一种基于马氏距离的文本聚类算法。该算法可发现非球形结构的类簇,在不需要先验知识的情况下,仅通过数学迭代即可得到聚类结果。鉴于当前无纸化考试系统的广泛应用,将该算法应用于主观题的自动阅卷系统中。通过对多种主观题的仿真实验,表明了该算法与C均值和FCM算法相比,不仅能获得较高的准确率,算法收敛速度也较快。
Traditional clustering algorithm with fuzzy partition based on Euclidean distance fits more the clustering of spherical structural clusters.When applying it to the text clustering with higher dimensions,the accuracy and efficiency will all be decreased.Focus on solving this problem,we propose a Mahalanobis distance-based text clustering algorithm.It can detect the class clusters with non-spherical structure, and can obtain the clustering result just through the mathematical iteration without the need of priori knowledge.In view of the wide applica-tion of paperless examination system at present,we apply this algorithm to automatic paper marking system of subjective questions.Through the simulation experiments on a variety of subjective questions,it is demonstrate that the algorithm can achieve higher accuracy rate than the c-means and FCM algorithms,furthermore,its convergence rate is higher as well.
出处
《计算机应用与软件》
CSCD
2015年第4期80-82,86,共4页
Computer Applications and Software
基金
河南省教育厅自然科学研究计划项目(2011C510002)
关键词
聚类
文本聚类
模糊C均值
欧氏距离
马氏距离
自动阅卷
Clustering Text clustering Fuzzy c-means (FCM) Euclidean distance Mahalanobis distance Automatic paper marking