摘要
对于大规模数据量的语音识别问题,支持向量机的训练成为一个难题。预选取支持向量是解决这一难题的方法之一。提出一种新的支持向量预选取算法.一方面对原数据集的每类数据分别进行核模糊C均值聚类,将所有的聚类中心作为每类数据的表征集;另一方面根据支持向量的几何分布含义并借鉴支持向量机的多类分类算法中一对一方法的思路提取原数据集的边界样本作为预选取支持向量进行训练和预测,并将该算法应用于嵌入式语音识别系统中,实验结果表明:该方法提高了语音识别系统的训练效率,降低了计算代价,同时保持了较高的识别率。
Support vector machine(SVM) training is difficult for large-scale data set of speech recognition. A new SVM pre-extracting algorithm was proposed. On the one hand, kernel Fuzzy C-Means clustering was separately performed on each class of original data set. All the cluster centers were as a representative set of each class. On the other hand, according to the geometric distribution of support vectors and combined with the classification strategy of one-versus-one for SVM multi-class classification algorithm, boundary samples were extracted as support vectors for SVM to training and prediction. The algorithm was applied to embedded speech recognition system. Experiments indicate that this method improves the efficiency of training but also maintains the high recognition rate.
出处
《系统仿真学报》
CAS
CSCD
北大核心
2015年第11期2714-2721,共8页
Journal of System Simulation
基金
Shanxi Scholarship Council of China(2009-28)
Natural Science Foundation of Shanxi Province(2009011022-2)
关键词
支持向量
多类分类
核模糊C聚类
样本预选取算法
语音识别系统仿真
support vector
multi-class classification
kernel fuzzy C-Means clustering
sample pre-extracting
speech recognition system simulation