摘要
为了筛选出散播垃圾语音的用户,建立了一种采用基于加权k-means和支持向量机的垃圾语言识别方法.该方法依据用户的历史通信活动建立通信行为网络模型,用加权的k-means算法对用户进行半监督聚类,然后从每个类中均匀选取部分用户数据作为训练集,采用支持向量机获得训练模型用以预测剩余用户数据.实验结果表明,该方法的用户分类更细化,并具备预测功能,有一定的机器学习能力,可用于大客户发现及关联客户发现和业务推荐等.
In order to screen the spreading spam over Internet telephony(SPIT) user, a recognition method was built based on weighted k-means and support vector machine (SYM). This method built a communica- tion network model according to historical communication activities of customers, and clustered semi-super- vised by weighted k-means algorithm. Then it equally selected part of customers data from each classified cluster as the training set and finally processed the rest data by using SYM method. Experimental data showed that this method could make the classification more refined and had forecast function and certain ability of machine learning. It can be used for the discovery of important customers, relevant customers and service recommendation, etc.
出处
《郑州轻工业学院学报(自然科学版)》
CAS
2014年第1期94-97,108,共5页
Journal of Zhengzhou University of Light Industry:Natural Science
关键词
数据挖掘
K-MEANS
支持向量机
垃圾语音
data mining
k-means
support vector machine (SVM)
spare over Internet telephony (SPIT)