摘要
为了对数字电视用户浏览行为进行有效分析,提出了应用于数字电视用户浏览行为的二分K-medoids算法。针对欧氏距离容易丢失数据信息、受异常值影响较大的缺点,利用云相似度对聚类算法进行了改进,减少异常数据等不确定因素对聚类结果的影响;针对K-means算法易受人为因素影响的迭代次数、大数据环境下聚类中心不再变化难以实现等停止准则问题,使用了一种综合类内、类间相似度和类簇个数三个因素的停止准则,在不过度消耗系统资源的同时又能满足实际的聚类需求。在实验中将基于云相似度的二分K-medoids(BKS)、基于云相似度的K-medoids(KS)算法在不同用户数量下进行测试,实验结果表明提出的算法提高了聚类准确性和算法的鲁棒性。
In order to analyze the user browsing behavior for connected TV,this paper proposed a bisecting K-medoids algorithm. This study made two major improvements,first,the cloud model replaced traditional Euclidean distance to measure the similarity of different behaviors under uncertainty. Second,it developed a bisecting structure algorithm to improve the stability of the clustering method and used a compromise between the maximization of inter-class similarities and minimization of the number of clusters as the stop criterion so as to obtain big size clusters with acceptable inter-class similarity. According to the numerical experiment,the proposed bisecting K-medoids method outperforms the classical K-medoids method in terms of both solution quality and algorithm efficiency.
作者
费红英
孙丹
Fei Hongying;Sun Dan(School of Management,Shanghai University,Shanghai 200444,China)
出处
《计算机应用研究》
CSCD
北大核心
2018年第12期3575-3578,共4页
Application Research of Computers