针对传统K-均值聚类方法不能有效处理大规模数据聚类的问题,提出一种基于随机抽样的加速K-均值聚类(Kmeans Clustering Algorithm Based on Random Sampling,Kmeans_RS)方法,以提高传统K-均值聚类方法的效率。首先从大规模的聚类数据集...针对传统K-均值聚类方法不能有效处理大规模数据聚类的问题,提出一种基于随机抽样的加速K-均值聚类(Kmeans Clustering Algorithm Based on Random Sampling,Kmeans_RS)方法,以提高传统K-均值聚类方法的效率。首先从大规模的聚类数据集中进行随机抽样,得到规模较小的工作集,在工作集上进行传统K-均值聚类,得到聚类中心和半径,并得到抽样结果;然后通过衡量剩下的聚类样本与已得到的抽样结果之间的关系,对剩余的样本进行归类。该方法通过随机抽样大大地减小了参与K-均值聚类的问题规模,从而有效提高了聚类效率,可解决大规模数据的聚类问题。实验结果表明,Kmeans_RS方法在大规模数据集中在保持聚类效果的同时大幅度提高了聚类效率。展开更多
针对传统k-均值聚类方法不能有效处理海量数据聚类的问题,该文提出一种基于并行计算的加速k-均值聚类(K-means clustering based on parallel computing,Pk-means)方法。该方法首先将海量的聚类样本随机划分为多个独立同分布的聚类工作...针对传统k-均值聚类方法不能有效处理海量数据聚类的问题,该文提出一种基于并行计算的加速k-均值聚类(K-means clustering based on parallel computing,Pk-means)方法。该方法首先将海量的聚类样本随机划分为多个独立同分布的聚类工作集,并在每个工作集上并行进行传统k-均值聚类,并得到相应的聚类中心和半径,通过衡量不同子集聚类结果的关系,对每个工作集中聚类得到的子类进行合并,并对特殊数据进行二次归并以校正聚类结果,从而有效处理海量数据的聚类问题。实验结果表明,Pk_means方法在大规模数据集上在保持聚类效果的同时大幅度提高了聚类效率。展开更多
Fueled by the booming online games,there is an increasing demand for monitoring online games in various settings.One of the application scenarios is the monitor of computer games in school computer labs,for which an i...Fueled by the booming online games,there is an increasing demand for monitoring online games in various settings.One of the application scenarios is the monitor of computer games in school computer labs,for which an intelligent game recognition method is required.In this paper,a method to identify game processes in accordance with private working sets(i.e.,the amount of memory occupied by a process but cannot be shared among other processes)is introduced.Results of the W test showed that the memory sizes occupied by the legitimate processes(e.g.,the processes of common native windows applications)and game processes followed normal distribution.Using the T-test,a significant difference was identified between the legitimate processes and C/S-based computer games,in terms of the means and variances of their private working sets.Subsequently,we derived the density functions of the private working sets of the considered game processes and those of the legitimate processes.Given the private working set of a process and the derived probability density functions,the probability that the process is a legitimate process and the probability that the process is a game process can be determined.After comparing the two probabilities,we can easily determine whether the process is a game process or not.As revealed from the test results,the recognition accuracy of this method for C/S-based computer games was approximately 90%.展开更多
文摘针对传统K-均值聚类方法不能有效处理大规模数据聚类的问题,提出一种基于随机抽样的加速K-均值聚类(Kmeans Clustering Algorithm Based on Random Sampling,Kmeans_RS)方法,以提高传统K-均值聚类方法的效率。首先从大规模的聚类数据集中进行随机抽样,得到规模较小的工作集,在工作集上进行传统K-均值聚类,得到聚类中心和半径,并得到抽样结果;然后通过衡量剩下的聚类样本与已得到的抽样结果之间的关系,对剩余的样本进行归类。该方法通过随机抽样大大地减小了参与K-均值聚类的问题规模,从而有效提高了聚类效率,可解决大规模数据的聚类问题。实验结果表明,Kmeans_RS方法在大规模数据集中在保持聚类效果的同时大幅度提高了聚类效率。
文摘针对传统k-均值聚类方法不能有效处理海量数据聚类的问题,该文提出一种基于并行计算的加速k-均值聚类(K-means clustering based on parallel computing,Pk-means)方法。该方法首先将海量的聚类样本随机划分为多个独立同分布的聚类工作集,并在每个工作集上并行进行传统k-均值聚类,并得到相应的聚类中心和半径,通过衡量不同子集聚类结果的关系,对每个工作集中聚类得到的子类进行合并,并对特殊数据进行二次归并以校正聚类结果,从而有效处理海量数据的聚类问题。实验结果表明,Pk_means方法在大规模数据集上在保持聚类效果的同时大幅度提高了聚类效率。
基金This work is funded in part by the National Nature Science Foundation of China(File Nos.61872451 and 61872452)in part by the Science and Technology Development Fund,Macao SAR(File Nos.0098/2018/A3 and 0076/2019/A2).Li Feng is the corresponding author.
文摘Fueled by the booming online games,there is an increasing demand for monitoring online games in various settings.One of the application scenarios is the monitor of computer games in school computer labs,for which an intelligent game recognition method is required.In this paper,a method to identify game processes in accordance with private working sets(i.e.,the amount of memory occupied by a process but cannot be shared among other processes)is introduced.Results of the W test showed that the memory sizes occupied by the legitimate processes(e.g.,the processes of common native windows applications)and game processes followed normal distribution.Using the T-test,a significant difference was identified between the legitimate processes and C/S-based computer games,in terms of the means and variances of their private working sets.Subsequently,we derived the density functions of the private working sets of the considered game processes and those of the legitimate processes.Given the private working set of a process and the derived probability density functions,the probability that the process is a legitimate process and the probability that the process is a game process can be determined.After comparing the two probabilities,we can easily determine whether the process is a game process or not.As revealed from the test results,the recognition accuracy of this method for C/S-based computer games was approximately 90%.