速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(re...速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(recursive density based clustering algorithm,简称RDBC),此算法可以智能地、动态地修改其密度参数.RDBC是基于DBSCAN的一种改进算法,其运算复杂度和DBSCAN相同.通过在Web文档上的聚类实验,结果表明,RDBC不但保留了DBSCAN高速度的优点,而且聚类效果大大优于DBSCAN.展开更多
The study of mathematical models on information retrieval is an important area in the Information Retrievalcommunity. Because of the uncertainty characteristic of IR,the probability model based on statistical probabil...The study of mathematical models on information retrieval is an important area in the Information Retrievalcommunity. Because of the uncertainty characteristic of IR,the probability model based on statistical probability is apromising model from recent to future. Those models can be classified into classical models and probability networkmodels. Several famous models are introduced and their shortcomings are pointed out in this paper. We also clarifythe relationship of these models and introduce a new models based on statistical language model curtly.展开更多
文摘速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(recursive density based clustering algorithm,简称RDBC),此算法可以智能地、动态地修改其密度参数.RDBC是基于DBSCAN的一种改进算法,其运算复杂度和DBSCAN相同.通过在Web文档上的聚类实验,结果表明,RDBC不但保留了DBSCAN高速度的优点,而且聚类效果大大优于DBSCAN.
文摘The study of mathematical models on information retrieval is an important area in the Information Retrievalcommunity. Because of the uncertainty characteristic of IR,the probability model based on statistical probability is apromising model from recent to future. Those models can be classified into classical models and probability networkmodels. Several famous models are introduced and their shortcomings are pointed out in this paper. We also clarifythe relationship of these models and introduce a new models based on statistical language model curtly.