Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-...Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-temporal data by a novel method adopting the concepts of clustering and sequential pattern mining. The algorithms used logically split the trajectory span area into clusters and then apply the k-means algorithm over this clusters until the squared error minimizes. The new method applies the threshold to obtain active clusters and arranges them in descending order based on number of trajectories passing through. From these active clusters, inter cluster patterns are found by a sequential pattern mining technique. The process is repeated until all the active clusters are linked. The clusters thus linked in sequence are the frequent trajectories. A set of experiments conducted using real datasets shows that the proposed method is relatively five times better than the existing ones. A comparison is made with the results of other algorithms and their variation is analyzed by statistical methods. Further, tests of significance are conducted with ANOVA to find the efficient threshold value for the optimum plot of frequent trajectories. The results are analyzed and found to be superior than the existing ones. This approach may be of relevance in finding alternate paths in busy networks ( congestion control), finding the frequent paths of migratory birds, or even to predict the next level of pattern characteristics in case of time series data with minor alterations and finding the frequent path of balls in certain games.展开更多
研究了利用已发现的频繁序列模式对序列数据库进行再聚类再发现的问题,针对已有的K-均值聚类算法随机选取初始中心点而导致聚类结果不稳定性的缺点,提出了一种基于Huffman思想的初始中心点选取算法——K-SPAM(K-means algorithm of sequ...研究了利用已发现的频繁序列模式对序列数据库进行再聚类再发现的问题,针对已有的K-均值聚类算法随机选取初始中心点而导致聚类结果不稳定性的缺点,提出了一种基于Huffman思想的初始中心点选取算法——K-SPAM(K-means algorithm of sequence pattern mining based on the Huffman Method)算法.该算法能够在一定程度上减少陷入局部最优的可能,而且对序列间相似度的计算采用一种高效的"与"、"或"运算,可极大提高算法的执行效率.展开更多
基金the receipt of research supported by the TATA Consultancy Service's scholarship
文摘Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-temporal data by a novel method adopting the concepts of clustering and sequential pattern mining. The algorithms used logically split the trajectory span area into clusters and then apply the k-means algorithm over this clusters until the squared error minimizes. The new method applies the threshold to obtain active clusters and arranges them in descending order based on number of trajectories passing through. From these active clusters, inter cluster patterns are found by a sequential pattern mining technique. The process is repeated until all the active clusters are linked. The clusters thus linked in sequence are the frequent trajectories. A set of experiments conducted using real datasets shows that the proposed method is relatively five times better than the existing ones. A comparison is made with the results of other algorithms and their variation is analyzed by statistical methods. Further, tests of significance are conducted with ANOVA to find the efficient threshold value for the optimum plot of frequent trajectories. The results are analyzed and found to be superior than the existing ones. This approach may be of relevance in finding alternate paths in busy networks ( congestion control), finding the frequent paths of migratory birds, or even to predict the next level of pattern characteristics in case of time series data with minor alterations and finding the frequent path of balls in certain games.
文摘研究了利用已发现的频繁序列模式对序列数据库进行再聚类再发现的问题,针对已有的K-均值聚类算法随机选取初始中心点而导致聚类结果不稳定性的缺点,提出了一种基于Huffman思想的初始中心点选取算法——K-SPAM(K-means algorithm of sequence pattern mining based on the Huffman Method)算法.该算法能够在一定程度上减少陷入局部最优的可能,而且对序列间相似度的计算采用一种高效的"与"、"或"运算,可极大提高算法的执行效率.