Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures...Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures search engine’s efficiency. In this paper, we propose a technique for detecting the similarity in the structure of XML documents;in the following, we would cluster this document with Delaunay Triangulation method. The technique is based on the idea of representing the structure of an XML document as a time series in which each occurrence of a tag corresponds to a given impulse. So we could use Discrete Fourier Transform as a simple method to analyze these signals in frequency domain and make similarity matrices through a kind of distance measurement, in order to group them into clusters. We exploited Delaunay Triangulation as a clustering method to cluster the d-dimension points of XML documents. The results show a significant efficiency and accuracy in front of common methods.展开更多
The visual assessment of tendency (VAT) technique, for visually finding the number of meaningful clusters in data, developed by J. C. Bezdek, R. J. Hathaway and J. M. Huband, is very useful, but there is room for impr...The visual assessment of tendency (VAT) technique, for visually finding the number of meaningful clusters in data, developed by J. C. Bezdek, R. J. Hathaway and J. M. Huband, is very useful, but there is room for improvements. Instead of displaying the ordered dissimilarity matrix (ODM) as a 2D gray-level image for human interpretation as is done by VAT, we trace the changes in dissimilarities along the diagonal of the ODM. This changes the 2D data structure (matrices) into 1D arrays, displayed as what we call the tendency curves, which enables one to concentrate only on one variable, namely the height. One of these curves, called the d-curve, clearly shows the existence of cluster structure as patterns in peaks and valleys, which can be caught not only by human eyes but also by the computer. Our numerical experiments showed that the computer can catch cluster structures from the d-curve even in some cases where the human eyes see no structure from the visual outputs of VAT. And success on all numerical experiments was obtained us- ing the same (fixed) set of program parameter values.展开更多
Clustering is an unsupervised learning technology,and it groups information(observations or datasets)according to similarity measures.Developing clustering algorithms is a hot topic in recent years,and this area devel...Clustering is an unsupervised learning technology,and it groups information(observations or datasets)according to similarity measures.Developing clustering algorithms is a hot topic in recent years,and this area develops rapidly with the increasing complexity of data and the volume of datasets.In this paper,the concept of clustering is introduced,and the clustering technologies are analyzed from traditional and modern perspectives.First,this paper summarizes the principles,advantages,and disadvantages of 20 traditional clustering algorithms and 4 modern algorithms.Then,the core elements of clustering are presented,such as similarity measures and evaluation index.Considering that data processing is often applied in vehicle engineering,finally,some specific applications of clustering algorithms in vehicles are listed and the future development of clustering in the era of big data is highlighted.The purpose of this review is to make a comprehensive survey that helps readers learn various clustering algorithms and choose the appropriate methods to use,especially in vehicles.展开更多
Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets...Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.展开更多
In this paper,we focus on trajectories at intersections regulated by various regulation types such as traffic lights,priority/yield signs,and right-of-way rules.We test some methods to detect and recognize movement pa...In this paper,we focus on trajectories at intersections regulated by various regulation types such as traffic lights,priority/yield signs,and right-of-way rules.We test some methods to detect and recognize movement patterns from GPS trajectories,in terms of their geometrical and spatio-temporal components.In particular,we first find out the main paths that vehicles follow at such locations.We then investigate the way that vehicles follow these geometric paths(how do they move along them).For these scopes,machine learning methods are used and the performance of some known methods for trajectory similarity measurement(DTW,Hausdorff,and Fréchet distance)and clustering(Affinity propagation and Agglomerative clustering)are compared based on clustering accuracy.Afterward,the movement behavior observed at six different intersections is analyzed by identifying certain movement patterns in the speed-and time-profiles of trajectories.We show that depending on the regulation type,different movement patterns are observed at intersections.This finding can be useful for intersection categorization according to traffic regulations.The practicality of automatically identifying traffic rules from GPS tracks is the enrichment of modern maps with additional navigation-related information(traffic signs,traffic lights,etc.).展开更多
文摘Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures search engine’s efficiency. In this paper, we propose a technique for detecting the similarity in the structure of XML documents;in the following, we would cluster this document with Delaunay Triangulation method. The technique is based on the idea of representing the structure of an XML document as a time series in which each occurrence of a tag corresponds to a given impulse. So we could use Discrete Fourier Transform as a simple method to analyze these signals in frequency domain and make similarity matrices through a kind of distance measurement, in order to group them into clusters. We exploited Delaunay Triangulation as a clustering method to cluster the d-dimension points of XML documents. The results show a significant efficiency and accuracy in front of common methods.
文摘The visual assessment of tendency (VAT) technique, for visually finding the number of meaningful clusters in data, developed by J. C. Bezdek, R. J. Hathaway and J. M. Huband, is very useful, but there is room for improvements. Instead of displaying the ordered dissimilarity matrix (ODM) as a 2D gray-level image for human interpretation as is done by VAT, we trace the changes in dissimilarities along the diagonal of the ODM. This changes the 2D data structure (matrices) into 1D arrays, displayed as what we call the tendency curves, which enables one to concentrate only on one variable, namely the height. One of these curves, called the d-curve, clearly shows the existence of cluster structure as patterns in peaks and valleys, which can be caught not only by human eyes but also by the computer. Our numerical experiments showed that the computer can catch cluster structures from the d-curve even in some cases where the human eyes see no structure from the visual outputs of VAT. And success on all numerical experiments was obtained us- ing the same (fixed) set of program parameter values.
基金supported in part by the founding of the State Key Laboratory of Industrial Control Technology,Zhejiang University(ICT2021B19)the Technological Innovation and Application Demonstration in Chongqing(Major Themes of Industry:cstc2019jscx-zdztzxX0033,cstc2019jscx-fxyd0158).
文摘Clustering is an unsupervised learning technology,and it groups information(observations or datasets)according to similarity measures.Developing clustering algorithms is a hot topic in recent years,and this area develops rapidly with the increasing complexity of data and the volume of datasets.In this paper,the concept of clustering is introduced,and the clustering technologies are analyzed from traditional and modern perspectives.First,this paper summarizes the principles,advantages,and disadvantages of 20 traditional clustering algorithms and 4 modern algorithms.Then,the core elements of clustering are presented,such as similarity measures and evaluation index.Considering that data processing is often applied in vehicle engineering,finally,some specific applications of clustering algorithms in vehicles are listed and the future development of clustering in the era of big data is highlighted.The purpose of this review is to make a comprehensive survey that helps readers learn various clustering algorithms and choose the appropriate methods to use,especially in vehicles.
文摘Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.
基金This work is supported by the German Research Foundation(Deutsche Forschungsgemeinschaft(DFG))with grant number 227198829/GRK1931The authors gratefully acknowledge the financial support from DFG.
文摘In this paper,we focus on trajectories at intersections regulated by various regulation types such as traffic lights,priority/yield signs,and right-of-way rules.We test some methods to detect and recognize movement patterns from GPS trajectories,in terms of their geometrical and spatio-temporal components.In particular,we first find out the main paths that vehicles follow at such locations.We then investigate the way that vehicles follow these geometric paths(how do they move along them).For these scopes,machine learning methods are used and the performance of some known methods for trajectory similarity measurement(DTW,Hausdorff,and Fréchet distance)and clustering(Affinity propagation and Agglomerative clustering)are compared based on clustering accuracy.Afterward,the movement behavior observed at six different intersections is analyzed by identifying certain movement patterns in the speed-and time-profiles of trajectories.We show that depending on the regulation type,different movement patterns are observed at intersections.This finding can be useful for intersection categorization according to traffic regulations.The practicality of automatically identifying traffic rules from GPS tracks is the enrichment of modern maps with additional navigation-related information(traffic signs,traffic lights,etc.).