Support vector machines(SVMs) are supervised learning models traditionally employed for classification and regression analysis. In classification analysis, a set of training data is chosen, and each instance in the tr...Support vector machines(SVMs) are supervised learning models traditionally employed for classification and regression analysis. In classification analysis, a set of training data is chosen, and each instance in the training data is assigned a categorical class. An SVM then constructs a model based on a separating plane that maximizes the margin between different classes. Despite being one of the most popular classification models because of its strong performance empirically, understanding the knowledge captured in an SVM remains difficult. SVMs are typically applied in a black-box manner where the details of parameter tuning, training, and even the final constructed model are hidden from the users. This is natural since these details are often complex and difficult to understand without proper visualization tools. However, such an approach often brings about various problems including trial-and-error tuning and suspicious users who are forced to trust these models blindly.The contribution of this paper is a visual analysis approach for building SVMs in an open-box manner.Our goal is to improve an analyst's understanding of the SVM modeling process through a suite of visualization techniques that allow users to have full interactive visual control over the entire SVM training process.Our visual exploration tools have been developed to enable intuitive parameter tuning, training datamanipulation, and rule extraction as part of the SVM training process. To demonstrate the efficacy of our approach, we conduct a case study using a real-world robot control dataset.展开更多
Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two comp...Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.展开更多
基金supported in part by the National Basic Research Program of China (973 Program, No. 2015CB352503)the Major Program ofNational Natural Science Foundation of China (No. 61232012)the National Natural Science Foundation of China (No. 61422211)
文摘Support vector machines(SVMs) are supervised learning models traditionally employed for classification and regression analysis. In classification analysis, a set of training data is chosen, and each instance in the training data is assigned a categorical class. An SVM then constructs a model based on a separating plane that maximizes the margin between different classes. Despite being one of the most popular classification models because of its strong performance empirically, understanding the knowledge captured in an SVM remains difficult. SVMs are typically applied in a black-box manner where the details of parameter tuning, training, and even the final constructed model are hidden from the users. This is natural since these details are often complex and difficult to understand without proper visualization tools. However, such an approach often brings about various problems including trial-and-error tuning and suspicious users who are forced to trust these models blindly.The contribution of this paper is a visual analysis approach for building SVMs in an open-box manner.Our goal is to improve an analyst's understanding of the SVM modeling process through a suite of visualization techniques that allow users to have full interactive visual control over the entire SVM training process.Our visual exploration tools have been developed to enable intuitive parameter tuning, training datamanipulation, and rule extraction as part of the SVM training process. To demonstrate the efficacy of our approach, we conduct a case study using a real-world robot control dataset.
文摘Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.