In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of lear...In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of learner. We conduct a comparative study for various ensemble methods (Bagging, Boosting, and Voting) with simple classifiers in perspective of classification. We also evaluate the effectiveness of these classifiers (both ensemble and simple) on five different data sets of varying session length. Presently the results of web server log analyzers are not very much reliable because the input log files are highly inflated by sessions of automated web traverse software’s, known as web robots. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract any actionable and usable knowledge about browsing behavior of actual visitors. So web robots sessions need accurate and fast detection from web server log repositories to extract knowledge about genuine visitors and to produce correct results of log analyzers.展开更多
Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audie...Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audiences and improve the likelihood of response. In this work we have investigated two data mining techniques: the Naive Bayes and the C4.5 decision tree algorithms. The goal of this work is to predict whether a client will subscribe a term deposit. We also made comparative study of performance of those two algorithms. Publicly available UCI data is used to train and test the performance of the algorithms. Besides, we extract actionable knowledge from decision tree that focuses to take interesting and important decision in business area.展开更多
A main focus of machine learning research has been improving the generalization accuracy and efficiency of prediction models. However, what emerges as missing in many applications is actionability, i.e., the ability t...A main focus of machine learning research has been improving the generalization accuracy and efficiency of prediction models. However, what emerges as missing in many applications is actionability, i.e., the ability to turn prediction results into actions. Existing effort in deriving such actionable knowledge is few and limited to simple action models while in many real applications those models are often more complex and harder to extract an optimal solution. In this paper, we propose a novel approach that achieves actionability by combining learning with planning, two core areas of AI. In particular, we propose a framework to extract actionable knowledge from random forest, one of the most widely used and best off-the-shelf classifiers. We formulate the actionability problem to a sub-optimal action planning (SOAP) problem, which is to find a plan to alter certain features of a given input so that the random forest would yield a desirable output, while minimizing the total costs of actions. Technically, the SOAP problem is formulated in the SAS+ planning formalism, and solved using a Max-SAT based ap- proach. Our experimental results demonstrate the effectiveness and efficiency of the proposed approach on a personal credit dataset and other benchmarks. Our work represents a new application of automated planning on an emerging and challenging machine learning paradigm.展开更多
Although amazing progress has been made in ma- chine learning to achieve high generalization accuracy and ef- ficiency, there is still very limited work on deriving meaning- ful decision-making actions from the result...Although amazing progress has been made in ma- chine learning to achieve high generalization accuracy and ef- ficiency, there is still very limited work on deriving meaning- ful decision-making actions from the resulting models. How- ever, in many applications such as advertisement, recommen- dation systems, social networks, customer relationship man- agement, and clinical prediction, the users need not only ac- curate prediction, but also suggestions on actions to achieve a desirable goal (e.g., high ads hit rates) or avert an unde- sirable predicted result (e.g., clinical deterioration). Existing works for extracting such actionability are few and limited to simple models such as a decision tree. The dilemma is that those models with high accuracy are often more complex and harder to extract actionability from. In this paper, we propose an effective method to extract ac- tionable knowledge from additive tree models (ATMs), one of the most widely used and best off-the-shelf classifiers. We rigorously formulate the optimal actionable planning (OAP) problem for a given ATM, which is to extract an action- able plan for a given input so that it can achieve a desirable output while maximizing the net profit. Based on a state space graph formulation, we first propose an optimal heuris- tic search method which intends to find an optimal solution. Then, we also present a sub-optimal heuristic search with an admissible and consistent heuristic function which can re- markably improve the efficiency of the algorithm. Our exper- imental results demonstrate the effectiveness and efficiency of the proposed algorithms on several real datasets in the application domain of personal credit and banking.展开更多
文摘In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of learner. We conduct a comparative study for various ensemble methods (Bagging, Boosting, and Voting) with simple classifiers in perspective of classification. We also evaluate the effectiveness of these classifiers (both ensemble and simple) on five different data sets of varying session length. Presently the results of web server log analyzers are not very much reliable because the input log files are highly inflated by sessions of automated web traverse software’s, known as web robots. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract any actionable and usable knowledge about browsing behavior of actual visitors. So web robots sessions need accurate and fast detection from web server log repositories to extract knowledge about genuine visitors and to produce correct results of log analyzers.
文摘Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audiences and improve the likelihood of response. In this work we have investigated two data mining techniques: the Naive Bayes and the C4.5 decision tree algorithms. The goal of this work is to predict whether a client will subscribe a term deposit. We also made comparative study of performance of those two algorithms. Publicly available UCI data is used to train and test the performance of the algorithms. Besides, we extract actionable knowledge from decision tree that focuses to take interesting and important decision in business area.
基金This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61502412, 61379066, and 61402395), Natural Science Foundation of the Jiangsu Province (BK20150459, BK20151314, and BK20140492), Natural Science Foundation of the Jiangsu Higher Education Institutions (15KJB520036), United States NSF grants (IIS-0534699, IIS-0713109, CNS-1017701), Microsoft Research New Faculty Fellowship, and the Research Innovation Program for Graduate Student in Jiangsu Province (KYLX16 1390).
文摘A main focus of machine learning research has been improving the generalization accuracy and efficiency of prediction models. However, what emerges as missing in many applications is actionability, i.e., the ability to turn prediction results into actions. Existing effort in deriving such actionable knowledge is few and limited to simple action models while in many real applications those models are often more complex and harder to extract an optimal solution. In this paper, we propose a novel approach that achieves actionability by combining learning with planning, two core areas of AI. In particular, we propose a framework to extract actionable knowledge from random forest, one of the most widely used and best off-the-shelf classifiers. We formulate the actionability problem to a sub-optimal action planning (SOAP) problem, which is to find a plan to alter certain features of a given input so that the random forest would yield a desirable output, while minimizing the total costs of actions. Technically, the SOAP problem is formulated in the SAS+ planning formalism, and solved using a Max-SAT based ap- proach. Our experimental results demonstrate the effectiveness and efficiency of the proposed approach on a personal credit dataset and other benchmarks. Our work represents a new application of automated planning on an emerging and challenging machine learning paradigm.
基金This work was supported in part by China Postdoctoral Science Foundation (2013M531527), the Fundamental Research Funds for the Central Universities (0110000037), the National Natural Science Foun- dation of China (Grant Nos. 61502412, 61033009, and 61175057), Natural Science Foundation of the Jiangsu Province (BK20150459), Natural Science Foundation of the Jiangsu Higher Education Institutions (15KJB520036), National Science Foundation, United States (IIS-0534699, IIS-0713109, CNS-1017701), and a Microsoft Research New Faculty Fellowship.
文摘Although amazing progress has been made in ma- chine learning to achieve high generalization accuracy and ef- ficiency, there is still very limited work on deriving meaning- ful decision-making actions from the resulting models. How- ever, in many applications such as advertisement, recommen- dation systems, social networks, customer relationship man- agement, and clinical prediction, the users need not only ac- curate prediction, but also suggestions on actions to achieve a desirable goal (e.g., high ads hit rates) or avert an unde- sirable predicted result (e.g., clinical deterioration). Existing works for extracting such actionability are few and limited to simple models such as a decision tree. The dilemma is that those models with high accuracy are often more complex and harder to extract actionability from. In this paper, we propose an effective method to extract ac- tionable knowledge from additive tree models (ATMs), one of the most widely used and best off-the-shelf classifiers. We rigorously formulate the optimal actionable planning (OAP) problem for a given ATM, which is to extract an action- able plan for a given input so that it can achieve a desirable output while maximizing the net profit. Based on a state space graph formulation, we first propose an optimal heuris- tic search method which intends to find an optimal solution. Then, we also present a sub-optimal heuristic search with an admissible and consistent heuristic function which can re- markably improve the efficiency of the algorithm. Our exper- imental results demonstrate the effectiveness and efficiency of the proposed algorithms on several real datasets in the application domain of personal credit and banking.