The use of prediction error to optimize the number of splitting rules in a tree model does not control the probability of the emergence of splitting rules with a predictor that has no functional relationship with the ...The use of prediction error to optimize the number of splitting rules in a tree model does not control the probability of the emergence of splitting rules with a predictor that has no functional relationship with the target variable. To solve this problem, a new optimization method is proposed. Using this method, the probability that the predictors used in splitting rules in the optimized tree model have no functional relationships with the target variable is confined to less than 0.05. It is fairly convincing that the tree model given by the new method represents knowledge contained in the data.展开更多
Static analysis is an efficient approach for software assurance. It is indicated that its most effective usage is to perform analysis in an interactive way through the software development process, which has a high pe...Static analysis is an efficient approach for software assurance. It is indicated that its most effective usage is to perform analysis in an interactive way through the software development process, which has a high performance requirement. This paper concentrates on rule-based static analysis tools and proposes an optimized rule-checking algorithm. Our technique improves the performance of static analysis tools by filtering vulnerability rules in terms of characteristic objects before checking source files. Since a source file always contains vulnerabilities of a small part of rules rather than all, our approach may achieve better performance. To investigate our technique's feasibility and effectiveness, we implemented it in an open source static analysis tool called PMD and used it to conduct experiments. Experimental results show that our approach can obtain an average performance promotion of 28.7% compared with the original PMD. While our approach is effective and precise in detecting vulnerabilities, there is no side effect.展开更多
This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data...This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data. Although do-main expertise is often available, the query formulation task is tedious and laborious, and hence automation of query formulation is desirable. In an effort to automate the query formulation process, a machine learning algorithm is lev-eraged to discover knowledge in the form of if-then rules in the data from which the Bayesian belief network model under validation was also induced. The set of if-then rules are processed and filtered through domain expertise to identify a subset that consists of “interesting” and “significant” rules. The subset of interesting and significant rules is formulated into corresponding queries to be posed, for validation purposes, to the Bayesian belief network induced from the same dataset. The promise of the proposed methodology was assessed through an empirical study performed on a real-life dataset, the National Crime Victimization Survey, which has over 250 attributes and well over 200,000 data points. The study demonstrated that the proposed approach is feasible and provides automation, in part, of the query formulation process for validation of a complex probabilistic model, which culminates in substantial savings for the need for human expert involvement and investment.展开更多
The IEC 61850 standard stipulates the Substation Configuration Description Language (SCL) file as a means to define the substation equipment, IED function and also the communication mechanism for the substation area n...The IEC 61850 standard stipulates the Substation Configuration Description Language (SCL) file as a means to define the substation equipment, IED function and also the communication mechanism for the substation area network. The SCL is an eXtensible Markup Language (XML) based file which helps to describe the configuration of the substation Intelligent Electronic Devices (IED) including their associated functions. The SCL file is also configured to contain all IED capabilities including data model which is structured into objects for easy descriptive modeling. The effective functioning of this SCL file relies on appropriate validation techniques which check the data model for errors due to non-conformity to the IEC 61850 standard. In this research, we extend the conventional SCL validation algorithm to develop a more advanced validator which can validate the standard data model using the Unified Modeling Language (UML). By using the Rule-based SCL validation tool, we implement validation test cases for a more comprehensive understanding of the various validation functionalities. It can be observed from the algorithm and the various implemented test cases that the proposed validation tool can improve SCL information validation and also help automation engineers to comprehend the IEC 61850 substation system architecture.展开更多
文摘The use of prediction error to optimize the number of splitting rules in a tree model does not control the probability of the emergence of splitting rules with a predictor that has no functional relationship with the target variable. To solve this problem, a new optimization method is proposed. Using this method, the probability that the predictors used in splitting rules in the optimized tree model have no functional relationships with the target variable is confined to less than 0.05. It is fairly convincing that the tree model given by the new method represents knowledge contained in the data.
基金Project supported by the National High-Tech R&D Program(863)of China(No.2013AA12A202)the National Natural Science Foundation of China(Nos.61172173,41501505,and 61502205)+1 种基金the Natural Science Foundation of Hubei Province,China(No.2014CFB779)the Youths Science Foundation of Wuhan Institute of Technology(No.K201546)
文摘Static analysis is an efficient approach for software assurance. It is indicated that its most effective usage is to perform analysis in an interactive way through the software development process, which has a high performance requirement. This paper concentrates on rule-based static analysis tools and proposes an optimized rule-checking algorithm. Our technique improves the performance of static analysis tools by filtering vulnerability rules in terms of characteristic objects before checking source files. Since a source file always contains vulnerabilities of a small part of rules rather than all, our approach may achieve better performance. To investigate our technique's feasibility and effectiveness, we implemented it in an open source static analysis tool called PMD and used it to conduct experiments. Experimental results show that our approach can obtain an average performance promotion of 28.7% compared with the original PMD. While our approach is effective and precise in detecting vulnerabilities, there is no side effect.
文摘This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data. Although do-main expertise is often available, the query formulation task is tedious and laborious, and hence automation of query formulation is desirable. In an effort to automate the query formulation process, a machine learning algorithm is lev-eraged to discover knowledge in the form of if-then rules in the data from which the Bayesian belief network model under validation was also induced. The set of if-then rules are processed and filtered through domain expertise to identify a subset that consists of “interesting” and “significant” rules. The subset of interesting and significant rules is formulated into corresponding queries to be posed, for validation purposes, to the Bayesian belief network induced from the same dataset. The promise of the proposed methodology was assessed through an empirical study performed on a real-life dataset, the National Crime Victimization Survey, which has over 250 attributes and well over 200,000 data points. The study demonstrated that the proposed approach is feasible and provides automation, in part, of the query formulation process for validation of a complex probabilistic model, which culminates in substantial savings for the need for human expert involvement and investment.
文摘The IEC 61850 standard stipulates the Substation Configuration Description Language (SCL) file as a means to define the substation equipment, IED function and also the communication mechanism for the substation area network. The SCL is an eXtensible Markup Language (XML) based file which helps to describe the configuration of the substation Intelligent Electronic Devices (IED) including their associated functions. The SCL file is also configured to contain all IED capabilities including data model which is structured into objects for easy descriptive modeling. The effective functioning of this SCL file relies on appropriate validation techniques which check the data model for errors due to non-conformity to the IEC 61850 standard. In this research, we extend the conventional SCL validation algorithm to develop a more advanced validator which can validate the standard data model using the Unified Modeling Language (UML). By using the Rule-based SCL validation tool, we implement validation test cases for a more comprehensive understanding of the various validation functionalities. It can be observed from the algorithm and the various implemented test cases that the proposed validation tool can improve SCL information validation and also help automation engineers to comprehend the IEC 61850 substation system architecture.