随机森林是机器学习领域中一种常用的分类算法,具有适用范围广且不易过拟合等优点.为了提高随机森林处理多分类问题的能力,提出一种基于空间变换的随机森林算法(space transformation based random forest algorithm,ST-RF).首先,给出...随机森林是机器学习领域中一种常用的分类算法,具有适用范围广且不易过拟合等优点.为了提高随机森林处理多分类问题的能力,提出一种基于空间变换的随机森林算法(space transformation based random forest algorithm,ST-RF).首先,给出一种考虑优先类别的线性判别分析方法(priority class based linear discriminant analysis,PCLDA),利用针对优先类别的投影矩阵对样本进行空间变换,以增强优先类别样本与其他类别样本的区分效果.进而,将PCLDA方法引入随机森林构建过程中,在为每棵决策树随机选择一个优先类别保证随机森林多样性的基础上,利用PCLDA方法创建侧重于不同优先类别的决策树,以提高单棵决策树的分类准确性,从而实现集成模型整体分类性能的有效提升.最后,在10个标准数据集上对ST-RF算法与7种典型随机森林算法进行比较分析,验证所提算法的有效性,并将基于PCLDA的空间变换策略应用到对比算法中,对改进前后的算法性能进行比较分析.实验结果表明:ST-RF算法在处理多分类问题方面具有明显优势,所提出的空间变换策略具有较强的普适性,可以显著提升原算法的分类性能.展开更多
Even after thorough testing, a few bugs still remain in a program with moderate complexity. These residual bugs are randomly distributed throughout the code. We have noticed that bugs in some parts of a program cause ...Even after thorough testing, a few bugs still remain in a program with moderate complexity. These residual bugs are randomly distributed throughout the code. We have noticed that bugs in some parts of a program cause frequent and severe failures compared to those in other parts. Then, it is necessary to take a decision about what to test more and what to test less within the testing budget. It is possible to prioritize the methods and classes of an object-oriented program according to their potential to cause failures. For this, we propose a program metric called influence metric to find the influence of a program element on the source code. First, we represent the source code into an intermediate graph called extended system dependence graph. Then, forward slicing is applied on a node of the graph to get the influence of that node. The influence metric for a method m in a program shows the number of statements of the program which directly or indirectly use the result produced by method m. We compute the influence metric for a class c based on the influence metric of all its methods. As influence metric is computed statically, it does not show the expected behavior of a class at run time. It is already known that faults in highly executed parts tend to more failures. Therefore, we have considered operational profile to find the average execution time of a class in a system. Then, classes are prioritized in the source code based on influence metric and average execution time. The priority of an element indicates the potential of the element to cause failures. Once all program elements have been prioritized, the testing effort can be apportioned so that the elements causing frequent failures will be tested thoroughly. We have conducted experiments for two well-known case studies -- Library Management System and Trading Automation System -- and successfully identified critical elements in the source code of each case study. We have also conducted experiments to compare our scheme with a related scheme. The experime展开更多
The requirement for guaranteed Quality of Service (QoS) have become very essential since there are numerous network base application is available such as video conferencing, data streaming, data transfer and many more...The requirement for guaranteed Quality of Service (QoS) have become very essential since there are numerous network base application is available such as video conferencing, data streaming, data transfer and many more. This has led to the multi-class switch architecture to cater for the needs for different QoS requirements. The introduction of threshold in multi-class switch to solve the starvation problems in loss sensitive class has increased the mean delay for delay sensitive class. In this research, a new scheduling architecture is introduced to improve mean delay in delay sensitive class when the threshold is active. The proposed architecture has been simulated under uniform and non-uniform traffic to show performance of the switch in terms of mean delay. The results show that the proposed architecture has achieved better performance as compared to Weighted Fair Queueing (WFQ) and Priority Queue (PQ).展开更多
The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same ...The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same period, workflow management has experienced significant development, but has become more focused on the industry. However, it seems to us that document workflows have not had the same interest for the scientific community. But nowadays, the emergence and supremacy of the Internet in electronic exchanges are leading to a massive dematerialization of documents;which requires a conceptual reconsideration of the organizational framework for the processing of said documents in both public and private administrations. This problem seems open to us and deserves the interest of the scientific community. Indeed, EDM has mainly focused on the storage (referencing) and circulation of documents (traceability). It paid little attention to the overall behavior of the system in processing documents. The purpose of our researches is to model document processing systems. In the previous works, we proposed a general model and its specialization in the case of small documents (any document processed by a single person at a time during its processing life cycle), which represent 70% of documents processed by administrations, according to our study. In this contribution, we extend the model for processing small documents to the case where they are managed in a system comprising document classes organized in subclasses;which is the case for most administrations. We have thus observed that this model is a Markovian <i>M<sup>L×K</sup>/M<sup>L×K</sup>/</i>1 queues network. We have analyzed the constraints of this model and deduced certain characteristics and metrics. <span style="white-space:normal;"><i></i></span><i>In fine<span style="white-space:normal;"></span></i>, the ultimate objective of our work is to design a document workflow management system, integrating a component of global behavior prediction.展开更多
文摘随机森林是机器学习领域中一种常用的分类算法,具有适用范围广且不易过拟合等优点.为了提高随机森林处理多分类问题的能力,提出一种基于空间变换的随机森林算法(space transformation based random forest algorithm,ST-RF).首先,给出一种考虑优先类别的线性判别分析方法(priority class based linear discriminant analysis,PCLDA),利用针对优先类别的投影矩阵对样本进行空间变换,以增强优先类别样本与其他类别样本的区分效果.进而,将PCLDA方法引入随机森林构建过程中,在为每棵决策树随机选择一个优先类别保证随机森林多样性的基础上,利用PCLDA方法创建侧重于不同优先类别的决策树,以提高单棵决策树的分类准确性,从而实现集成模型整体分类性能的有效提升.最后,在10个标准数据集上对ST-RF算法与7种典型随机森林算法进行比较分析,验证所提算法的有效性,并将基于PCLDA的空间变换策略应用到对比算法中,对改进前后的算法性能进行比较分析.实验结果表明:ST-RF算法在处理多分类问题方面具有明显优势,所提出的空间变换策略具有较强的普适性,可以显著提升原算法的分类性能.
基金supported by grants from the Department of Science and TechnologyGovernment of India under SERC Project
文摘Even after thorough testing, a few bugs still remain in a program with moderate complexity. These residual bugs are randomly distributed throughout the code. We have noticed that bugs in some parts of a program cause frequent and severe failures compared to those in other parts. Then, it is necessary to take a decision about what to test more and what to test less within the testing budget. It is possible to prioritize the methods and classes of an object-oriented program according to their potential to cause failures. For this, we propose a program metric called influence metric to find the influence of a program element on the source code. First, we represent the source code into an intermediate graph called extended system dependence graph. Then, forward slicing is applied on a node of the graph to get the influence of that node. The influence metric for a method m in a program shows the number of statements of the program which directly or indirectly use the result produced by method m. We compute the influence metric for a class c based on the influence metric of all its methods. As influence metric is computed statically, it does not show the expected behavior of a class at run time. It is already known that faults in highly executed parts tend to more failures. Therefore, we have considered operational profile to find the average execution time of a class in a system. Then, classes are prioritized in the source code based on influence metric and average execution time. The priority of an element indicates the potential of the element to cause failures. Once all program elements have been prioritized, the testing effort can be apportioned so that the elements causing frequent failures will be tested thoroughly. We have conducted experiments for two well-known case studies -- Library Management System and Trading Automation System -- and successfully identified critical elements in the source code of each case study. We have also conducted experiments to compare our scheme with a related scheme. The experime
文摘The requirement for guaranteed Quality of Service (QoS) have become very essential since there are numerous network base application is available such as video conferencing, data streaming, data transfer and many more. This has led to the multi-class switch architecture to cater for the needs for different QoS requirements. The introduction of threshold in multi-class switch to solve the starvation problems in loss sensitive class has increased the mean delay for delay sensitive class. In this research, a new scheduling architecture is introduced to improve mean delay in delay sensitive class when the threshold is active. The proposed architecture has been simulated under uniform and non-uniform traffic to show performance of the switch in terms of mean delay. The results show that the proposed architecture has achieved better performance as compared to Weighted Fair Queueing (WFQ) and Priority Queue (PQ).
文摘The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same period, workflow management has experienced significant development, but has become more focused on the industry. However, it seems to us that document workflows have not had the same interest for the scientific community. But nowadays, the emergence and supremacy of the Internet in electronic exchanges are leading to a massive dematerialization of documents;which requires a conceptual reconsideration of the organizational framework for the processing of said documents in both public and private administrations. This problem seems open to us and deserves the interest of the scientific community. Indeed, EDM has mainly focused on the storage (referencing) and circulation of documents (traceability). It paid little attention to the overall behavior of the system in processing documents. The purpose of our researches is to model document processing systems. In the previous works, we proposed a general model and its specialization in the case of small documents (any document processed by a single person at a time during its processing life cycle), which represent 70% of documents processed by administrations, according to our study. In this contribution, we extend the model for processing small documents to the case where they are managed in a system comprising document classes organized in subclasses;which is the case for most administrations. We have thus observed that this model is a Markovian <i>M<sup>L×K</sup>/M<sup>L×K</sup>/</i>1 queues network. We have analyzed the constraints of this model and deduced certain characteristics and metrics. <span style="white-space:normal;"><i></i></span><i>In fine<span style="white-space:normal;"></span></i>, the ultimate objective of our work is to design a document workflow management system, integrating a component of global behavior prediction.