Based on the principle of discernibility matrix, a kind of reduction algorithm with attribute order has been developed and its solution has been proved to be complete for reduct and unique for a given attribute order....Based on the principle of discernibility matrix, a kind of reduction algorithm with attribute order has been developed and its solution has been proved to be complete for reduct and unique for a given attribute order. Being called the reduct problem, this algorithm can be regarded as a mapping R = Reduct(S) from the attribute order space Theta to the reduct space R for an information system <U, C boolean OR D>, where U is the universe and C and D are two sets of condition and decision attributes respectively. This paper focuses on the reverse problem of reduct problem S = Order(R), i.e., for a given reduct R of an information system, we determine the solution of S = Order(R) in the space Theta. First, we need to prove that there is at least one attribute order S such that S = Order(R). Then, some decision rules are proposed, which can be used directly to decide whether the pair of attribute orders has the same reduct. The main method is based on the fact that an attribute order can be transformed into another one by moving the attribute for limited times. Thus, the decision of the pair of attribute orders can be altered to the decision of the sequence of neighboring pairs of attribute orders. Therefore, the basic theorem of neighboring pair of attribute orders is first proved, then, the decision theorem of attribute order is proved accordingly by the second attribute.展开更多
The discernibility matrix is one of the most important approaches to computing positive region, reduct, core and value reduct in rough sets. The subject of this paper is to develop a parallel approach of it, called "...The discernibility matrix is one of the most important approaches to computing positive region, reduct, core and value reduct in rough sets. The subject of this paper is to develop a parallel approach of it, called "tree expression". Its computational complexity for positive region and reduct is O(m^2 × n) instead of O(m × n^2) in discernibility-matrix-based approach, and is not over O(n^2) for other concepts in rough sets, where rn and n are the numbers of attributes and objects respectively in a given dataset (also called an "information system" in rough sets). This approach suits information systems with n ≥ m and containing over one million objects.展开更多
研究在不完备信息系统(incomplete information system,IIS)中的知识获取已经成为近期粒度计算研究的热点方向之一.为探索一种高效的知识获取方法,基于相容粒度计算的基本原理,针对不完备信息系统的特点,提出了一种完整的知识获取算法....研究在不完备信息系统(incomplete information system,IIS)中的知识获取已经成为近期粒度计算研究的热点方向之一.为探索一种高效的知识获取方法,基于相容粒度计算的基本原理,针对不完备信息系统的特点,提出了一种完整的知识获取算法.该算法包括不完备信息系统的属性约简算法和系统中对象的约简算法.其主要特点是在由完全覆盖构成的粒度世界中去研究知识的表示和获取问题,其基本粒就是最大相容类.对算法的性能进行了理论和实验分析,证明了算法的有效性和可行性.展开更多
Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features ma...Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. The conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised feature selection. However, in unsupervised learning, decision class labels are not provided. In this paper, we propose a new unsupervised quick reduct (QR) algorithm using rough set theory. The quality of the reduced data is measured by the classification performance and it is evaluated using WEKA classifier tool. The method is compared with existing supervised methods and the result demonstrates the efficiency of the proposed algorithm.展开更多
文摘Based on the principle of discernibility matrix, a kind of reduction algorithm with attribute order has been developed and its solution has been proved to be complete for reduct and unique for a given attribute order. Being called the reduct problem, this algorithm can be regarded as a mapping R = Reduct(S) from the attribute order space Theta to the reduct space R for an information system <U, C boolean OR D>, where U is the universe and C and D are two sets of condition and decision attributes respectively. This paper focuses on the reverse problem of reduct problem S = Order(R), i.e., for a given reduct R of an information system, we determine the solution of S = Order(R) in the space Theta. First, we need to prove that there is at least one attribute order S such that S = Order(R). Then, some decision rules are proposed, which can be used directly to decide whether the pair of attribute orders has the same reduct. The main method is based on the fact that an attribute order can be transformed into another one by moving the attribute for limited times. Thus, the decision of the pair of attribute orders can be altered to the decision of the sequence of neighboring pairs of attribute orders. Therefore, the basic theorem of neighboring pair of attribute orders is first proved, then, the decision theorem of attribute order is proved accordingly by the second attribute.
基金This work is partially supported by the National Grand Fundamental Research 973 Program of China under Grant No. 2004CB318103 and the National Nature Science Foundation of China under Grant No. 60573078.
文摘The discernibility matrix is one of the most important approaches to computing positive region, reduct, core and value reduct in rough sets. The subject of this paper is to develop a parallel approach of it, called "tree expression". Its computational complexity for positive region and reduct is O(m^2 × n) instead of O(m × n^2) in discernibility-matrix-based approach, and is not over O(n^2) for other concepts in rough sets, where rn and n are the numbers of attributes and objects respectively in a given dataset (also called an "information system" in rough sets). This approach suits information systems with n ≥ m and containing over one million objects.
文摘研究在不完备信息系统(incomplete information system,IIS)中的知识获取已经成为近期粒度计算研究的热点方向之一.为探索一种高效的知识获取方法,基于相容粒度计算的基本原理,针对不完备信息系统的特点,提出了一种完整的知识获取算法.该算法包括不完备信息系统的属性约简算法和系统中对象的约简算法.其主要特点是在由完全覆盖构成的粒度世界中去研究知识的表示和获取问题,其基本粒就是最大相容类.对算法的性能进行了理论和实验分析,证明了算法的有效性和可行性.
基金supported by the UGC, SERO, Hyderabad under FDP during XI plan periodthe UGC, New Delhi for financial assistance under major research project Grant No. F-34-105/2008
文摘Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. The conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised feature selection. However, in unsupervised learning, decision class labels are not provided. In this paper, we propose a new unsupervised quick reduct (QR) algorithm using rough set theory. The quality of the reduced data is measured by the classification performance and it is evaluated using WEKA classifier tool. The method is compared with existing supervised methods and the result demonstrates the efficiency of the proposed algorithm.