Normalized Difference Vegetation Index (NDVI) is a very useful feature for differentiating vegetation and non-vegetation in remote sensed imagery. In the light of the function of NDVI and the spatial patterns of the...Normalized Difference Vegetation Index (NDVI) is a very useful feature for differentiating vegetation and non-vegetation in remote sensed imagery. In the light of the function of NDVI and the spatial patterns of the vegetation landscapes, we proposed the lacunarity texture derived from NDVI to characterize the spatial patterns of vegetation landscapes concerning the "gappiness" or "emptiness" characteristics. The NDVI-based lacunarity texture was incorporated into object-oriented classification for improving the identification of vegetation categories, especially Torreya which was the targeted tree species in the present research. A three-level hierarchical network of image objects was defined and the proposed texture was integrated as potential sources of information in the rules base. A knowledge base of rules created by classifier C5.0 indicated that the texture could potentially be applied in object-oriented classification. It was found that the addition of such texture improved the identification of every vegetation category. The results demonstrated that the texture could characterize the spatial patterns of vegetation structures, which could be a promising approach for vegetation identification.展开更多
There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms ...There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.展开更多
The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore other...The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback(ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.展开更多
基金supported by the National Natural Science Foundation of China (30671212)
文摘Normalized Difference Vegetation Index (NDVI) is a very useful feature for differentiating vegetation and non-vegetation in remote sensed imagery. In the light of the function of NDVI and the spatial patterns of the vegetation landscapes, we proposed the lacunarity texture derived from NDVI to characterize the spatial patterns of vegetation landscapes concerning the "gappiness" or "emptiness" characteristics. The NDVI-based lacunarity texture was incorporated into object-oriented classification for improving the identification of vegetation categories, especially Torreya which was the targeted tree species in the present research. A three-level hierarchical network of image objects was defined and the proposed texture was integrated as potential sources of information in the rules base. A knowledge base of rules created by classifier C5.0 indicated that the texture could potentially be applied in object-oriented classification. It was found that the addition of such texture improved the identification of every vegetation category. The results demonstrated that the texture could characterize the spatial patterns of vegetation structures, which could be a promising approach for vegetation identification.
基金Supported by the National Natural Science Foundation of China (No.61105057)the Ph.D Foundation of Jiangsu University of Science and Technology (Nos.35301002 and 35211104)
文摘There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.
基金supported by the National Natural Science Foundation of China(61202082)the Fundamental Research Funds for the Central Universities(BUPT2012RC0218,BUPT2012RC0219)
文摘The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback(ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.