As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-b...As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.展开更多
The target coverage is an important yet challenging problem in wireless sensor networks, especially when both coverage and energy constraints should be taken into account. Due to its nonlinear nature, previous studies...The target coverage is an important yet challenging problem in wireless sensor networks, especially when both coverage and energy constraints should be taken into account. Due to its nonlinear nature, previous studies of this problem have mainly focused on heuristic algorithms; the theoretical bound remains unknown. Moreover, the most popular method used in the previous literature, i.e., discretization of continuous time, has yet to be justified. This paper fills in these gaps with two theoretical results. The first one is a formal justification for the method. We use a simple example to illustrate the procedure of transforming a solution in time domain into a corresponding solution in the pattern domain with the same network lifetime and obtain two key observations. After that, we formally prove these two observations and use them as the basis to justify the method. The second result is an algorithm that can guarantee the network lifetime to be at least (1 - ε) of the optimal network lifetime, where ε can be made arbitrarily small depending on the required precision. The algorithm is based on the column generation (CG) theory, which decomposes the original problem into two sub-problems and iteratively solves them in a way that approaches the optimal solution. Moreover, we developed several constructive approaches to further optimize the algorithm. Numerical results verify the efficiency of our CG-based algorithm.展开更多
The long asymptomatic stage of HIV infection poses a great challenge in identifying recent HIV infections. This is a bottleneck for monitoring HIV epidemic trends and evaluating the effectiveness of national AIDS cont...The long asymptomatic stage of HIV infection poses a great challenge in identifying recent HIV infections. This is a bottleneck for monitoring HIV epidemic trends and evaluating the effectiveness of national AIDS control programs. Several serological methods were used to address this issue with some success. Because of high false-positive rates in patients with advanced infection or in ART treatment, UNAIDS still hesitates to recommend their use in routine surveillance. We developed a new pattern-based method for measuring intra-patient viral genetic diversity for determination of recent infections and estimation of population incidence. This method is verified by using several datasets (424 subtype B and 77 CRF07_BC samples) with clearly identified HIV-1 infection times. Pattern-based diversities of recent infections are significantly lower than that of chronic ones. With larger window periods varying from 200 to 350 days, a higher accuracy (90% 95%) not affected by advanced disease nor ART treatment could be obtained. The pattern-based genetic method is supplementary to the existing serology-based assays, both of which could be suitable for use in low and high epidemic regions, respectively.展开更多
Patterned-based time series segmentation (PTSS) is an important task for many time series data mining applications. In this paper, according to the characteristics of PTSS, a generalized model is proposed for PTSS. Fi...Patterned-based time series segmentation (PTSS) is an important task for many time series data mining applications. In this paper, according to the characteristics of PTSS, a generalized model is proposed for PTSS. First, a new inter-pretation for PTSS is given by comparing this problem with the prototype-based clustering (PC). Then, a novel model, called clustering-inverse model (CI-model), is presented. Finally, two algorithms are presented to implement this model. Our experimental results on artificial and real-world time series demonstrate that the proposed algorithms are quite effective.展开更多
The early diagnosis of pre-existing coronary disorders helps to control complications such as pulmonary hypertension,irregular cardiac functioning,and heart failure.Machine-based learning of heart sound is an efficien...The early diagnosis of pre-existing coronary disorders helps to control complications such as pulmonary hypertension,irregular cardiac functioning,and heart failure.Machine-based learning of heart sound is an efficient technology which can help minimize the workload of manual auscultation by automatically identifying irregular cardiac sounds.Phonocardiogram(PCG)and electrocardiogram(ECG)waveforms provide the much-needed information for the diagnosis of these diseases.In this work,the researchers have converted the heart sound signal into its corresponding repeating pattern-based spectrogram.PhysioNet 2016 and PASCAL 2011 have been taken as the benchmark datasets to perform experimentation.The existing models,viz.MobileNet,Xception,Visual Geometry Group(VGG16),ResNet,DenseNet,and InceptionV3 of Transfer Learning have been used for classifying the heart sound signals as normal and abnormal.For PhysioNet 2016,DenseNet has outperformed its peer models with an accuracy of 89.04 percent,whereas for PASCAL 2011,VGG has outperformed its peer approaches with an accuracy of 92.96 percent.展开更多
The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the...The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces. A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed. The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques. EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle. Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.展开更多
Recently, the National Typhoon Center (NTC) at the Korea Meteorological Administration launched a track-pattern-based model that predicts the horizontal distribution of tropical cyclone (TC) track density from Jun...Recently, the National Typhoon Center (NTC) at the Korea Meteorological Administration launched a track-pattern-based model that predicts the horizontal distribution of tropical cyclone (TC) track density from June to October. This model is the first approach to target seasonal TC track clusters covering the entire western North Pacific (WNP) basin, and may represent a milestone for seasonal TC forecasting, using a simple statistical method that can be applied at weather operation centers. In this note, we describe the procedure of the track-pattern-based model with brief technical background to provide practical information on the use and operation of the model. The model comprises three major steps. First, long-term data of WNP TC tracks reveal seven climatological track clusters. Second, the TC counts for each cluster are predicted using a hybrid statistical-dynamical method, using the seasonal prediction of large-scale environments. Third, the final forecast map of track density is constructed by merging the spatial probabilities of the seven clusters and applying necessary bias corrections. Although the model is developed to issue the seasonal forecast in mid-May, it can be applied to alternative dates and target seasons following the procedure described in this note. Work continues on establishing an automatic system for this model at the NTC.展开更多
基金This work is supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2006CB303103) and the National Natural Science Foundation of China under Grants No. 60573089, No. 60273079 and No. 60473074.
文摘As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.
基金partially supported by the National Natural Science Foundation of China under Grant Nos.60872009,6002016the Hi-Tech Research and Development 863 Program of China under Grant Nos.2007AA01Z428,2009AA01Z148the Post Doctoral Fellowship(ID No.P10356)for Scientific Research of Japan Society for Promotion of Science(JSPS)
文摘The target coverage is an important yet challenging problem in wireless sensor networks, especially when both coverage and energy constraints should be taken into account. Due to its nonlinear nature, previous studies of this problem have mainly focused on heuristic algorithms; the theoretical bound remains unknown. Moreover, the most popular method used in the previous literature, i.e., discretization of continuous time, has yet to be justified. This paper fills in these gaps with two theoretical results. The first one is a formal justification for the method. We use a simple example to illustrate the procedure of transforming a solution in time domain into a corresponding solution in the pattern domain with the same network lifetime and obtain two key observations. After that, we formally prove these two observations and use them as the basis to justify the method. The second result is an algorithm that can guarantee the network lifetime to be at least (1 - ε) of the optimal network lifetime, where ε can be made arbitrarily small depending on the required precision. The algorithm is based on the column generation (CG) theory, which decomposes the original problem into two sub-problems and iteratively solves them in a way that approaches the optimal solution. Moreover, we developed several constructive approaches to further optimize the algorithm. Numerical results verify the efficiency of our CG-based algorithm.
基金supported in part by the National Natural Science Foundation of China (Grant No. 30870475)Ministry of Science and Technology of China (Grant No. 2009CB918801)+1 种基金Ministry of Health of China (Grant No. 2008ZX10001-003)the International Development Research Center, Ottawa, Canada (Grant No. 104519-010)
文摘The long asymptomatic stage of HIV infection poses a great challenge in identifying recent HIV infections. This is a bottleneck for monitoring HIV epidemic trends and evaluating the effectiveness of national AIDS control programs. Several serological methods were used to address this issue with some success. Because of high false-positive rates in patients with advanced infection or in ART treatment, UNAIDS still hesitates to recommend their use in routine surveillance. We developed a new pattern-based method for measuring intra-patient viral genetic diversity for determination of recent infections and estimation of population incidence. This method is verified by using several datasets (424 subtype B and 77 CRF07_BC samples) with clearly identified HIV-1 infection times. Pattern-based diversities of recent infections are significantly lower than that of chronic ones. With larger window periods varying from 200 to 350 days, a higher accuracy (90% 95%) not affected by advanced disease nor ART treatment could be obtained. The pattern-based genetic method is supplementary to the existing serology-based assays, both of which could be suitable for use in low and high epidemic regions, respectively.
文摘Patterned-based time series segmentation (PTSS) is an important task for many time series data mining applications. In this paper, according to the characteristics of PTSS, a generalized model is proposed for PTSS. First, a new inter-pretation for PTSS is given by comparing this problem with the prototype-based clustering (PC). Then, a novel model, called clustering-inverse model (CI-model), is presented. Finally, two algorithms are presented to implement this model. Our experimental results on artificial and real-world time series demonstrate that the proposed algorithms are quite effective.
基金This work was supported by the National Research Foundation of Korea(NRF)Grant Funded by the Korea government(Ministry of Science and ICT)(No.2017R1E1A1A01077913)by the Institute of Information&Communications Technology Planning&Evaluation(IITP)funded by the Korea Government(MSIT)(Development of Smart Signage Technology for Automatic Classification of Untact Examination and Patient Status Based on AI)under Grant 2020-0-01907.
文摘The early diagnosis of pre-existing coronary disorders helps to control complications such as pulmonary hypertension,irregular cardiac functioning,and heart failure.Machine-based learning of heart sound is an efficient technology which can help minimize the workload of manual auscultation by automatically identifying irregular cardiac sounds.Phonocardiogram(PCG)and electrocardiogram(ECG)waveforms provide the much-needed information for the diagnosis of these diseases.In this work,the researchers have converted the heart sound signal into its corresponding repeating pattern-based spectrogram.PhysioNet 2016 and PASCAL 2011 have been taken as the benchmark datasets to perform experimentation.The existing models,viz.MobileNet,Xception,Visual Geometry Group(VGG16),ResNet,DenseNet,and InceptionV3 of Transfer Learning have been used for classifying the heart sound signals as normal and abnormal.For PhysioNet 2016,DenseNet has outperformed its peer models with an accuracy of 89.04 percent,whereas for PASCAL 2011,VGG has outperformed its peer approaches with an accuracy of 92.96 percent.
基金The National Natural Science Foundation of China(No60273075)
文摘The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces. A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed. The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques. EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle. Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.
基金funded by the Korea Meteorological Administration Research and Development Program under Grant CATER 2012-2040supported by the BK21 project of the Korean government
文摘Recently, the National Typhoon Center (NTC) at the Korea Meteorological Administration launched a track-pattern-based model that predicts the horizontal distribution of tropical cyclone (TC) track density from June to October. This model is the first approach to target seasonal TC track clusters covering the entire western North Pacific (WNP) basin, and may represent a milestone for seasonal TC forecasting, using a simple statistical method that can be applied at weather operation centers. In this note, we describe the procedure of the track-pattern-based model with brief technical background to provide practical information on the use and operation of the model. The model comprises three major steps. First, long-term data of WNP TC tracks reveal seven climatological track clusters. Second, the TC counts for each cluster are predicted using a hybrid statistical-dynamical method, using the seasonal prediction of large-scale environments. Third, the final forecast map of track density is constructed by merging the spatial probabilities of the seven clusters and applying necessary bias corrections. Although the model is developed to issue the seasonal forecast in mid-May, it can be applied to alternative dates and target seasons following the procedure described in this note. Work continues on establishing an automatic system for this model at the NTC.