Determining the number of chemical species is the first step in analyses of a chemical or biological system. A novel method is proposed to address this issue by taking advantage of frequency differences between chemic...Determining the number of chemical species is the first step in analyses of a chemical or biological system. A novel method is proposed to address this issue by taking advantage of frequency differences between chemical information and noise. Two interlaced submatrices were obtained by downsampling an original data spectra matrix in an interlacing manner. The two interlaced submatrices contained similar chemical information but different noise levels. The number of relevant chemical species was determined through pairwise comparisons of principal components obtained by principal component analysis of the two interlaced submatrices. The proposed method, referred to as SRISM, uses two self-referencing interlaced submatrices to make the determination. SRISM was able to selectively distinguish relevant chemical species from various types of interference factors such as signal overlapping, minor components and noise in simulated datasets. Its performance was further validated using experimental datasets that contained high-levels of instrument aberrations, signal overlapping and collinearity. SRISM was also applied to infrared spectral data obtained from atmospheric monitoring. It has great potential for overcoming various types of interference factor. This method is mathematically rigorous, computationally efficient, and readily automated.展开更多
Based on the problem of detecting the number of signals,this paper provides a systematic empirical investigation on model selection performances of several classical criteria and recently developed methods(including A...Based on the problem of detecting the number of signals,this paper provides a systematic empirical investigation on model selection performances of several classical criteria and recently developed methods(including Akaike’s information criterion(AIC),Schwarz’s Bayesian information criterion,Bozdogan’s consistent AIC,Hannan-Quinn information criterion,Minka’s(MK)principal component analysis(PCA)criterion,Kritchman&Nadler’s hypothesis tests(KN),Perry&Wolfe’s minimax rank estimation thresholding algorithm(MM),and Bayesian Ying-Yang(BYY)harmony learning),by varying signal-to-noise ratio(SNR)and training sample size N.A family of model selection indifference curves is defined by the contour lines of model selection accuracies,such that we can examine the joint effect of N and SNR rather than merely the effect of either of SNR and N with the other fixed as usually done in the literature.The indifference curves visually reveal that all methods demonstrate relative advantages obviously within a region of moderate N and SNR.Moreover,the importance of studying this region is also confirmed by an alternative reference criterion by maximizing the testing likelihood.It has been shown via extensive simulations that AIC and BYY harmony learning,as well as MK,KN,and MM,are relatively more robust than the others against decreasing N and SNR,and BYY is superior for a small sample size.展开更多
基金supported by the Program for Changjiang Scholars and Innovative Research Team in University and Fundamental Research Funds for the Central Universities(wk2060190040)
文摘Determining the number of chemical species is the first step in analyses of a chemical or biological system. A novel method is proposed to address this issue by taking advantage of frequency differences between chemical information and noise. Two interlaced submatrices were obtained by downsampling an original data spectra matrix in an interlacing manner. The two interlaced submatrices contained similar chemical information but different noise levels. The number of relevant chemical species was determined through pairwise comparisons of principal components obtained by principal component analysis of the two interlaced submatrices. The proposed method, referred to as SRISM, uses two self-referencing interlaced submatrices to make the determination. SRISM was able to selectively distinguish relevant chemical species from various types of interference factors such as signal overlapping, minor components and noise in simulated datasets. Its performance was further validated using experimental datasets that contained high-levels of instrument aberrations, signal overlapping and collinearity. SRISM was also applied to infrared spectral data obtained from atmospheric monitoring. It has great potential for overcoming various types of interference factor. This method is mathematically rigorous, computationally efficient, and readily automated.
基金The work described in this paper was fully supported by a grant from the Research Grant Council of the Hong Kong SAR(No.CUHK4177/07E).
文摘Based on the problem of detecting the number of signals,this paper provides a systematic empirical investigation on model selection performances of several classical criteria and recently developed methods(including Akaike’s information criterion(AIC),Schwarz’s Bayesian information criterion,Bozdogan’s consistent AIC,Hannan-Quinn information criterion,Minka’s(MK)principal component analysis(PCA)criterion,Kritchman&Nadler’s hypothesis tests(KN),Perry&Wolfe’s minimax rank estimation thresholding algorithm(MM),and Bayesian Ying-Yang(BYY)harmony learning),by varying signal-to-noise ratio(SNR)and training sample size N.A family of model selection indifference curves is defined by the contour lines of model selection accuracies,such that we can examine the joint effect of N and SNR rather than merely the effect of either of SNR and N with the other fixed as usually done in the literature.The indifference curves visually reveal that all methods demonstrate relative advantages obviously within a region of moderate N and SNR.Moreover,the importance of studying this region is also confirmed by an alternative reference criterion by maximizing the testing likelihood.It has been shown via extensive simulations that AIC and BYY harmony learning,as well as MK,KN,and MM,are relatively more robust than the others against decreasing N and SNR,and BYY is superior for a small sample size.