Determining the number of chemical species is the first step in analyses of a chemical or biological system. A novel method is proposed to address this issue by taking advantage of frequency differences between chemic...Determining the number of chemical species is the first step in analyses of a chemical or biological system. A novel method is proposed to address this issue by taking advantage of frequency differences between chemical information and noise. Two interlaced submatrices were obtained by downsampling an original data spectra matrix in an interlacing manner. The two interlaced submatrices contained similar chemical information but different noise levels. The number of relevant chemical species was determined through pairwise comparisons of principal components obtained by principal component analysis of the two interlaced submatrices. The proposed method, referred to as SRISM, uses two self-referencing interlaced submatrices to make the determination. SRISM was able to selectively distinguish relevant chemical species from various types of interference factors such as signal overlapping, minor components and noise in simulated datasets. Its performance was further validated using experimental datasets that contained high-levels of instrument aberrations, signal overlapping and collinearity. SRISM was also applied to infrared spectral data obtained from atmospheric monitoring. It has great potential for overcoming various types of interference factor. This method is mathematically rigorous, computationally efficient, and readily automated.展开更多
基金supported by the Program for Changjiang Scholars and Innovative Research Team in University and Fundamental Research Funds for the Central Universities(wk2060190040)
文摘Determining the number of chemical species is the first step in analyses of a chemical or biological system. A novel method is proposed to address this issue by taking advantage of frequency differences between chemical information and noise. Two interlaced submatrices were obtained by downsampling an original data spectra matrix in an interlacing manner. The two interlaced submatrices contained similar chemical information but different noise levels. The number of relevant chemical species was determined through pairwise comparisons of principal components obtained by principal component analysis of the two interlaced submatrices. The proposed method, referred to as SRISM, uses two self-referencing interlaced submatrices to make the determination. SRISM was able to selectively distinguish relevant chemical species from various types of interference factors such as signal overlapping, minor components and noise in simulated datasets. Its performance was further validated using experimental datasets that contained high-levels of instrument aberrations, signal overlapping and collinearity. SRISM was also applied to infrared spectral data obtained from atmospheric monitoring. It has great potential for overcoming various types of interference factor. This method is mathematically rigorous, computationally efficient, and readily automated.