摘要
红外光谱分析是基于分子振动与跃迁理论的鉴别物质化学组成的技术。得到的光谱数据常常具有较高的维数和重叠度,这给后续的数据处理带来困难。为此提出一种GK可能C均值聚类算法(GKIPCM),引入了GK聚类算法的马氏距离测度与改进的可能C均值聚类算法(IPCM)的模糊隶属度与聚类中心更新方程,使样本的距离测度具有自适应性且避免了聚类中心的一致性。GKIPCM算法具有分类精度更高,分类准确率对参数敏感性低的优点。将四组洗净白菜作为光谱分析对象,分别施加三种农药(高效氯氟氰菊酯)配比,采用安捷伦Cary630FTIR光谱仪采集白菜的傅里叶中红外光谱(FT-MIR)。首先对样本进行预处理,使用多元散射矫正(MSC)对光谱数据降噪,消除数据偏移量;其次,由于采集到的数据波数范围为4300~590cm^(-1),数据维数达到了971维,故使用主成分分析(PCA)对数据实现降维,降维后的数据维度减小到了23,且23个主成分的累积贡献率高达99.60%;但各类光谱的特征信息依然混杂在一起,故使用线性判别分析(LDA)提取特征鉴别信息,进一步将数据降至3维;最终,运行模糊C-均值聚类算法(FCM)得到较优初始聚类中心,使用GKIPCM算法对四类降维后的光谱数据进行聚类分析,并与GK聚类算法与IPCM聚类算法的运行结果作对比。GKIPCM算法的总迭代时长为0.2188s,分类准确率达到了97.22%。相较之下,GK算法与IPCM算法的准确率分别为63.89%和91.67%,运行的总时长为0.0938与0.0625s。从实验结果可看出,GKIPCM算法可以通过分析光谱数据从而完成对不同程度农药残留进行定性分析的任务。
Infrared spectroscopy is a technology used to identify the chemical composition of substances based on molecular vibration and quantum-jump theory.Due to the unique absorbance of different functional groups,the spectral data related to the absorbance and the wavelength(or wavenumber)can be obtained when the infrared beam irradiates the molecular.However,the spectral data from experiments always have high dimensions and overlap,making it difficult to process the data.Thus,this paper proposed an improved Gustafson-Kessel possibilistic c-means clustering(GKIPCM),introducing the Mahalanobis distance from GK clustering and the iterative equations of fuzzy membership values and cluster centers from improved possibilistic c-means clustering(IPCM).GKIPCM makes the data adapt to different mathematical distance measures and avoids identical cluster centers.Furthermore,GKIPCM has higher classification accuracy,which is less sensitive to parameters.In the experiments,four groups of washed Chinese cabbage were the objects of spectral analysis and different concentrations of lambda-cyhalothrin pesticide were sprayed on the Chinese cabbages.Spectral data of Chinese cabbages were collected with Agilent Cary 630 FTIR spectrometer.Firstly,multiplicative scatter correction(MSC)was applied to reduce the noise and eliminate data offset when pre-processing the data.Secondly,principal component analysis(PCA)was utilized to reduce dimensions due to the wide wavenumber range(4300~590 cm;)and the high data dimensions(971).After conducting PAC,the dimensionality of data was reduced to 23,and the total contribution of 23 principal components reached 99.60%.Nonetheless,the feature information was still mixed.So the linear discriminant analysis(LDA)was used to extract features of the spectral data,and the LDA algorithm reduced the dimensionality of the spectral data to 3.Finally,the fuzzy c-means clustering(FCM)was employed to obtain the optimal initial cluster centers.Then,the GKIPCM algorithm was applied to cluster four different groups of
作者
谭阳
武小红
武斌
沈砚君
刘锦茂
TAN Yang;WU Xiao-hong;WU Bin;SHEN Yan-jun;LIU Jin-mao(Institute of Talented Engineering Students,Jiangsu University,Zhenjiang 212013,China;School of Electrical and Information Engineering,Jiangsu University,Zhen jiang 212013,China;High-tech Key Laboratory of Agricultural Equipment and Intelligence of Jiangsu Province,Jiangsu University,Zhenjiang 212013,China;Department of Information Engineering,Chuzhou Polytechnic,Chuzhou 239000,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2022年第5期1465-1470,共6页
Spectroscopy and Spectral Analysis
基金
国家自然科学基金项目(31471413)
安徽省教育厅高校自然科学研究重点项目(KJ2019A1129)
滁州职业技术学院校级自科重点项目(YJZ-2020-12)
滁州职业技术学院院级人才项目“优秀骨干教师”(YG2019026,YG2019024)
江苏大学大学生创新训练计划项目(202010299244Y)资助。
关键词
白菜
农药残留
光谱分析
主成分分析
线性判别分析
模糊聚类
Chinese cabbage
Pesticide residues
Infrared spectroscopy
Principal component analysis
Linear discriminant analysis
Fuzzy clustering