期刊文献+

特征分层结合改进粒子群算法的近红外光谱特征选择方法研究 被引量:10

Study on Feature Selection of Near Infrared Spectra Based on Feature Hierarchical Combining Improved Particle Swarm Optimization
下载PDF
导出
摘要 在近红外光谱数据定量建模中,数据的高冗余和高噪严重影响了建模的稳健性和精确性,因此提出了一种特征分层结合改进粒子群算法(PSO)的特征光谱选择方法。首先通过互信息度量特征的重要性得分,并按特征的重要性降序排序,有效避免了因采用降维方法得到主成分而引起的丢失重要信息的问题。其次,引入了跳跃度概念,并构造了一种特征分层的方法,重要性程度相似的特征并入同一个特征子集,将降序排列的特征集分割为不同的特征子集,避免了筛选特征过程中因人为设定特征重要性得分阈值而导致的不确定性。最后,采用收敛速度快、控制参数少的粒子群算法作为最优特征子集的优化方法,同时对粒子群算法做了两方面改进:引入混沌模型增加种群的多样性,提高了PSO的全局搜索能力,避免陷入局部最优;将特征数目引入到适应度函数中,在迭代前期通过惩罚因子调节特征数目对适应度函数的影响,提高了算法的适应能力。将分层后的数据以特征子集为单位,依次累加并作为改进粒子群算法的输入,从而选择出高辨别力的特征子集。以烟碱指标为例进行了特征选择过程的描述,实验采用尼高力公司的AntarisⅡ近红外光谱仪进行近红外光谱数据的采集,光谱扫描范围为4 000~10 000 cm^(-1)。首先,利用互信息理论计算全光谱1 557个特征对待测指标定量建模的重要性得分,得分取30次实验的均值。其次,将所有特征按照重要性得分降序排序,计算所有特征的跳跃度,依据跳跃度寻找特征分层的临界点,将特征划分到不同的特征层中,构建了包含8个特征子集的特征集合S={S′_1,S′_2,S′_3,S′_4,S′_5,S′_6,S′_7,S′_8}。然后,依次将特征子集S′_1,{S′_1,S′_2},{S′_1,S′_2,S′_3},…,{S′_1,S′_2,S′_3,S′_4,S′_5,S′_6,S′_7,S′_8}作为初始粒子群的候选集,以R/(1+RMSEP)作为特征子集优� In the quantitative modeling of near-infrared spectroscopy data,the high redundancy and high noise of the data severely affect the robustness and accuracy of the modeling.Therefore,this paper presents a feature-based spectroscopy combined with improved Particle Swarm Optimization(PSO)Method of choosing.First,we measure the importance score of each feature through mutual information,and then sort the features according to the importance of the features in descending order.This effectively avoids the problem of losing important information caused by using the principal component reduction method.Secondly,the concept of jump degree is introduced and a method of feature stratification is constructed.Similar features of similar importance are merged into the same feature subset,and the descending ordered feature set is segmented into different feature subsets,avoiding the screening uncertainty caused by artificially setting the score of feature importance score during feature process.Finally,the particle swarm optimization algorithm with fast convergence rate and few control parameters is used as the optimal feature subset optimization method.At the same time,particle swarm optimization is improved in two aspects:The chaotic model is introduced to increase the diversity of the population and improve the global searching ability of PSO,so as to avoid getting into local optimum.The number of features is introduced into the fitness function,and the influence of the number of features on the fitness function is adjusted by the penalty factor in the early iteration to improve the adaptability of the algorithm.The stratified data is collected as a feature subset and then added as a modified particle swarm optimization algorithm to select the high-resolution feature subset.In this paper,the nicotine index as an example of the feature selection process is described,using Nicolet company Antaris II near infrared spectrometer near infrared spectrum data acquisition,spectrum scanning range is 4 000~10 000 cm^-1.First,we use the
作者 徐宝鼎 秦玉华 杨宁 高锐 苑程程 XU Bao-ding;QIN Yu-hua;YANG Ning;GAO Rui;YUAN Cheng-cheng(College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China;College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China;China Tobacco Yunnan Industrial Co., Ltd., Technical Research Center, Kunming 650024, China)
出处 《光谱学与光谱分析》 SCIE EI CAS CSCD 北大核心 2019年第3期717-722,共6页 Spectroscopy and Spectral Analysis
基金 国家重点研发计划项目(2016YFB1001103) 云南中烟工业有限责任公司项目(2017XX02 2018JC01)资助
关键词 特征选择 特征分层 跳跃度 改进粒子群算法 近红外光谱 Feature selection Feature stratification Jumping degree Improved particle swarm optimization Near infrared spectroscopy
  • 相关文献

参考文献10

二级参考文献226

共引文献201

同被引文献107

引证文献10

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部