摘要
机器学习在当今的诸多领域已经取得了巨大的成功.尤其是提升算法.提升算法适应各种场景的能力较强、准确率较高,已经在多个领域发挥巨大的作用.但是提升算法在天文学中的应用却极为少见.为解决斯隆数字巡天(Sloan Digital Sky Survey,SDSS)数据中恒星/星系暗源集分类正确率低的问题,引入了机器学习中较新的研究成果–XGBoost (eXtreme Gradient Boosting).从SDSS-DR7 (SDSS Data Release 7)中获取完整的测光数据集,并根据星等值划分为亮源集和暗源集.首先,分别对亮源集和暗源集使用十折交叉验证法,同时运用XGBoost算法建立恒星/星系分类模型;然后,运用栅格搜索等方法调优XGBoost参数;最后,基于星系的分类正确率等指标,与功能树(Function Tree, FT)、Adaboost (Adaptive boosting)、随机森林(Random Forest, RF)、梯度提升决策树(Gradient Boosting Decision Tree, GBDT)、堆叠降噪自编码(Stacked Denoising AutoEncoders, SDAE)、深度置信网络(Deep Belief Network, DBN)等模型进行对比并分析结果.实验结果表明:XGBoost在暗源分类中要比功能树算法的星系分类正确率提高了将近10%,在暗源集的最暗星等中比功能树提高了将近5%.同其他传统的机器学习算法和深度神经网络相比, XGBoost也有不同程度的提升.
Machine learning,especially the life algorithm,has achieved great success in many areas today.The lifting algorithm has a strong ability to adapt to various scenarios with high accuracy,and has played a great role in many fields.But in astronomy,the application of lifting algorithms is rare.In response to the low classification accuracy of dark source sets in star/galaxy in the Sloan Digital Sky Survey(SDSS),a new research result in machine learning,e Xtreme Gradient Boosting(XGBoost),was introduced.The complete photometric data set is obtained from the SDSS-DR7,and divided into a bright source set and a dark source set according to the magnitude.Firstly,the ten-fold cross-validation method is used for the bright source set and the dark source set respectively,and the XGBoost algorithm is used to establish the star/galaxy classification model.Then,the grid search and other methods are used to tune the XGBoost parameters.Finally,based on galaxies’classification accuracy and other indicators,the classification results are analyzed,comparing with the models of function tree(FT),Adaptive boosting(Adaboost),Random Forest(RF),Gradient Boosting Decision Tree(GBDT),Stacked Denoising AutoEncoders(SDAE),and Deep Belief Nets(DBN).The experimental results show that,the XGBoost improves the classification accuracy of galaxies in dark source classification by nearly 10%compared to the function tree algorithm,and improves the classification accuracy of galaxies in the darkest magnitude of dark source set by nearly 5%compared to the function tree algorithm.Compared with other traditional machine learning algorithms and deep neural networks,the XGBoost also has different degrees of improvement.
作者
李超
张文辉
林基明
LI Chao;ZHANG Wen-hui;LIN Ji-ming(College of Information and Communication Engineering,Guilin University of Electronic Technology,Guilin 541004;Key Laboratory of Cognitive Radio and Information Processing,the Ministry of Education,Guilin University of Electronic Technology,Guilin 541004;Guangxi Cooperative Innovation Center of Cloud Computing and Big Data,Guilin University of Electronic Technology,Guilin 541004;Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems,Guilin University of Electronic Technology,Guilin 541004)
出处
《天文学报》
CSCD
北大核心
2019年第2期73-82,共10页
Acta Astronomica Sinica
基金
广西云计算与大数据协同创新中心
广西高校云计算与复杂系统重点实验室项目(编号1716)资助
关键词
恒星:基本参数
星系:基本参数
技术:测光
方法:数据分析
stars:fundamental parameters
galaxies:fundamental parameters
techniques:photometric
methods:data analysis