摘要
Pulsar detection has become an active research topic in radio astronomy recently.One of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signals in a survey.However,pulsar candidates are always class-imbalanced,as most candidates are non-pulsars such as RFI and only a tiny part of them are from real pulsars.Class imbalance can greatly affect the performance of machine learning(ML)models,resulting in a heavy cost as some real pulsars are misjudged.To deal with the problem,techniques of choosing relevant features to discriminate pulsars from non-pulsars are focused on,which is known as feature selection.Feature selection is a process of selecting a subset of the most relevant features from a feature pool.The distinguishing features between pulsars and non-pulsars can significantly improve the performance of the classifier even if the data are highly imbalanced.In this work,an algorithm for feature selection called the K-fold Relief-Greedy(KFRG)algorithm is designed.KFRG is a two-stage algorithm.In the first stage,it filters out some irrelevant features according to their K-fold Relief scores,while in the second stage,it removes the redundant features and selects the most relevant features by a forward greedy search strategy.Experiments on the data set of the High Time Resolution Universe survey verified that ML models based on KFRG are capable of PCS,correctly separating pulsars from non-pulsars even if the candidates are highly class-imbalanced.
基金
support from the National Natural Science Foundation of China(NSFC,grant Nos.11973022 and 12373108)
the Natural Science Foundation of Guangdong Province(No.2020A1515010710)
Hanshan Normal University Startup Foundation for Doctor Scientific Research(No.QD202129)。