摘要
随着地下水利、水务管网对材料需求的多样性和复杂性日益加剧,通过机器学习高效便捷地设计满足个性化需求的特种材料成为人们关注的热点。传统监督学习方法均以大量数据训练建模为基础,但从深埋地下水务管网、高端军工设备等领域所需的特种材料,如稀贵高熵合金等获取大数据集,需要的成本极高且周期较长。为了解决该问题,提出了一种小样本扩充模型——RX-SMOGN,使用极致梯度提升模型和使用交叉验证的递归特征消除算法进行特征筛选,使用SMOGN算法扩充数据集。提出以高熵合金相结构为研究对象,训练传统机器学习模型对其进行预测以验证RX-SMOGN模型的有效性。由五折交叉验证及4个评价指标结果可知,RX-SMOGN模型充分提高了机器学习模型的性能,为合金材料设计提供了一种更便捷的方法,充分提高了合金材料设计的效率。
With the increasing diversity and complexity of material requirements for underground water conservancy and water pipeline networks,the efficient and convenient design of special materials to meet individual needs through machine learning has become a hot topic of concern.Traditional supervised learning methods are all based on a large dataset to train models,but obtaining large datasets for special materials required in deeply buried underground water pipeline networks and high-end military equipment,such as rare and high-entropy alloys,etc.requires extremely high cost and a long period.To solve this problem,we propose a small sample expansion model-RX-SMOGN,using XGBoost and RFECV algorithms for feature screening.We enrich the dataset with the SMOGN algorithm.In this paper,the phase structure of high-entropy alloys is used as the research object,and traditional machine learning models are trained to predict them to verify the effectiveness of the RX-SMOGN model.From the results of 5-fold cross-verification and 4 evaluation indicators,it can be seen that the RX-SMOGN model fully improves the performance of the machine learning model,provides a more convenient method for alloy material design,and fully improves the efficiency of alloy material design.
作者
杨涛
张兆波
郑添屹
彭保
YANG Tao;ZHANG Zhaobo;ZHENG Tianyi;PENG Bao(Shenzhen Koron Soft Co.,Ltd.,Shenzhen 518063,China;GD Holdings Pearl River Delta Water Supply Co.,Ltd.,Guangzhou 511455,China;South China Academy of Advanced Optoelectronics,South China Normal University,Guangzhou 510006,China;School of Information and Communication,Shenzhen Institute of Information Technology,Shenzhen 518172,China)
出处
《大数据》
2024年第1期185-194,共10页
Big Data Research
基金
深圳大学稳定保障计划项目(No.20200829114939001)
深圳信息职业技术学院校级创新科研团队项目(No.TD2020E001)
珠三角水资源配置工程科研项目(No.CD88-QT01-2022-0068)。
关键词
小样本扩充
特征工程
机器学习
高熵合金
稀贵金属
small sample expansion
feature engineering
machine learning
high-entropy alloy
rare precious metal