摘要
针对运维故障诊断任务中所面临的数据量庞大、标注成本高昂以及样本类别分布不均衡等挑战,提出了一种面向业务异常数据的伪标签半监督学习方法。首先,该方法对伪标签数据进行了数据增强,并引入了伪标签损失函数来迭代优化模型。此外,设计了一种自适应非平衡网络,引入了自适应损失函数,以缩小样本之间的非平衡差距,从而提高模型的泛化能力。最后,通过运用基于分布对齐的策略,构建了一个选择性的伪标签自训练框架,有效减轻了模型在迭代训练过程中可能出现的预测偏移问题。实验结果显示,在真实的磁盘数据集上,相较于传统的基线半监督学习算法,本方法在故障诊断方面取得了显著的性能提升。
In addressing the challenges of large-scale data,high annotation costs,and imbalanced sample class distribution encountered in fault diagnosis tasks,a semi-supervised learning method for anomaly data-driven pseudo-labeling is proposed.Firstly,this method employs data augmentation on pseudo-labeled data and introduces a pseudo-label loss function for iterative model optimization.Additionally,an adaptive imbalance-aware network is designed,integrating an adaptive loss function to reduce the imbalance gap among samples and enhance model generalization.Finally,by employing a distribution alignment-based strategy,a selective pseudo-label self-training framework is constructed,effectively alleviating potential prediction drift issues during iterative training.Experimental results demonstrate significant performance improvement in fault diagnosis compared to traditional baseline semi-supervised learning algorithms,particularly on real-world disk datasets.
作者
陆宏波
Lu Hongbo(Anhui Jiyuan Software Co.,Ltd.,Hefei,China)
出处
《科学技术创新》
2024年第22期101-104,共4页
Scientific and Technological Innovation
关键词
半监督故障诊断
伪标签
非平衡学习
自适应网络
semi-supervised fault diagnosis
pseudo-label
imbalanced learning
self-adaptive network