摘要
With the emergence of various intelligent applications,machine learning technologies face lots of challenges including large-scale models,application oriented real-time dataset and limited capabilities of nodes in practice.Therefore,distributed machine learning(DML) and semi-supervised learning methods which help solve these problems have been addressed in both academia and industry.In this paper,the semi-supervised learning method and the data parallelism DML framework are combined.The pseudo-label based local loss function for each distributed node is studied,and the stochastic gradient descent(SGD) based distributed parameter update principle is derived.A demo that implements the pseudo-label based semi-supervised learning in the DML framework is conducted,and the CIFAR-10 dataset for target classification is used to evaluate the performance.Experimental results confirm the convergence and the accuracy of the model using the pseudo-label based semi-supervised learning in the DML framework.Given the proportion of the pseudo-label dataset is 20%,the accuracy of the model is over 90% when the value of local parameter update steps between two global aggregations is less than 5.Besides,fixing the global aggregations interval to 3,the model converges with acceptable performance degradation when the proportion of the pseudo-label dataset varies from 20% to 80%.
基金
Supported by the National Key R&D Program of China(No.2020YFC1807904)
the Natural Science Foundation of Beijing Municipality(No.L192002)
the National Natural Science Foundation of China(No.U1633115)。