Optimizing Synchronous Stochastic Gradient Descent with Local Efficient Sign and Model Averaging Correction
28 Pages Posted: 24 Sep 2024
Abstract
Synchronous Stochastic Gradient Descent (SSGD) is a key method of distributed deep learning, while its lots of iterations and large communicated weight parameters per interaction cause communication bottlenecks. To deal with this problem, this paper proposes a communication optimized approach for Synchronous Stochastic Gradient Descent with Local Efficient Sign and Model Averaging Correction (LEFS-SGDM). The LEFS-SGDM method merges delay communication with gradient compression technology Sign Stochastic Gradient Descent to diminish communication frequency and data volume. Furthermore, it incorporates the error accumulation and global model constraint mechanisms to enhance training accuracy. Experiments were carried out on Residual Network-20 (ResNet-20), Visual Geometry Group-11 (VGG-11) and Dense Convolutional Network-40 (DenseNet-40) models with Canadian Institute for Advanced Research, 10 classes (CIFAR-10) and Canadian Institute for Advanced Research, 100 classes (CIFAR-100) datasets. Compared with existing Local Stochastic Gradient Descent, the results show that LEFS-SGDM not only reduces the amount of communication data by 97.04% but also improves the test accuracy by 0.46%-2.32%. The results prove the effectiveness of the method and demonstrate its potential applicability in distributed deep learning.
Keywords: Distributed Deep Learning, Synchronous Stochastic Gradient Descent, Local Efficient Sign, Model Averaging Correction, Gradient Compression, Communication Bottleneck
Suggested Citation: Suggested Citation