Optimizing Synchronous Stochastic Gradient Descent with Local Efficient Sign and Model Averaging Correction

Liu, Dongyang; Chen, Zeqiang; Chen, Nengcheng

doi:10.2139/ssrn.4965637

Download This Paper

Open PDF in Browser

Add Paper to My Library

Optimizing Synchronous Stochastic Gradient Descent with Local Efficient Sign and Model Averaging Correction

28 Pages Posted: 24 Sep 2024

See all articles by Dongyang Liu

Synchronous Stochastic Gradient Descent (SSGD) is a key method of distributed deep learning, while its lots of iterations and large communicated weight parameters per interaction cause communication bottlenecks. To deal with this problem, this paper proposes a communication optimized approach for Synchronous Stochastic Gradient Descent with Local Efficient Sign and Model Averaging Correction (LEFS-SGDM). The LEFS-SGDM method merges delay communication with gradient compression technology Sign Stochastic Gradient Descent to diminish communication frequency and data volume. Furthermore, it incorporates the error accumulation and global model constraint mechanisms to enhance training accuracy. Experiments were carried out on Residual Network-20 (ResNet-20), Visual Geometry Group-11 (VGG-11) and Dense Convolutional Network-40 (DenseNet-40) models with Canadian Institute for Advanced Research, 10 classes (CIFAR-10) and Canadian Institute for Advanced Research, 100 classes (CIFAR-100) datasets. Compared with existing Local Stochastic Gradient Descent, the results show that LEFS-SGDM not only reduces the amount of communication data by 97.04% but also improves the test accuracy by 0.46%-2.32%. The results prove the effectiveness of the method and demonstrate its potential applicability in distributed deep learning.

Keywords: Distributed Deep Learning, Synchronous Stochastic Gradient Descent, Local Efficient Sign, Model Averaging Correction, Gradient Compression, Communication Bottleneck

Suggested Citation: Suggested Citation

Liu, Dongyang and Chen, Zeqiang and Chen, Nengcheng, Optimizing Synchronous Stochastic Gradient Descent with Local Efficient Sign and Model Averaging Correction. Available at SSRN: https://ssrn.com/abstract=4965637 or http://dx.doi.org/10.2139/ssrn.4965637