Faculty Recruiting Support CICS

Distributed Learning Algorithms: Communication Efficiency and Error Resilience

30 Oct
Friday, 10/30/2020 10:00am to 12:00pm
Zoom Meeting
PhD Dissertation Proposal Defense
Speaker: Raj Kumar Maity
Zoom Meeting: https://umass-amherst.zoom.us/j/98483617156?pwd=UVdSdXpkeHBZcU92a1BualpTeC80UT09 Meeting ID: 984 8361 7156 Passcode: rajumass

Abstract

In modern real-world applications such as self-driving cars, recommender systems, robotics, genetics etc.,   the size of the training data has grown to the point that it has become essential to design distributed learning algorithms. A general framework for the distributed learning is data parallelism where the data is distributed among the worker machines for parallel processing and computation to speed up  learning.  With billions of devices such as cellphones, computers, etc., the data is inherently distributed and stored locally in the users' devices. The learning in this set up  is popularly known as Federated Learning. The speed-up due to distributed framework gets hindered by some fundamental problems such as straggler workers, communication bottleneck due to  high  communication overhead  between workers and central server and adversarial failure popularly know as Byzantine failure.   In this thesis, we study and develop  distributed algorithms that are error resilient and communication efficient.   First, we address the problem of straggler workers where the learning is delayed due to slow workers in the distributed setup. To  mitigate the effect of the stragglers, we employ LDPC (low density parity check code) to encode the data and implement gradient descent algorithm in the distributed setup.  Second, we present a family of vector quantization schemes vqSGD (vector quantized Stochastic Gradient Descent ) that provides an asymptotic reduction in the communication cost with convergence guarantees in the first order distributed optimization.  We also show that vqSGD provides strong privacy guarantee. Third, we address the problem of Byzantine failure together with communication efficiency in the first order gradient descent based algorithm. We consider a generic class of  $\delta $- approximate compressor for communication efficiency and  employ a simple norm based thresholding scheme to make the learning algorithm robust to Byzantine failures. We establish statistical error rate for arbitrary (convex or non-convex) smooth losses. Fourth, we employ the generic class of $\delta $- approximate compressor to develop a communication efficient second order Newton-type algorithm and provide rate of convergence for smooth objective. Fifth, we  propose COMRADE (COMmunication-efficient and Robust Approximate Distributed nEwton), an iterative second order algorithm that is communication efficient as well as robust against Byzantine failures.     Advisor: Arya Mazumdar