An Investigation into Federated Learning

Published in

Towards Data Science

4 min readJan 13, 2021

A decentralized distributed machine learning technique without sharing raw data

What is Federated Learning?

Federated Learning also known as collaborative learning or a decentralized learning, a machine learning technique where a model is trained by keeping the training data decentralized.

How cool is that?

So why do we need federated learning?

Before we start, let’s quickly go through how a model is trained in conventional machine learning(ML) setting. In conventional ML setting, the first step prior to training a model involves downloading or storing the data you would like to use in your local repository and apply some machine learning algorithms on the data to predict the results. The key point to note down here is that you centrally collect or store the data to one place and thereby centrally train the model. But this process may not work going forward in the future due to data privacy related concerns. Data, irrespective of any domain they come from is often private or sensitive. It might be impossible for any organization to share the data because of privacy related issues. Due to industry competition, privacy security and complicated administrative procedures, even data integration between different departments of the same company faces heavy resistance.

Let us now consider an example in health care domain. Medical data is highly sensitive. Due to regulatory concerns like Health Insurance Portability and Accounting Act (HIPAA) in United States or General Data Protection and Regulation (GDPR) the data cannot be shared or available to public.

With that saying, reforming the conventional machine learning technique is necessary to address these growing regulatory concerns.

So how will you start the training process when you don’t have access to any data? How can you train a model if you cannot collect and store the data to your local repository?

There are three solutions to every problem: accept it, change it, or leave it. If you can’t accept it, change it. If you can’t change it, leave it

Google first introduced federated learning in 2016. You can read more about federated learning in this paper. They use Federated learning in edge devices (mobile device) for predicting next keyword prediction in Google’s Gboard.

Let’s understand federated learning in a nutshell.

The main concept of federated learning is instead of collecting or storing the data to one place to train a model, we send the model to training devices.

A model which is already trained using a centralized machine learning setting is sent to all participating devices in federated learning process. Each device has its own local private data. Each device gets trained using its own device data that produces certain local updates. These updates are sent to a trusted centralized server where the server takes the weighted average of all device updates. This weighted average becomes the base model for the next round of training. This is repeated for k number of times until the model converges .So this way , a model is trained by keeping the training data decentralized.

Image by author. Image shows four participating devices in Federated learning where each worker gets trained by its own data and send the updates to server. Server takes the weighted average from these 4 devices and distribute to all workers which becomes the base model for the next round of training.

But, there are many practical challenges or constraints in federated learning. Following are a few of them:

1. Each device’s system requirements like memory are different

2. There is a lot of communication overhead as each device has to send model updates to server

3. Each device updates can be reversed to get the raw data back

4. The local device data can be heterogenous and Non-IID. That is, device data may be imbalanced and number of samples for each device can be different

We will be talking more about federated learning in my next article regarding the types of federated learning, its implementation using Pytorch and various model topologies that can be adopted in federated learning. This is just an introduction to Federated learning.

Stay tuned guys. See you in my next article.

About me

I am Manjari Ganapathy, Master of Science student in Computer Science at University of Nevada, Las Vegas. Currently I am doing my Master’s thesis in federated learning under the guidance of Prof. Mingon Kang.

References

[1]. Brendan McMahan et al ,Communication-Efficient Learning of Deep Networks from Decentralized Data (2017)

[2]. Peter Kairouz et al, Advances and Open Problems in Federated Learning(2019)

An Investigation into Federated Learning

A decentralized distributed machine learning technique without sharing raw data

What is Federated Learning?

Written by Manjari Ganapathy