Vertical Federated Learning concept

Marc Deveaux
5 min readJun 26, 2022

--

Photo by Anastasia Taioglou on Unsplash

Notes on Vertical Federated Learning

Sources

1. Vertical federated learning overview

Introduction

Today’s AI still faces two major challenges:

  • One is that in most industries, data exists in the form of isolated islands
  • The other is the strengthening of data privacy and security (example: GDPR, etc.)

One answer to those challenges is Federated Learning

  • Federated learning is a machine learning technique that trains an algorithm across multiple decentralized servers holding local data samples, without exchanging them
  • This approach stands in contrast to traditional centralized machine learning techniques where all the local datasets are uploaded to one server
  • Federated learning enables multiple actors to build a common machine learning model without sharing data

Vertical Federated Learning Applicability

There are different type of federated learning techniques, however in this post we will talk about the “Vertical Federated Learning” technique. Vertical federated learning is applicable to the cases that two data sets share the same sample ID space but differ in feature space

Same easy_id across two data sets

  • a user named Hector is in both in dataset A and B -> good
  • a user named Martha is only in dataset B -> bad

Different features

  • dataset A has columns “Age” and “Gender” while dataset B has “personal income” -> good
  • dataset A and B have the same columns “Age” and “Gender” -> bad

Example

  • Consider two different companies in the same city, one is a bank, and the other is an e-commerce company. Their user lists are likely to contain most of the residents of the area, so the intersection of their user space is large (and they have many users in common)
  • Bank records the user’s revenue and expenditure behavior, and the e-commerce retains the user’s browsing and purchasing history, so their feature spaces are very different (so features will be different)
  • We want both parties to have a prediction model for product purchase based on user and product information. By exploiting 2 different datasets, I have more features to build a better learning model
  • Vertically federated learning is the process of aggregating these different features and building a model in a privacy-preserving manner, using data from both parties collaboratively

Architecture

  • Part 1 Encrypted entity alignment: Since the user groups of the two companies are not the same, the system uses the encryption-based user ID alignment techniques to confirm the common users of both parties without A and B exposing their respective data
  • Part 2 Encrypted model training: Train the machine learning model on the common user list. In this process, only the model parameters are shared, never the data

2. Reminder on Fully Connected Layer Neural Network

To understand the step by step idea behind the Encrypted model training, we need to recall the basics of fully connected layers NN.

Layers and weight

Input feature: information that the network will attempt to learn about

Hidden units: where the information gets processed. The system becomes more “knowledgeable” as it goes along, filtering information through multiple hidden layers

Output layer: self explanatory

Weights

  • All units are interconnected (represented by arrows)
  • The connections between one unit and another are represented by a number called a weight
  • It can be either positive or negative
  • The higher the weight, the more influence one unit has on another

Information Flow

Basically:

  1. calculate output
  2. compare the calculated output against the real output
  3. update the weights
  4. repeat

Forward propagation

Goal: calculate the output probability

  • When a NN is being trained, information are fed into the network via the input units, which trigger the layers of hidden units, which arrive at the output units
  • Each unit receives inputs from the units to its left, and the inputs are multiplied by the weights of the connections they travel along
  • Every unit adds up all the inputs it receives in this way and if the sum is more than a certain threshold value, the unit “fires” and triggers the units it’s connected to (those on its right)

Backward propagation

Goal: update the weights

  • The NN learns through feedback (being told if what it is doing is right or wrong)
  • NN learns to compare the output it produced with the output it was meant to produce, and uses the difference between them to modify the weights of the connections between the units in the network
  • The NN change the weights from the output units through the hidden units to the input units by going backward
  • In time, backpropagation causes the network to learn, reducing the difference between actual and intended output to the point where the two exactly coincide

3. Vertical Federated Learning Step by Step

see for details: https://arxiv.org/pdf/2202.04309.pdf

  1. Select the common users across all the different organization
  2. Each local model do a Forward propagation using its local data. No data or weights are shared across organizations
  3. Each local model transmit its forward output to the label owner. Forward outputs contain intermediate results of the local NN
  4. Top model do forward propagation. The top model connects all the local intermediates NN results and create the final output
  5. Top model do backward propagation. Parameters are getting updated
  6. Backward output transmission. Gradients are sent back to each local models
  7. Local Model Backward Propagation. The local model parameters are updated

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Marc Deveaux
Marc Deveaux

No responses yet

Write a response