Strategy: FedSCR#

Overview#

Sparse method

Quant method

Residual

Encoding

Upstream

Downstream

FedSCR

structured threshold

None

Yes

None

Yes

No

Handle None-IID

Handle Dropping/Skipping

Generality

Adaptive threshold

None

Only Conv-models

The main motivation of FedSCR is to selectively send back some important updates. The main contributions are as follows:

  1. Conduct empirical research on the pattern of parameter update in convolutional networks, and find that the parameter gradients “in the same filter” and “in the same channel” often have a strong correlation;

  2. According to the conclusion in (1), some “unimportant” (absolute value and lower than threshold) filter or channel parameters will be selectively not transmitted in upstream as a method of sparsity;

  3. The adaptive FedSCR is proposed for the non-iid situation, which can control each client to have a different threshold according to the heterogeneity;

Empirical Research#

It is observed that in the training process of the convolutional neural network, there is a strong correlation between the parameters corresponding to the same channel under the same filter: fed_scr_1 The figure shows the visualization of the parameter gradient value of a fixed layer when epoch=1/15/50/100: where every three rows corresponds to a filter of the layer, and every three columns in a filter corresponds to a channel of the input of the layer, you can A large correlation between parameters in the same row/column is observed:

  1. The update gradients of weights in the same filter/channel are very similar;

  2. When a parameter is close to fitting, other parameters in the same filter/channel have a high probability of being close to fitting;

Compression Design#

mathematical#

Calculate the sum of the absolute values of gradients corresponding to a Channel:

Calculate the sum of the absolute gradient values corresponding to a Filter:

If C/F < threshold, set this channel/filter to 0;

Pseudo Code (Compression)#

algo

Adaptive FedSCR#

In order to target the data distribution of non-iid, adaptive FedSCR is proposed. The purpose is to allow each client to adjust each client according to its own parameter updates (Weight Divergence, Significance of Local Updates) and global data distribution (Impact of Data Distribution). The threshold used by the client for sparseness

Convergence Proof#

reference:Structure-Based Communication Reduction for Federated Learning

Experiment#

on threshold#

scr_exp_1

on Convergence#

scr_exp_2 scr_exp_3

on Convergence (non-iid, compare with FedSTC)#

scr_exp_4

on accuracy#

scr_exp_5

Reference#

Structure-Based Communication Reduction for Federated Learning