# Secure Aggregation

>The following codes are demos only. It's **NOT for production** due to system security concerns, please **DO NOT** use it directly in production.


It is recommended to use [jupyter](https://jupyter.org/) to run this tutorial.

Secure aggregation can be expressed as multiple parties owning data, and cooperating to complete the computation of aggregated values (such as summation) without revealing their private data.

Secure aggregation is an important concept in federated learning. There have been many studies in the academic community. SecretFlow has used secure aggregation in horizontal federated gradient/weight aggregation and data statistics (such as data exploration and preprocessing).

The following explains the secure aggregation used by secretflow.


## Preparation

Initialize SecretFlow.

In [None]:
import secretflow as sf

# In case you have a running SecretFlow runtime already.
sf.shutdown()

sf.init(['alice', 'bob'], address='local')

Prepare some data for testing.

In [2]:
import numpy as np

arr0, arr1 = np.random.rand(2, 3), np.random.rand(2, 3)
print('arr0:\n', arr0, '\narr1:\n', arr1)

print('Sum:\n', np.sum([arr0, arr1], axis=0))
print('Average:\n', np.average([arr0, arr1], axis=0))
print('Min:\n', np.min([arr0, arr1], axis=0))
print('Max:\n', np.max([arr0, arr1], axis=0))

arr0:
 [[0.53867365 0.69040348 0.42628929]
 [0.76128941 0.5444343 0.7680543 ]] 
arr1:
 [[0.74303296 0.7274792 0.47244091]
 [0.88295957 0.80091356 0.82681861]]
Sum:
 [[1.28170662 1.41788268 0.8987302 ]
 [1.64424898 1.34534786 1.59487291]]
Average:
 [[0.64085331 0.70894134 0.4493651 ]
 [0.82212449 0.67267393 0.79743646]]
Min:
 [[0.53867365 0.69040348 0.42628929]
 [0.76128941 0.5444343 0.7680543 ]]
Max:
 [[0.74303296 0.7274792 0.47244091]
 [0.88295957 0.80091356 0.82681861]]


Create parties alice and bob.

In [3]:
alice, bob = sf.PYU('alice'), sf.PYU('bob')

## Aggregate operation

SecretFlow provides a variety of ```Aggregator``` for users to choose from, each ```Aggregator``` provides the function of sum/average.

### SPU based security aggregation

[SPU](../design/spu.md) is a security device in SecretFlow, and its underlying principle is [MPC](https://en.wikipedia.org/wiki/Secure_multi-party_computation). The SecretFlow implements SPU-based secure aggregation, and the following shows how to use it.

In [4]:
# Create an spu device.
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))

# Create an aggregator instance using this spu.
spu_aggr = sf.security.aggregation.SPUAggregator(spu)

In [5]:
# Simulate that alice and bob hold data respectively
a = alice(lambda: arr0)()
b = bob(lambda: arr1)()

In [6]:
# Sum the data.
sf.reveal(spu_aggr.sum([a, b], axis=0))

array([[1.2817066 , 1.4178827 , 0.89873016],
 [1.644249 , 1.3453479 , 1.594873 ]], dtype=float32)

In [7]:
# Average the data.
sf.reveal(spu_aggr.average([a, b], axis=0))

array([[0.6408533 , 0.70894134, 0.44936508],
 [0.8221245 , 0.67267394, 0.7974364 ]], dtype=float32)

### Masking with One-Time Pads

`Masking with One-Time Pads` negotiates a secret for every two participants, then uses the secret to hide its input $x$, and each participant outputs.

$$ y_u = x_u + \sum_{u < v}s_{u,v} - \sum_{u > v}s_{u,v}\ mod\ R $$

the secrets are cancelled out after aggregation and then we can get the correct result.

$$ \sum y = \sum x $$


For example, the participants Alice, Bob, and Carol each own $x_1, x_2, x_3$, negotiate the secret $s_{a,b}, s_{a,c}, s_{b,c}$, and then output:
$y_1 = x_1 + s_{a,b} + s_{a,c}$ 
$y_2 = x_2 - s_{a,b} + s_{b,c}$ 
$y_3 = x_3 - s_{a,c} - s_{b,c}$ 
then it is easy to get $$ y_1 + y_2 + y_3 = x_1 + s_{a,b} + s_{a,c} + x_2 - s_{a,b} + s_{b,c} + x_3 - s_{a,c} - s_{b,c} = x_1 + x_2 + x_3 $$

Note that `Masking with One-Time Pads` is based on semi-honest assumptions and does not support client dropping. For more information, please refer to [Practical Secure Aggregation
for Privacy-Preserving Machine Learning](https://eprint.iacr.org/2017/281.pdf)

> **_Warning:_** The SecureAggregator uses [numpy.random.PCG64](https://numpy.org/doc/stable/reference/random/bit_generators/pcg64.html#numpy.random.PCG64). There are many discussions of whether PCG is a CSPRNG (e.g. https://crypto.stackexchange.com/questions/77101/is-the-pcg-prng-a-csprng-or-why-not), we prefer a conservative strategy unless a further security analysis came up. Therefore we recommend users to use a standardized CSPRNG in industrial scenarios.

In [8]:
# Create a secure aggregator instance with alice and bob,
# where alice is responsible for performing aggregate computing operations.
secure_aggr = sf.security.aggregation.SecureAggregator(device=alice, participants=[alice, bob])

In [9]:
# Sum the data.
sf.reveal(secure_aggr.sum([a, b], axis=0))

array([[1.28170395, 1.41788101, 0.89872742],
 [1.64424515, 1.34534454, 1.59486771]])

In [10]:
# Average the data.
sf.reveal(secure_aggr.average([a, b], axis=0))

array([[0.64085197, 0.70894051, 0.44936371],
 [0.82212257, 0.67267227, 0.79743385]])

### Plaintext aggregation (for test only, not recommended for production use)

**`PlainAggregator` is used for test only and not for production use.**

For simple local simulation, SecretFlow also provides a plaintext aggregator.

In [12]:
# Create a plaintext aggregator instance and alice is responsible for performing aggregation.
plain_aggr = sf.security.aggregation.PlainAggregator(alice)

In [13]:
# Sum the data.
sf.reveal(plain_aggr.sum([a, b], axis=0))

array([[1.2817066 , 1.4178827 , 0.89873016],
 [1.644249 , 1.3453479 , 1.594873 ]], dtype=float32)

In [14]:
# Average the data.
sf.reveal(plain_aggr.average([a, b], axis=0))

array([[0.6408533 , 0.70894134, 0.44936508],
 [0.8221245 , 0.67267394, 0.7974365 ]], dtype=float32)

## Comparison

In addition, SecretFlow provides a variety of ``Comparator```, providing operations such as maximum (max)/minimum (min).
For example, in horizontal partitioned data scenario, global values can be obtained through secure comparison without exposing the private information of the participants.

### SPU based security comparison

SecretFlow implements SPU-based secure comparison, and the following shows how to use it.

In [15]:
# Create an spu comparator instance.
spu_com = sf.security.compare.SPUComparator(spu)

In [16]:
# Get the minimum.
sf.reveal(spu_com.min([a, b], axis=0))

array([[0.53867364, 0.69040346, 0.4262893 ],
 [0.7612894 , 0.5444343 , 0.7680543 ]], dtype=float32)

In [18]:
# Get the maximum.
sf.reveal(spu_com.max([a, b], axis=0))

array([[0.743033 , 0.7274792 , 0.4724409 ],
 [0.88295954, 0.8009136 , 0.8268186 ]], dtype=float32)

### Plaintext comparison (not recommended for production use)

**`PlainComparator` is used for test only and not for production use.**

For simple local simulation, SecretFlow also provides a plaintext comparator.

In [19]:
# Create a plaintext comparator instance and alice is responsible for performing the comparison.
plain_com = sf.security.compare.PlainComparator(alice)

In [21]:
# Get the minimum.
sf.reveal(plain_com.min([a, b], axis=0))

array([[0.53867364, 0.69040346, 0.4262893 ],
 [0.7612894 , 0.5444343 , 0.7680543 ]], dtype=float32)

In [22]:
# Get the maximum.
sf.reveal(plain_com.max([a, b], axis=0))

array([[0.743033 , 0.7274792 , 0.4724409 ],
 [0.88295954, 0.8009136 , 0.8268186 ]], dtype=float32)

## Ending

In [23]:
sf.shutdown()

## Summarize

This article shows the security aggregation of SecretFlow. SecretFlow provides a variety of security aggregation, and users can implement different security policies according to their own needs.
For the plaintext aggregation and compare, it is not recommended to use it in the production environment.