Mix Federated Learning - Logistic Regression#

The following codes are demos only. It’s NOT for production due to system security concerns, please DO NOT use it directly in production.

This tutorial demonstrates how to do Logistic Regression with mix partitioned data.

The following is an example of mix partitioning data.

Secretflow supports Logistic Regression with mix data. The algorithm combines Homomorphic Encryption and secure aggregation for better security, for more details please refer to Federated Logistic Regression

Initialization#

Init secretflow.

[ ]:

import secretflow as sf

# In case you have running secretflow runtime already.
sf.shutdown()

sf.init(['alice', 'bob', 'carol', 'dave', 'eric'], address='local', num_cpus=64)

alice, bob, carol, dave, eric = (
    sf.PYU('alice'),
    sf.PYU('bob'),
    sf.PYU('carol'),
    sf.PYU('dave'),
    sf.PYU('eric'),
)

Data Preparation#

We use brease canser as our dataset.

Let us build a mix partitioned data with this dataset. The partitions are as follows:

label	feature1~feature10	feature11~feature20	feature21~feature30
alice_y0	alice_x0	bob_x0	carol_x
alice_y1	alice_x1	bob_x1	dave_x
alice_y2	alice_x2	bob_x2	eric_x

Alice holds all label and all 1~10 features, bob holds all 11~20 fetures, carol/dave/eric hold a part of 21~30 features separately.

[2]:

from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler

features, label = load_breast_cancer(return_X_y=True, as_frame=True)
features.iloc[:, :] = StandardScaler().fit_transform(features)
label = label.to_frame()


feat_list = [
    features.iloc[:, :10],
    features.iloc[:, 10:20],
    features.iloc[:, 20:],
]

alice_y0, alice_y1, alice_y2 = label.iloc[0:200], label.iloc[200:400], label.iloc[400:]
alice_x0, alice_x1, alice_x2 = (
    feat_list[0].iloc[0:200, :],
    feat_list[0].iloc[200:400, :],
    feat_list[0].iloc[400:, :],
)
bob_x0, bob_x1, bob_x2 = (
    feat_list[1].iloc[0:200, :],
    feat_list[1].iloc[200:400, :],
    feat_list[1].iloc[400:, :],
)
carol_x, dave_x, eric_x = (
    feat_list[2].iloc[0:200, :],
    feat_list[2].iloc[200:400, :],
    feat_list[2].iloc[400:, :],
)

[3]:

import tempfile

tmp_dir = tempfile.mkdtemp()


def filepath(filename):
    return f'{tmp_dir}/{filename}'


alice_y0_file, alice_y1_file, alice_y2_file = (
    filepath('alice_y0'),
    filepath('alice_y1'),
    filepath('alice_y2'),
)
alice_x0_file, alice_x1_file, alice_x2_file = (
    filepath('alice_x0'),
    filepath('alice_x1'),
    filepath('alice_x2'),
)
bob_x0_file, bob_x1_file, bob_x2_file = (
    filepath('bob_x0'),
    filepath('bob_x1'),
    filepath('bob_x2'),
)
carol_x_file, dave_x_file, eric_x_file = (
    filepath('carol_x'),
    filepath('dave_x'),
    filepath('eric_x'),
)

alice_x0.to_csv(alice_x0_file, index=False)
alice_x1.to_csv(alice_x1_file, index=False)
alice_x2.to_csv(alice_x2_file, index=False)
bob_x0.to_csv(bob_x0_file, index=False)
bob_x1.to_csv(bob_x1_file, index=False)
bob_x2.to_csv(bob_x2_file, index=False)
carol_x.to_csv(carol_x_file, index=False)
dave_x.to_csv(dave_x_file, index=False)
eric_x.to_csv(eric_x_file, index=False)

alice_y0.to_csv(alice_y0_file, index=False)
alice_y1.to_csv(alice_y1_file, index=False)
alice_y2.to_csv(alice_y2_file, index=False)

Now create MixDataFrame x and y for further usage. MixDataFrame is a list of HDataFrame or VDataFrame

you can read secretflow’s DataFrame for more details.

[4]:

vdf_x0 = sf.data.vertical.read_csv(
    {alice: alice_x0_file, bob: bob_x0_file, carol: carol_x_file}
)
vdf_x1 = sf.data.vertical.read_csv(
    {alice: alice_x1_file, bob: bob_x1_file, dave: dave_x_file}
)
vdf_x2 = sf.data.vertical.read_csv(
    {alice: alice_x2_file, bob: bob_x2_file, eric: eric_x_file}
)
vdf_y0 = sf.data.vertical.read_csv({alice: alice_y0_file})
vdf_y1 = sf.data.vertical.read_csv({alice: alice_y1_file})
vdf_y2 = sf.data.vertical.read_csv({alice: alice_y2_file})


x = sf.data.mix.MixDataFrame(partitions=[vdf_x0, vdf_x1, vdf_x2])
y = sf.data.mix.MixDataFrame(partitions=[vdf_y0, vdf_y1, vdf_y2])

Training#

Construct HEU and SecureAggregator for further training.

[5]:

from typing import List

from secretflow.security.aggregation import SecureAggregator
import spu


def heu_config(sk_keeper: str, evaluators: List[str]):
    return {
        'sk_keeper': {'party': sk_keeper},
        'evaluators': [{'party': evaluator} for evaluator in evaluators],
        'mode': 'PHEU',
        'he_parameters': {
            'schema': 'paillier',
            'key_pair': {'generate': {'bit_size': 2048}},
        },
    }


heu0 = sf.HEU(heu_config('alice', ['bob', 'carol']), spu.spu_pb2.FM128)
heu1 = sf.HEU(heu_config('alice', ['bob', 'dave']), spu.spu_pb2.FM128)
heu2 = sf.HEU(heu_config('alice', ['bob', 'eric']), spu.spu_pb2.FM128)
aggregator0 = SecureAggregator(alice, [alice, bob, carol])
aggregator1 = SecureAggregator(alice, [alice, bob, dave])
aggregator2 = SecureAggregator(alice, [alice, bob, eric])

Fit the model.

[6]:

import logging
logging.root.setLevel(level=logging.INFO)

from secretflow.ml.linear import FlLogisticRegressionMix

model = FlLogisticRegressionMix()

model.fit(
    x,
    y,
    batch_size=64,
    epochs=3,
    learning_rate=0.1,
    aggregators=[aggregator0, aggregator1, aggregator2],
    heus=[heu0, heu1, heu2],
)

INFO:root:MixLr epoch 0: loss = 0.22200048124132613
INFO:root:MixLr epoch 1: loss = 0.10997288443236536
INFO:root:MixLr epoch 2: loss = 0.08508413270494121
INFO:root:MixLr epoch 3: loss = 0.07325763613227645

Prediction#

Let us do predictions with the model.

[7]:

import numpy as np
from sklearn.metrics import roc_auc_score

y_pred = np.concatenate(sf.reveal(model.predict(x)))

auc = roc_auc_score(label.values, y_pred)
acc = np.mean((y_pred > 0.5) == label.values)
print('auc:', auc, ', acc:', acc)

auc: 0.9875535119708261 , acc: 0.9384885764499121

The end#

Clean temp files.

[8]:

import shutil

shutil.rmtree(tmp_dir, ignore_errors=True)