Mix Federated Learning - Logistic Regression#
The following codes are demos only. It’s NOT for production due to system security concerns, please DO NOT use it directly in production.
This tutorial demonstrates how to do Logistic Regression with mix partitioned data.
The following is an example of mix partitioning data.

Secretflow supports Logistic Regression with mix data. The algorithm combines Homomorphic Encryption and secure aggregation for better security, for more details please refer to Federated Logistic Regression
Initialization#
Init secretflow.
[ ]:
import secretflow as sf
# In case you have running secretflow runtime already.
sf.shutdown()
sf.init(['alice', 'bob', 'carol', 'dave', 'eric'], address='local', num_cpus=64)
alice, bob, carol, dave, eric = (
sf.PYU('alice'),
sf.PYU('bob'),
sf.PYU('carol'),
sf.PYU('dave'),
sf.PYU('eric'),
)
Data Preparation#
We use brease canser as our dataset.
Let us build a mix partitioned data with this dataset. The partitions are as follows:
label |
feature1~feature10 |
feature11~feature20 |
feature21~feature30 |
|---|---|---|---|
alice_y0 |
alice_x0 |
bob_x0 |
carol_x |
alice_y1 |
alice_x1 |
bob_x1 |
dave_x |
alice_y2 |
alice_x2 |
bob_x2 |
eric_x |
Alice holds all label and all 1~10 features, bob holds all 11~20 fetures, carol/dave/eric hold a part of 21~30 features separately.
[2]:
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
features, label = load_breast_cancer(return_X_y=True, as_frame=True)
features.iloc[:, :] = StandardScaler().fit_transform(features)
label = label.to_frame()
feat_list = [
features.iloc[:, :10],
features.iloc[:, 10:20],
features.iloc[:, 20:],
]
alice_y0, alice_y1, alice_y2 = label.iloc[0:200], label.iloc[200:400], label.iloc[400:]
alice_x0, alice_x1, alice_x2 = (
feat_list[0].iloc[0:200, :],
feat_list[0].iloc[200:400, :],
feat_list[0].iloc[400:, :],
)
bob_x0, bob_x1, bob_x2 = (
feat_list[1].iloc[0:200, :],
feat_list[1].iloc[200:400, :],
feat_list[1].iloc[400:, :],
)
carol_x, dave_x, eric_x = (
feat_list[2].iloc[0:200, :],
feat_list[2].iloc[200:400, :],
feat_list[2].iloc[400:, :],
)
[3]:
import tempfile
tmp_dir = tempfile.mkdtemp()
def filepath(filename):
return f'{tmp_dir}/{filename}'
alice_y0_file, alice_y1_file, alice_y2_file = (
filepath('alice_y0'),
filepath('alice_y1'),
filepath('alice_y2'),
)
alice_x0_file, alice_x1_file, alice_x2_file = (
filepath('alice_x0'),
filepath('alice_x1'),
filepath('alice_x2'),
)
bob_x0_file, bob_x1_file, bob_x2_file = (
filepath('bob_x0'),
filepath('bob_x1'),
filepath('bob_x2'),
)
carol_x_file, dave_x_file, eric_x_file = (
filepath('carol_x'),
filepath('dave_x'),
filepath('eric_x'),
)
alice_x0.to_csv(alice_x0_file, index=False)
alice_x1.to_csv(alice_x1_file, index=False)
alice_x2.to_csv(alice_x2_file, index=False)
bob_x0.to_csv(bob_x0_file, index=False)
bob_x1.to_csv(bob_x1_file, index=False)
bob_x2.to_csv(bob_x2_file, index=False)
carol_x.to_csv(carol_x_file, index=False)
dave_x.to_csv(dave_x_file, index=False)
eric_x.to_csv(eric_x_file, index=False)
alice_y0.to_csv(alice_y0_file, index=False)
alice_y1.to_csv(alice_y1_file, index=False)
alice_y2.to_csv(alice_y2_file, index=False)
Now create MixDataFrame x and y for further usage. MixDataFrame is a list of HDataFrame or VDataFrame
you can read secretflow’s DataFrame for more details.
[4]:
vdf_x0 = sf.data.vertical.read_csv(
{alice: alice_x0_file, bob: bob_x0_file, carol: carol_x_file}
)
vdf_x1 = sf.data.vertical.read_csv(
{alice: alice_x1_file, bob: bob_x1_file, dave: dave_x_file}
)
vdf_x2 = sf.data.vertical.read_csv(
{alice: alice_x2_file, bob: bob_x2_file, eric: eric_x_file}
)
vdf_y0 = sf.data.vertical.read_csv({alice: alice_y0_file})
vdf_y1 = sf.data.vertical.read_csv({alice: alice_y1_file})
vdf_y2 = sf.data.vertical.read_csv({alice: alice_y2_file})
x = sf.data.mix.MixDataFrame(partitions=[vdf_x0, vdf_x1, vdf_x2])
y = sf.data.mix.MixDataFrame(partitions=[vdf_y0, vdf_y1, vdf_y2])
Training#
Construct HEU and SecureAggregator for further training.
[5]:
from typing import List
from secretflow.security.aggregation import SecureAggregator
import spu
def heu_config(sk_keeper: str, evaluators: List[str]):
return {
'sk_keeper': {'party': sk_keeper},
'evaluators': [{'party': evaluator} for evaluator in evaluators],
'mode': 'PHEU',
'he_parameters': {
'schema': 'paillier',
'key_pair': {'generate': {'bit_size': 2048}},
},
}
heu0 = sf.HEU(heu_config('alice', ['bob', 'carol']), spu.spu_pb2.FM128)
heu1 = sf.HEU(heu_config('alice', ['bob', 'dave']), spu.spu_pb2.FM128)
heu2 = sf.HEU(heu_config('alice', ['bob', 'eric']), spu.spu_pb2.FM128)
aggregator0 = SecureAggregator(alice, [alice, bob, carol])
aggregator1 = SecureAggregator(alice, [alice, bob, dave])
aggregator2 = SecureAggregator(alice, [alice, bob, eric])
Fit the model.
[6]:
import logging
logging.root.setLevel(level=logging.INFO)
from secretflow.ml.linear import FlLogisticRegressionMix
model = FlLogisticRegressionMix()
model.fit(
x,
y,
batch_size=64,
epochs=3,
learning_rate=0.1,
aggregators=[aggregator0, aggregator1, aggregator2],
heus=[heu0, heu1, heu2],
)
INFO:root:MixLr epoch 0: loss = 0.22200048124132613
INFO:root:MixLr epoch 1: loss = 0.10997288443236536
INFO:root:MixLr epoch 2: loss = 0.08508413270494121
INFO:root:MixLr epoch 3: loss = 0.07325763613227645
Prediction#
Let us do predictions with the model.
[7]:
import numpy as np
from sklearn.metrics import roc_auc_score
y_pred = np.concatenate(sf.reveal(model.predict(x)))
auc = roc_auc_score(label.values, y_pred)
acc = np.mean((y_pred > 0.5) == label.values)
print('auc:', auc, ', acc:', acc)
auc: 0.9875535119708261 , acc: 0.9384885764499121
The end#
Clean temp files.
[8]:
import shutil
shutil.rmtree(tmp_dir, ignore_errors=True)