{ "cells": [ { "cell_type": "markdown", "id": "eb4f5fec-1a5c-4b6f-b583-fb369472e94b", "metadata": {}, "source": [ "# PSI On SPU" ] }, { "cell_type": "markdown", "id": "4172c7c4", "metadata": {}, "source": [ ">The following codes are demos only. It's **NOT for production** due to system security concerns, please **DO NOT** use it directly in production." ] }, { "cell_type": "markdown", "id": "f9b632d8-b12f-44a1-8a75-5d9c0a704a38", "metadata": {}, "source": [ "PSI([Private Set Intersection](https://en.wikipedia.org/wiki/Private_set_intersection)) is a cryptographic technique that allows two parties holding sets to compare encrypted versions of these sets in order to compute the intersection. In this scenario, neither party reveals anything to the counterparty except for the elements in the intersection.\n", "\n", "In SecretFlow, SPU device supports three PSI protocol:\n", "\n", "- [ECDH](https://ieeexplore.ieee.org/document/6234849/):semi-honest, based on public key encryption, suitable for small datasets.\n", "- [KKRT](https://eprint.iacr.org/2016/799.pdf):semi-host, based on cuckoo hashing and OT extension, suitable for large datasets.\n", "- [BC22PCG](https://eprint.iacr.org/2022/334): semi-host, psi from pseudorandom correlation generators.\n", "\n", "Before we start, we need to initialize the environment. The following three nodes `alice`, `bob`, and `carol` are created on a single machine to simulate multiple participants." ] }, { "cell_type": "code", "execution_count": 1, "id": "3d7c4fa2-ea20-4e0d-b1ad-648cce23e729", "metadata": {}, "outputs": [], "source": [ "import secretflow as sf\n", "\n", "# In case you have a running secretflow runtime already.\n", "sf.shutdown()\n", "\n", "sf.init(['alice', 'bob', 'carol'], address='local')" ] }, { "cell_type": "markdown", "id": "00a798bd", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "5ed0a08b-3aa4-4fa6-9e1d-0caba207bdf5", "metadata": {}, "source": [ "## Preparing dataset" ] }, { "cell_type": "markdown", "id": "b4c16f07-1c67-4bad-af70-d8a4fe9266f3", "metadata": {}, "source": [ "First, we need a dataset for constructing vertical partitioned scenarios. For simplicity, we use [iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) dataset here. We add two columns to it for subsequent single-column and multi-column intersection demonstrations\n", "\n", "- uid:Sample unique ID.\n", "- month:Simulate a scenario where samples are generated monthly. The first 50% of the samples are generated in January, and the last 50% of the samples are generated in February." ] }, { "cell_type": "code", "execution_count": 2, "id": "31f0a010-0a2e-4ee2-996a-169d7cb2731d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "uid | \n", "month | \n", "
---|---|---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "0 | \n", "Jan | \n", "
1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "1 | \n", "Jan | \n", "
2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "2 | \n", "Jan | \n", "
3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "3 | \n", "Jan | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "4 | \n", "Jan | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
145 | \n", "6.7 | \n", "3.0 | \n", "5.2 | \n", "2.3 | \n", "145 | \n", "Feb | \n", "
146 | \n", "6.3 | \n", "2.5 | \n", "5.0 | \n", "1.9 | \n", "146 | \n", "Feb | \n", "
147 | \n", "6.5 | \n", "3.0 | \n", "5.2 | \n", "2.0 | \n", "147 | \n", "Feb | \n", "
148 | \n", "6.2 | \n", "3.4 | \n", "5.4 | \n", "2.3 | \n", "148 | \n", "Feb | \n", "
149 | \n", "5.9 | \n", "3.0 | \n", "5.1 | \n", "1.8 | \n", "149 | \n", "Feb | \n", "
150 rows × 6 columns
\n", "\n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "uid | \n", "month | \n", "
---|---|---|---|---|---|---|
0 | \n", "6.3 | \n", "3.3 | \n", "6.0 | \n", "2.5 | \n", "100 | \n", "Feb | \n", "
1 | \n", "5.8 | \n", "2.7 | \n", "5.1 | \n", "1.9 | \n", "101 | \n", "Feb | \n", "
2 | \n", "7.1 | \n", "3.0 | \n", "5.9 | \n", "2.1 | \n", "102 | \n", "Feb | \n", "
3 | \n", "6.3 | \n", "2.9 | \n", "5.6 | \n", "1.8 | \n", "103 | \n", "Feb | \n", "
4 | \n", "6.5 | \n", "3.0 | \n", "5.8 | \n", "2.2 | \n", "104 | \n", "Feb | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
101 | \n", "5.6 | \n", "2.7 | \n", "4.2 | \n", "1.3 | \n", "94 | \n", "Feb | \n", "
102 | \n", "5.7 | \n", "2.9 | \n", "4.2 | \n", "1.3 | \n", "96 | \n", "Feb | \n", "
103 | \n", "6.2 | \n", "2.9 | \n", "4.3 | \n", "1.3 | \n", "97 | \n", "Feb | \n", "
104 | \n", "5.1 | \n", "2.5 | \n", "3.0 | \n", "1.1 | \n", "98 | \n", "Feb | \n", "
105 | \n", "5.7 | \n", "2.8 | \n", "4.1 | \n", "1.3 | \n", "99 | \n", "Feb | \n", "
106 rows × 6 columns
\n", "