Private Set Intersection(PSI) ============================= SecretFlow SPU implements the following PSI protocols, - Semi-honest ECDH-based two-party PSI protocol [HFH99]_ - Semi-honest ECDH-based three-party PSI protocol - Semi-honest OT-based two-party PSI protocol [KKRT16]_ - Semi-honest PCG/VOLE-based two-party PSI protocol (with improved communication efficiency) [BC22]_ - Semi-honest EC-OPRF based two-party Unbalanced PSI protocol - Differentially Private (DP) PSI Protocol [DP-PSI]_ As a general rule, OT-based PSI protocols are (significantly) faster but require more communication than Diffie-Hellman-based PSI protocols. In some scenarios, communication cost is overwhelmingly more important than computation cost. ECDH-PSI (2P) ------------- The semi-honest DH-PSI protocol is due to Huberman, Franklin, and Hogg [HFH99]_, but with roots as far back as Meadows [Mea86]_. It is a semi-honest protocol that requires exponentiations in a Diffie-Hellman group proportional to the number of items in the sets. DH-PSI protocol based on the Decisional Diffie-Hellman assumption: - Agree on a group G, with a generator g. - The assumption: for random a,b,c cannot distinguish :math:`(g^a, g^b, g^{ab})` from :math:`(g^a, g^b, g^c)` Several candidate groups are widely used, such as subgroups of the multiplication group of a finite field and elliptic curve groups. In practice, carefully chosen elliptic curves like Curve25519 [Ber06]_ offer a good balance between security and performance. .. figure:: ./resources/dh_psi.svg Note that at the beginning of ECDH-PSI protocol, we assume the input data from both Alice and Bob are shuffled. Protocol: 1. For each element :math:`x_i` in its set, Alice applies the hash function and then exponentiates it using its key :math:`\alpha`, thus computing :math:`{H(x_i)}^\alpha` . Alice sends :math:`{\{\,{H(x_i)}^\alpha\}\,}_{i=1}^{n_1}` to Bob. 2. For each element :math:`{H(x_i)}^\alpha` received from Alice in the previous step, Bob exponentiates it using its key :math:`\beta`, computing :math:`{H(x_i)}^{\alpha\beta}`. Bob sends :math:`{\{\,{H(x_i)}^{\alpha\beta}\}\,}_{i=1}^{n_1}` to Alice. 3. For each element :math:`y_i` in its set, Bob applies the hash function and then exponentiates it using its key :math:`\beta`, thus computing :math:`{H(y_i)}^\beta` . Bob sends the set :math:`{\,\{\,{H(y_i)}^\beta\}\,}_{i=1}^{n_2}` to Alice. 4. For each element :math:`{H(y_i)}^\beta` received from Bob in the previous step, Alice exponentiates it using its key :math:`\alpha`, computing :math:`{H(y_i)}^{\beta\alpha}` . 5. Alice compares two set :math:`{\{\,{H(x_i)}^{\alpha\beta}\}\,}_{i=1}^{n_1}` and :math:`{\{\,{H(y_i)}^{\beta\alpha}\}\,}_{i=1}^{n_2}` and gets intersection. The Elliptic Curve groups, supported in secretflow SPU PSI moudule. +-------------+------------------------+------------------------------------------------------+ | EC group | Reference | CryptoLib | +=============+========================+======================================================+ | Curve25519 | [Ber06]_ | `LibSoidum `_ | | | +------------------------------------------------------+ | | | [ipp-crypto]_ (Intel® CPU support AVX-512 IFMA) | +-------------+------------------------+------------------------------------------------------+ | Secp256k1 | [SEC2-v2]_ | `OpenSSL `_ | +-------------+------------------------+------------------------------------------------------+ | SM2 | GBT.32918.1-2016 | `OpenSSL `_ | | +------------------------+ | | | ISO/IEC 14888-3:2018 | | +-------------+------------------------+------------------------------------------------------+ | FourQ | [FourQ]_ | `FourQlib `_ | +-------------+------------------------+------------------------------------------------------+ ECDH-PSI (3P) ------------- We implement our own three-party PSI protocol based on ECDH. Note that our implementation has known leakage, please use at your own risk. Assume Alice, Bob, Charlie (receiver) want to perform 3P PSI, in addition to the final output, our protocol leaks the intersection size of Alice's data and Bob's data to Charlie. .. figure:: ./resources/dh_psi_3p.svg Note that at the beginning of ECDH-PSI protocol, we assume the input data from both Alice and Charlie are shuffled (It's not necessary to shuffle Bob's set). Protocol: 1. For i-th element in its set, Alice calculates :math:`H(x_i)^\alpha` and sends to Bob. 2. For i-th element, Bob calculates :math:`H(x_i)^{\alpha\beta}` and :math:`H(y_i)^\beta`, then random shuffles and sends them to Alice. 3. For i-th element, Alice calculates :math:`H(y_i)^{\alpha\beta}` and gets the intersection of :math:`H(x_i)^{\alpha\beta} \cap H(y_i)^{\alpha\beta}` (we denote the intersection as :math:`I^{\alpha\beta}`), then sends :math:`I^{\alpha\beta}` to Charlie. 4. For i-th element, Charlie sends :math:`H(z_i)^{\gamma}` to Bob, Bob calculates and sends to Alice :math:`H(z_i)^{\beta\gamma}`, finally Alice calculates and sends to Charlie :math:`H(z_i)^{\alpha\beta\gamma}`. 5. Charlie calculates :math:`I^{\alpha\beta\gamma}` and compares :math:`I^{\alpha\beta\gamma}` with :math:`H(z_i)^{\alpha\beta\gamma}`. KKRT16-PSI ---------- [KKRT16]_ is semi-honest OT-based PSI, based on OT Extension, BaRK-OPRF and CuckooHash. [KKRT16]_ is the first PSI protocol requiring only one minute for the case of larger sets ( :math:`2^{24}` items each) of long strings (128 bits). We use 3-way stash-less CuckooHash proposed in [PSZ18]_. .. figure:: ./resources/kkrt16_psi.png Protocol: 1. Sender and Receiver Agree on CuckooHash :math:`h_1,h_2,h_3: {\{0,1\}\,}^{*} \rightarrow [m]` 2. Receiver insert each x into bin :math:`h_1(x)`, :math:`h_2(x)` or :math:`h_3(x)` 3. Sender insert each y into bin :math:`h_1(y)`, :math:`h_2(y)` and :math:`h_3(y)` 4. Run BaRK-OPRF, Receiver get :math:`F_{s,k_i}(x)`,Sender get :math:`F_{s,k_i}(y)`, for :math:`bin_i` 5. Sender sends all :math:`{F_{s,k_i}(y)}` values to Receiver 6. Receiver compares two BaRK-OPRFs sets and gets intersection. BC22 PCG-PSI ------------ Pseudorandom Correlation Generator (PCG), is a primitive introduced in the work of Boyle et al. [BCG+19b]_, [BCGI18]_, [SGRR19]_, [BCG+19a]_, [CIK+20]_. The goal of PCG is to compress long sources of correlated randomness without violating security. Boyle et al. have designed multiple concretely efficient PCGs for specific correlations, such as vector oblivious linear evaluation (VOLE) or batch oblivious linear evaluation (BOLE). These primitives are at the heart of modern secure computation protocols with low communication overhead.The VOLE functionality allows a receiver to learn a secret linear combination of two vectors held by a sender and it was constructed (with sublinear communication) under variants of the syndrome decoding assumption. [BC22]_ use PCG speeding up private set intersection protocols, minimizing computation and communication. We implement semi-honest version psi in [BC22]_ and use PCG/VOLE from [WYKW21]_ . [BC22]_ PSI protocol require only 30 seconds for the case of larger sets ( :math:`2^{24}` items each) of long strings (128 bits), and reduce 1/3 communication than [KKRT16]_. .. figure:: ./resources/pcg_psi.svg 1. Sender and Receiver agree on :math:`(3,2)`-Generalized CuckooHash :math:`h_1,h_2: {\{0,1\}\,}^{*} \rightarrow [N]` 2. Receiver insert each x into bin :math:`h_1(x)` or :math:`h_2(x)` 3. Sender insert each y into bin :math:`h_1(y)` and :math:`h_2(y)` 4. Run PCG/VOLE from [WYKW21]_, :math:`w_i = \Delta * u_i + v_i`, Sender get :math:`w_i` and :math:`\Delta`, Receiver get :math:`u_i` and :math:`v_i`, for each :math:`bin_i` 5. Receiver send Masked Bin Polynomial Coefficients to Sender, and receive BaRK-OPRF values 6. Sender sends all BaRK-OPRF values for each :math:`{\{y_i\}\,}_{i=1}^{n_2}` to Receiver 7. Receiver compares two BaRK-OPRFs sets and gets intersection. Unbalanced PSI -------------- Ecdh-OPRF based PSI >>>>>>>>>>>>>>>>>>> [RA18]_ section 3 introduce Basic Unbalanced PSI(Ecdh-OPRF based) protocol proposed in [BBCD+11]_ that relaxes the security of the [JL10]_ to be secure against semi-honest adversaries. The protocol has two phases, the preprocessing phase and the online phase. The authors introduced many optimizations to push as much computation and communication cost to the preprocessing phase as possible An Oblivious Pseudorandom Function (OPRF) is a two-party protocol between client and server for computing the output of a Pseudorandom Function (PRF). [draft-irtf-cfrg-voprf-10]_ specifies OPRF, VOPRF, and POPRF protocols built upon prime-order groups. .. figure:: ./resources/ecdh_oprf_psi.png - Offline Phase 1. For each element :math:`y_i` in its set, Bob applies PRF using private key :math:`\beta`, i.e. computing :math:`H_2(y_i,{H_1(y_i)}^\beta)` . 2. Bob sends :math:`{\,\{H_2(y_i,{H_1(y_i)}^\beta)\}\,}_{i=1}^{n_2}` to Alice in shuffled order - Online Phase 1. For each element :math:`x_i` in its set, Alice applies the hash function and then exponentiates it using its blind key :math:`r_i`, thus computing :math:`{H_1(x_i)}^{r_i}`. Alice sends :math:`{\,\{\,{H_1(x_i)}^{r_i}\,\}\,}_{i=1}^{n_1}` to Bob. 2. For each element :math:`H_1(x_i)^{r_i}` received from Alice in the previous step, Bob exponentiates it using its key :math:`\beta`, computing :math:`{H_1(x_i)}^{r_i\beta}`. Bob sends :math:`{\,\{\,{H_1(x_i)}^{\,{r_i}\,\beta}\,\}\,}_{i=1}^{n_1}` to Alice. 3. Alice receive :math:`{\,\{\,{H_1(x_i)}^{r_i\beta}\}\,}_{i=1}^{n_1}` from Bob, and unblind it use :math:`r_i`, Get :math:`{\,\{\,{H_1(x_i)}^\beta\}\,}_{i=1}^{n_1}`, compute OPRF :math:`{\,\{H_2(x_i,{H_1(x_i)}^\beta)\}\,}_{i=1}^{n_1}`. 4. Alice compares two sets :math:`{\,\{H_2(x_i,{H_1(x_i)}^\beta)\}\,}_{i=1}^{n_1}` and :math:`{\,\{H_2(y_i,{H_1(y_i)}^\beta)\}\,}_{i=1}^{n_2}` and gets intersection. Differentially Private PSI -------------------------- We also implement a Differentially Private (DP) Private Set Intersection (PSI) Protocol. Our implementaion bases on ECDH-PSI, and provides: - Differentially private PSI results. This feature is currently under test, please use at your own risk! Why PSI with differentially private results? If we want a scheme that protects both the private inputs and output privacy, an ideal way is to use `circuit PSI`, which is a typical PSI variant that allows secure computation (e.g. MPC or HE) on the PSI result without revealing it. `PSTY19 `_ However those protocols are expensive in terms of efficiency. DP-PSI is a way of utilising the up-sampling and sub-sampling mechanism to add calibrated noises to the PSI results, without revealing its concise value. The protocol is listed below, assume Alice has a (hashed and shuffled) set :math:`X` and Bob has a (hashed and shuffled) :math:`Y`. .. figure:: ./resources/dp_psi.png Note that we use encrypt to denote the process of calculating :math:`y\gets x^a`. Protocol: 1. Alice and Bob first encrypts their own dataset, and gets :math:`X^a` and :math:`Y^b` separately. 2. Alice sends :math:`X^a` to Bob. 3. Bob performs random subsampling on :math:`Y^b`, gets :math:`Y_*^b` and send to Alice. In the meantime, on receiving :math:`X^a` from Alice, Bob reencrypts it with :math:`b`, gets :math:`X^{ab}`. Then it samples a random permutation :math:`\pi` to permute Alice's set, and send permuted :math:`\pi(X^{ab})` back to Alice. 4. On receving :math:`Y_*^b` and :math:`\pi(X^{ab})` from Bob, Alice reencrypts :math:`Y_*^b` and gets :math:`Y_*^{ab}`, then calculate the intersection :math:`I_*^{ab}\gets\pi(X^{ab})\cap Y_*^{ab}`. 5. Alice randomly subsamples the intersection, gets :math:`I_{**}^{ab}`, and then find-out their corresponding index in :math:`Y_*^b`. Then randomly add non-intersection index to this set. 6. Alice sends the index set to Bob, then Bob reveal the final results. In the end, this scheme ensures that the receiver (Bob) only learns the noised intersection, without the ability of pointing out whether an element is in the actual set intersection or not. Note that multiple invocation of DP-PSI inevitably weaken the privacy protection, therefore, we strongly suggest that user should implement a protection mechanism to prevent multiple DP-PSI executions on the same input value. +---------------------------+--------+---------+---------+---------+-----------+ | Intel(R) Xeon(R) Platinum | 2^20 | 2^21 | 2^22 | 2^23 | 2^24 | +===========================+========+=========+=========+=========+===========+ | DP-PSI | 9.806s | 20.134s | 42.067s | 86.580s | 170.359s | +---------------------------+--------+---------+---------+---------+-----------+ For DP, our default privacy protection strength is :math:`\epsilon=3`. For more details, please refer to the original paper: [DP-PSI]_ Tutorial -------- Please check :ref:`/tutorial/PSI_On_SPU.ipynb` for details. Reference ------------ .. [BBCD+11] Baldi, P., Baronio, R., Cristofaro, E.D., Gasti, P., Tsudik, G.: Countering GATTACA: Efficient and Secure Testing of Fully-sequenced Human Genomes. In: ACM Conference on Computer and Communications Security. pp. 691–702. ACM (2011) .. [BCGI18] E. Boyle, G. Couteau, N. Gilboa, and Y. Ishai. Compressing vector OLE. In ACM CCS 2018, pages 896-912. ACM Press, October 2018. .. [BCG+19a] E. Boyle, G. Couteau, N. Gilboa, Y. Ishai, L. Kohl, P. Rindal, and P. Scholl. Efficient two-round OT extension and silent non-interactive secure computation. In ACM CCS 2019, pages 291–308. ACM Press, November 2019. .. [BCG+19b] E. Boyle, G. Couteau, N. Gilboa, Y. Ishai, L. Kohl, P. Rindal, and P. Scholl. Efficient two-round OT extension and silent non-interactive secure computation. In ACM CCS 2019, pages 291–308. ACM Press, November 2019. .. [BC22] Private Set Intersection from Pseudorandom Correlation Generators .. [Ber06] Daniel J. Bernstein. Curve25519: new diffie-hellman speed records. In In Public Key Cryptography (PKC), Springer-Verlag LNCS 3958, page 2006, 2006. (Cited on page 4.) .. [CIK+20] G. Couteau, Y. Ishai, L. Kohl, E. Boyle, P. Scholl, and N. Gilboa. Efficient pseudorandom correlation generators from ring-lpn. Springer-Verlag, 2020. .. [DP-PSI] Differentially-Private PSI https://arxiv.org/pdf/2208.13249.pdf .. [FourQ] Costello, C., Longa, P.: Fourq: four-dimensional decompositions on a q-curve over the mersenne prime. Cryptology ePrint Archive, Report 2015/565 (2015), https://eprint.iacr.org/2015/565 .. [HFH99] Bernardo A. Huberman, Matt Franklin, and Tad Hogg. Enhancing privacy and trust in electronic communities. In ACM CONFERENCE ON ELECTRONIC COMMERCE. ACM, 1999. .. [ipp-crypto] https://github.com/intel/ipp-crypto/ .. [JL10] Jarecki, S., Liu, X.: Fast Secure Computation of Set Intersection. In: SCN. LNCS, vol. 6280, pp. 418–435. Springer (2010) .. [KKRT16] V. Kolesnikov, R. Kumaresan, M. Rosulek, and N. Trieu. Efficient batched oblivious PRF with applications to private set intersection. In ACM CCS 2016, pages 818-829. ACM Press, October 2016. .. [Mea86] C. Meadows. A more efficient cryptographic matchmaking protocol for use in the absence of a continuously available third party. In 1986 IEEE Symposium on Security and Privacy, pages 134-134, April 1986. .. [PSZ18] B. Pinkas, T. Schneider, and M. Zohner. Scalable private set intersection based on ot extension. ACM Transactions on Privacy and Security (TOPS), 21(2):1-35, 2018. .. [RA18] Resende, A.C.D., Aranha, D.F.: Faster unbalanced private set intersection. In: Meiklejohn, S., Sako, K. (eds.) FC2018. LNCS, vol. 10957, pp. 203{221. Springer, Heidelberg (Feb / Mar 2018) .. [SEC2-v2] Standards for Efficient Cryptography (SEC) .. [SGRR19] P. Schoppmann, A. Gascón, L. Reichert, and M. Raykova. Distributed vector-OLE: Improved constructions and implementation. In ACM CCS 2019, pages 1055-1072. ACM Press, November 2019. .. [WYKW21] C. Weng, K. Yang, J. Katz, and X. Wang. Wolverine: fast, scalable, and communication-efficient zero-knowledge proofs for boolean and arithmetic circuits. In 2021 IEEE Symposium on Security and Privacy (SP), pages 1074-1091. IEEE, 2021. .. [draft-irtf-cfrg-voprf-10] Oblivious Pseudorandom Functions (OPRFs) using Prime-Order Groups. https://www.ietf.org/archive/id/draft-irtf-cfrg-voprf-10.html