{ "cells": [ { "cell_type": "markdown", "id": "a07356b3-744a-4319-9fec-cd62f37fa865", "metadata": {}, "source": [ "# Data Preprocessing with DataFrame" ] }, { "cell_type": "markdown", "id": "7d72fa78", "metadata": {}, "source": [ ">The following codes are demos only. It's **NOT for production** due to system security concerns, please **DO NOT** use it directly in production." ] }, { "cell_type": "markdown", "id": "e89e9dcf", "metadata": {}, "source": [ "It is recommended to use [jupyter](https://jupyter.org/) to run this tutorial." ] }, { "cell_type": "markdown", "id": "8ff05a38-2211-4240-a9db-0d79c813ab99", "metadata": {}, "source": [ "Secretflow provides a variety of preprocessing tools to process data." ] }, { "cell_type": "markdown", "id": "25ec1569-f9f7-4f27-90b8-a6c7feab28e2", "metadata": { "tags": [] }, "source": [ "## Preparation\n", "\n", "Initialize secretflow and create two parties alice and bob." ] }, { "cell_type": "markdown", "id": "83e1596d-8ca1-40ae-9681-7254c563ff7e", "metadata": {}, "source": [ "> 💡 Before using preprocessing, you may need to get to know secretflow's [DataFrame](../components/preprocessing/DataFrame.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "id": "9ad74320-2c3a-4c86-aea4-6688d96d2230", "metadata": {}, "outputs": [], "source": [ "import secretflow as sf\n", "\n", "# In case you have a running secretflow runtime already.\n", "sf.shutdown()\n", "\n", "sf.init(['alice', 'bob'], address='local')\n", "alice = sf.PYU('alice')\n", "bob = sf.PYU('bob')" ] }, { "cell_type": "markdown", "id": "94c83c7b-417a-4772-9de1-2efc589cd89f", "metadata": { "tags": [] }, "source": [ "## Data Preparation" ] }, { "cell_type": "markdown", "id": "86168ad6-2fe0-4410-b59c-fd65cbe8ea9b", "metadata": {}, "source": [ "Here we use [iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) as example data." ] }, { "cell_type": "code", "execution_count": 2, "id": "9d7d70b8-2d12-40c0-891e-d42cbd567cab", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "target | \n", "
---|---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "setosa | \n", "
1 | \n", "4.9 | \n", "NaN | \n", "1.4 | \n", "0.2 | \n", "setosa | \n", "
2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "setosa | \n", "
3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "setosa | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "setosa | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
145 | \n", "6.7 | \n", "3.0 | \n", "5.2 | \n", "2.3 | \n", "virginica | \n", "
146 | \n", "6.3 | \n", "2.5 | \n", "5.0 | \n", "1.9 | \n", "virginica | \n", "
147 | \n", "6.5 | \n", "3.0 | \n", "5.2 | \n", "2.0 | \n", "virginica | \n", "
148 | \n", "6.2 | \n", "3.4 | \n", "5.4 | \n", "2.3 | \n", "virginica | \n", "
149 | \n", "5.9 | \n", "3.0 | \n", "5.1 | \n", "1.8 | \n", "virginica | \n", "
150 rows × 5 columns
\n", "