{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "9d11321e", "metadata": {}, "source": [ "# Horizontally Federated XGBoost" ] }, { "cell_type": "markdown", "id": "079c8a84", "metadata": {}, "source": [ ">The following codes are demos only. It's **NOT for production** due to system security concerns, please **DO NOT** use it directly in production." ] }, { "cell_type": "markdown", "id": "89135d8c", "metadata": {}, "source": [ "In this tutorial, we will learn how to use `SecretFlow` to train tree models for horizontal federation. Secretflow provides `tree modeling` capabilities for horizontal scenarios(`SFXgboost`), The usage of `SFXgboost` is similar to `XGBoost`, you can easily convert your existing XGBoost program into a federated model for `SecretFlow`." ] }, { "cell_type": "markdown", "id": "a7190826", "metadata": {}, "source": [ "## Xgboost" ] }, { "cell_type": "markdown", "id": "19cb8b99", "metadata": {}, "source": [ "XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework\n", "\n", "official tutorial [XGBoost tutorials](https://xgboost.readthedocs.io/en/latest/tutorials/index.html)\n" ] }, { "cell_type": "markdown", "id": "dafa8ea9", "metadata": {}, "source": [ "### prepare secretflow devices " ] }, { "cell_type": "code", "execution_count": 1, "id": "c40b050a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-08-19 13:47:10.795519: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n" ] } ], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import secretflow as sf\n", "# In case you have a running secretflow runtime already.\n", "sf.shutdown()\n", "\n", "sf.init(['alice', 'bob', 'charlie'], address='local')\n", "alice, bob, charlie = sf.PYU('alice'), sf.PYU('bob'), sf.PYU('charlie')" ] }, { "cell_type": "markdown", "id": "b5295255", "metadata": {}, "source": [ "### XGBoost Example" ] }, { "cell_type": "code", "execution_count": 2, "id": "7ca2f164", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", " from pandas import MultiIndex, Int64Index\n", "/home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/data.py:262: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", " elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "erythema int64\n", "scaling int64\n", "definite_borders int64\n", "itching int64\n", "koebner_phenomenon int64\n", "polygonal_papules int64\n", "follicular_papules int64\n", "oral_mucosal_involvement int64\n", "knee_and_elbow_involvement int64\n", "scalp_involvement int64\n", "family_history int64\n", "melanin_incontinence int64\n", "eosinophils_in_the_infiltrate int64\n", "pnl_infiltrate int64\n", "fibrosis_of_the_papillary_dermis int64\n", "exocytosis int64\n", "acanthosis int64\n", "hyperkeratosis int64\n", "parakeratosis int64\n", "clubbing_of_the_rete_ridges int64\n", "elongation_of_the_rete_ridges int64\n", "thinning_of_the_suprapapillary_epidermis int64\n", "spongiform_pustule int64\n", "munro_microabcess int64\n", "focal_hypergranulosis int64\n", "disappearance_of_the_granular_layer int64\n", "vacuolisation_and_damage_of_basal_layer int64\n", "spongiosis int64\n", "saw-tooth_appearance_of_retes int64\n", "follicular_horn_plug int64\n", "perifollicular_parakeratosis int64\n", "inflammatory_monoluclear_inflitrate int64\n", "band-like_infiltrate int64\n", "age float64\n", "class int64\n", "dtype: object\n", "[0]\ttrain-merror:0.01913\n", "[1]\ttrain-merror:0.01366\n", "[2]\ttrain-merror:0.01366\n", "[3]\ttrain-merror:0.00820\n" ] } ], "source": [ "import xgboost as xgb\n", "import pandas as pd\n", "from secretflow.utils.simulation.datasets import dataset\n", "\n", "df = pd.read_csv(dataset('dermatology'))\n", "df.fillna(value=0)\n", "print(df.dtypes)\n", "y = df['class']\n", "y = y - 1\n", "x = df.drop(columns=\"class\")\n", "dtrain = xgb.DMatrix(x, y)\n", "dtest = dtrain\n", "params = {\n", " 'max_depth': 4,\n", " 'objective': 'multi:softmax',\n", " 'min_child_weight': 1,\n", " 'max_bin': 10,\n", " 'num_class': 6,\n", " 'eval_metric': 'merror',\n", "}\n", "num_round = 4\n", "watchlist = [(dtrain, 'train')]\n", "bst = xgb.train(params, dtrain, num_round, evals=watchlist, early_stopping_rounds=2)\n" ] }, { "cell_type": "markdown", "id": "8c23a85f", "metadata": {}, "source": [ "### Then, How to do federated xgboost in secretflow?\n", "1. Use federate Binning method based on iteration to calculate the global bucket information combined with the data of all sides, which was used as the candidate to enter the subsequent construction procedure\n", "2. The data is input into each Client XGBoost engine to calculate G & H\n", "3. Train federated boosting model \n", " 1) Data is reassigned to the node to be split \n", " 2) The sum of grad and the sum of hess are calculated according to the previously calculated binning buckets \n", " 3) Send the sum of grad and the sum of hess to server,server use secure aggregation to produce global summary,then choose best split point,Send best split info back to clients. \n", " 4) Clients Updates local model \n", "4. Finish training,and save model" ] }, { "cell_type": "markdown", "id": "3be5325d", "metadata": {}, "source": [ "Create 3 entities in the Secretflow environment [Alice, Bob, Charlie] Where 'Alice' and 'Bob' are clients, and `Charlie` is the server,then you can happily start `Federate Boosting`." ] }, { "cell_type": "markdown", "id": "67d6f007", "metadata": {}, "source": [ "### Prepare Data" ] }, { "cell_type": "code", "execution_count": 3, "id": "54ac3a76", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(pid=3817970)\u001b[0m 2022-08-19 13:47:17.904107: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817969)\u001b[0m 2022-08-19 13:47:17.904107: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n" ] } ], "source": [ "from secretflow.data.horizontal import read_csv\n", "from secretflow.security.aggregation import SecureAggregator\n", "from secretflow.security.compare import SPUComparator\n", "from secretflow.utils.simulation.datasets import load_dermatology\n", "\n", "aggr = SecureAggregator(charlie, [alice, bob])\n", "spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))\n", "comp = SPUComparator(spu)\n", "data = load_dermatology(parts=[alice, bob], aggregator=aggr, comparator=comp)\n", "data.fillna(value=0, inplace=True)\n" ] }, { "cell_type": "markdown", "id": "baffdd20", "metadata": {}, "source": [ "### Prepare Params" ] }, { "cell_type": "code", "execution_count": 4, "id": "2d51d646", "metadata": {}, "outputs": [], "source": [ "params = {\n", " # XGBoost parameter tutorial\n", " # https://xgboost.readthedocs.io/en/latest/parameter.html\n", " 'max_depth': 4, # max depth\n", " 'eta': 0.3, # learning rate\n", " 'objective': 'multi:softmax', # objection function,support \"binary:logistic\",\"reg:logistic\",\"multi:softmax\",\"multi:softprob\",\"reg:squarederror\"\n", " 'min_child_weight': 1, # The minimum value of weight\n", " 'lambda': 0.1, # L2 regularization term on weights (xgb's lambda)\n", " 'alpha': 0, # L1 regularization term on weights (xgb's alpha)\n", " 'max_bin': 10, # Max num of binning\n", " 'num_class': 6, # Only required in multi-class classification\n", " 'gamma': 0, # Same to min_impurity_split,The minimux gain for a split\n", " 'subsample': 1.0, # Subsample rate by rows\n", " 'colsample_bytree': 1.0, # Feature selection rate by tree\n", " 'colsample_bylevel': 1.0, # Feature selection rate by level\n", " 'eval_metric': 'merror', # supported eval metric:\n", " # 1. rmse\n", " # 2. rmsle\n", " # 3. mape\n", " # 4. logloss\n", " # 5. error\n", " # 6. error@t\n", " # 7. merror\n", " # 8. mlogloss\n", " # 9. auc\n", " # 10. aucpr\n", " # Special params in SFXgboost\n", " # Required\n", " 'hess_key': 'hess', # Required, Mark hess columns, optionally choosing a column name that is not in the data set\n", " 'grad_key': 'grad', # Required,Mark grad columns, optionally choosing a column name that is not in the data set\n", " 'label_key': 'class', # Required,ark label columns, optionally choosing a column name that is not in the data set\n", "}\n" ] }, { "cell_type": "markdown", "id": "57bf92f0", "metadata": {}, "source": [ "### Create SFXgboost" ] }, { "cell_type": "code", "execution_count": 6, "id": "4bde4412", "metadata": {}, "outputs": [], "source": [ "from secretflow.ml.boost.homo_boost import SFXgboost\n", "\n", "bst = SFXgboost(server=charlie, clients=[alice, bob])\n" ] }, { "cell_type": "markdown", "id": "5c6714ee", "metadata": {}, "source": [ "run SFXgboost" ] }, { "cell_type": "code", "execution_count": 7, "id": "64d5e208", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(_run pid=3817967)\u001b[0m 2022-08-19 13:48:05.541675: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(_run pid=3817967)\u001b[0m 2022-08-19 13:48:07,217,217 WARNING [xla_bridge.py:backends:265] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)\n", "\u001b[2m\u001b[36m(_run pid=3817957)\u001b[0m 2022-08-19 13:48:07.943512: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817961)\u001b[0m 2022-08-19 13:48:08.108831: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817959)\u001b[0m 2022-08-19 13:48:08.068793: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817954)\u001b[0m 2022-08-19 13:48:08.108831: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817964)\u001b[0m 2022-08-19 13:48:08.108831: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817963)\u001b[0m 2022-08-19 13:48:08.111619: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817956)\u001b[0m 2022-08-19 13:48:08.108832: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817955)\u001b[0m 2022-08-19 13:48:08.127188: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817960)\u001b[0m 2022-08-19 13:48:08.157280: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817962)\u001b[0m 2022-08-19 13:48:08.127188: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(pid=3817968)\u001b[0m 2022-08-19 13:48:08.140477: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(_run pid=3817957)\u001b[0m 2022-08-19 13:48:09,720,720 WARNING [xla_bridge.py:backends:265] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)\n", "\u001b[2m\u001b[36m(pid=3817954)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(pid=3817954)\u001b[0m from pandas import MultiIndex, Int64Index\n", "\u001b[2m\u001b[36m(_run pid=3817964)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(_run pid=3817964)\u001b[0m from pandas import MultiIndex, Int64Index\n", "\u001b[2m\u001b[36m(pid=3817963)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(pid=3817963)\u001b[0m from pandas import MultiIndex, Int64Index\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/data.py:262: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/data.py:262: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/data.py:262: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/data.py:262: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/data.py:262: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m /home/fengjun.feng/miniconda3/envs/py3.8/lib/python3.8/site-packages/xgboost/data.py:262: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m elif isinstance(data.columns, (pd.Int64Index, pd.RangeIndex)):\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m [0]\ttrain-merror:0.01366\tvalid-merror:0.01366\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m [0]\ttrain-merror:0.01366\tvalid-merror:0.01366\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m [0]\ttrain-merror:0.01366\tvalid-merror:0.01366\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m [1]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m [1]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m [1]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m [2]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m [2]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m [2]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m [3]\ttrain-merror:0.01093\tvalid-merror:0.01093\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m [3]\ttrain-merror:0.01093\tvalid-merror:0.01093\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m [3]\ttrain-merror:0.01093\tvalid-merror:0.01093\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m [4]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m [4]\ttrain-merror:0.00820\tvalid-merror:0.00820\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m [4]\ttrain-merror:0.00820\tvalid-merror:0.00820\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(HomoBooster pid=3817964)\u001b[0m [5]\ttrain-merror:0.00546\tvalid-merror:0.00546\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817954)\u001b[0m [5]\ttrain-merror:0.00546\tvalid-merror:0.00546\n", "\u001b[2m\u001b[36m(HomoBooster pid=3817963)\u001b[0m [5]\ttrain-merror:0.00546\tvalid-merror:0.00546\n" ] } ], "source": [ "bst.train(data, data, params=params, num_boost_round=6)\n" ] }, { "cell_type": "markdown", "id": "56f1ee3c", "metadata": {}, "source": [ "Now our Federated XGBoost training is complete, where the BST is the federated Boost object" ] }, { "cell_type": "markdown", "id": "fcebea9b", "metadata": {}, "source": [ "## Conclusion\n", "* This tutorial introduces how to use tree models for training etc\n", "* SFXgboost encapsulates the logic of the federated subtree model. Sfxgboost trained models remain compatible with XGBoost, and we can directly use the existing infrastructure for online prediction and so on.\n", "* Next, you can try SFXgboost on your data, just need to follow this tutorial\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.13 ('sf')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "vscode": { "interpreter": { "hash": "db45a4cb4cd37a8de684dfb7fcf899b68fccb8bd32d97c5ad13e5de1245c0986" } } }, "nbformat": 4, "nbformat_minor": 5 }