secretflow.ml.boost.sgb_v#

Classes:

SgbModel(label_holder, objective, base)

Sgboost Model & predict.

Sgb(heu)

This class provides both classification and regression tree boosting (also known as GBDT, GBM) for vertical split dataset setting by using secure boost.

SGBFactory()

You can customize your own boosting algorithms which are based on any combination of ideas of secureboost, XGB, and lightGBM.

class secretflow.ml.boost.sgb_v.SgbModel(label_holder: PYU, objective: RegType, base: float)[source]#

Bases: object

Sgboost Model & predict. It is a distributed tree in essence.

Methods:

__init__(label_holder, objective, base)

param label_holder:

PYU device, label holder's PYU device.

predict(dtrain[, to_pyu])

predict on dtrain with this model.

to_dict()

save_model(device_path_dict[, ...])

Save model to different parties

__init__(label_holder: PYU, objective: RegType, base: float) None[source]#
Parameters:
  • label_holder – PYU device, label holder’s PYU device.

  • objective – RegType, specifies doing logistic regression or regression

  • base – float

predict(dtrain: Union[FedNdarray, VDataFrame], to_pyu: Optional[PYU] = None) Union[PYUObject, FedNdarray][source]#

predict on dtrain with this model.

Parameters:
  • dtrain – [FedNdarray, VDataFrame] vertical split dataset.

  • to – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in plaintext and save as PYUObject in label_holder device.

Returns:

Pred values store in pyu object or FedNdarray.

to_dict() Dict[source]#
save_model(device_path_dict: Dict, wait_before_proceed=True)[source]#

Save model to different parties

Parameters:
  • device_path_dict (Dict) – {device: a path to save model for the device}.

  • wait_before_process (bool) – if False, handle will be returned, to allow user to wait for model write to finish (and do something else in the meantime).

class secretflow.ml.boost.sgb_v.Sgb(heu: HEU)[source]#

Bases: object

This class provides both classification and regression tree boosting (also known as GBDT, GBM) for vertical split dataset setting by using secure boost.

SGB is short for SecureBoost. Compared to its safer counterpart SS-XGB, SecureBoost focused on protecting label holder.

Parameters:

heu – secret device running homomorphic encryptions

Methods:

__init__(heu)

train(params, dtrain, label[, audit_paths])

train on dtrain and label.

__init__(heu: HEU) None[source]#
train(params: Dict, dtrain: Union[FedNdarray, VDataFrame], label: Union[FedNdarray, VDataFrame], audit_paths: Dict = {}) SgbModel[source]#

train on dtrain and label.

Parameters:
  • params – Dict booster params, details are as follows

  • dtrain – {FedNdarray, VDataFrame} vertical split dataset.

  • label – {FedNdarray, VDataFrame} label column.

  • audit_paths – {party: party_audit_path} for each party. party_audit_path is a file location for gradients. Leave it empty if you do not need audit function.

booster params details:
num_boost_roundint, default=10

Number of boosting iterations. range: [1, 1024]

‘max_depth’: int, maximum depth of a tree.

default: 5 range: [1, 16]

‘learning_rate’: float, step size shrinkage used in update to prevent overfitting.

default: 0.3 range: (0, 1]

‘objective’: Specify the learning objective.

default: ‘logistic’ range: [‘linear’, ‘logistic’]

‘reg_lambda’: float. L2 regularization term on weights.

default: 0.1 range: [0, 10000]

‘gamma’: float. Greater than 0 means pre-pruning enabled.

Gain less than it will not induce split node. default: 0.1 range: [0, 10000]

‘subsample’: Subsample ratio of the training instances.

default: 1 range: (0, 1]

‘colsample_by_tree’: Subsample ratio of columns when constructing each tree.

default: 1 range: (0, 1]

‘sketch_eps’: This roughly translates into O(1 / sketch_eps) number of bins.

default: 0.1 range: (0, 1]

‘base_score’: The initial prediction score of all instances, global bias.

default: 0

‘seed’: Pseudorandom number generator seed.

default: 42

‘fixed_point_parameter’: int. Any floating point number encoded by heu,

will multiply a scale and take the round, scale = 2 ** fixed_point_parameter. larger value may mean more numerical accurate, but too large will lead to overflow problem. See HEU’s document for more details.

default: 20

Returns:

SgbModel

class secretflow.ml.boost.sgb_v.SGBFactory[source]#

Bases: object

You can customize your own boosting algorithms which are based on any combination of ideas of secureboost, XGB, and lightGBM. The parameters for the produced booster algorithm depends on what components it consists of. See components’ parameters.

params_dict#

A dict contain params for the factory, booster and its components.

Type:

dict

factory_params#

validated params for the factory.

Type:

SGBFactoryParams

heu#

the device for HE computations. must be set before training.

Methods:

__init__()

set_params(params)

Set params by a dictionary.

set_heu(heu)

get_params([detailed])

get the params set

fit(dataset, label)

train(params, dataset, label)

__init__()[source]#
set_params(params: dict)[source]#

Set params by a dictionary.

set_heu(heu: HEU)[source]#
get_params(detailed: bool = False) dict[source]#

get the params set

Parameters:

detailed (bool, optional) – If include default settings. Defaults to False.

Returns:

current params.

Return type:

dict

fit(dataset: Union[FedNdarray, VDataFrame], label: Union[FedNdarray, VDataFrame]) SgbModel[source]#
train(params: dict, dataset: Union[FedNdarray, VDataFrame], label: Union[FedNdarray, VDataFrame]) SgbModel[source]#

secretflow.ml.boost.sgb_v.model#

Classes:

SgbModel(label_holder, objective, base)

Sgboost Model & predict.

Functions:

from_dict(model_dict)

from_json_to_dict(device_path_dict, label_holder)

load_model(device_path_dict, label_holder)

class secretflow.ml.boost.sgb_v.model.SgbModel(label_holder: PYU, objective: RegType, base: float)[source]#

Bases: object

Sgboost Model & predict. It is a distributed tree in essence.

Methods:

__init__(label_holder, objective, base)

param label_holder:

PYU device, label holder's PYU device.

predict(dtrain[, to_pyu])

predict on dtrain with this model.

to_dict()

save_model(device_path_dict[, ...])

Save model to different parties

__init__(label_holder: PYU, objective: RegType, base: float) None[source]#
Parameters:
  • label_holder – PYU device, label holder’s PYU device.

  • objective – RegType, specifies doing logistic regression or regression

  • base – float

predict(dtrain: Union[FedNdarray, VDataFrame], to_pyu: Optional[PYU] = None) Union[PYUObject, FedNdarray][source]#

predict on dtrain with this model.

Parameters:
  • dtrain – [FedNdarray, VDataFrame] vertical split dataset.

  • to – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in plaintext and save as PYUObject in label_holder device.

Returns:

Pred values store in pyu object or FedNdarray.

to_dict() Dict[source]#
save_model(device_path_dict: Dict, wait_before_proceed=True)[source]#

Save model to different parties

Parameters:
  • device_path_dict (Dict) – {device: a path to save model for the device}.

  • wait_before_process (bool) – if False, handle will be returned, to allow user to wait for model write to finish (and do something else in the meantime).

secretflow.ml.boost.sgb_v.model.from_dict(model_dict: Dict) SgbModel[source]#
secretflow.ml.boost.sgb_v.model.from_json_to_dict(device_path_dict: Dict, label_holder: PYU) Dict[source]#
secretflow.ml.boost.sgb_v.model.load_model(device_path_dict: Dict, label_holder: PYU) SgbModel[source]#

secretflow.ml.boost.sgb_v.sgb#

Classes:

Sgb(heu)

This class provides both classification and regression tree boosting (also known as GBDT, GBM) for vertical split dataset setting by using secure boost.

Functions:

move_config(pyu, params)

write_log(x, path)

class secretflow.ml.boost.sgb_v.sgb.Sgb(heu: HEU)[source]#

Bases: object

This class provides both classification and regression tree boosting (also known as GBDT, GBM) for vertical split dataset setting by using secure boost.

SGB is short for SecureBoost. Compared to its safer counterpart SS-XGB, SecureBoost focused on protecting label holder.

Parameters:

heu – secret device running homomorphic encryptions

Methods:

__init__(heu)

train(params, dtrain, label[, audit_paths])

train on dtrain and label.

__init__(heu: HEU) None[source]#
train(params: Dict, dtrain: Union[FedNdarray, VDataFrame], label: Union[FedNdarray, VDataFrame], audit_paths: Dict = {}) SgbModel[source]#

train on dtrain and label.

Parameters:
  • params – Dict booster params, details are as follows

  • dtrain – {FedNdarray, VDataFrame} vertical split dataset.

  • label – {FedNdarray, VDataFrame} label column.

  • audit_paths – {party: party_audit_path} for each party. party_audit_path is a file location for gradients. Leave it empty if you do not need audit function.

booster params details:
num_boost_roundint, default=10

Number of boosting iterations. range: [1, 1024]

‘max_depth’: int, maximum depth of a tree.

default: 5 range: [1, 16]

‘learning_rate’: float, step size shrinkage used in update to prevent overfitting.

default: 0.3 range: (0, 1]

‘objective’: Specify the learning objective.

default: ‘logistic’ range: [‘linear’, ‘logistic’]

‘reg_lambda’: float. L2 regularization term on weights.

default: 0.1 range: [0, 10000]

‘gamma’: float. Greater than 0 means pre-pruning enabled.

Gain less than it will not induce split node. default: 0.1 range: [0, 10000]

‘subsample’: Subsample ratio of the training instances.

default: 1 range: (0, 1]

‘colsample_by_tree’: Subsample ratio of columns when constructing each tree.

default: 1 range: (0, 1]

‘sketch_eps’: This roughly translates into O(1 / sketch_eps) number of bins.

default: 0.1 range: (0, 1]

‘base_score’: The initial prediction score of all instances, global bias.

default: 0

‘seed’: Pseudorandom number generator seed.

default: 42

‘fixed_point_parameter’: int. Any floating point number encoded by heu,

will multiply a scale and take the round, scale = 2 ** fixed_point_parameter. larger value may mean more numerical accurate, but too large will lead to overflow problem. See HEU’s document for more details.

default: 20

Returns:

SgbModel

secretflow.ml.boost.sgb_v.sgb.move_config(pyu, params)[source]#
secretflow.ml.boost.sgb_v.sgb.write_log(x, path)[source]#