secretflow.ml.linear#

Classes:

FlLogisticRegressionMix()

SGD based logistic regression for mix partitioned data.

FlLogisticRegressionVertical(devices, ...[, ...])

Vertical logistic regression.

HESSLogisticRegression(spu, heu_x, heu_y)

This method provides logistic regression linear models for vertical split dataset setting by using secret sharing and homomorphic encryption with mini batch SGD training solver.

SSRegression(spu)

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver.

LinearModel(weights, reg_type, sig_type)

Unified linear regression model.

RegType(value)

An enumeration.

SSGLM(spu)

class secretflow.ml.linear.FlLogisticRegressionMix[源代码]#

基类:object

SGD based logistic regression for mix partitioned data.

The following is an example to illustrate the algorithm.

Suppose alice has features and label, while bob/carol/dave have features only.

The perspective of MixDataFrame X is as follows:

X

VDataFrame_0

alice_x0

bob_x

dave_x0

VDataFrame_1

alice_x1

carol_x

dave_x1

The perspective of MixDataFrame Y is as follows:

Y

VDataFrame_0

alice_y0

VDataFrame_1

alice_y1

When fitted with the X and Y, two FlLogisticRegressionVertical instances are constructed. The first one will be fitted with VDataFrame_0 of X and Y, while the second one will be fitted with VDataFrame_1 of X and Y,.

The main steps of one epoch are:

  1. The FlLogisticRegressionVertical are fitted with the VDataFrame of X and Y respectly.

  2. Aggregate \({\theta}\) of the FlLogisticRegressionVertical with SecureAggregator.

  3. Send aggregated \({\theta}\) to the FlLogisticRegressionVertical.

Methods:

fit(x, y, batch_size, epochs, aggregators, heus)

Fit the model.

predict(x)

Predict the score.

fit(x: MixDataFrame, y: MixDataFrame, batch_size: int, epochs: int, aggregators: List[Aggregator], heus: List[HEU], fxp_bits: Optional[int] = 18, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1, agg_epochs: Optional[int] = 1, audit_log_dir: Optional[Dict[PYU, str]] = None)[源代码]#

Fit the model.

参数:
  • x – training vector. X should be a horizontal partitioned MixDataFrame, which consists of :py:class:`~secretflow.data.vertical.VDataFrame`s.

  • y – target vector relative to x. Y should be a horizontal partitioned MixDataFrame alos. X and y should have the same amount of `VDataFrame`s.

  • batch_size – number of samples per gradient update.

  • epochs – number of epochs to train the model.

  • aggregators – aggregator used to compute vertical lr. Amount of aggregators should be same as the VDataFrame of X.

  • heus – a list of heu used to compute vertical lr. Amount of heus should be same as the VDataFrame of X.

  • fxp_bits – the fraction bit length for encoding before sending to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).

  • tol – optional, tolerance for stopping criteria. Defaults to 1e-4.

  • learning_rate – optional, learning rate. Defaults to 0.1.

  • agg_epochs – aggregate weights for every {agg_epochs} epochs. Defaults to 1.

  • audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.

predict(x: MixDataFrame) List[PYUObject][源代码]#

Predict the score.

参数:

x – the samples to predict.

返回:

a list of PYUObjects holding prediction results.

class secretflow.ml.linear.FlLogisticRegressionVertical(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[源代码]#

基类:object

Vertical logistic regression.

Implement the basic SGD based logistic regression among multiple vertical participants.

To explain this algorithm, suppose alice has features and label, while bob and charlie have features only. The main steps of SGD are:

  1. Alice does prediction using secure aggregation.

  2. Alice sends the residual to bob/charlie in HE(Homomorphic Encryption) ciphertext.

  3. Bob and charlie compute gradients in HE ciphertext and send masked gradients to alice.

  4. Alice decrypts the masked gradients and send them back to bob/charlie.

  5. Bob and charlie unmask gradients and update their weights independently.

  6. Alice updates its weights also.

Methods:

__init__(devices, aggregator, heu[, ...])

Init VanillaVerLogisticRegression.

init_train_data(x, y, epochs, batch_size[, ...])

predict(x)

Predict the score.

compute_loss(x, y[, avg_flag])

Compute the loss.

get_weight()

Get weight from this estimator.

set_weight(weight)

Set weight to this estimator.

fit(x, y, batch_size, epochs[, tol, ...])

Fit the model.

fit_in_steps(n_step, learning_rate, epoch)

Fit in steps.

__init__(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[源代码]#

Init VanillaVerLogisticRegression.

参数:
  • devices – a list of PYU devices taking part in the computation.

  • aggregator – the aggregator instance.

  • heu – the heu device instance.

  • fxp_bits – the fraction bit length for encoding before send to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).

  • audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.

init_train_data(x: FedNdarray, y: FedNdarray, epochs: int, batch_size: int, shuffle_seed: Optional[int] = None)[源代码]#
predict(x: Union[VDataFrame, FedNdarray, List[PYUObject]]) PYUObject[源代码]#

Predict the score.

参数:

x – the samples to predict.

返回:

a PYUObject holds prediction results.

返回类型:

PYUObject

compute_loss(x: FedNdarray, y: FedNdarray, avg_flag: Optional[bool] = True) PYUObject[源代码]#

Compute the loss.

参数:
  • x – the samples.

  • y – the label.

  • avg_flag – whether dividing the sample number. Defaults to True.

返回:

a PYUObject holds loss value.

返回类型:

PYUObject

get_weight() Dict[PYU, PYUObject][源代码]#

Get weight from this estimator.

返回:

A dict of pyu and its weight. Note that the intecept(w0) is the first column of the label deivce weight.

set_weight(weight: Dict[PYU, Union[PYUObject, ndarray]])[源代码]#

Set weight to this estimator.

参数:

weight – a dict of pyu and its weight.

fit(x: Union[VDataFrame, FedNdarray], y: Union[VDataFrame, FedNdarray], batch_size: int, epochs: int, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1)[源代码]#

Fit the model.

参数:
  • x – training vector.

  • y – target vector relative to x.

  • batch_size – number of samples per gradient update.

  • epochs – number of epochs to train the model.

  • tol – optional, tolerance for stopping criteria. Defaults to 1e-4.

  • learning_rate – optional, learning rate. Defaults to 0.1.

fit_in_steps(n_step: int, learning_rate: float, epoch: int)[源代码]#

Fit in steps.

参数:
  • n_step – the number of steps.

  • learning_rate – learning rate.

  • epoch – the current epoch.

class secretflow.ml.linear.HESSLogisticRegression(spu: SPU, heu_x: HEU, heu_y: HEU)[源代码]#

基类:object

This method provides logistic regression linear models for vertical split dataset setting by using secret sharing and homomorphic encryption with mini batch SGD training solver. HESS-SGD is short for HE & secret sharing SGD training.

During the calculation process, the HEU is used to protect the weights and calculate the predicted y, and the SPU is used to calculate the sigmoid and gradient.

SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security. More detail: https://www.secretflow.org.cn/docs/spu/en/

HEU is a secure computing device that implementing HE encryption and decryption, and provides matrix operations similar to the numpy, reducing the threshold for use. More detail: https://www.secretflow.org.cn/docs/heu/en/

For more detail, please refer to paper in KDD’21: https://dl.acm.org/doi/10.1145/3447548.3467210

参数:
  • spu – SPU SPU device.

  • heu_x – HEU HEU device without label.

  • heu_y – HEU HEU device with label.

备注

training dataset should be normalized or standardized, otherwise the SGD solver will not converge.

Methods:

__init__(spu, heu_x, heu_y)

fit(x, y[, learning_rate, epochs, batch_size])

Fit linear model with Stochastic Gradient Descent.

save_model()

Save fit model in LinearModel format.

load_model(m)

Load LinearModel format model.

predict(x)

Probability estimates.

__init__(spu: SPU, heu_x: HEU, heu_y: HEU) None[源代码]#
fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], learning_rate=0.001, epochs=1, batch_size=None)[源代码]#

Fit linear model with Stochastic Gradient Descent.

参数:
  • x – {FedNdarray, VDataFrame} Input data, must be colocated with SPU.

  • y – {FedNdarray, VDataFrame} Target data, must be located on self._heu_y.

  • learning_rate – float, default=1e-3. Learning rate.

  • epochs – int, default=1 Number of epochs to train the model

  • batch_size – int, default=None Number of samples per gradient update. If None, batch_size will default to number of all samples.

save_model() LinearModel[源代码]#

Save fit model in LinearModel format.

load_model(m: LinearModel) None[源代码]#

Load LinearModel format model.

predict(x: Union[FedNdarray, VDataFrame]) PYUObject[源代码]#

Probability estimates.

参数:

x – {FedNdarray, VDataFrame} Predict samples.

返回:

probability of the sample for each class in the model.

返回类型:

PYUObject

class secretflow.ml.linear.SSRegression(spu: SPU)[源代码]#

基类:object

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver. SS-SGD is short for secret sharing SGD training.

more detail for SGD: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent

Linear regression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

more detail for linear regression: https://en.wikipedia.org/wiki/Linear_regression

Logistic regression, despite its name, is a linear model for classification rather than regression. logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. This method can fit binary regularization with optional L2 regularization.

more detail for logistic regression: https://en.wikipedia.org/wiki/Logistic_regression

SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security.

More detail for SPU: https://www.secretflow.org.cn/docs/spu/en/

This method protects the original dataset and the final model by secret sharing the dataset to SPU device and running model fit under SPU.

参数:

spu – secure device.

备注

training dataset should be normalized or standardized, otherwise the SGD solver will not converge.

Methods:

__init__(spu)

fit(x, y, epochs[, learning_rate, ...])

Fit the model according to the given training data.

save_model()

Save fit model in LinearModel format.

load_model(m)

Load LinearModel format model.

predict(x[, batch_size, to_pyu])

Predict using the model.

__init__(spu: SPU) None[源代码]#
fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], epochs: int, learning_rate: float = 0.1, batch_size: int = 1024, sig_type: str = 't1', reg_type: str = 'logistic', penalty: str = 'None', l2_norm: float = 0.5, eps: float = 0.001, decay_epoch: Optional[int] = None, decay_rate: Optional[float] = None, strategy: str = 'naive_sgd') None[源代码]#

Fit the model according to the given training data.

参数:
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.

  • epochs – int iteration rounds.

  • learning_rate – float, default=0.1 controls how much to change the model in one epoch.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • sig_type – str, default=t1 sigmoid approximation type.

  • reg_type – str, default=logistic Linear or Logistic regression.

  • penalty – str, default=None The penalty (aka regularization term) to be used.

  • l2_norm – float, default=0.5 L2 regularization term.

  • eps – float, default=1e-3 If the W’s change rate is less than this threshold, the model is considered to be converged, and the training stops early. 0 disable.

  • decay_rate (decay_epoch /) – int, default=None decay learning rate, learning_rate * (decay_rate ** floor(epoch / decay_epoch)). None disable If strategy=policy_sgd, then decay_rate and decay_epoch have default value 0.5, 5.

  • strategy

    str, default=naive_sgd optimization strategy used in training

    naive_sgd means origin sgd policy_sgd(LR only) will scale the learning_rate in each update like adam but with unify factor,

    so the batch_size can be larger and the early stop strategy can be more aggressive, which accelerates training in most scenery(But not recommend for training with large regularization).

返回:

Final weights in SPUObject.

save_model() LinearModel[源代码]#

Save fit model in LinearModel format.

load_model(m: LinearModel) None[源代码]#

Load LinearModel format model.

predict(x: Union[FedNdarray, VDataFrame], batch_size: int = 1024, to_pyu: Optional[PYU] = None) Union[SPUObject, FedNdarray][源代码]#

Predict using the model.

参数:
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • to_pyu – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.

返回:

pred scores in SPUObject or FedNdarray, shape (n_samples,)

class secretflow.ml.linear.LinearModel(weights: Union[SPUObject, List[PYUObject]], reg_type: RegType, sig_type: SigType)[源代码]#

基类:object

Unified linear regression model.

weights#

{SPUObject, List[PYUObject]} for mpc lr, use SPUObject save all weights; for fl lr, use list of PYUObject.

Type:

Union[secretflow.device.device.spu.SPUObject, List[secretflow.device.device.pyu.PYUObject]]

reg_type#

RegType linear regression or logistic regression model.

Type:

secretflow.ml.linear.linear_model.RegType

sig_type#

SigType which sigmoid approximation should use, only use in mpc lr.

Type:

secretflow.utils.sigmoid.SigType

Attributes:

weights

reg_type

sig_type

Methods:

dump(dir_path)

load(record[, spu, pyus])

__init__(weights, reg_type, sig_type)

weights: Union[SPUObject, List[PYUObject]]#
reg_type: RegType#
sig_type: SigType#
dump(dir_path: Dict[str, str]) LinearModelRecord[源代码]#
classmethod load(record: LinearModelRecord, spu: Optional[SPU] = None, pyus: Optional[List[PYU]] = None) LinearModel[源代码]#
__init__(weights: Union[SPUObject, List[PYUObject]], reg_type: RegType, sig_type: SigType) None#
class secretflow.ml.linear.RegType(value)[源代码]#

基类:Enum

An enumeration.

Attributes:

Linear

Logistic

Linear = 'linear'#
Logistic = 'logistic'#
class secretflow.ml.linear.SSGLM(spu: SPU)[源代码]#

基类:object

Methods:

__init__(spu)

fit_irls(x, y, offset, weight, epochs, link, ...)

Fit the model by IRLS(Iteratively reweighted least squares).

fit_sgd(x, y, offset, weight, epochs, link, dist)

Fit the model by SGD(stochastic gradient descent).

predict(x[, o, to_pyu])

Predict using the model.

__init__(spu: SPU) None[源代码]#
fit_irls(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], offset: Union[FedNdarray, VDataFrame], weight: Union[FedNdarray, VDataFrame], epochs: int, link: str, dist: str, tweedie_power: float = 1, scale: float = 1, eps: float = 0.0001) None[源代码]#

Fit the model by IRLS(Iteratively reweighted least squares).

参数:
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.

  • offset – {FedNdarray, VDataFrame} of shape (n_samples,) Specify a column to use as the offset, Offsets are per-row “bias values” that are used during model training.

  • weight – {FedNdarray, VDataFrame} of shape (n_samples,) Specify a column to use for the observation weights, which are used for bias correction.

  • epochs – int iteration rounds.

  • link – str Specify a link function (Logit, Log, Reciprocal, Indentity)

  • dist – str Specify a probability distribution (Bernoulli, Poisson, Gamma, Tweedie)

  • tweedie_power

    float Tweedie distributions are a family of distributions that include normal, gamma, poisson and their combinations.

    0: Specialized as normal 1: Specialized as poisson 2: Specialized as gamma (1,2): combinations of gamma and poisson

  • scale – float A guess value for distribution’s scale.

  • learning_rate – float, default=0.1 controls how much to change the model in one epoch.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • iter_start_irls – int, default=0 run a few rounds of irls training as the initialization of w, 0 disable.

  • eps – float, default=1e-4 If the W’s change rate is less than this threshold, the model is considered to be converged, and the training stops early. 0 disable.

fit_sgd(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], offset: Union[FedNdarray, VDataFrame], weight: Union[FedNdarray, VDataFrame], epochs: int, link: str, dist: str, tweedie_power: float = 1, scale: float = 1, learning_rate: float = 0.1, batch_size: int = 1024, iter_start_irls: int = 0, eps: float = 0.0001, decay_epoch: Optional[int] = None, decay_rate: Optional[float] = None) None[源代码]#

Fit the model by SGD(stochastic gradient descent).

参数:
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.

  • offset – {FedNdarray, VDataFrame} of shape (n_samples,) Specify a column to use as the offset, Offsets are per-row “bias values” that are used during model training.

  • weight – {FedNdarray, VDataFrame} of shape (n_samples,) Specify a column to use for the observation weights, which are used for bias correction.

  • epochs – int iteration rounds.

  • link – str Specify a link function (Logit, Log, Reciprocal, Indentity)

  • dist – str Specify a probability distribution (Bernoulli, Poisson, Gamma, Tweedie)

  • tweedie_power

    float Tweedie distributions are a family of distributions that include normal, gamma, poisson and their combinations.

    0: Specialized as normal 1: Specialized as poisson 2: Specialized as gamma (1,2): combinations of gamma and poisson

  • scale – float A guess value for distribution’s scale.

  • learning_rate – float, default=0.1 controls how much to change the model in one epoch.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • iter_start_irls – int, default=0 run a few rounds of irls training as the initialization of w, 0 disable.

  • eps – float, default=1e-4 If the W’s change rate is less than this threshold, the model is considered to be converged, and the training stops early. 0 disable.

  • decay_rate (decay_epoch /) – int, default=None decay learning rate, learning_rate * (decay_rate ** floor(epoch / decay_epoch)). None disable

predict(x: Union[FedNdarray, VDataFrame], o: Optional[Union[FedNdarray, VDataFrame]] = None, to_pyu: Optional[PYU] = None) Union[SPUObject, PYUObject][源代码]#

Predict using the model.

参数:
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.

  • o – {FedNdarray, VDataFrame} of shape (n_samples,) Specify a column to use as the offset as per-row “bias values” use in predict

  • to_pyu – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.

返回:

pred scores in SPUObject, shape (n_samples,)

secretflow.ml.linear.fl_lr_mix#

Classes:

FlLogisticRegressionMix()

SGD based logistic regression for mix partitioned data.

class secretflow.ml.linear.fl_lr_mix.FlLogisticRegressionMix[源代码]#

基类:object

SGD based logistic regression for mix partitioned data.

The following is an example to illustrate the algorithm.

Suppose alice has features and label, while bob/carol/dave have features only.

The perspective of MixDataFrame X is as follows:

X

VDataFrame_0

alice_x0

bob_x

dave_x0

VDataFrame_1

alice_x1

carol_x

dave_x1

The perspective of MixDataFrame Y is as follows:

Y

VDataFrame_0

alice_y0

VDataFrame_1

alice_y1

When fitted with the X and Y, two FlLogisticRegressionVertical instances are constructed. The first one will be fitted with VDataFrame_0 of X and Y, while the second one will be fitted with VDataFrame_1 of X and Y,.

The main steps of one epoch are:

  1. The FlLogisticRegressionVertical are fitted with the VDataFrame of X and Y respectly.

  2. Aggregate \({\theta}\) of the FlLogisticRegressionVertical with SecureAggregator.

  3. Send aggregated \({\theta}\) to the FlLogisticRegressionVertical.

Methods:

fit(x, y, batch_size, epochs, aggregators, heus)

Fit the model.

predict(x)

Predict the score.

fit(x: MixDataFrame, y: MixDataFrame, batch_size: int, epochs: int, aggregators: List[Aggregator], heus: List[HEU], fxp_bits: Optional[int] = 18, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1, agg_epochs: Optional[int] = 1, audit_log_dir: Optional[Dict[PYU, str]] = None)[源代码]#

Fit the model.

参数:
  • x – training vector. X should be a horizontal partitioned MixDataFrame, which consists of :py:class:`~secretflow.data.vertical.VDataFrame`s.

  • y – target vector relative to x. Y should be a horizontal partitioned MixDataFrame alos. X and y should have the same amount of `VDataFrame`s.

  • batch_size – number of samples per gradient update.

  • epochs – number of epochs to train the model.

  • aggregators – aggregator used to compute vertical lr. Amount of aggregators should be same as the VDataFrame of X.

  • heus – a list of heu used to compute vertical lr. Amount of heus should be same as the VDataFrame of X.

  • fxp_bits – the fraction bit length for encoding before sending to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).

  • tol – optional, tolerance for stopping criteria. Defaults to 1e-4.

  • learning_rate – optional, learning rate. Defaults to 0.1.

  • agg_epochs – aggregate weights for every {agg_epochs} epochs. Defaults to 1.

  • audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.

predict(x: MixDataFrame) List[PYUObject][源代码]#

Predict the score.

参数:

x – the samples to predict.

返回:

a list of PYUObjects holding prediction results.

secretflow.ml.linear.fl_lr_v#

Classes:

FlLrVWorker()

PYUFlLrVWorker

ActorProxy(PYUFlLrVWorker) 的别名

FlLogisticRegressionVertical(devices, ...[, ...])

Vertical logistic regression.

class secretflow.ml.linear.fl_lr_v.FlLrVWorker[源代码]#

基类:object

Methods:

init_train_data(x, batch_size, epochs[, y, ...])

Initialize the training data.

next_batch()

Get next batch of X and y.

compute_mul(x_batch)

Compute Xi*Wi.

predict(mul)

Do prediction.

compute_loss(y, h, avg_flag)

compute_residual(y, h)

encode(data, frac_bits)

decode(data, frac_bits)

generate_rand_mask(decode_frac)

get_weight()

set_weight(w)

update_weight(masked_gradient, learning_rate)

update_weight_agg(x_batch, residual, ...)

init_train_data(x: Union[DataFrame, ndarray], batch_size: int, epochs: int, y: Optional[Union[DataFrame, ndarray]] = None, shuffle_seed: Optional[int] = None)[源代码]#

Initialize the training data.

参数:
  • x – the training vector.

  • batch_size – number of samples per gradient update.

  • epochs – number of epochs to train the model.

  • y – optional; the target vector relative to x.

  • shuffle_seed – optional; the data will be shuffled if not none.

next_batch() Tuple[ndarray, ndarray][源代码]#

Get next batch of X and y.

返回:

A tuple of (x batch, y batch), while y batch is None if no y.

compute_mul(x_batch: ndarray) ndarray[源代码]#

Compute Xi*Wi.

predict(mul: ndarray) ndarray[源代码]#

Do prediction.

参数:

mul – the sum of Xi*Wi (i>0).

返回:

The prediction results.

compute_loss(y: ndarray, h: ndarray, avg_flag: bool) ndarray[源代码]#
compute_residual(y: ndarray, h: ndarray) ndarray[源代码]#
encode(data: ndarray, frac_bits: int) ndarray[源代码]#
decode(data: ndarray, frac_bits: int) ndarray[源代码]#
generate_rand_mask(decode_frac: int) ndarray[源代码]#
get_weight() ndarray[源代码]#
set_weight(w: ndarray)[源代码]#
update_weight(masked_gradient: ndarray, learning_rate: float)[源代码]#
update_weight_agg(x_batch: ndarray, residual: ndarray, learning_rate: float)[源代码]#
secretflow.ml.linear.fl_lr_v.PYUFlLrVWorker[源代码]#

ActorProxy(PYUFlLrVWorker) 的别名 Methods:

__init__(*args, **kwargs)

Abstraction device object base class.

compute_loss(y, h, avg_flag)

compute_mul(x_batch)

Compute Xi*Wi.

compute_residual(y, h)

decode(data, frac_bits)

encode(data, frac_bits)

generate_rand_mask(decode_frac)

get_weight()

init_train_data(x, batch_size, epochs[, y, ...])

Initialize the training data.

next_batch()

Get next batch of X and y.

predict(mul)

Do prediction.

set_weight(w)

update_weight(masked_gradient, learning_rate)

update_weight_agg(x_batch, residual, ...)

class secretflow.ml.linear.fl_lr_v.FlLogisticRegressionVertical(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[源代码]#

基类:object

Vertical logistic regression.

Implement the basic SGD based logistic regression among multiple vertical participants.

To explain this algorithm, suppose alice has features and label, while bob and charlie have features only. The main steps of SGD are:

  1. Alice does prediction using secure aggregation.

  2. Alice sends the residual to bob/charlie in HE(Homomorphic Encryption) ciphertext.

  3. Bob and charlie compute gradients in HE ciphertext and send masked gradients to alice.

  4. Alice decrypts the masked gradients and send them back to bob/charlie.

  5. Bob and charlie unmask gradients and update their weights independently.

  6. Alice updates its weights also.

Methods:

__init__(devices, aggregator, heu[, ...])

Init VanillaVerLogisticRegression.

init_train_data(x, y, epochs, batch_size[, ...])

predict(x)

Predict the score.

compute_loss(x, y[, avg_flag])

Compute the loss.

get_weight()

Get weight from this estimator.

set_weight(weight)

Set weight to this estimator.

fit(x, y, batch_size, epochs[, tol, ...])

Fit the model.

fit_in_steps(n_step, learning_rate, epoch)

Fit in steps.

__init__(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[源代码]#

Init VanillaVerLogisticRegression.

参数:
  • devices – a list of PYU devices taking part in the computation.

  • aggregator – the aggregator instance.

  • heu – the heu device instance.

  • fxp_bits – the fraction bit length for encoding before send to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).

  • audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.

init_train_data(x: FedNdarray, y: FedNdarray, epochs: int, batch_size: int, shuffle_seed: Optional[int] = None)[源代码]#
predict(x: Union[VDataFrame, FedNdarray, List[PYUObject]]) PYUObject[源代码]#

Predict the score.

参数:

x – the samples to predict.

返回:

a PYUObject holds prediction results.

返回类型:

PYUObject

compute_loss(x: FedNdarray, y: FedNdarray, avg_flag: Optional[bool] = True) PYUObject[源代码]#

Compute the loss.

参数:
  • x – the samples.

  • y – the label.

  • avg_flag – whether dividing the sample number. Defaults to True.

返回:

a PYUObject holds loss value.

返回类型:

PYUObject

get_weight() Dict[PYU, PYUObject][源代码]#

Get weight from this estimator.

返回:

A dict of pyu and its weight. Note that the intecept(w0) is the first column of the label deivce weight.

set_weight(weight: Dict[PYU, Union[PYUObject, ndarray]])[源代码]#

Set weight to this estimator.

参数:

weight – a dict of pyu and its weight.

fit(x: Union[VDataFrame, FedNdarray], y: Union[VDataFrame, FedNdarray], batch_size: int, epochs: int, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1)[源代码]#

Fit the model.

参数:
  • x – training vector.

  • y – target vector relative to x.

  • batch_size – number of samples per gradient update.

  • epochs – number of epochs to train the model.

  • tol – optional, tolerance for stopping criteria. Defaults to 1e-4.

  • learning_rate – optional, learning rate. Defaults to 0.1.

fit_in_steps(n_step: int, learning_rate: float, epoch: int)[源代码]#

Fit in steps.

参数:
  • n_step – the number of steps.

  • learning_rate – learning rate.

  • epoch – the current epoch.

secretflow.ml.linear.linear_model#

Classes:

RegType(value)

An enumeration.

PartyPath(party, path)

LinearModelRecord(reg_type, sig_type, ...)

LinearModel(weights, reg_type, sig_type)

Unified linear regression model.

class secretflow.ml.linear.linear_model.RegType(value)[源代码]#

基类:Enum

An enumeration.

Attributes:

Linear

Logistic

Linear = 'linear'#
Logistic = 'logistic'#
class secretflow.ml.linear.linear_model.PartyPath(party: str, path: str)[源代码]#

基类:object

Attributes:

party

path

Methods:

__init__(party, path)

party: str#
path: str#
__init__(party: str, path: str) None#
class secretflow.ml.linear.linear_model.LinearModelRecord(reg_type: secretflow.ml.linear.linear_model.RegType, sig_type: secretflow.utils.sigmoid.SigType, weights_spu: List[secretflow.ml.linear.linear_model.PartyPath], weights_pyu: List[secretflow.ml.linear.linear_model.PartyPath])[源代码]#

基类:object

Attributes:

reg_type

sig_type

weights_spu

weights_pyu

Methods:

__init__(reg_type, sig_type, weights_spu, ...)

reg_type: RegType#
sig_type: SigType#
weights_spu: List[PartyPath]#
weights_pyu: List[PartyPath]#
__init__(reg_type: RegType, sig_type: SigType, weights_spu: List[PartyPath], weights_pyu: List[PartyPath]) None#
class secretflow.ml.linear.linear_model.LinearModel(weights: Union[SPUObject, List[PYUObject]], reg_type: RegType, sig_type: SigType)[源代码]#

基类:object

Unified linear regression model.

weights#

{SPUObject, List[PYUObject]} for mpc lr, use SPUObject save all weights; for fl lr, use list of PYUObject.

Type:

Union[secretflow.device.device.spu.SPUObject, List[secretflow.device.device.pyu.PYUObject]]

reg_type#

RegType linear regression or logistic regression model.

Type:

secretflow.ml.linear.linear_model.RegType

sig_type#

SigType which sigmoid approximation should use, only use in mpc lr.

Type:

secretflow.utils.sigmoid.SigType

Attributes:

weights

reg_type

sig_type

Methods:

dump(dir_path)

load(record[, spu, pyus])

__init__(weights, reg_type, sig_type)

weights: Union[SPUObject, List[PYUObject]]#
reg_type: RegType#
sig_type: SigType#
dump(dir_path: Dict[str, str]) LinearModelRecord[源代码]#
classmethod load(record: LinearModelRecord, spu: Optional[SPU] = None, pyus: Optional[List[PYU]] = None) LinearModel[源代码]#
__init__(weights: Union[SPUObject, List[PYUObject]], reg_type: RegType, sig_type: SigType) None#