secretflow.ml.linear.ss_sgd#

Classes:

SSRegression(spu)

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver.

class secretflow.ml.linear.ss_sgd.SSRegression(spu: SPU)[源代码]#

基类：object

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver. SS-SGD is short for secret sharing SGD training.

more detail for SGD: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent

Linear regression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

more detail for linear regression: https://en.wikipedia.org/wiki/Linear_regression

Logistic regression, despite its name, is a linear model for classification rather than regression. logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. This method can fit binary regularization with optional L2 regularization.

more detail for logistic regression: https://en.wikipedia.org/wiki/Logistic_regression

SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security.

More detail for SPU: https://www.secretflow.org.cn/docs/spu/en/

This method protects the original dataset and the final model by secret sharing the dataset to SPU device and running model fit under SPU.

参数:: spu – secure device.

备注

training dataset should be normalized or standardized, otherwise the SGD solver will not converge.

Methods:

`__init__`(spu)
`fit`(x, y, epochs[, learning_rate, ...])	Fit the model according to the given training data.
`save_model`()	Save fit model in LinearModel format.
`load_model`(m)	Load LinearModel format model.
`predict`(x[, batch_size, to_pyu])	Predict using the model.

__init__(spu: SPU) → None[源代码]#

fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], epochs: int, learning_rate: float = 0.1, batch_size: int = 1024, sig_type: str = 't1', reg_type: str = 'logistic', penalty: str = 'None', l2_norm: float = 0.5, eps: float = 0.001, decay_epoch: Optional[int] = None, decay_rate: Optional[float] = None, strategy: str = 'naive_sgd') → None[源代码]#

Fit the model according to the given training data.

参数:

x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.
epochs – int iteration rounds.
learning_rate – float, default=0.1 controls how much to change the model in one epoch.
batch_size – int, default=1024 how many samples use in one calculation.
sig_type – str, default=t1 sigmoid approximation type.
reg_type – str, default=logistic Linear or Logistic regression.
penalty – str, default=None The penalty (aka regularization term) to be used.
l2_norm – float, default=0.5 L2 regularization term.
eps – float, default=1e-3 If the W’s change rate is less than this threshold, the model is considered to be converged, and the training stops early. 0 disable.
decay_rate (decay_epoch /) – int, default=None decay learning rate, learning_rate * (decay_rate ** floor(epoch / decay_epoch)). None disable If strategy=policy_sgd, then decay_rate and decay_epoch have default value 0.5, 5.
strategy –
str, default=naive_sgd optimization strategy used in training

naive_sgd means origin sgd policy_sgd(LR only) will scale the learning_rate in each update like adam but with unify factor,

so the batch_size can be larger and the early stop strategy can be more aggressive, which accelerates training in most scenery(But not recommend for training with large regularization).

返回:

Final weights in SPUObject.

save_model() → LinearModel[源代码]#: Save fit model in LinearModel format.

load_model(m: LinearModel) → None[源代码]#: Load LinearModel format model.

predict(x: Union[FedNdarray, VDataFrame], batch_size: int = 1024, to_pyu: Optional[PYU] = None) → Union[SPUObject, FedNdarray][源代码]#

Predict using the model.

参数:

x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.
batch_size – int, default=1024 how many samples use in one calculation.
to_pyu – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.

返回:

pred scores in SPUObject or FedNdarray, shape (n_samples,)