secretflow.ml.nn.fl.backend.torch#

secretflow.ml.nn.fl.backend.torch.fl_base#

Classes:

BaseTorchModel(builder_base[, random_seed])

class secretflow.ml.nn.fl.backend.torch.fl_base.BaseTorchModel(builder_base: Callable[[], TorchModel], random_seed: Optional[int] = None)[source]#

Bases: ABC

Methods:

__init__(builder_base[, random_seed])

build_dataset_from_csv(csv_file_path, label)

build torch.dataloader

build_dataset(x[, y, s_w, sampling_rate, ...])

build torch.dataloader

build_dataset_from_builder(dataset_builder, x)

build tf.data.Dataset

get_rows_count(filename)

get_weights()

set_weights(weights)

set weights of client model

set_validation_metrics(global_metrics)

wrap_local_metrics()

evaluate([evaluate_steps])

predict([predict_steps])

init_training(callbacks[, epochs, steps, ...])

on_train_begin()

on_epoch_begin(epoch)

on_epoch_end(epoch)

transform_metrics(logs[, stage])

on_train_end()

get_stop_training()

train_step(weights, cur_steps, train_steps, ...)

save_model(model_path)

For compatibility reasons it is recommended to instead save only its state dict Ref:https://pytorch.org/docs/master/notes/serialization.html#id5

load_model(model_path)

load model from state dict, model structure must be defined before load

__init__(builder_base: Callable[[], TorchModel], random_seed: Optional[int] = None)[source]#
build_dataset_from_csv(csv_file_path: str, label: str, sampling_rate=None, shuffle=False, random_seed=1234, na_value='?', repeat_count=1, sample_length=0, buffer_size=None, ignore_errors=True, prefetch_buffer_size=None, stage='train', label_decoder=None)[source]#

build torch.dataloader

Parameters:
  • csv_file_path – Dict of csv file path

  • label – label column name

  • sampling_rate – Sampling rate of a batch

  • shuffle – A bool that indicates whether the input should be shuffled

  • random_seed – Randomization seed to use for shuffling.

  • na_value – Additional string to recognize as NA/NaN.

  • repeat_count – num of repeats

  • sample_length – num of sample length

  • buffer_size – shuffle size

  • ignore_errors – if True, ignores errors with CSV file parsing,

  • prefetch_buffer_size – An int specifying the number of feature batches to prefetch for performance improvement.

  • stage – the stage of the datset

  • label_decoder – callable function for label preprocess

build_dataset(x: ndarray, y: Optional[ndarray] = None, s_w: Optional[ndarray] = None, sampling_rate=None, buffer_size=None, shuffle=False, random_seed=1234, repeat_count=1, sampler_method='batch', stage='train')[source]#

build torch.dataloader

Parameters:
  • x – feature, FedNdArray or HDataFrame

  • y – label, FedNdArray or HDataFrame

  • s_w – sample weight of this dataset

  • sampling_rate – Sampling rate of a batch

  • buffer_size – shuffle size

  • shuffle – A bool that indicates whether the input should be shuffled

  • random_seed – Prg seed for shuffling

  • repeat_count – num of repeats

  • sampler – method of sampler

build_dataset_from_builder(dataset_builder: Callable, x: Union[DataFrame, str], y: Optional[ndarray] = None, s_w: Optional[ndarray] = None, repeat_count=1, stage='train')[source]#

build tf.data.Dataset

Parameters:
  • dataset_builder – Function of how to build dataset, must return dataset and step_per_epoch

  • x – A pandas Dataframe or A string representing the path to a CSV file or data folder containing the input data.

  • y – label, An optional NumPy array containing the labels for the dataset. Defaults to None.

  • s_w – An optional NumPy array containing the sample weights for the dataset. Defaults to None.

  • repeat_count – An integer specifying the number of times to repeat the dataset. This is useful for increasing the effective size of the dataset.

  • stage – A string indicating the stage of the dataset (either “train”, “eval”). Defaults to “train”.

Returns:

A tensorflow dataset

get_rows_count(filename)[source]#
get_weights()[source]#
set_weights(weights)[source]#

set weights of client model

set_validation_metrics(global_metrics)[source]#
wrap_local_metrics()[source]#
evaluate(evaluate_steps=0)[source]#
predict(predict_steps=0)[source]#
init_training(callbacks, epochs=1, steps=0, verbose=0)[source]#
on_train_begin()[source]#
on_epoch_begin(epoch)[source]#
on_epoch_end(epoch)[source]#
transform_metrics(logs, stage='train')[source]#
on_train_end()[source]#
get_stop_training()[source]#
abstract train_step(weights, cur_steps, train_steps, **kwargs)[source]#
save_model(model_path: str)[source]#

For compatibility reasons it is recommended to instead save only its state dict Ref:https://pytorch.org/docs/master/notes/serialization.html#id5

load_model(model_path: str)[source]#

load model from state dict, model structure must be defined before load

secretflow.ml.nn.fl.backend.torch.sampler#

Functions:

batch_sampler(x, y, s_w, sampling_rate, ...)

implementation of batch sampler

possion_sampler(x, y, s_w, sampling_rate, ...)

implementation of possion sampler

sampler_data([sampler_method, x, y, s_w, ...])

do sample data by sampler_method

secretflow.ml.nn.fl.backend.torch.sampler.batch_sampler(x, y, s_w, sampling_rate, buffer_size, shuffle, repeat_count, random_seed)[source]#

implementation of batch sampler

Parameters:
  • x – feature, FedNdArray or HDataFrame

  • y – label, FedNdArray or HDataFrame

  • s_w – sample weight of this dataset

  • sampling_rate – Sampling rate of a batch

  • buffer_size – shuffle size

  • shuffle – A bool that indicates whether the input should be shuffled

  • repeat_count – num of repeats

  • random_seed – Prg seed for shuffling

Returns:

tf.data.Dataset

Return type:

data_set

secretflow.ml.nn.fl.backend.torch.sampler.possion_sampler(x, y, s_w, sampling_rate, random_seed)[source]#

implementation of possion sampler

Parameters:
  • x – feature, FedNdArray or HDataFrame

  • y – label, FedNdArray or HDataFrame

  • s_w – sample weight of this dataset

  • sampling_rate – Sampling rate of a batch

  • random_seed – Prg seed for shuffling

Returns:

tf.data.Dataset

Return type:

dataloader

secretflow.ml.nn.fl.backend.torch.sampler.sampler_data(sampler_method='batch', x=None, y=None, s_w=None, sampling_rate=None, buffer_size=None, shuffle=False, repeat_count=1, random_seed=1234)[source]#

do sample data by sampler_method

Parameters:
  • x – feature, FedNdArray or HDataFrame

  • y – label, FedNdArray or HDataFrame

  • s_w – sample weight of this dataset

  • sampling_rate – Sampling rate of a batch

  • buffer_size – shuffle size

  • shuffle – A bool that indicates whether the input should be shuffled

  • repeat_count – num of repeats

  • random_seed – Prg seed for shuffling

Returns:

tf.data.Dataset

Return type:

data_set

secretflow.ml.nn.fl.backend.torch.utils#

Classes:

BaseModule(*args, **kwargs)

TorchModel([model_fn, loss_fn, optim_fn, ...])

class secretflow.ml.nn.fl.backend.torch.utils.BaseModule(*args, **kwargs)[source]#

Bases: ABC, Module

Methods:

forward(x)

Defines the computation performed at every call.

get_weights([return_numpy])

set_weights(weights)

update_weights(weights)

get_gradients([parameters])

set_gradients(gradients[, parameters])

Attributes:

abstract forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_weights(return_numpy=False)[source]#
set_weights(weights)[source]#
update_weights(weights)[source]#
get_gradients(parameters=None)[source]#
set_gradients(gradients: List[Union[Tensor, ndarray]], parameters: Optional[List[Tensor]] = None)[source]#
training: bool#
class secretflow.ml.nn.fl.backend.torch.utils.TorchModel(model_fn: Optional[BaseModule] = None, loss_fn: Optional[_Loss] = None, optim_fn: Optional[Optimizer] = None, metrics: List[Metric] = [])[source]#

Bases: object

Methods:

__init__([model_fn, loss_fn, optim_fn, metrics])

__init__(model_fn: Optional[BaseModule] = None, loss_fn: Optional[_Loss] = None, optim_fn: Optional[Optimizer] = None, metrics: List[Metric] = [])[source]#