secretflow.preprocessing#

Classes:

`KBinsDiscretizer`([n_bins, strategy])	Bin continuous data into intervals.
`LabelEncoder`()	Encode target labels with value between 0 and n_classes-1.
`OneHotEncoder`([min_frequency, max_categories])	Encode categorical features as a one-hot numeric array.
`MinMaxScaler`()	Transform features by scaling each feature to a given range.
`StandardScaler`([with_mean, with_std])	Standardize features by removing the mean and scaling to unit variance.
`LogroundTransformer`([decimals, bias])	Constructs a transformer for calculating round(log2(x + bias)) of (partition of) dataframe.

class secretflow.preprocessing.KBinsDiscretizer(n_bins=5, strategy: str = 'quantile')[源代码]#

基类：_PreprocessBase

Bin continuous data into intervals.

This KBinsDiscretizer is almost same as sklearn.preprocessing.KBinsDiscretizer where the input and output are federated dataframe.

_discretizer#: the sklearn.preprocessing.KBinsDiscretizer instance used.

_n_bins#: The number of bins to produce.

_strategy#: {‘uniform’, ‘quantile’}, notice that ‘kmeans’ is not supported yet now.

Methods:

`__init__`([n_bins, strategy])
`fit`(df[, aggregator, comparator, ...])	Fit the estimator.
`transform`(df)	Discretize the data.
`fit_transform`(df[, aggregator, comparator, ...])	Fit the estimator with X and then transform.
`get_params`()

__init__(n_bins=5, strategy: str = 'quantile') → None[源代码]#

fit(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, compress_thres: int = 10000, error: float = 10000.0, max_iter: int = 200) → KBinsDiscretizer[源代码]#

Fit the estimator.

参数:

df – the X to fit.
aggregator – optional; shall be provided if df is a horizontal partitioned MixDataFrame.
comparator – optional; shall be provided if df is a horizontal partitioned MixDataFrame.
compress_thres – optional; the compress threshold of HomoBinning.
error – optional; the error of HomoBinning.
max_iter – optional; the max iterations of HomoBinning.

返回:

the instance itself.

transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#

Discretize the data.

参数:: df – the X to discretize.
返回:: the transformed X in federated dataframe.

fit_transform(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, compress_thres: int = 10000, error: float = 10000.0, max_iter: int = 200)[源代码]#: Fit the estimator with X and then transform. Just a convience combine of fit and transform methods.

get_params() → Dict[str, Any][源代码]#

class secretflow.preprocessing.LabelEncoder[源代码]#

基类：_PreprocessBase

Encode target labels with value between 0 and n_classes-1.

Just same as sklearn.preprocessing.LabelEncoder where the input/ouput is federated dataframe.

_encoder#: the sklearn LabelEncoder instance.

示例

>>> from secretflow.preprocessing import LabelEncoder
>>> le = LabelEncoder()
>>> le.fit(df)
>>> le.transform(df)

Methods:

`fit`(df)	Fit label encoder.
`transform`(df)	Transform labels to normalized encoding.
`fit_transform`(df)	Fit label encoder and return encoded labels.
`get_params`()

fit(df: Union[HDataFrame, VDataFrame, MixDataFrame])[源代码]#: Fit label encoder.

transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#: Transform labels to normalized encoding.

fit_transform(df: Union[HDataFrame, VDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#: Fit label encoder and return encoded labels.

get_params() → Dict[str, Any][源代码]#

class secretflow.preprocessing.OneHotEncoder(min_frequency=None, max_categories=None)[源代码]#

基类：_PreprocessBase

Encode categorical features as a one-hot numeric array.

Just same as sklearn.preprocessing.OneHotEncoder where the input/ouput is federated dataframe.

Note: min_frequency and max_categories are calculated by partition,: so they are only available for vertical scenarios currently.

参数:

min_frequency –
int or float, default=None Specifies the minimum frequency below which a category will be considered infrequent.
- If int, categories with a smaller cardinality will be considered
infrequent.
- If float, categories with a smaller cardinality than
min_frequency * n_samples will be considered infrequent.
max_categories – int, default=None Specifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, max_categories includes the category representing the infrequent categories along with the frequent categories. If None, there is no limit to the number of output features.

_encoder#: the sklearn OneHotEncoder instance.

示例

>>> from secretflow.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit(df)
>>> enc.transform(df)

Methods:

`__init__`([min_frequency, max_categories])
`fit`(df)	Fit this encoder with X.
`transform`(df)	Transform X using one-hot encoding.
`fit_transform`(df)	Fit this OneHotEncoder with X, then transform X.
`get_params`()

__init__(min_frequency=None, max_categories=None)[源代码]#

fit(df: Union[HDataFrame, VDataFrame, MixDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#: Fit this encoder with X.

transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#: Transform X using one-hot encoding.

fit_transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#: Fit this OneHotEncoder with X, then transform X.

get_params() → Dict[str, Any][源代码]#

class secretflow.preprocessing.MinMaxScaler[源代码]#

基类：_PreprocessBase

Transform features by scaling each feature to a given range.

_scaler#: the sklearn MinMaxScaler instance.

示例

>>> from secretflow.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> scaler.fit(df)
>>> scaler.transform(df)

Methods:

`fit`(df)	Compute the minimum and maximum for later scaling.
`transform`(df)	Scale features of X according to feature_range.
`fit_transform`(df)	Fit to X, then transform X.
`get_params`()

fit(df: Union[HDataFrame, VDataFrame, MixDataFrame])[源代码]#: Compute the minimum and maximum for later scaling.

transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#: Scale features of X according to feature_range.

fit_transform(df: Union[HDataFrame, VDataFrame])[源代码]#: Fit to X, then transform X.

get_params() → Dict[str, Any][源代码]#

class secretflow.preprocessing.StandardScaler(with_mean=True, with_std=True)[源代码]#

基类：_PreprocessBase

Standardize features by removing the mean and scaling to unit variance.

StandardScaler is similar to sklearn.preprocessing.StandardScaler. The main differences are a) takes HDataFrame/VDataFrame/MixDataFrame as input/output. b) does not support sparse matrix.

The standard score of a sample x is calculated as:

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

_scaler#: the sklearn StandardScaler instance.

_with_mean#: bool, default=True if True, center the data before scaling.

_with_std#: bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).

示例

>>> from secretflow.preprocessing import StandardScaler
>>> data = HDataFrame(...) # your HDataFrame/VDataFrame/MixDataFrame instance.
>>> scaler = StandardScaler()
>>> scaler.fit(data)
>>> print(scaler._scaler.mean_, scaler._scaler.var_)
>>> scaler.transform(data)

Methods:

`__init__`([with_mean, with_std])	param with_mean: optional; same as sklearn StandardScaler。
`fit`(df[, aggregator])	Fit a federated dataframe.
`transform`(df)	Transform a federated dataframe.
`fit_transform`(df[, aggregator])	A convenience combine of fit and transform.
`get_params`()

__init__(with_mean=True, with_std=True) → None[源代码]#

参数:

with_mean – optional; same as sklearn StandardScaler。
with_std – optional; same as sklearn StandardScaler

fit(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None)[源代码]#

Fit a federated dataframe.

参数:

df – the X to fit.
aggregator – optional; the aggregator to compute global mean and standard variance. Shall provided if X is a horizontal partitioned MixDataFrame.

transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) → Union[HDataFrame, VDataFrame, MixDataFrame][源代码]#

Transform a federated dataframe.

参数:: df – the X to transform.
返回:: a federated dataframe correspondint to the input X.

fit_transform(df: Union[HDataFrame, VDataFrame], aggregator: Optional[Aggregator] = None)[源代码]#: A convenience combine of fit and transform.

get_params() → Dict[str, Any][源代码]#

class secretflow.preprocessing.LogroundTransformer(decimals: int = 6, bias: float = 0.5)[源代码]#

基类：_FunctionTransformer

Constructs a transformer for calculating round(log2(x + bias)) of (partition of) dataframe.

参数:

decimals – Number of decimal places to round each column to. Defaults to 6.
bias – Add bias to value before log2. Defaults to 0.5.

Methods:

`__init__`([decimals, bias])
`get_params`()

__init__(decimals: int = 6, bias: float = 0.5)[源代码]#

get_params() → Dict[str, Any][源代码]#

secretflow.preprocessing.binning