secretflow.preprocessing#
Classes:
|
Bin continuous data into intervals. |
Encode target labels with value between 0 and n_classes-1. |
|
|
Encode categorical features as a one-hot numeric array. |
Transform features by scaling each feature to a given range. |
|
|
Standardize features by removing the mean and scaling to unit variance. |
|
Constructs a transformer for calculating round(log2(x + bias)) of (partition of) dataframe. |
- class secretflow.preprocessing.KBinsDiscretizer(n_bins=5, strategy: str = 'quantile')[源代码]#
基类:
_PreprocessBase
Bin continuous data into intervals.
This KBinsDiscretizer is almost same as
sklearn.preprocessing.KBinsDiscretizer
where the input and output are federated dataframe.- _discretizer#
the sklearn.preprocessing.KBinsDiscretizer instance used.
- _n_bins#
The number of bins to produce.
- _strategy#
{‘uniform’, ‘quantile’}, notice that ‘kmeans’ is not supported yet now.
Methods:
__init__
([n_bins, strategy])fit
(df[, aggregator, comparator, ...])Fit the estimator.
transform
(df)Discretize the data.
fit_transform
(df[, aggregator, comparator, ...])Fit the estimator with X and then transform.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, compress_thres: int = 10000, error: float = 10000.0, max_iter: int = 200) KBinsDiscretizer [源代码]#
Fit the estimator.
- 参数:
df – the X to fit.
aggregator – optional; shall be provided if df is a horizontal partitioned MixDataFrame.
comparator – optional; shall be provided if df is a horizontal partitioned MixDataFrame.
compress_thres – optional; the compress threshold of
HomoBinning
.error – optional; the error of
HomoBinning
.max_iter – optional; the max iterations of
HomoBinning
.
- 返回:
the instance itself.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Discretize the data.
- 参数:
df – the X to discretize.
- 返回:
the transformed X in federated dataframe.
- fit_transform(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, compress_thres: int = 10000, error: float = 10000.0, max_iter: int = 200)[源代码]#
Fit the estimator with X and then transform. Just a convience combine of fit and transform methods.
- class secretflow.preprocessing.LabelEncoder[源代码]#
基类:
_PreprocessBase
Encode target labels with value between 0 and n_classes-1.
Just same as
sklearn.preprocessing.LabelEncoder
where the input/ouput is federated dataframe.- _encoder#
the sklearn LabelEncoder instance.
示例
>>> from secretflow.preprocessing import LabelEncoder >>> le = LabelEncoder() >>> le.fit(df) >>> le.transform(df)
Methods:
fit
(df)Fit label encoder.
transform
(df)Transform labels to normalized encoding.
fit_transform
(df)Fit label encoder and return encoded labels.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame])[源代码]#
Fit label encoder.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Transform labels to normalized encoding.
- fit_transform(df: Union[HDataFrame, VDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Fit label encoder and return encoded labels.
- class secretflow.preprocessing.OneHotEncoder(min_frequency=None, max_categories=None)[源代码]#
基类:
_PreprocessBase
Encode categorical features as a one-hot numeric array.
Just same as
sklearn.preprocessing.OneHotEncoder
where the input/ouput is federated dataframe.- Note: min_frequency and max_categories are calculated by partition,
so they are only available for vertical scenarios currently.
- 参数:
min_frequency –
int or float, default=None Specifies the minimum frequency below which a category will be considered infrequent.
If int, categories with a smaller cardinality will be considered
infrequent.
If float, categories with a smaller cardinality than
min_frequency * n_samples will be considered infrequent.
max_categories – int, default=None Specifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, max_categories includes the category representing the infrequent categories along with the frequent categories. If None, there is no limit to the number of output features.
- _encoder#
the sklearn OneHotEncoder instance.
示例
>>> from secretflow.preprocessing import OneHotEncoder >>> enc = OneHotEncoder() >>> enc.fit(df) >>> enc.transform(df)
Methods:
__init__
([min_frequency, max_categories])fit
(df)Fit this encoder with X.
transform
(df)Transform X using one-hot encoding.
fit_transform
(df)Fit this OneHotEncoder with X, then transform X.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Fit this encoder with X.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Transform X using one-hot encoding.
- fit_transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Fit this OneHotEncoder with X, then transform X.
- class secretflow.preprocessing.MinMaxScaler[源代码]#
基类:
_PreprocessBase
Transform features by scaling each feature to a given range.
- _scaler#
the sklearn MinMaxScaler instance.
示例
>>> from secretflow.preprocessing import MinMaxScaler >>> scaler = MinMaxScaler() >>> scaler.fit(df) >>> scaler.transform(df)
Methods:
fit
(df)Compute the minimum and maximum for later scaling.
transform
(df)Scale features of X according to feature_range.
fit_transform
(df)Fit to X, then transform X.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame])[源代码]#
Compute the minimum and maximum for later scaling.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Scale features of X according to feature_range.
- fit_transform(df: Union[HDataFrame, VDataFrame])[源代码]#
Fit to X, then transform X.
- class secretflow.preprocessing.StandardScaler(with_mean=True, with_std=True)[源代码]#
基类:
_PreprocessBase
Standardize features by removing the mean and scaling to unit variance.
StandardScaler is similar to
sklearn.preprocessing.StandardScaler
. The main differences are a) takes HDataFrame/VDataFrame/MixDataFrame as input/output. b) does not support sparse matrix.The standard score of a sample x is calculated as:
z = (x - u) / s
where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
- _scaler#
the sklearn StandardScaler instance.
- _with_mean#
bool, default=True if True, center the data before scaling.
- _with_std#
bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).
示例
>>> from secretflow.preprocessing import StandardScaler >>> data = HDataFrame(...) # your HDataFrame/VDataFrame/MixDataFrame instance. >>> scaler = StandardScaler() >>> scaler.fit(data) >>> print(scaler._scaler.mean_, scaler._scaler.var_) >>> scaler.transform(data)
Methods:
__init__
([with_mean, with_std])- param with_mean:
optional; same as sklearn StandardScaler。
fit
(df[, aggregator])Fit a federated dataframe.
transform
(df)Transform a federated dataframe.
fit_transform
(df[, aggregator])A convenience combine of fit and transform.
- __init__(with_mean=True, with_std=True) None [源代码]#
- 参数:
with_mean – optional; same as sklearn StandardScaler。
with_std – optional; same as sklearn StandardScaler
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None)[源代码]#
Fit a federated dataframe.
- 参数:
df – the X to fit.
aggregator – optional; the aggregator to compute global mean and standard variance. Shall provided if X is a horizontal partitioned MixDataFrame.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Transform a federated dataframe.
- 参数:
df – the X to transform.
- 返回:
a federated dataframe correspondint to the input X.
- fit_transform(df: Union[HDataFrame, VDataFrame], aggregator: Optional[Aggregator] = None)[源代码]#
A convenience combine of fit and transform.
- class secretflow.preprocessing.LogroundTransformer(decimals: int = 6, bias: float = 0.5)[源代码]#
基类:
_FunctionTransformer
Constructs a transformer for calculating round(log2(x + bias)) of (partition of) dataframe.
- 参数:
decimals – Number of decimal places to round each column to. Defaults to 6.
bias – Add bias to value before log2. Defaults to 0.5.
Methods:
__init__
([decimals, bias])
secretflow.preprocessing.base#
secretflow.preprocessing.discretization#
Classes:
|
Bin continuous data into intervals. |
- class secretflow.preprocessing.discretization.KBinsDiscretizer(n_bins=5, strategy: str = 'quantile')[源代码]#
基类:
_PreprocessBase
Bin continuous data into intervals.
This KBinsDiscretizer is almost same as
sklearn.preprocessing.KBinsDiscretizer
where the input and output are federated dataframe.- _discretizer#
the sklearn.preprocessing.KBinsDiscretizer instance used.
- _n_bins#
The number of bins to produce.
- _strategy#
{‘uniform’, ‘quantile’}, notice that ‘kmeans’ is not supported yet now.
Methods:
__init__
([n_bins, strategy])fit
(df[, aggregator, comparator, ...])Fit the estimator.
transform
(df)Discretize the data.
fit_transform
(df[, aggregator, comparator, ...])Fit the estimator with X and then transform.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, compress_thres: int = 10000, error: float = 10000.0, max_iter: int = 200) KBinsDiscretizer [源代码]#
Fit the estimator.
- 参数:
df – the X to fit.
aggregator – optional; shall be provided if df is a horizontal partitioned MixDataFrame.
comparator – optional; shall be provided if df is a horizontal partitioned MixDataFrame.
compress_thres – optional; the compress threshold of
HomoBinning
.error – optional; the error of
HomoBinning
.max_iter – optional; the max iterations of
HomoBinning
.
- 返回:
the instance itself.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Discretize the data.
- 参数:
df – the X to discretize.
- 返回:
the transformed X in federated dataframe.
- fit_transform(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, compress_thres: int = 10000, error: float = 10000.0, max_iter: int = 200)[源代码]#
Fit the estimator with X and then transform. Just a convience combine of fit and transform methods.
secretflow.preprocessing.encoder#
Classes:
Encode target labels with value between 0 and n_classes-1. |
|
|
Encode categorical features as a one-hot numeric array. |
- class secretflow.preprocessing.encoder.LabelEncoder[源代码]#
基类:
_PreprocessBase
Encode target labels with value between 0 and n_classes-1.
Just same as
sklearn.preprocessing.LabelEncoder
where the input/ouput is federated dataframe.- _encoder#
the sklearn LabelEncoder instance.
示例
>>> from secretflow.preprocessing import LabelEncoder >>> le = LabelEncoder() >>> le.fit(df) >>> le.transform(df)
Methods:
fit
(df)Fit label encoder.
transform
(df)Transform labels to normalized encoding.
fit_transform
(df)Fit label encoder and return encoded labels.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame])[源代码]#
Fit label encoder.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Transform labels to normalized encoding.
- fit_transform(df: Union[HDataFrame, VDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Fit label encoder and return encoded labels.
- class secretflow.preprocessing.encoder.OneHotEncoder(min_frequency=None, max_categories=None)[源代码]#
基类:
_PreprocessBase
Encode categorical features as a one-hot numeric array.
Just same as
sklearn.preprocessing.OneHotEncoder
where the input/ouput is federated dataframe.- Note: min_frequency and max_categories are calculated by partition,
so they are only available for vertical scenarios currently.
- 参数:
min_frequency –
int or float, default=None Specifies the minimum frequency below which a category will be considered infrequent.
If int, categories with a smaller cardinality will be considered
infrequent.
If float, categories with a smaller cardinality than
min_frequency * n_samples will be considered infrequent.
max_categories – int, default=None Specifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, max_categories includes the category representing the infrequent categories along with the frequent categories. If None, there is no limit to the number of output features.
- _encoder#
the sklearn OneHotEncoder instance.
示例
>>> from secretflow.preprocessing import OneHotEncoder >>> enc = OneHotEncoder() >>> enc.fit(df) >>> enc.transform(df)
Methods:
__init__
([min_frequency, max_categories])fit
(df)Fit this encoder with X.
transform
(df)Transform X using one-hot encoding.
fit_transform
(df)Fit this OneHotEncoder with X, then transform X.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Fit this encoder with X.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Transform X using one-hot encoding.
- fit_transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Fit this OneHotEncoder with X, then transform X.
secretflow.preprocessing.scaler#
Classes:
Transform features by scaling each feature to a given range. |
|
|
Standardize features by removing the mean and scaling to unit variance. |
- class secretflow.preprocessing.scaler.MinMaxScaler[源代码]#
基类:
_PreprocessBase
Transform features by scaling each feature to a given range.
- _scaler#
the sklearn MinMaxScaler instance.
示例
>>> from secretflow.preprocessing import MinMaxScaler >>> scaler = MinMaxScaler() >>> scaler.fit(df) >>> scaler.transform(df)
Methods:
fit
(df)Compute the minimum and maximum for later scaling.
transform
(df)Scale features of X according to feature_range.
fit_transform
(df)Fit to X, then transform X.
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame])[源代码]#
Compute the minimum and maximum for later scaling.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Scale features of X according to feature_range.
- fit_transform(df: Union[HDataFrame, VDataFrame])[源代码]#
Fit to X, then transform X.
- class secretflow.preprocessing.scaler.StandardScaler(with_mean=True, with_std=True)[源代码]#
基类:
_PreprocessBase
Standardize features by removing the mean and scaling to unit variance.
StandardScaler is similar to
sklearn.preprocessing.StandardScaler
. The main differences are a) takes HDataFrame/VDataFrame/MixDataFrame as input/output. b) does not support sparse matrix.The standard score of a sample x is calculated as:
z = (x - u) / s
where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
- _scaler#
the sklearn StandardScaler instance.
- _with_mean#
bool, default=True if True, center the data before scaling.
- _with_std#
bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).
示例
>>> from secretflow.preprocessing import StandardScaler >>> data = HDataFrame(...) # your HDataFrame/VDataFrame/MixDataFrame instance. >>> scaler = StandardScaler() >>> scaler.fit(data) >>> print(scaler._scaler.mean_, scaler._scaler.var_) >>> scaler.transform(data)
Methods:
__init__
([with_mean, with_std])- param with_mean:
optional; same as sklearn StandardScaler。
fit
(df[, aggregator])Fit a federated dataframe.
transform
(df)Transform a federated dataframe.
fit_transform
(df[, aggregator])A convenience combine of fit and transform.
- __init__(with_mean=True, with_std=True) None [源代码]#
- 参数:
with_mean – optional; same as sklearn StandardScaler。
with_std – optional; same as sklearn StandardScaler
- fit(df: Union[HDataFrame, VDataFrame, MixDataFrame], aggregator: Optional[Aggregator] = None)[源代码]#
Fit a federated dataframe.
- 参数:
df – the X to fit.
aggregator – optional; the aggregator to compute global mean and standard variance. Shall provided if X is a horizontal partitioned MixDataFrame.
- transform(df: Union[HDataFrame, VDataFrame, MixDataFrame]) Union[HDataFrame, VDataFrame, MixDataFrame] [源代码]#
Transform a federated dataframe.
- 参数:
df – the X to transform.
- 返回:
a federated dataframe correspondint to the input X.
- fit_transform(df: Union[HDataFrame, VDataFrame], aggregator: Optional[Aggregator] = None)[源代码]#
A convenience combine of fit and transform.
secretflow.preprocessing.transformer#
Classes:
|
Constructs a transformer for calculating round(log2(x + bias)) of (partition of) dataframe. |
- class secretflow.preprocessing.transformer.LogroundTransformer(decimals: int = 6, bias: float = 0.5)[源代码]#
基类:
_FunctionTransformer
Constructs a transformer for calculating round(log2(x + bias)) of (partition of) dataframe.
- 参数:
decimals – Number of decimal places to round each column to. Defaults to 6.
bias – Add bias to value before log2. Defaults to 0.5.
Methods:
__init__
([decimals, bias])