secretflow.data.horizontal#
Classes:
|
Federated dataframe holds horizontal partitioned data. |
Functions:
|
Read a comma-separated values (csv) file into HDataFrame. |
|
Write object to a comma-separated values (csv) file. |
- class secretflow.data.horizontal.HDataFrame(partitions: ~typing.Dict[~secretflow.device.device.pyu.PYU, ~secretflow.data.base.Partition] = <factory>, aggregator: ~typing.Optional[~secretflow.security.aggregation.aggregator.Aggregator] = None, comparator: ~typing.Optional[~secretflow.security.compare.comparator.Comparator] = None)[源代码]#
-
Federated dataframe holds horizontal partitioned data.
This dataframe is design to provide a federated pandas dataframe and just same as using pandas. The original data is still stored locally in the data holder and is not transmitted out of the domain during all the methods execution.
In some methods we need to compute the global statistics, e.g. global maximum is needed when call max method. A aggregator or comparator is expected here for global sum or extreme value respectively.
- partitions#
a dict of pyu and partition.
- aggregator#
the aggagator for computing global values such as mean.
- comparator#
the comparator for computing global values such as maximum/minimum.
示例
>>> from secretflow.data.horizontal import read_csv >>> from secretflow.security.aggregation import PlainAggregator, PlainComparator >>> from secretflow import PYU >>> alice = PYU('alice') >>> bob = PYU('bob') >>> h_df = read_csv({alice: 'alice.csv', bob: 'bob.csv'}, aggregator=PlainAggregagor(alice), comparator=PlainComparator(alice)) >>> h_df.columns Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'], dtype='object') >>> h_df.mean(numeric_only=True) sepal_length 5.827693 sepal_width 3.054000 petal_length 3.730000 petal_width 1.198667 dtype: float64 >>> h_df.min(numeric_only=True) sepal_length 4.3 sepal_width 2.0 petal_length 1.0 petal_width 0.1 dtype: float64 >>> h_df.max(numeric_only=True) sepal_length 7.9 sepal_width 4.4 petal_length 6.9 petal_width 2.5 dtype: float64 >>> h_df.count() sepal_length 130 sepal_width 150 petal_length 120 petal_width 150 class 150 dtype: int64 >>> h_df.fillna({'sepal_length': 2})
Attributes:
Return a federated Numpy representation of the DataFrame.
Return the dtypes in the DataFrame.
The column labels of the DataFrame.
Return a tuple representing the dimensionality of the DataFrame.
Methods:
mean
(*args, **kwargs)Return the mean of the values over the requested axis.
min
(*args, **kwargs)Return the min of the values over the requested axis.
max
(*args, **kwargs)Return the max of the values over the requested axis.
sum
(*args, **kwargs)Return the sum of the values over the requested axis.
count
(*args, **kwargs)Count non-NA cells for each column or row.
isna
()Detects missing values for an array-like object. Same as pandas.DataFrame.isna Returns DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
quantile
([q, axis])kurtosis
(*args, **kwargs)skew
(*args, **kwargs)sem
(*args, **kwargs)std
(*args, **kwargs)var
(*args, **kwargs)replace
(*args, **kwargs)mode
(*args, **kwargs)astype
(dtype[, copy, errors])Cast object to a specified dtype
dtype
.Return shapes of each partition.
copy
()Shallow copy of this dataframe.
drop
([labels, axis, index, columns, level, ...])Drop specified labels from rows or columns.
fillna
([value, method, axis, inplace, ...])Fill NA/NaN values using the specified method.
to_csv
(fileuris, **kwargs)Write object to a comma-separated values (csv) file.
__init__
([partitions, aggregator, comparator])- aggregator: Aggregator = None#
- comparator: Comparator = None#
- mean(*args, **kwargs) Series [源代码]#
Return the mean of the values over the requested axis.
All arguments are same with
pandas.DataFrame.mean()
.- 返回:
pd.Series
- min(*args, **kwargs) Series [源代码]#
Return the min of the values over the requested axis.
All arguments are same with
pandas.DataFrame.min()
.- 返回:
pd.Series
- max(*args, **kwargs) Series [源代码]#
Return the max of the values over the requested axis.
All arguments are same with
pandas.DataFrame.max()
.- 返回:
pd.Series
- sum(*args, **kwargs) Series [源代码]#
Return the sum of the values over the requested axis.
All arguments are same with
pandas.DataFrame.sum()
.- 返回:
pd.Series
- count(*args, **kwargs) Series [源代码]#
Count non-NA cells for each column or row.
All arguments are same with
pandas.DataFrame.count()
.- 返回:
pd.Series
- isna() HDataFrame [源代码]#
Detects missing values for an array-like object. Same as pandas.DataFrame.isna Returns
- DataFrame: Mask of bool values for each element in DataFrame
that indicates whether an element is an NA value.
- 返回:
VDataFrame
- Reference:
pd.DataFrame.isna
- property values: FedNdarray#
Return a federated Numpy representation of the DataFrame.
- 返回:
FedNdarray.
- property dtypes: Series#
Return the dtypes in the DataFrame.
- 返回:
the data type of each column.
- 返回类型:
pd.Series
- astype(dtype, copy: bool = True, errors: str = 'raise')[源代码]#
Cast object to a specified dtype
dtype
.All args are same as
pandas.DataFrame.astype()
.
- property columns#
The column labels of the DataFrame.
- property shape#
Return a tuple representing the dimensionality of the DataFrame.
- copy() HDataFrame [源代码]#
Shallow copy of this dataframe.
- 返回:
HDataFrame.
- drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') Optional[HDataFrame] [源代码]#
Drop specified labels from rows or columns.
All arguments are same with
pandas.DataFrame.drop()
.- 返回:
HDataFrame without the removed index or column labels or None if inplace=True.
- fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None) Optional[HDataFrame] [源代码]#
Fill NA/NaN values using the specified method.
All arguments are same with
pandas.DataFrame.fillna()
.- 返回:
HDataFrame with missing values filled or None if inplace=True.
- to_csv(fileuris: Dict[PYU, str], **kwargs)[源代码]#
Write object to a comma-separated values (csv) file.
- 参数:
fileuris – a dict of file uris specifying file for each PYU.
kwargs – other arguments are same with
pandas.DataFrame.to_csv()
.
- 返回:
Returns a list of PYUObjects whose value is none. You can use secretflow.wait to wait for the save to complete.
- __init__(partitions: ~typing.Dict[~secretflow.device.device.pyu.PYU, ~secretflow.data.base.Partition] = <factory>, aggregator: ~typing.Optional[~secretflow.security.aggregation.aggregator.Aggregator] = None, comparator: ~typing.Optional[~secretflow.security.compare.comparator.Comparator] = None) None #
- secretflow.data.horizontal.read_csv(filepath: Dict[PYU, str], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, **kwargs) HDataFrame [源代码]#
Read a comma-separated values (csv) file into HDataFrame.
- 参数:
filepath – a dict {PYU: file path}.
aggregator – optionla; the aggregator assigned to the dataframe.
comparator – optionla; the comparator assigned to the dataframe.
kwargs – all other arguments are same with
pandas.DataFrame.read_csv()
.
- 返回:
HDataFrame
示例
>>> read_csv({PYU('alice'): 'alice.csv', PYU('bob'): 'bob.csv'})
- secretflow.data.horizontal.to_csv(df: HDataFrame, file_uris: Dict[PYU, str], **kwargs)[源代码]#
Write object to a comma-separated values (csv) file.
- 参数:
df – the HDataFrame to save.
file_uris – the file path of each PYU.
kwargs – all other arguments are same with
pandas.DataFrame.to_csv()
.
secretflow.data.horizontal.dataframe#
Classes:
|
Federated dataframe holds horizontal partitioned data. |
- class secretflow.data.horizontal.dataframe.HDataFrame(partitions: ~typing.Dict[~secretflow.device.device.pyu.PYU, ~secretflow.data.base.Partition] = <factory>, aggregator: ~typing.Optional[~secretflow.security.aggregation.aggregator.Aggregator] = None, comparator: ~typing.Optional[~secretflow.security.compare.comparator.Comparator] = None)[源代码]#
-
Federated dataframe holds horizontal partitioned data.
This dataframe is design to provide a federated pandas dataframe and just same as using pandas. The original data is still stored locally in the data holder and is not transmitted out of the domain during all the methods execution.
In some methods we need to compute the global statistics, e.g. global maximum is needed when call max method. A aggregator or comparator is expected here for global sum or extreme value respectively.
- partitions#
a dict of pyu and partition.
- aggregator#
the aggagator for computing global values such as mean.
- comparator#
the comparator for computing global values such as maximum/minimum.
示例
>>> from secretflow.data.horizontal import read_csv >>> from secretflow.security.aggregation import PlainAggregator, PlainComparator >>> from secretflow import PYU >>> alice = PYU('alice') >>> bob = PYU('bob') >>> h_df = read_csv({alice: 'alice.csv', bob: 'bob.csv'}, aggregator=PlainAggregagor(alice), comparator=PlainComparator(alice)) >>> h_df.columns Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'], dtype='object') >>> h_df.mean(numeric_only=True) sepal_length 5.827693 sepal_width 3.054000 petal_length 3.730000 petal_width 1.198667 dtype: float64 >>> h_df.min(numeric_only=True) sepal_length 4.3 sepal_width 2.0 petal_length 1.0 petal_width 0.1 dtype: float64 >>> h_df.max(numeric_only=True) sepal_length 7.9 sepal_width 4.4 petal_length 6.9 petal_width 2.5 dtype: float64 >>> h_df.count() sepal_length 130 sepal_width 150 petal_length 120 petal_width 150 class 150 dtype: int64 >>> h_df.fillna({'sepal_length': 2})
Attributes:
Return a federated Numpy representation of the DataFrame.
Return the dtypes in the DataFrame.
The column labels of the DataFrame.
Return a tuple representing the dimensionality of the DataFrame.
Methods:
mean
(*args, **kwargs)Return the mean of the values over the requested axis.
min
(*args, **kwargs)Return the min of the values over the requested axis.
max
(*args, **kwargs)Return the max of the values over the requested axis.
sum
(*args, **kwargs)Return the sum of the values over the requested axis.
count
(*args, **kwargs)Count non-NA cells for each column or row.
isna
()Detects missing values for an array-like object. Same as pandas.DataFrame.isna Returns DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
quantile
([q, axis])kurtosis
(*args, **kwargs)skew
(*args, **kwargs)sem
(*args, **kwargs)std
(*args, **kwargs)var
(*args, **kwargs)replace
(*args, **kwargs)mode
(*args, **kwargs)astype
(dtype[, copy, errors])Cast object to a specified dtype
dtype
.Return shapes of each partition.
copy
()Shallow copy of this dataframe.
drop
([labels, axis, index, columns, level, ...])Drop specified labels from rows or columns.
fillna
([value, method, axis, inplace, ...])Fill NA/NaN values using the specified method.
to_csv
(fileuris, **kwargs)Write object to a comma-separated values (csv) file.
__init__
([partitions, aggregator, comparator])- aggregator: Aggregator = None#
- comparator: Comparator = None#
- mean(*args, **kwargs) Series [源代码]#
Return the mean of the values over the requested axis.
All arguments are same with
pandas.DataFrame.mean()
.- 返回:
pd.Series
- min(*args, **kwargs) Series [源代码]#
Return the min of the values over the requested axis.
All arguments are same with
pandas.DataFrame.min()
.- 返回:
pd.Series
- max(*args, **kwargs) Series [源代码]#
Return the max of the values over the requested axis.
All arguments are same with
pandas.DataFrame.max()
.- 返回:
pd.Series
- sum(*args, **kwargs) Series [源代码]#
Return the sum of the values over the requested axis.
All arguments are same with
pandas.DataFrame.sum()
.- 返回:
pd.Series
- count(*args, **kwargs) Series [源代码]#
Count non-NA cells for each column or row.
All arguments are same with
pandas.DataFrame.count()
.- 返回:
pd.Series
- isna() HDataFrame [源代码]#
Detects missing values for an array-like object. Same as pandas.DataFrame.isna Returns
- DataFrame: Mask of bool values for each element in DataFrame
that indicates whether an element is an NA value.
- 返回:
VDataFrame
- Reference:
pd.DataFrame.isna
- property values: FedNdarray#
Return a federated Numpy representation of the DataFrame.
- 返回:
FedNdarray.
- property dtypes: Series#
Return the dtypes in the DataFrame.
- 返回:
the data type of each column.
- 返回类型:
pd.Series
- astype(dtype, copy: bool = True, errors: str = 'raise')[源代码]#
Cast object to a specified dtype
dtype
.All args are same as
pandas.DataFrame.astype()
.
- property columns#
The column labels of the DataFrame.
- property shape#
Return a tuple representing the dimensionality of the DataFrame.
- copy() HDataFrame [源代码]#
Shallow copy of this dataframe.
- 返回:
HDataFrame.
- drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') Optional[HDataFrame] [源代码]#
Drop specified labels from rows or columns.
All arguments are same with
pandas.DataFrame.drop()
.- 返回:
HDataFrame without the removed index or column labels or None if inplace=True.
- fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None) Optional[HDataFrame] [源代码]#
Fill NA/NaN values using the specified method.
All arguments are same with
pandas.DataFrame.fillna()
.- 返回:
HDataFrame with missing values filled or None if inplace=True.
- to_csv(fileuris: Dict[PYU, str], **kwargs)[源代码]#
Write object to a comma-separated values (csv) file.
- 参数:
fileuris – a dict of file uris specifying file for each PYU.
kwargs – other arguments are same with
pandas.DataFrame.to_csv()
.
- 返回:
Returns a list of PYUObjects whose value is none. You can use secretflow.wait to wait for the save to complete.
- __init__(partitions: ~typing.Dict[~secretflow.device.device.pyu.PYU, ~secretflow.data.base.Partition] = <factory>, aggregator: ~typing.Optional[~secretflow.security.aggregation.aggregator.Aggregator] = None, comparator: ~typing.Optional[~secretflow.security.compare.comparator.Comparator] = None) None #
secretflow.data.horizontal.io#
Functions:
|
Read a comma-separated values (csv) file into HDataFrame. |
|
Write object to a comma-separated values (csv) file. |
- secretflow.data.horizontal.io.read_csv(filepath: Dict[PYU, str], aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None, **kwargs) HDataFrame [源代码]#
Read a comma-separated values (csv) file into HDataFrame.
- 参数:
filepath – a dict {PYU: file path}.
aggregator – optionla; the aggregator assigned to the dataframe.
comparator – optionla; the comparator assigned to the dataframe.
kwargs – all other arguments are same with
pandas.DataFrame.read_csv()
.
- 返回:
HDataFrame
示例
>>> read_csv({PYU('alice'): 'alice.csv', PYU('bob'): 'bob.csv'})
- secretflow.data.horizontal.io.to_csv(df: HDataFrame, file_uris: Dict[PYU, str], **kwargs)[源代码]#
Write object to a comma-separated values (csv) file.
- 参数:
df – the HDataFrame to save.
file_uris – the file path of each PYU.
kwargs – all other arguments are same with
pandas.DataFrame.to_csv()
.
secretflow.data.horizontal.sampler#
Classes:
|
Generates data with poisson sampling |
- class secretflow.data.horizontal.sampler.PoissonDataSampler(x, y, s_w, sampling_rate, **kwargs)[源代码]#
基类:
Sequence
Generates data with poisson sampling
Methods:
__init__
(x, y, s_w, sampling_rate, **kwargs)Initialization
set_random_seed
(random_seed)