secretflow.utils.simulation.data#
secretflow.utils.simulation.data.dataframe#
Functions:
|
Create a federated dataframe from a single data source. |
|
Create a HDataFrame from a single dataset source. |
|
Create a VDataFrame from a single dataset source. |
- secretflow.utils.simulation.data.dataframe.create_df(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], axis: int = 0, shuffle: bool = False, random_state: Optional[int] = None, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) Union[HDataFrame, VDataFrame] [源代码]#
Create a federated dataframe from a single data source.
- 参数:
source – the dataset source, shall be a file path or pandas.DataFrame or callable (shall returns a pandas.DataFrame).
parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict of pyu with value, the value shall be one of the followings: 1. a float 2. an interval in tuple closed on the left-side and open on the right-side.
axis – optional, the value is 0 or 1. 0 means split by row returning a horizontal partitioning federated DataFrame. 1 means split by column returning a vertical partitioning federated DataFrame.
shuffle – optional, if suffule the dataset before split.
random_state – optional, the random state for shuffle.
aggregator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.
comparator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.
- 返回:
return a HDataFrame if axis is 0 else VDataFrame.
- 返回类型:
Union[HDataFrame, VDataFrame]
示例
>>> df = pd.DataFrame({'f1': [1, 2, 3, 4], 'f3': [11, 12, 13, 14]})
>>> # Create a HDataFrame evenly. >>> hdf = create_df(df, [alice, bob], axis=0)
>>> # Create a VDataFrame with a given percentage. >>> vdf = create_df(df, {alice: 0.3, bob: 0.7}, axis=1)
>>> # Create a HDataFrame with a given index. >>> hdf = create_df(df, {alice: (0, 1), bob: (1, 4)})
- secretflow.utils.simulation.data.dataframe.create_hdf(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], shuffle: bool = False, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) HDataFrame [源代码]#
Create a HDataFrame from a single dataset source.
Refer to
create_df()
for full documentation.
- secretflow.utils.simulation.data.dataframe.create_vdf(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], shuffle: bool = False) VDataFrame [源代码]#
Create a VDataFrame from a single dataset source.
Refer to
create_df()
for full documentation.
secretflow.utils.simulation.data.ndarray#
Functions:
|
Create a federated ndarray from a single data source. |
- secretflow.utils.simulation.data.ndarray.create_ndarray(source: Union[str, ndarray, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], axis: int = 0, shuffle: bool = False, random_state: Optional[int] = None, allow_pickle: bool = False, is_torch: bool = False) FedNdarray [源代码]#
Create a federated ndarray from a single data source.
- 参数:
source – the dataset source, shall be a file path or numpy.ndarray or callable (shall returns a pandas.DataFrame).
parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict {PYU: value}, the value shall be one of the followings. 1) a float 2) an interval in tuple closed on the left-side and open on the right-side.
axis – optional, the value is 0 or 1. 0 means split by row returning a horizontal partitioning federated DataFrame. 1 means split by column returning a vertical partitioning federated DataFrame.
shuffle – optional, if suffule the dataset before split.
random_state – optional, the random state for shuffle.
allow_pickle – the np.load argument when source is a file path.
- 返回:
a FedNdrray.
示例
>>> arr = np.array([[1, 2, 3, 4], [11, 12, 13, 14]])
>>> # Create a horizontal partitioned FedNdarray evenly. >>> h_arr = created_ndarray(arr, [alice, bob], axis=0)
>>> # Create a vertical partitioned FedNdarray. >>> v_arr = created_ndarray(arr, {alice: 0.3, bob: 0.7}, axis=1)
>>> # Create a horizontal partitioned FedNdarray evenly. >>> h_arr = created_ndarray(arr, {alice: (0, 1), bob: (1, 4)})