secretflow.ml.boost.sgb_v.core.split_tree_trainer#

secretflow.ml.boost.sgb_v.core.split_tree_trainer.order_map_context#

Classes:

OrderMapContext()

Manage context related to order map, and bucket and split point information derived from it.

class secretflow.ml.boost.sgb_v.core.split_tree_trainer.order_map_context.OrderMapContext[source]#

Bases: object

Manage context related to order map, and bucket and split point information derived from it.

Methods:

__init__()

build_maps(x, buckets)

split features into buckets and build maps use in train.

get_order_map()

get_features()

get_feature_buckets()

get_feature_bucket_at(index)

get_split_points()

get_order_map_shape()

__init__()[source]#
build_maps(x: ndarray, buckets: int) None[source]#

split features into buckets and build maps use in train.

Parameters:

x – dataset from this partition.

Returns:

leaf nodes’ selects

get_order_map() ndarray[source]#
get_features() int[source]#
get_feature_buckets() List[int][source]#
get_feature_bucket_at(index: int) int[source]#
get_split_points() List[List[float]][source]#
get_order_map_shape() Tuple[int, int][source]#

secretflow.ml.boost.sgb_v.core.split_tree_trainer.shuffler#

Classes:

Shuffler()

class secretflow.ml.boost.sgb_v.core.split_tree_trainer.shuffler.Shuffler[source]#

Bases: object

Methods:

__init__()

create_shuffle_mask(key, bucket_list)

Create a shuffle of buckets for key i and for each bucket, The random mask is a list of list of int.

get_shuffling_indices(key)

reset_shuffle_mask()

is_shuffled()

undo_shuffle_mask(key, index)

__init__()[source]#
create_shuffle_mask(key: int, bucket_list: List[int]) List[int][source]#

Create a shuffle of buckets for key i and for each bucket, The random mask is a list of list of int. The random mask is private to each worker. It should be used to shuffle encrypted gh sum, before sending to label holder. It should also be applied to restore the correct split bucket, after receiving from label holder.

The key is the index of node in a fixed level. We will create one mask each for each fewer number child node selects at a level. Note when calculating bucket sums, we only do it for either left or right child. The other can be calculated

Parameters:
  • key – int. Each node select corresponds to one key.

  • bucket_list – List[int]. List of number of buckets.

get_shuffling_indices(key: int) List[int][source]#
reset_shuffle_mask()[source]#
is_shuffled() bool[source]#
undo_shuffle_mask(key: int, index: int) int[source]#

secretflow.ml.boost.sgb_v.core.split_tree_trainer.split_tree_trainer#

Classes:

SplitTreeTrainer

alias of ActorProxy(SplitTreeTrainer)

secretflow.ml.boost.sgb_v.core.split_tree_trainer.split_tree_trainer.SplitTreeTrainer[source]#

alias of ActorProxy(SplitTreeTrainer) Methods:

__init__(*args, **kwargs)

Abstraction device object base class.

global_setup(x, buckets, seed)

Set up global context.

set_buckets_count(buckets_count)

save how many buckets in each partition's all features.

tree_setup(colsample)

Set up tree context and do col sample if colsample < 1

predict_leaf_selects(x)

tree_finish(leaf_indices)

do_split(split_buckets, sampled_rows, ...)

record split info and generate next level's left children select.

create_shuffle_mask(key)

reset_shuffle_mask()

secretflow.ml.boost.sgb_v.core.split_tree_trainer.splitter#

Classes:

Splitter(idx)

class secretflow.ml.boost.sgb_v.core.split_tree_trainer.splitter.Splitter(idx: int)[source]#

Bases: object

Methods:

__init__(idx)

get_features()

get_feature_buckets()

get_feature_bucket_at(index)

get_order_map()

get_order_map_shape()

build_maps(x, buckets)

get_split_points()

get_col_choices()

set_up_col_choices(colsample)

set_buckets_count(buckets_count)

save how many buckets in each partition's all features.

find_split_bucket(split_bucket)

check if this partition contains split bucket.

get_split_feature(split_bucket)

find split bucket is belong to which feature.

compute_left_child_selects(feature, ...[, ...])

Compute the left child node select in bool array based on order map, feature and split_point_index

__init__(idx: int)[source]#
get_features() int[source]#
get_feature_buckets() List[int][source]#
get_feature_bucket_at(index: int) int[source]#
get_order_map() ndarray[source]#
get_order_map_shape() Tuple[int, int][source]#
build_maps(x: ndarray, buckets: int)[source]#
get_split_points() List[List[int]][source]#
get_col_choices() List[int][source]#
set_up_col_choices(colsample: float) Tuple[ndarray, int][source]#
set_buckets_count(buckets_count: List[int]) None[source]#

save how many buckets in each partition’s all features.

find_split_bucket(split_bucket: int) int[source]#

check if this partition contains split bucket.

get_split_feature(split_bucket: int) Tuple[int, int][source]#

find split bucket is belong to which feature.

compute_left_child_selects(feature: int, split_point_index: int, sampled_indices: Optional[List[int]] = None) ndarray[source]#

Compute the left child node select in bool array based on order map, feature and split_point_index

Parameters:
  • feature (int) – which feature to split on

  • split_point_index (int) – choose which bucket

  • sampled_indices (Union[List[int], None], optional) – samples in original node. Defaults to None. None means all.

Returns:

a 0/1 select array, shape (1 , sample number).

1 means in left child, 0 otherwise.

Return type:

np.ndarray