secretflow.ml.boost.sgb_v.core.split_tree_trainer#

secretflow.ml.boost.sgb_v.core.split_tree_trainer.order_map_context#

Classes:

OrderMapContext()

Manage context related to order map, and bucket and split point information derived from it.

class secretflow.ml.boost.sgb_v.core.split_tree_trainer.order_map_context.OrderMapContext[源代码]#

基类：object

Manage context related to order map, and bucket and split point information derived from it.

Methods:

`__init__`()
`build_maps`(x, buckets)	split features into buckets and build maps use in train.
`get_order_map`()
`get_features`()
`get_feature_buckets`()
`get_feature_bucket_at`(index)
`get_split_points`()
`get_order_map_shape`()

__init__()[源代码]#

build_maps(x: ndarray, buckets: int) → None[源代码]#

split features into buckets and build maps use in train.

参数:: x – dataset from this partition.
返回:: leaf nodes’ selects

get_order_map() → ndarray[源代码]#

get_features() → int[源代码]#

get_feature_buckets() → List[int][源代码]#

get_feature_bucket_at(index: int) → int[源代码]#

get_split_points() → List[List[float]][源代码]#

get_order_map_shape() → Tuple[int, int][源代码]#

secretflow.ml.boost.sgb_v.core.split_tree_trainer.shuffler#

Classes:

Shuffler()

class secretflow.ml.boost.sgb_v.core.split_tree_trainer.shuffler.Shuffler[源代码]#

基类：object

Methods:

`__init__`()
`create_shuffle_mask`(key, bucket_list)	Create a shuffle of buckets for key i and for each bucket, The random mask is a list of list of int.
`get_shuffling_indices`(key)
`reset_shuffle_mask`()
`is_shuffled`()
`undo_shuffle_mask`(key, index)

__init__()[源代码]#

create_shuffle_mask(key: int, bucket_list: List[int]) → List[int][源代码]#

Create a shuffle of buckets for key i and for each bucket, The random mask is a list of list of int. The random mask is private to each worker. It should be used to shuffle encrypted gh sum, before sending to label holder. It should also be applied to restore the correct split bucket, after receiving from label holder.

The key is the index of node in a fixed level. We will create one mask each for each fewer number child node selects at a level. Note when calculating bucket sums, we only do it for either left or right child. The other can be calculated

参数:

key – int. Each node select corresponds to one key.
bucket_list – List[int]. List of number of buckets.

get_shuffling_indices(key: int) → List[int][源代码]#

reset_shuffle_mask()[源代码]#

is_shuffled() → bool[源代码]#

undo_shuffle_mask(key: int, index: int) → int[源代码]#

secretflow.ml.boost.sgb_v.core.split_tree_trainer.split_tree_trainer#

Classes:

SplitTreeTrainer

ActorProxy(SplitTreeTrainer) 的别名

secretflow.ml.boost.sgb_v.core.split_tree_trainer.split_tree_trainer.SplitTreeTrainer[源代码]#

ActorProxy(SplitTreeTrainer) 的别名 Methods:

`__init__`(args, *kwargs)	Abstraction device object base class.
`global_setup`(x, buckets, seed)	Set up global context.
`set_buckets_count`(buckets_count)	save how many buckets in each partition's all features.
`tree_setup`(colsample)	Set up tree context and do col sample if colsample < 1
`predict_leaf_selects`(x)
`tree_finish`(leaf_indices)
`do_split`(split_buckets, sampled_rows, ...)	record split info and generate next level's left children select.
`create_shuffle_mask`(key)
`reset_shuffle_mask`()

secretflow.ml.boost.sgb_v.core.split_tree_trainer.splitter#

Classes:

Splitter(idx)

class secretflow.ml.boost.sgb_v.core.split_tree_trainer.splitter.Splitter(idx: int)[源代码]#

基类：object

Methods:

`__init__`(idx)
`get_features`()
`get_feature_buckets`()
`get_feature_bucket_at`(index)
`get_order_map`()
`get_order_map_shape`()
`build_maps`(x, buckets)
`get_split_points`()
`get_col_choices`()
`set_up_col_choices`(colsample)
`set_buckets_count`(buckets_count)	save how many buckets in each partition's all features.
`find_split_bucket`(split_bucket)	check if this partition contains split bucket.
`get_split_feature`(split_bucket)	find split bucket is belong to which feature.
`compute_left_child_selects`(feature, ...[, ...])	Compute the left child node select in bool array based on order map, feature and split_point_index

__init__(idx: int)[源代码]#

get_features() → int[源代码]#

get_feature_buckets() → List[int][源代码]#

get_feature_bucket_at(index: int) → int[源代码]#

get_order_map() → ndarray[源代码]#

get_order_map_shape() → Tuple[int, int][源代码]#

build_maps(x: ndarray, buckets: int)[源代码]#

get_split_points() → List[List[int]][源代码]#

get_col_choices() → List[int][源代码]#

set_up_col_choices(colsample: float) → Tuple[ndarray, int][源代码]#

set_buckets_count(buckets_count: List[int]) → None[源代码]#: save how many buckets in each partition’s all features.

find_split_bucket(split_bucket: int) → int[源代码]#: check if this partition contains split bucket.

get_split_feature(split_bucket: int) → Tuple[int, int][源代码]#: find split bucket is belong to which feature.

compute_left_child_selects(feature: int, split_point_index: int, sampled_indices: Optional[List[int]] = None) → ndarray[源代码]#

Compute the left child node select in bool array based on order map, feature and split_point_index

参数:

feature (int) – which feature to split on
split_point_index (int) – choose which bucket
sampled_indices (Union[List[int], None], optional) – samples in original node. Defaults to None. None means all.

返回:

a 0/1 select array, shape (1 , sample number).: 1 means in left child, 0 otherwise.

返回类型:

np.ndarray