secretflow.ml.boost.homo_boost.tree_core#
secretflow.ml.boost.homo_boost.tree_core.criterion#
Classes:
Base class for split criterion |
|
|
XgboostCriterion 分裂规则类 . |
- class secretflow.ml.boost.homo_boost.tree_core.criterion.Criterion[源代码]#
基类:
ABC
Base class for split criterion
Methods:
split_gain
(left_node_sum, right_node_sum)
- class secretflow.ml.boost.homo_boost.tree_core.criterion.XgboostCriterion(reg_lambda: float = 0.1, reg_alpha: float = 0, decimal: int = 10)[源代码]#
基类:
Criterion
XgboostCriterion 分裂规则类 .. attribute:: reg_lambda
L2 regularization term on weight
- reg_alpha#
L1 regularization term on weight
- decimal#
truncate parms
Methods:
__init__
([reg_lambda, reg_alpha, decimal])split_gain
(node_sum, left_node_sum, ...)Calculate split gain :param node_sum: After the split, Grad and Hess at this node :param left_node_sum: After the split, Grad and Hess at the left split point :param right_node_sum: After the split, Grad and Hess at the right split point
truncate
(f[, decimal])Truncate control precision can reduce training time with early stop
node_gain
(sum_grad, sum_hess)Calculate node gain :param sum_grad: Sum of gradient :param sum_hess: Sum of hessian
node_weight
(sum_grad, sum_hess)Calculte node weight :param sum_grad: Sum of gradient :param sum_hess: Sum of hessian
- split_gain(node_sum: Tuple[float, float], left_node_sum: Tuple[float, float], right_node_sum: Tuple[float, float]) float [源代码]#
Calculate split gain :param node_sum: After the split, Grad and Hess at this node :param left_node_sum: After the split, Grad and Hess at the left split point :param right_node_sum: After the split, Grad and Hess at the right split point
- 返回:
Split gain of this split
- 返回类型:
gain
- static truncate(f, decimal=10)[源代码]#
Truncate control precision can reduce training time with early stop
secretflow.ml.boost.homo_boost.tree_core.decision_tree#
Classes:
|
Class for local version decision tree |
- class secretflow.ml.boost.homo_boost.tree_core.decision_tree.DecisionTree(tree_param: Optional[TreeParam] = None, data: Optional[DataFrame] = None, bin_split_points: Optional[ndarray] = None, tree_id: Optional[int] = None, group_id: Optional[int] = None, iter_round: Optional[int] = None, grad_key: str = 'grad', hess_key: str = 'hess', label_key: str = 'label')[源代码]#
基类:
object
Class for local version decision tree
- tree_param#
params for tree build
- data#
training data, HdataFrame
- bin_split_points#
global binning infos
- tree_id#
tree id
- group_id#
group id indicates which class the tree classifies
- iter_round#
iteration round
- hess_key#
unique column name for hess value
- grad_key#
unique column name for grad value
Methods:
__init__
([tree_param, data, ...])feature_col_sample
(all_features[, sample_rate])Column sample for features :param all_features: A list of feature names for all columns :param sample_rate: subsample rate, a float-number in [0, 1]
convert bid to real value
get_grad_hess_sum
(data_frame)calculate sum of grad and hess :param data_frame: data frame which contains hess and grad
update_feature_importance
(split_info)Calculate feature importance default split count :param split_info: Global optimal splitting information calculated from histogram
fit
()Entrance for local decision tree
update_tree
(cur_to_split, split_info, ...)Tree update function :param cur_to_split: List of nodes to be split :param split_info: Global optim split info :param cur_data_frames: List of dataframe in each node
init_xgboost_model
(model_path)Init standard xgboost model :param model_path: model path
save_xgboost_model
(model_path, tree_nodes)Transform tree info to standard xgboost model ref: https://xgboost.readthedocs.io/en/latest/dev/structxgboost_1_1TreeParam.html#aab8ff286e59f1bbab47bfa865da4a107 :param model_path: model path :param tree_nodes: federate decision tree internal model
- __init__(tree_param: Optional[TreeParam] = None, data: Optional[DataFrame] = None, bin_split_points: Optional[ndarray] = None, tree_id: Optional[int] = None, group_id: Optional[int] = None, iter_round: Optional[int] = None, grad_key: str = 'grad', hess_key: str = 'hess', label_key: str = 'label')[源代码]#
- feature_col_sample(all_features: List[str], sample_rate: float = 1.0)[源代码]#
Column sample for features :param all_features: A list of feature names for all columns :param sample_rate: subsample rate, a float-number in [0, 1]
- 返回:
A dict of valid features, which will be use in this round built
- 返回类型:
valid_features
- get_grad_hess_sum(data_frame)[源代码]#
calculate sum of grad and hess :param data_frame: data frame which contains hess and grad
- 返回:
sum of grad hess: sum of hess
- 返回类型:
grad
- update_feature_importance(split_info)[源代码]#
Calculate feature importance default split count :param split_info: Global optimal splitting information calculated from histogram
- update_tree(cur_to_split: List[Node], split_info: List[SplitInfo], cur_data_frames: List[DataFrame])[源代码]#
Tree update function :param cur_to_split: List of nodes to be split :param split_info: Global optim split info :param cur_data_frames: List of dataframe in each node
- 返回:
List of nodes to be evaluated in the next iteration next_layer_data: List of data to be evaluated in the next iteration
- 返回类型:
next_layer_node
- save_xgboost_model(model_path: str, tree_nodes: List[Node])[源代码]#
Transform tree info to standard xgboost model ref: https://xgboost.readthedocs.io/en/latest/dev/structxgboost_1_1TreeParam.html#aab8ff286e59f1bbab47bfa865da4a107 :param model_path: model path :param tree_nodes: federate decision tree internal model
- 返回:
update standard xgboost model on the model path
secretflow.ml.boost.homo_boost.tree_core.feature_histogram#
Classes:
|
Histogram container |
Feature Histogram |
- class secretflow.ml.boost.homo_boost.tree_core.feature_histogram.HistogramBag(histogram: Optional[List] = None, hid: int = -1, p_hid: int = -1)[源代码]#
基类:
object
Histogram container
- histogram#
Histogram list calculated by calculate_histogram
- Type:
List
- hid#
histogram id
- Type:
int
- p_hid#
parent histogram id
- Type:
int
Attributes:
Methods:
binary_op
(other, func[, inplace])__init__
([histogram, hid, p_hid])- histogram: List = None#
- hid: int = -1#
- p_hid: int = -1#
- __init__(histogram: Optional[List] = None, hid: int = -1, p_hid: int = -1) None #
- class secretflow.ml.boost.homo_boost.tree_core.feature_histogram.FeatureHistogram[源代码]#
基类:
object
Feature Histogram
Methods:
calculate_histogram
(data_frame_list, ...[, ...])Calculate histogram according to G and H histogram: [cols,[buckets,[sum_g,sum_h,count]]
calculate_single_histogram
(data, bin_split_point)- static calculate_histogram(data_frame_list: List[DataFrame], bin_split_points: ndarray, valid_features: Optional[Dict] = None, use_missing: bool = False, grad_key: str = 'grad', hess_key: str = 'hess', thread_pool: Optional[ThreadPoolExecutor] = None)[源代码]#
Calculate histogram according to G and H histogram: [cols,[buckets,[sum_g,sum_h,count]]
- 参数:
data_frame_list – A list of data frame, which contain grad and hess
bin_split_points – global split point dicts
valid_features – valid feature names Dict[id:bool]
use_missing – whether missing value participate in train
grad_key – unique column name for grad value
hess_key – unique column name for hess value
- 返回:
一个List[histogram1, histogram2, …]
- 返回类型:
node_histograms
secretflow.ml.boost.homo_boost.tree_core.feature_importance#
Classes:
|
Feature importance class |
- class secretflow.ml.boost.homo_boost.tree_core.feature_importance.FeatureImportance(main_importance: float = 0, other_importance: float = 0, main_type: str = 'split')[源代码]#
基类:
object
Feature importance class
- main_importance#
main importance value, ref main_type
- other_importance#
other importance value, ref opposite to main_type
- main_type#
type of importance, eg:gain
Methods:
__init__
([main_importance, ...])add_gain
(val)add_split
(val)
secretflow.ml.boost.homo_boost.tree_core.loss_function#
Classes:
|
Inner define for loss functions |
- class secretflow.ml.boost.homo_boost.tree_core.loss_function.LossFunction(obj_name: str)[源代码]#
基类:
object
Inner define for loss functions
- obj_name#
Name of loss function in [“binary:logistic”,# logistic regression “reg:logistic”, # logistic regression for binary classification, output probability “multi:softmax”, # logistic regression for binary classification, output score before logistic transformation “multi:softprob”, # logistic regression for binary classification, output probability “reg:squarederror” # for multi label classification ]
Methods:
__init__
(obj_name)
secretflow.ml.boost.homo_boost.tree_core.node#
Classes:
|
Tree Node |
- class secretflow.ml.boost.homo_boost.tree_core.node.Node(id: Optional[int] = None, fid: Optional[int] = None, bid: Optional[int] = None, weight: float = 0.0, is_leaf: bool = False, sum_grad: Optional[float] = None, sum_hess: Optional[float] = None, left_nodeid: int = -1, right_nodeid: int = -1, missing_dir: int = 1, sample_num: int = 0, parent_nodeid: Optional[int] = None, is_left_node: bool = False, sibling_nodeid: Optional[int] = None, loss_change: float = 0.0)[源代码]#
基类:
object
Tree Node
- id#
node id
- Type:
int
- fid#
feature id
- Type:
int
- bid#
bucket id
- Type:
int
- weight#
node weight
- Type:
float
- is_leaf#
whether this node is leaf
- Type:
bool
- sum_grad#
sum of grad
- Type:
float
- sum_hess#
sum of hess
- Type:
float
- left_nodeid#
left node id
- Type:
int
- right_nodeid#
right node id
- Type:
int
- missing_dir#
which branch to go when encounting missing value default 1->right
- Type:
int
- sample_num#
num of data sample
- Type:
int
- parent_nodeid#
parent nodeid
- Type:
int
- is_left_node#
is this node if left child of the parent
- Type:
bool
- sibling_nodeid#
sibling node id
- Type:
int
- loss_change#
the loss change.
- Type:
float
Attributes:
Methods:
__init__
([id, fid, bid, weight, is_leaf, ...])- id: int = None#
- fid: int = None#
- bid: int = None#
- weight: float = 0.0#
- is_leaf: bool = False#
- sum_grad: float = None#
- sum_hess: float = None#
- left_nodeid: int = -1#
- right_nodeid: int = -1#
- missing_dir: int = 1#
- sample_num: int = 0#
- parent_nodeid: int = None#
- is_left_node: bool = False#
- sibling_nodeid: int = None#
- loss_change: float = 0.0#
- __init__(id: Optional[int] = None, fid: Optional[int] = None, bid: Optional[int] = None, weight: float = 0.0, is_leaf: bool = False, sum_grad: Optional[float] = None, sum_hess: Optional[float] = None, left_nodeid: int = -1, right_nodeid: int = -1, missing_dir: int = 1, sample_num: int = 0, parent_nodeid: Optional[int] = None, is_left_node: bool = False, sibling_nodeid: Optional[int] = None, loss_change: float = 0.0) None #
secretflow.ml.boost.homo_boost.tree_core.splitter#
Classes:
|
Split Info . |
|
Split Calculate Class . |
- class secretflow.ml.boost.homo_boost.tree_core.splitter.SplitInfo(best_fid: Optional[int] = None, best_bid: Optional[int] = None, sum_grad: float = 0, sum_hess: float = 0, gain: Optional[float] = None, missing_dir: int = 1, sample_count: int = -1)[源代码]#
基类:
object
Split Info .. attribute:: best_fid
best split on feature id
- type:
int
- best_bid#
best split on bucket id
- Type:
int
- sum_grad#
sum of grad
- Type:
float
- sum_hess#
sum of hess
- Type:
float
- gain#
split gain
- Type:
float
- missing_dir#
which branch to go when encounting missing value default 1->right
- Type:
int
- sample_count#
num of sample after split
- Type:
int
Attributes:
Methods:
__init__
([best_fid, best_bid, sum_grad, ...])- best_fid: int = None#
- best_bid: int = None#
- sum_grad: float = 0#
- sum_hess: float = 0#
- gain: float = None#
- missing_dir: int = 1#
- sample_count: int = -1#
- __init__(best_fid: Optional[int] = None, best_bid: Optional[int] = None, sum_grad: float = 0, sum_hess: float = 0, gain: Optional[float] = None, missing_dir: int = 1, sample_count: int = -1) None #
- class secretflow.ml.boost.homo_boost.tree_core.splitter.Splitter(criterion_method: str, criterion_params: List = [0, 0, 10], min_impurity_split: float = 0.01, min_sample_split: int = 2, min_leaf_node: int = 1, min_child_weight: int = 1)[源代码]#
基类:
object
Split Calculate Class .. attribute:: criterion_method
criterion method
- criterion_params#
criterion parms, eg[l1: 0.1, l2: 0.2]
- min_impurity_split#
minimum gain threshold of splitting
- min_sample_split#
minimum sample split of splitting, default to 2
- min_leaf_node#
minimum samples on node to split
- min_child_weight#
minimum sum of hess after split
Methods:
__init__
(criterion_method[, ...])find_split_once
(histogram, valid_features, ...)Find best split info from histogram
find_split
(histograms, valid_features[, ...])查找最优分裂点 :param histograms: a list of histogram :param valid_features: valid feature names Dict[id:bool] :param use_missing: whether missing value participate in train
node_gain
(grad, hess)node_weight
(grad, hess)split_gain
(sum_grad, sum_hess, sum_grad_l, ...)- __init__(criterion_method: str, criterion_params: List = [0, 0, 10], min_impurity_split: float = 0.01, min_sample_split: int = 2, min_leaf_node: int = 1, min_child_weight: int = 1)[源代码]#
- find_split_once(histogram: List, valid_features: Dict, use_missing: bool) SplitInfo [源代码]#
Find best split info from histogram
- 参数:
histogram – a three-dimensional matrix store G,H,Count
valid_features – valid feature names Dict[id:bool]
use_missing – whether missing value participate in train
- 返回:
best split point info
- 返回类型:
- find_split(histograms: List, valid_features: Dict, use_missing: bool = False) List[SplitInfo] [源代码]#
查找最优分裂点 :param histograms: a list of histogram :param valid_features: valid feature names Dict[id:bool] :param use_missing: whether missing value participate in train
- 返回:
best split info on each node
- 返回类型:
tree_node_splitinfo