secretflow.preprocessing.binning.kernels#
secretflow.preprocessing.binning.kernels.base_binning#
Classes:
|
- class secretflow.preprocessing.binning.kernels.base_binning.BaseBinning(bin_names: List, bin_indexes: List, bin_num: int, abnormal_list: List)[源代码]#
基类:
ABC
Methods:
__init__
(bin_names, bin_indexes, bin_num, ...)fit_split_points
(data)Attributes:
- property split_points#
secretflow.preprocessing.binning.kernels.quantile_binning#
Classes:
|
Use QuantileSummary algorithm for constant frequency binning |
- class secretflow.preprocessing.binning.kernels.quantile_binning.QuantileBinning(bin_num: int = 10, compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, bin_indexes: List[int] = [], bin_names: List[str] = [], local_only: bool = False, abnormal_list: Optional[List[str]] = None, allow_duplicate: bool = False)[源代码]#
基类:
BaseBinning
Use QuantileSummary algorithm for constant frequency binning
- bin_num#
the num of buckets
- compress_thres#
if size of summary greater than compress_thres, do compress operation
- cols_dict#
mapping of value to index. {key: col_name , value: index}.
- head_size#
buffer size
- error#
0 <= error < 1 default: 0.001,error tolerance, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N)
- abnormal_list#
list of anomaly features.
- summary_dict#
a dict store summary of each features
- col_name_maps#
a dict store column index to name
- bin_idx_name#
a dict store index to name
- allow_duplicate#
Whether duplication is allowed
Methods:
__init__
([bin_num, compress_thres, ...])fit_split_points
(data_frame)calculate bin split points base on QuantileSummary algorithm
feature_summary
(data_frame, compress_thres, ...)calculate summary
- __init__(bin_num: int = 10, compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, bin_indexes: List[int] = [], bin_names: List[str] = [], local_only: bool = False, abnormal_list: Optional[List[str]] = None, allow_duplicate: bool = False)[源代码]#
- fit_split_points(data_frame: DataFrame) DataFrame [源代码]#
calculate bin split points base on QuantileSummary algorithm
- 参数:
data_frame – input data
- 返回:
bin result returned as dataframe
- 返回类型:
bin_result
- static feature_summary(data_frame: DataFrame, compress_thres: int, head_size: int, error: float, bin_dict: Dict[str, int], abnormal_list: List[str]) Dict [源代码]#
calculate summary
- 参数:
data_frame – pandas.DataFrame, input data
compress_thres – int,
head_size – int, buffer size, when
error – float, error tolerance
bin_dict – a dict store col name to index
abnormal_list – list of anomaly features
secretflow.preprocessing.binning.kernels.quantile_summaries#
Classes:
|
store information for each item in the summary |
|
QuantileSummary |
- class secretflow.preprocessing.binning.kernels.quantile_summaries.Stats(value: float, w: int, delta: int)[源代码]#
基类:
object
store information for each item in the summary
- value#
value of this stat
- Type:
float
- w#
weight of this stat
- Type:
int
- delta#
delta = rmax - rmin
- Type:
int
Attributes:
Methods:
__init__
(value, w, delta)- value: float#
- w: int#
- delta: int#
- __init__(value: float, w: int, delta: int) None #
- class secretflow.preprocessing.binning.kernels.quantile_summaries.QuantileSummaries(compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, abnormal_list: Optional[List] = None)[源代码]#
基类:
object
- QuantileSummary
insert: insert data to summary merge: merge summaries fast_init: A fast version implementation creates the summary with little performance loss compress: compress summary to some size
- compress_thres#
if num of stats greater than compress_thres, do compress
- head_size#
buffer size for insert data, when samples come to head_size do create summary
- error#
0 <= error < 1 default: 0.001, error tolerance for binning. floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N)
- abnormal_list#
List of abnormal feature, will not participate in binning
Methods:
__init__
([compress_thres, head_size, error, ...])fast_init
(col_data)compress
()compress the summary, summary.sample will under compress_thres
query
(quantile)Use to query the value that specifies the quantile location
value_to_rank
(value)batch_query_value
(values)batch query function
- __init__(compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, abnormal_list: Optional[List] = None)[源代码]#