secretflow.preprocessing.binning.kernels#
secretflow.preprocessing.binning.kernels.base_binning#
Classes:
|
- class secretflow.preprocessing.binning.kernels.base_binning.BaseBinning(bin_names: List, bin_indexes: List, bin_num: int, abnormal_list: List)[源代码]#
基类:
ABCMethods:
__init__(bin_names, bin_indexes, bin_num, ...)fit_split_points(data)Attributes:
- property split_points#
secretflow.preprocessing.binning.kernels.quantile_binning#
Classes:
|
Use QuantileSummary algorithm for constant frequency binning |
- class secretflow.preprocessing.binning.kernels.quantile_binning.QuantileBinning(bin_num: int = 10, compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, bin_indexes: List[int] = [], bin_names: List[str] = [], local_only: bool = False, abnormal_list: Optional[List[str]] = None, allow_duplicate: bool = False)[源代码]#
基类:
BaseBinningUse QuantileSummary algorithm for constant frequency binning
- bin_num#
the num of buckets
- compress_thres#
if size of summary greater than compress_thres, do compress operation
- cols_dict#
mapping of value to index. {key: col_name , value: index}.
- head_size#
buffer size
- error#
0 <= error < 1 default: 0.001,error tolerance, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N)
- abnormal_list#
list of anomaly features.
- summary_dict#
a dict store summary of each features
- col_name_maps#
a dict store column index to name
- bin_idx_name#
a dict store index to name
- allow_duplicate#
Whether duplication is allowed
Methods:
__init__([bin_num, compress_thres, ...])fit_split_points(data_frame)calculate bin split points base on QuantileSummary algorithm
feature_summary(data_frame, compress_thres, ...)calculate summary
- __init__(bin_num: int = 10, compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, bin_indexes: List[int] = [], bin_names: List[str] = [], local_only: bool = False, abnormal_list: Optional[List[str]] = None, allow_duplicate: bool = False)[源代码]#
- fit_split_points(data_frame: DataFrame) DataFrame[源代码]#
calculate bin split points base on QuantileSummary algorithm
- 参数:
data_frame – input data
- 返回:
bin result returned as dataframe
- 返回类型:
bin_result
- static feature_summary(data_frame: DataFrame, compress_thres: int, head_size: int, error: float, bin_dict: Dict[str, int], abnormal_list: List[str]) Dict[源代码]#
calculate summary
- 参数:
data_frame – pandas.DataFrame, input data
compress_thres – int,
head_size – int, buffer size, when
error – float, error tolerance
bin_dict – a dict store col name to index
abnormal_list – list of anomaly features
secretflow.preprocessing.binning.kernels.quantile_summaries#
Classes:
|
store information for each item in the summary |
|
QuantileSummary |
- class secretflow.preprocessing.binning.kernels.quantile_summaries.Stats(value: float, w: int, delta: int)[源代码]#
基类:
objectstore information for each item in the summary
- value#
value of this stat
- Type:
float
- w#
weight of this stat
- Type:
int
- delta#
delta = rmax - rmin
- Type:
int
Attributes:
Methods:
__init__(value, w, delta)- value: float#
- w: int#
- delta: int#
- __init__(value: float, w: int, delta: int) None#
- class secretflow.preprocessing.binning.kernels.quantile_summaries.QuantileSummaries(compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, abnormal_list: Optional[List] = None)[源代码]#
基类:
object- QuantileSummary
insert: insert data to summary merge: merge summaries fast_init: A fast version implementation creates the summary with little performance loss compress: compress summary to some size
- compress_thres#
if num of stats greater than compress_thres, do compress
- head_size#
buffer size for insert data, when samples come to head_size do create summary
- error#
0 <= error < 1 default: 0.001, error tolerance for binning. floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N)
- abnormal_list#
List of abnormal feature, will not participate in binning
Methods:
__init__([compress_thres, head_size, error, ...])fast_init(col_data)compress()compress the summary, summary.sample will under compress_thres
query(quantile)Use to query the value that specifies the quantile location
value_to_rank(value)batch_query_value(values)batch query function
- __init__(compress_thres: int = 10000, head_size: int = 10000, error: float = 0.0001, abnormal_list: Optional[List] = None)[源代码]#