multi_imbalance.ensemble package¶
Subpackages¶
Submodules¶
multi_imbalance.ensemble.ecoc module¶
-
class
multi_imbalance.ensemble.ecoc.
ECOC
(binary_classifier='KNN', preprocessing='SOUP', encoding='OVO', n_neighbors=3, weights=None)¶ Bases:
sklearn.ensemble._bagging.BaggingClassifier
ECOC (Error Correcting Output Codes) is ensemble method for multi-class classification problems. Each class is encoded with unique binary or ternary code (where 0 means that class is excluded from training set of binary classifier). Then in the learning phase each binary classifier is learned. In the decoding phase the class which is closest to test instance in the sense of Hamming distance is chosen.
- Parameters
binary_classifier –
binary classifier used by the algorithm. Possible classifiers:
- ’tree’:
Decision Tree Classifier,
- ’NB’:
Naive Bayes Classifier,
- ’KNN’ :
K-Nearest Neighbors
- ’ClassifierMixin’ :
An instance of a class that implements ClassifierMixin
preprocessing –
method for oversampling between aggregated classes in each dichotomy. Possible methods:
- None :
no oversampling applied,
- ’globalCS’ :
random oversampling - randomly chosen instances of minority classes are duplicated
- ’SMOTE’ :
Synthetic Minority Oversampling Technique
- ’SOUP’ :
Similarity Oversampling Undersampling Preprocessing
- ’TransformerMixin’ :
An instance of a class that implements TransformerMixin
encoding –
algorithm for encoding classes. Possible encodings:
- ’dense’:
ceil(10log2(num_of_classes)) dichotomies, -1 and 1 with probability 0.5 each
- ’sparse’ :
ceil(10log2(num_of_classes)) dichotomies, 0 with probability 0.5, -1 and 1 with probability 0.25 each
- ’OVO’ :
’one vs one’ - n(n-1)/2 dichotomies, where n is number of classes, one for each pair of classes. Each column has one 1 and one -1 for classes included in particular pair, 0s for remaining classes.
- ’OVA’ :
’one vs all’ - number of dichotomies is equal to number of classes. Each column has one 1 and -1 for all remaining rows
- ’complete’2^(n-1)-1 dichotomies, reference
T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263–286, 1995.
n_neighbors –
weights –
strategy for dichotomies weighting. Possible values:
- None :
no weighting applied
- ’acc’ :
accuracy-based weights
- ’avg_tpr_min’ :
weights based on average true positive rates of dichotomies
-
fit
(X, y, minority_classes=None)¶ - Parameters
X – two dimensional numpy array (number of samples x number of features) with float numbers
y – one dimensional numpy array with labels for rows in X
minority_classes – list of classes considered to be minority classes
- Returns
self: object
-
predict
(X)¶ - Parameters
X – two dimensional numpy array (number of samples x number of features) with float numbers
- Returns
numpy array, shape = [number of samples]. Predicted target values for X.
multi_imbalance.ensemble.mrbbagging module¶
-
class
multi_imbalance.ensemble.mrbbagging.
MRBBagging
(k, learning_algorithm, undersampling=True, feature_selection=False, random_fs=False, half_features=True, random_state=None)¶ Bases:
sklearn.ensemble._bagging.BaggingClassifier
Multi-class Roughly Balanced Bagging (MRBBagging) is a generalization of MRBBagging for adapting to multiple minority classes.
Reference: M. Lango, J. Stefanowski: Multi-class and feature selection extensions of RoughlyBalanced Bagging for imbalanced data. J. Intell Inf Syst (2018) 50: 97
- Parameters
k – number of classifiers (multiplied by 3 when choosing feature selection)
learning_algorithm – classifier to be used
undersampling – (optional) boolean value to determine if undersampling or oversampling should be performed
feature_selection – (optional) boolean value to determine if feature selection should be performed
random_fs – (optional) boolean value to determine if feature selection should be all random (if False, chi^2, F test and random feature selection are performed)
half_features – (optional) boolean value to determine if the number of features to be selected should be 50% (if False, it is set to the square root of the base number of features)
random_state – (optional) the seed of the pseudo random number generator
-
fit
(x, y, **kwargs)¶ Build a MRBBagging ensemble of estimators from the training data.
- Parameters
x – Two dimensional numpy array (number of samples x number of features) with float numbers.
y – One dimensional numpy array with labels for rows in X.
- Returns
self (object)
-
predict
(data)¶ Predict classes for examples in data.
- Parameters
data – Two dimensional numpy array (number of samples x number of features) with float numbers.
multi_imbalance.ensemble.ovo module¶
-
class
multi_imbalance.ensemble.ovo.
OVO
(binary_classifier='tree', n_neighbors=3, preprocessing='SOUP', preprocessing_between='all')¶ Bases:
sklearn.ensemble._bagging.BaggingClassifier
OVO (One vs One) is an ensemble method that makes predictions for multi-class problems. OVO decomposes problem into m(m-1)/2 binary problems, where m is number of classes. Each of binary classifiers distinguishes between two classes. In the learning phase each classifier is learned only with instances from particular two classes. In prediction phase each classifier decides between these two classes. Results are aggregated and final output is derived depending on chosen aggregation model.
- Parameters
binary_classifier –
binary classifier. Possible classifiers:
- ’tree’:
Decision Tree Classifier,
- ’KNN’:
K-Nearest Neighbors
- ’NB’ :
Naive Bayes
- ’ClassifierMixin’ :
An instance of a class that implements ClassifierMixin
n_neighbors – number of nearest neighbors in KNN, works only if binary_classifier==’KNN’
preprocessing –
method for preprocessing of pairs of classes in the learning phase of ensemble. Possible values:
- None:
no preprocessing applied
- ’globalCS’:
oversampling with globalCS algorithm
- ’SMOTE’:
oversampling with SMOTE algorithm
- ’SOUP’:
oversampling and undersampling with SOUP algorithm
- ’TransformerMixin’ :
An instance of a class that implements TransformerMixin
preprocessing_between –
types of classes between which resampling should be applied. Possible values:
- ’all’ :
oversampling between each pair of classes
- ’maj-min’ :
oversampling only between majority ad minority classes
-
fit
(X, y, minority_classes=None)¶ - Parameters
X – two dimensional numpy array (number of samples x number of features) with float numbers
y – one dimensional numpy array with labels for rows in X
minority_classes – list of classes considered to be minority
- Returns
self: object
-
predict
(X)¶ - Parameters
X – two dimensional numpy array (number of samples x number of features) with float numbers
- Returns
numpy array, shape = [number of samples]. Predicted target values for X.
-
should_perform_oversampling
(first_class, second_class)¶
multi_imbalance.ensemble.soup_bagging module¶
-
class
multi_imbalance.ensemble.soup_bagging.
SOUPBagging
(classifier=None, maj_int_min=None, n_classifiers=5)¶ Bases:
sklearn.ensemble._bagging.BaggingClassifier
Version of Bagging that applies SOUP in each classifier
Reference: Lango, M., and Stefanowski, J. SOUP-Bagging: a new approach for multi-class imbalanced data classification. PP-RAI ’19: Polskie Porozumienie na Rzecz Sztucznej Inteligencji (2019).
- Parameters
classifier – Instance of classifier
maj_int_min – dict {‘maj’: majority class labels, ‘min’: minority class labels}
n_classifiers – number of classifiers
-
fit
(X, y, **kwargs)¶ - Parameters
X – array-like, sparse matrix of shape = [n_samples, n_features] The training input samples.
y – array-like, shape = [n_samples]. The target values (class labels).
**kwargs –
dict (optional)
- Returns
self object
-
static
fit_classifier
(args)¶
-
predict
(X, strategy: str = 'average')¶ Predict class for X. The predicted class of an input sample is computed as the class with the highest sum of predicted probability.
- Parameters
X – {array-like, sparse matrix} of shape = [n_samples, n_features]. The training input samples.
strategy –
- ‘average’ :
takes max from average values in prediction
- ’optimistic’ :
takes always best value of probability
- ’pessimistic’ :
takes always the worst value of probability
- ’mixed’ :
for minority classes takes optimistic strategy, and pessimistic for others. It requires maj_int_min
- Returns
array of shape = [n_samples]. The predicted classes.
-
predict_proba
(X)¶ Predict class probabilities for X.
- Parameters
X – {array-like, sparse matrix} of shape = [n_samples, n_features]. The training input samples.
- Returns
array of shape = [n_classifiers, n_samples, n_classes]. The class probabilities of the input samples.
-
multi_imbalance.ensemble.soup_bagging.
fit_clf
(args)¶