multi_imbalance.resampling package¶
Subpackages¶
Submodules¶
multi_imbalance.resampling.global_cs module¶
-
class
multi_imbalance.resampling.global_cs.
GlobalCS
(shuffle: bool = True)¶ Bases:
imblearn.base.BaseSampler
Global CS is an algorithm that equalizes number of samples in each class. It duplicates all samples equally for each class to achieve majority class size
multi_imbalance.resampling.mdo module¶
-
class
multi_imbalance.resampling.mdo.
MDO
(k=5, k1_frac=0.4, seed=0, prop=1, maj_int_min=None)¶ Bases:
imblearn.base.BaseSampler
Mahalanbois Distance Oversampling is an algorithm that oversamples all classes to a quantity of the major class. Samples for oversampling are chosen based on their k neighbours and new samples are created in random place but with the same Mahalanbois distance from the centre of class to chosen sample.
- Parameters
k – Number of neighbours considered during the neighbourhood analysis
k1_frac – Ratio of the number of neighbours in the sample class to all neighbours in the neighbourhood. If the ratio is greater, the example will not be considered noise
seed –
prop – Oversampling ratio, if equal to one the class size after resampling will be equal to the size of the largest class
maj_int_min – dict {‘maj’: majority class labels, ‘min’: minority class labels}
-
calculate_same_class_neighbour_quantities
(S_minor, S_minor_label)¶
multi_imbalance.resampling.soup module¶
-
class
multi_imbalance.resampling.soup.
SOUP
(k: int = 7, shuffle=False, maj_int_min=None)¶ Bases:
imblearn.base.BaseSampler
Similarity Oversampling and Undersampling Preprocessing (SOUP) is an algorithm that equalizes number of samples in each class. It also takes care of the similarity between classes, which means that it removes samples from majority class, that are close to samples from the other class and duplicate samples from the minority classes, which are in the safest area in space
- Parameters
k – number of neighbors
shuffle – bool - output will be shuffled
maj_int_min – dict {‘maj’: majority class labels, ‘min’: minority class labels}
multi_imbalance.resampling.spider module¶
-
class
multi_imbalance.resampling.spider.
SPIDER3
(k, maj_int_min=None, cost=None)¶ Bases:
imblearn.base.BaseSampler
SPIDER3 algorithm implementation for selective preprocessing of multi-class imbalanced data sets.
Reference: Wojciechowski, S., Wilk, S., Stefanowski, J.: An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data. Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017
- Parameters
k – Number of nearest neighbors considered while resampling.
maj_int_min – Dict that contains lists of majority, intermediate and minority classes labels.
cost – The cost matrix. An element c[i, j] of this matrix represents the cost associated with misclassifying an example from class i as class one from class j.
-
amplify
(int_min_class)¶
-
clean
(int_min_class)¶
-
relabel
(int_min_class)¶
multi_imbalance.resampling.static_smote module¶
-
class
multi_imbalance.resampling.static_smote.
StaticSMOTE
¶ Bases:
imblearn.base.BaseSampler
Static SMOTE implementation:
Reference: Fernández-Navarro, F., Hervás-Martínez, C., Gutiérrez, P.A.: A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognit. 44, 1821–1833 (2011)