multi_imbalance.resampling package


multi_imbalance.resampling.global_cs module

class multi_imbalance.resampling.global_cs.GlobalCS(shuffle: bool = True)

Bases: imblearn.base.BaseSampler

Global CS is an algorithm that equalizes number of samples in each class. It duplicates all samples equally for each class to achieve majority class size

multi_imbalance.resampling.mdo module

class multi_imbalance.resampling.mdo.MDO(k=5, k1_frac=0.4, seed=0, prop=1, maj_int_min=None)

Bases: imblearn.base.BaseSampler

Mahalanbois Distance Oversampling is an algorithm that oversamples all classes to a quantity of the major class. Samples for oversampling are chosen based on their k neighbours and new samples are created in random place but with the same Mahalanbois distance from the centre of class to chosen sample.

  • k – Number of neighbours considered during the neighbourhood analysis

  • k1_frac – Ratio of the number of neighbours in the sample class to all neighbours in the neighbourhood. If the ratio is greater, the example will not be considered noise

  • seed

  • prop – Oversampling ratio, if equal to one the class size after resampling will be equal to the size of the largest class

  • maj_int_min – dict {‘maj’: majority class labels, ‘min’: minority class labels}

calculate_same_class_neighbour_quantities(S_minor, S_minor_label)

multi_imbalance.resampling.soup module

class multi_imbalance.resampling.soup.SOUP(k: int = 7, shuffle=False, maj_int_min=None)

Bases: imblearn.base.BaseSampler

Similarity Oversampling and Undersampling Preprocessing (SOUP) is an algorithm that equalizes number of samples in each class. It also takes care of the similarity between classes, which means that it removes samples from majority class, that are close to samples from the other class and duplicate samples from the minority classes, which are in the safest area in space

  • k – number of neighbors

  • shuffle – bool - output will be shuffled

  • maj_int_min – dict {‘maj’: majority class labels, ‘min’: minority class labels}

multi_imbalance.resampling.spider module

class multi_imbalance.resampling.spider.SPIDER3(k, maj_int_min=None, cost=None)

Bases: imblearn.base.BaseSampler

SPIDER3 algorithm implementation for selective preprocessing of multi-class imbalanced data sets.

Reference: Wojciechowski, S., Wilk, S., Stefanowski, J.: An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data. Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017

  • k – Number of nearest neighbors considered while resampling.

  • maj_int_min – Dict that contains lists of majority, intermediate and minority classes labels.

  • cost – The cost matrix. An element c[i, j] of this matrix represents the cost associated with misclassifying an example from class i as class one from class j.


multi_imbalance.resampling.static_smote module

class multi_imbalance.resampling.static_smote.StaticSMOTE

Bases: imblearn.base.BaseSampler

Static SMOTE implementation:

Reference: Fernández-Navarro, F., Hervás-Martínez, C., Gutiérrez, P.A.: A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognit. 44, 1821–1833 (2011)

Module contents