multi_imbalance.utils package

Submodules

multi_imbalance.utils.array_util module

multi_imbalance.utils.array_util.contains(dataset, example)

Returns if dataset contains the example. :param dataset: :param example: :return: True or False depending on whether dataset contains the example.

multi_imbalance.utils.array_util.index_of(arr, example)
Returns

Index of learning exmaple in arr.

multi_imbalance.utils.array_util.intersect(arr1, arr2)

Performs the intersection operation over two numpy arrays (not removing duplicates).

Parameters
  • arr1 – Numpy array number 1.

  • arr2 – Numpy array number 2.

Returns

The intersection of arr1 and arr2.

multi_imbalance.utils.array_util.setdiff(arr1, arr2)

Performs the difference over two numpy arrays.

Parameters
  • arr1 – Numpy array number 1.

  • arr2 – Numpy array number 2.

Returns

Result of the difference of arr1 and arr2.

multi_imbalance.utils.array_util.union(arr1, arr2)

Performs the union over two numpy arrays (not removing duplicates, as it’s how the algorithm SPIDER3 actually works).

Parameters
  • arr1 – Numpy array number 1.

  • arr2 – Numpy array number 2.

Returns

The union of arr1 and arr2.

multi_imbalance.utils.data module

multi_imbalance.utils.data.construct_flat_2pc_df(X, y) → pandas.core.frame.DataFrame

This function takes two dimensional X and one dimensional y arrays, concatenates and returns them as data frame

Parameters
  • X – two dimensional numpy array

  • y – one dimensional numpy array with labels

Returns

Data frame with 3 columns x1 x2 and y and with number of rows equal to number of rows in X

multi_imbalance.utils.data.construct_maj_int_min(y: numpy.ndarray, strategy='median') → collections.OrderedDict

This function creates dictionary with information which classes are minority or majority

Parameters
  • y – One dimensional numpy array that contains class labels

  • strategy

    The principle according to which the division into minority and majority classes will be determined:

    • ’median’:

      A class whose size is equal to the median of the class sizes will be considered “intermediate”

    • ’average’:

      The average class size will be calculated, all classes that are smaller will be considered as minority and the rest will be considered majority

Returns

dictionary with keys ‘maj’, ‘int’, ‘min. The value for each key is a list containing the class labels belonging to the given group

multi_imbalance.utils.data.get_project_root() → pathlib.Path

Returns project root folder.

multi_imbalance.utils.data.load_arff_dataset(path: str, one_hot_encode: bool = True, return_non_cat_length: bool = False)

Load and return the dataset saved in arff type file

Parameters
  • path (str) – location of dataset file

  • one_hot_encode (bool) – flag, if true encodes categorical variables using OneHotEncoder

  • return_non_cat_length (bool) – flag, if true returns the number of non categorical variables

Returns

  • ndarray X - dimensional numpy array where non categorical variables are stored in first columns followed by categorical variables

  • ndarray y - one dimensional numpy array with the classification target

  • bool non_cat_length - number of non categorical variables (only if return_non_cat_length=True)

multi_imbalance.utils.data.load_datasets_arff(return_non_cat_length=False, dataset_paths=None)

multi_imbalance.utils.metrics module

multi_imbalance.utils.metrics.gmean_score(y_test, y_pred, correction: float = 0.001) → float

Calculate geometric mean score

Parameters
  • y_test – numpy array with labels

  • y_pred – numpy array with predicted labels

  • correction – value that replaces 0 during multiplication to avoid zeroing the result

Returns

geometric_mean_score: float

multi_imbalance.utils.min_int_maj module

multi_imbalance.utils.plot module

multi_imbalance.utils.plot.plot_cardinality_and_2d_data(X, y, dataset_name='') → None

Plots cardinality of classes from y as well as scatter plot of X transformed to two dimensions using PCA

Parameters
  • X (ndarray) – two dimensional numpy array

  • y (ndarray) – one dimensional numpy array

  • dataset_name (str) – title of chart

multi_imbalance.utils.plot.plot_visual_comparision_datasets(X1, y1, X2, y2, dataset_name1='', dataset_name2='') → None

Plots comparision of X1 y1 and X2 y2 using plot_cardinality_and_2d_data, which plots cardinality of classes from y as well as scatter plot of X transformed to two dimensions using PCA

Parameters
  • X1 (ndarray) – two dimensional numpy array with data from dataset1

  • y1 (ndarray) – one dimensional numpy array with target classes from dataset1

  • X2 (ndarray) – two dimensional numpy array with data from dataset2

  • y2 (ndarray) – one dimensional numpy array with target classes from dataset1

  • dataset_name1 (str) – first dataset chart title

  • dataset_name2 (str) – second dataset chart title

Module contents