multi_imbalance.utils package¶
Submodules¶
multi_imbalance.utils.array_util module¶
-
multi_imbalance.utils.array_util.
contains
(dataset, example)¶ Returns if dataset contains the example. :param dataset: :param example: :return: True or False depending on whether dataset contains the example.
-
multi_imbalance.utils.array_util.
index_of
(arr, example)¶ - Returns
Index of learning exmaple in arr.
-
multi_imbalance.utils.array_util.
intersect
(arr1, arr2)¶ Performs the intersection operation over two numpy arrays (not removing duplicates).
- Parameters
arr1 – Numpy array number 1.
arr2 – Numpy array number 2.
- Returns
The intersection of arr1 and arr2.
-
multi_imbalance.utils.array_util.
setdiff
(arr1, arr2)¶ Performs the difference over two numpy arrays.
- Parameters
arr1 – Numpy array number 1.
arr2 – Numpy array number 2.
- Returns
Result of the difference of arr1 and arr2.
-
multi_imbalance.utils.array_util.
union
(arr1, arr2)¶ Performs the union over two numpy arrays (not removing duplicates, as it’s how the algorithm SPIDER3 actually works).
- Parameters
arr1 – Numpy array number 1.
arr2 – Numpy array number 2.
- Returns
The union of arr1 and arr2.
multi_imbalance.utils.data module¶
-
multi_imbalance.utils.data.
construct_flat_2pc_df
(X, y) → pandas.core.frame.DataFrame¶ This function takes two dimensional X and one dimensional y arrays, concatenates and returns them as data frame
- Parameters
X – two dimensional numpy array
y – one dimensional numpy array with labels
- Returns
Data frame with 3 columns x1 x2 and y and with number of rows equal to number of rows in X
-
multi_imbalance.utils.data.
construct_maj_int_min
(y: numpy.ndarray, strategy='median') → collections.OrderedDict¶ This function creates dictionary with information which classes are minority or majority
- Parameters
y – One dimensional numpy array that contains class labels
strategy –
The principle according to which the division into minority and majority classes will be determined:
- ’median’:
A class whose size is equal to the median of the class sizes will be considered “intermediate”
- ’average’:
The average class size will be calculated, all classes that are smaller will be considered as minority and the rest will be considered majority
- Returns
dictionary with keys ‘maj’, ‘int’, ‘min. The value for each key is a list containing the class labels belonging to the given group
-
multi_imbalance.utils.data.
get_project_root
() → pathlib.Path¶ Returns project root folder.
-
multi_imbalance.utils.data.
load_arff_dataset
(path: str, one_hot_encode: bool = True, return_non_cat_length: bool = False)¶ Load and return the dataset saved in arff type file
- Parameters
path (str) – location of dataset file
one_hot_encode (bool) – flag, if true encodes categorical variables using OneHotEncoder
return_non_cat_length (bool) – flag, if true returns the number of non categorical variables
- Returns
ndarray X - dimensional numpy array where non categorical variables are stored in first columns followed by categorical variables
ndarray y - one dimensional numpy array with the classification target
bool non_cat_length - number of non categorical variables (only if return_non_cat_length=True)
-
multi_imbalance.utils.data.
load_datasets_arff
(return_non_cat_length=False, dataset_paths=None)¶
multi_imbalance.utils.metrics module¶
-
multi_imbalance.utils.metrics.
gmean_score
(y_test, y_pred, correction: float = 0.001) → float¶ Calculate geometric mean score
- Parameters
y_test – numpy array with labels
y_pred – numpy array with predicted labels
correction – value that replaces 0 during multiplication to avoid zeroing the result
- Returns
geometric_mean_score: float
multi_imbalance.utils.min_int_maj module¶
multi_imbalance.utils.plot module¶
-
multi_imbalance.utils.plot.
plot_cardinality_and_2d_data
(X, y, dataset_name='') → None¶ Plots cardinality of classes from y as well as scatter plot of X transformed to two dimensions using PCA
- Parameters
X (ndarray) – two dimensional numpy array
y (ndarray) – one dimensional numpy array
dataset_name (str) – title of chart
-
multi_imbalance.utils.plot.
plot_visual_comparision_datasets
(X1, y1, X2, y2, dataset_name1='', dataset_name2='') → None¶ Plots comparision of X1 y1 and X2 y2 using plot_cardinality_and_2d_data, which plots cardinality of classes from y as well as scatter plot of X transformed to two dimensions using PCA
- Parameters
X1 (ndarray) – two dimensional numpy array with data from dataset1
y1 (ndarray) – one dimensional numpy array with target classes from dataset1
X2 (ndarray) – two dimensional numpy array with data from dataset2
y2 (ndarray) – one dimensional numpy array with target classes from dataset1
dataset_name1 (str) – first dataset chart title
dataset_name2 (str) – second dataset chart title