PyRBP.metricsPlot¶
Many visualization functions are integrated in PyRBP for plotting different types of data or performance analysis, which requires some dependencies such as matplotlib, sklearn, seaborn, shap and yellobrick.
- PyRBP.metricsPlot.roc_curve_deeplearning(label_list, pred_proba_list, name_list, image_path='')¶
- Parameters:
- label_list:list
The list of label arrays corresponding to the sequences used to train each classifier, label value should be in {-1,1} or {0,1}.
- pred_proba_list:list
The list of target score arrays corresponding to the sequences used to train each classifier, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- name_list:list
The list of names corresponding to each classifier, the names in the list will be shown in final
.pngimage file.
- image_path:str, default=''
The path used to store the final image file.
- Attributes:
- fpr:numpy array of shape (>2,)
False positive rate.
- tpr:numpy array of shape (>2,)
True positive rate.
- PyRBP.metricsPlot.roc_curve_machinelearning(features, labels, clf_list, image_path='', test_size=0.25, random_state=0)¶
- Parameters:
- features:numpy array
Two-dimensional real number matrix used to fit each classifiers.
- labels:numpy array of shape (n_samples,)
True binary labels. The value of labels should be in {-1, 1} or {0, 1}
- clf_list:list
The list of
sklearn classifiersused to analyse roc curve.
- image_path:str, default=''
The path used to store the final image file.
- test_size:float or int, default=0.25
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- random_state:int, RandomState instance or None, default=0
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.
- PyRBP.metricsPlot.partial_dependence(features, labels, clf, feature_names, image_path='', subsample=50, n_jobs=3, random_state=0, grid_resolution=20)¶
- Parameters:
- features:{numpy array or dataframe} of shape (n_samples, n_features)
Features is used to generate a grid of values for the target features (where the partial dependence will be evaluated).
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- feature_names:array-like of shape (n_features,)
Name of each feature; feature_names[i] holds the name of the feature with index i.
- clf:sklearn classifier
A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are
not supported.
- image_path:str, default=''
The path used to store the final image file.
- subsample:float, int or None, default=50
Sampling for ICE curves. If
float, should be between 0.0 and 1.0 and represent the proportion of the dataset to be used to plot ICE curves. Ifint, represents the absolute number samples to use.
- n_jobs:int, default=3
The number of CPUs to use to compute the partial dependences.
- random_state:int, RandomState instance or None, default=0
Controls the randomness of the selected samples when subsamples is not
None
- grid_resolution:int, default=20
The number of equally spaced points on the axes of the plots, for each target feature.
- PyRBP.metricsPlot.confusion_matirx_deeplearning(test_labels, pred_labels, image_path='')¶
- Parameters:
- test_labels:numpy array of shape (n_samples,)
Ground truth labels corresponding to sequences in dataset.
- pred_labels:numpy array of shape (n_samples,)
Estimated labels conducted by a deep learning model.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.confusion_matrix_machinelearning(clf, features, labels, label_tags=None, test_size=0.25, normalize=None, random_state=0, image_path='')¶
- Parameters:
- clf:sklearn classifier
A sklearn classifier instance.
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
Labels to index the matrix.
- label_tags:list of names for different classes, default=None
Target names used for plotting. By default,
labelswill be used.
- test_size:float or int, default=0.25
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- normalize:{'true', 'pred', 'all'}, default=None
Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None, confusion matrix will not be normalized.
- random_state:int, RandomState instance or None, default=0
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.det_curve_machinelearning(features, labels, clf_list, image_path='', test_size=0.25, random_state=0)¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf_list:list
List of classifiers used to draw det curve.
- image_path:str, default=''
The path used to store the final image file.
- test_size:float or int, default=0.25
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- random_state:int, RandomState instance or None, default=0
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.
- PyRBP.metricsPlot.det_curve_deeplearning(label_list, pred_proba_list, name_list, image_path='')¶
- Parameters:
- label_list:list
The list of label arrays corresponding to the sequences used to train each classifier, label value should be in {-1,1} or {0,1}.
- pred_proba_list:list
The list of target score arrays corresponding to the sequences used to train each classifier, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- name_list:list
The list of names corresponding to each classifier, the names in the list will be shown in final
.pngimage file.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.precision_recall_curve_machinelearning(features, labels, clf_list, image_path='', test_size=0.25, random_state=0)¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- image_path:str, default=''
The path used to store the final image file.
- test_size:float or int, default=0.25
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- random_state:int, RandomState instance or None, default=0
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.
- PyRBP.metricsPlot.precision_recall_curve_deeplearning(label_list, pred_labels_list, name_list, image_path='')¶
- Parameters:
- label_list:list
The list of label arrays corresponding to the sequences used to train each classifier, label value should be in {-1,1} or {0,1}.
- pred_proba_list:list
The list of target score arrays corresponding to the sequences used to train each classifier, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- name_list:list
The list of names corresponding to each classifier, the names in the list will be shown in final
.pngimage file.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.shap_bar(features, labels, clf, sample_size=(0, 100), feature_size=(0, 10), image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf:sklearn classifier
A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are
not supported.
- sample_size:tuple, default=(0, 100)
Defines the number of samples used to perform the shap value calculation.
- feature_size:tuple, default=(0, 10)
Defines the features for calculating shap values.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.shap_scatter(features, labels, clf, feature_id, sample_size=(0, 100), feature_size=(0, 10), image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf:sklearn classifier
A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are
not supported.
- feature_id:int
The feature id for visualization, which should be less than or equal to the difference - 1 between the two values in
feature_size
- sample_size:tuple, default=(0, 100)
Defines the number of samples used to perform the shap value calculation.
- feature_size:tuple, default=(0, 10)
Defines the features for calculating shap values.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.shap_waterfall(features, labels, clf, feature_id, sample_size=(0, 100), feature_size=(0, 10), image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf:sklearn classifier
A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are
not supported.
- sample_size:tuple, default=(0, 100)
Defines the number of samples used to perform the shap value calculation.
- feature_size:tuple, default=(0, 10)
Defines the features for calculating shap values.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.shap_interaction_scatter(features, labels, clf, sample_size=(0, 100), feature_size=(0, 10), image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf:sklearn classifier
A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are
not supported.
- sample_size:tuple, default=(0, 100)
Defines the number of samples used to perform the shap value calculation.
- feature_size:tuple, default=(0, 10)
Defines the features for calculating shap values.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.shap_beeswarm(features, labels, clf, sample_size=(0, 100), feature_size=(0, 10), image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf:sklearn classifier
A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are
not supported.
- sample_size:tuple, default=(0, 100)
Defines the number of samples used to perform the shap value calculation.
- feature_size:tuple, default=(0, 10)
Defines the features for calculating shap values.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.shap_heatmap(features, labels, clf, sample_size=(0, 100), feature_size=(0, 10), image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences.
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf:sklearn classifier
A fitted estimator object implementing predict, predict_proba, or decision_function. Multioutput-multiclass classifiers are
not supported.
- sample_size:tuple, default=(0, 100)
Defines the number of samples used to perform the shap value calculation.
- feature_size:tuple, default=(0, 10)
Defines the features for calculating shap values.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.violinplot(features, x_id, y_id, image_path='')¶
- Parameters:
- features:dataframe of shape (n_samples, n_features)
Input features corresponding to the sequences.
- x_id:str
Name of variables in
dataor vector data.
- y_id:str
Name of variables in
dataor vector data.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.boxplot(features, x_id, y_id, image_path='')¶
- Parameters:
- features:dataframe of shape (n_samples, n_features)
Input features corresponding to the sequences.
- x_id:str
Name of variables in
dataor vector data.
- y_id:str
Name of variables in
dataor vector data.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.pointplot(features, x_id, y_id, image_path='')¶
- Parameters:
- features:dataframe of shape (n_samples, n_features)
Input features corresponding to the sequences.
- x_id:str
Name of variables in
featuresor vector data.
- y_id:str
Name of variables in
featuresor vector data.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.barplot(features, x_id, y_id, image_path='')¶
- Parameters:
- features:dataframe of shape (n_samples, n_features)
Input features corresponding to the sequences.
- x_id:str
Name of variables in
featuresor vector data.
- y_id:str
Name of variables in
featuresor vector data.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.sns_heatmap(features, sample_size=(0, 15), feature_size=(0, 15), image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences.
- sample_size:tuple, default=(0, 15)
The sample range used to plot the heatmap.
- feature_size:tuple, default=(0, 15)
The feature range used to plot the heatmap.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.prediction_error(features, labels, classes, clf, test_size=0.25, random_state=0, image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences.
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- classes:list of str
The class labels to use for the legend. Specifying classes in this manner is used to change the class names to a more specific format or to label encoded integer classes.
- test_size:float or int, default=0.25
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- random_state:int, RandomState instance or None, default=0
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.
- clf: classifier
A scikit-learn estimator that should be a classifier. If the model is not a classifier, an exception is raised.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.descrimination_threshold(features, labels, clf, image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences.
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- clf: classifier
A scikit-learn estimator that should be a classifier. If the model is not a classifier, an exception is raised.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.learning_curve(features, labels, clf, folds=5, image_path='')¶
- Parameters:
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences.
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- folds:int, default=5
Cross-validated folds, which divides the training set into 5 (or other values) subsets, where one subset is the validation set, and the other
fold - 1subsets constitute the training set. Each subset needs to be performed once as a validation set.
- clf: classifier
A scikit-learn estimator that should be a classifier. If the model is not a classifier, an exception is raised.
- image_path:str, default=''
The path used to store the final image file.
- PyRBP.metricsPlot.cross_validation_score(clf, features, labels, folds=5, scoring=None, image_path='')¶
- Parameters:
- folds:int, default=5
Cross-validated folds, which divides the training set into 5 (or other values) subsets, where one subset is the validation set, and the other
fold - 1subsets constitute the training set. Each subset needs to be performed once as a validation set.
- scoring:string, callable or None, optional, default: None
A string or scorer callable object / function with signature
scorer(estimator, features, labels)
- clf: classifier
A scikit-learn estimator that should be a classifier. If the model is not a classifier, an exception is raised.
- features:numpy array of shape (n_samples, n_features)
Input features corresponding to the sequences.
- labels:numpy array of shape (n_samples,)
True binary labels used to fit classifier. The value of labels should be in {-1, 1} or {0, 1}.
- image_path:str, default=''
The path used to store the final image file.