flotilla.visualize.predict module

Visualize the result of a classifcation or regression algorithm on the data.

class flotilla.visualize.predict.ClassifierViz(data_name, trait_name, predictor_name=None, *args, **kwargs)[source]

Bases: flotilla.compute.predict.Classifier, flotilla.visualize.predict.PredictorBaseViz

Visualize results from classification

check_a_feature(feature_name, **violinplot_kwargs)[source]

Make Violin Plots for a gene/probe’s value in the sets defined in sets

feature_name - gene/probe id. must be in the index of self._parent.X sets - list of sample ids violinplot_kwargs - extra parameters for violinplot

returns a list of lists with values for feature_name in each set of sets

class flotilla.visualize.predict.PredictorBaseViz(predictor_name, data_name, trait_name, X_data=None, trait=None, predictor_obj=None, predictor_scoring_fun=None, score_cutoff_fun=None, n_features_dependent_kwargs=None, constant_kwargs=None, is_categorical_trait=None, predictor_dataset_manager=None, predictor_config_manager=None, feature_renamer=None, groupby=None, color=None, pooled=None, order=None, violinplot_kws=None, data_type=None, label_to_color=None, label_to_marker=None, singles=None, outliers=None)[source]

Bases: flotilla.compute.predict.PredictorBase

A dataset-predictor pair from PredictorDatasetManager

One datset, one predictor, from dataset manager.

Parameters:

predictor_name : str

Name for predictor

data_name : str

Name for this (subset of the) data

trait_name : str

Name for this trait

X_data : pandas.DataFrame, optional

Samples-by-features (row x col) dataset to train the predictor on

trait : pandas.Series, optional

A variable you want to predict using X_data. Indexed like X_data.

predictor_obj : sklearn predictor, optional

A scikit-learn predictor that implements fit and score on (X_data,trait) Default ExtraTreesClassifier

predictor_scoring_fun : function, optional

Function to get the feature scores for a scikit-learn classifier. This can be different for different classifiers, e.g. for a classifier named “x” it could be x.scores_, for other it’s x.feature_importances_. Default: lambda x: x.feature_importances_

score_cutoff_fun : function, optional

Function to cut off insignificant scores Default: lambda scores: np.mean(x) + 2 * np.std(x)

n_features_dependent_kwargs : dict, optional

kwargs to the predictor that depend on n_features Default: {}

constant_kwargs : dict, optional

kwargs to the predictor that are constant, i.e.: {‘n_estimators’: 100, ‘bootstrap’: True, ‘max_features’: ‘auto’, ‘random_state’: 0, ‘oob_score’: True, ‘n_jobs’: 2, ‘verbose’: True}

do_pca(**plotting_kwargs)[source]
plot(**pca_plotting_kwargs)[source]
plot_scores(ax=None)[source]

plot kernel density of predictor scores and draw a vertical line where the cutoff was selected ax - ax to plot on. if None: plt.gca()

set_reducer_plotting_args(rpa)[source]
class flotilla.visualize.predict.RegressorViz(data_name, trait_name, predictor_name=None, *args, **kwargs)[source]

Bases: flotilla.compute.predict.Regressor, flotilla.visualize.predict.PredictorBaseViz

Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.