flotilla.visualize.decomposition module

class flotilla.visualize.decomposition.DecompositionViz(reduced_space, components_, explained_variance_ratio_, feature_renamer=None, groupby=None, singles=None, pooled=None, outliers=None, featurewise=False, order=None, violinplot_kws=None, data_type='expression', label_to_color=None, label_to_marker=None, scale_by_variance=True, x_pc='pc_1', y_pc='pc_2', n_vectors=20, distance='L1', n_top_pc_features=50, max_char_width=30)[source]

Bases: object

Plots the reduced space from a decomposed dataset. Does not perform any reductions of its own

Plot the results of a decomposition visualization

Parameters:

reduced_space : pandas.DataFrame

A (n_samples, n_dimensions) DataFrame of the post-dimensionality reduction data

components_ : pandas.DataFrame

A (n_features, n_dimensions) DataFrame of how much each feature contributes to the components (trailing underscore to be consistent with scikit-learn)

explained_variance_ratio_ : pandas.Series

A (n_dimensions,) Series of how much variance each component explains. (trailing underscore to be consistent with scikit-learn)

feature_renamer : function, optional

A function which takes the name of the feature and renames it, e.g. from an ENSEMBL ID to a HUGO known gene symbol. If not provided, the original name is used.

groupby : mapping function | dict, optional

A mapping of the samples to a label, e.g. sample IDs to phenotype, for the violinplots. If None, all samples are treated the same and are colored the same.

singles : pandas.DataFrame, optional

For violinplots only. If provided and ‘plot_violins’ is True, will plot the raw (not reduced) measurement values as violin plots.

pooled : pandas.DataFrame, optional

For violinplots only. If provided, pooled samples are plotted as black dots within their label.

outliers : pandas.DataFrame, optional

For violinplots only. If provided, outlier samples are plotted as a grey shadow within their label.

featurewise : bool, optional

If True, then the “samples” are features, e.g. genes instead of samples, and the “features” are the samples, e.g. the cells instead of the gene ids. Essentially, the transpose of the original matrix. If True, then violins aren’t plotted. (default False)

order : list-like, optional

The order of the labels for the violinplots, e.g. if the data is from a differentiation timecourse, then this would be the labels of the phenotypes, in the differentiation order.

violinplot_kws : dict, optional

Any additional parameters to violinplot

data_type : ‘expression’ | ‘splicing’, optional

For violinplots only. The kind of data that was originally used for the reduction. (default ‘expression’)

label_to_color : dict, optional

A mapping of the label, e.g. the phenotype, to the desired plotting color (default None, auto-assigned with the groupby)

label_to_marker : dict, optional

A mapping of the label, e.g. the phenotype, to the desired plotting symbol (default None, auto-assigned with the groupby)

scale_by_variance : bool, optional

If True, scale the x- and y-axes by their explained_variance_ratio_ (default True)

{x,y}_pc : str, optional

Principal component to plot on the x- and y-axis. (default “pc_1” and “pc_2”)

n_vectors : int, optional

Number of vectors to plot of the principal components. (default 20)

distance : ‘L1’ | ‘L2’, optional

The distance metric to use to plot the vector lengths. L1 is “Cityblock”, i.e. the sum of the x and y coordinates, and L2 is the traditional Euclidean distance. (default “L1”)

n_top_pc_features : int, optional

THe number of top features from the principal components to plot. (default 50)

max_char_width : int, optional

Maximum character width of a feature name. Useful for crazy long feature IDs like MISO IDs

plot(ax=None, title='', plot_violins=False, show_point_labels=False, show_vectors=True, show_vector_labels=True, markersize=10, legend=True, bokeh=False, metadata=None)[source]
plot_explained_variance(title='PCA explained variance')[source]

If the reducer is a form of PCA, then plot the explained variance ratio by the components.

plot_loadings(pc='pc_1', n_features=50, ax=None)[source]
plot_samples(show_point_labels=True, title='PCA', show_vectors=True, show_vector_labels=True, markersize=10, three_d=False, legend=True, ax=None)[source]

Plot PCA scatterplot

Parameters:

groupby : groupby

How to group the samples by color/label

label_to_color : dict

Group labels to a matplotlib color E.g. if you’ve already chosen specific colors to indicate a particular group. Otherwise will auto-assign colors

label_to_marker : dict

Group labels to matplotlib marker

title : str

title of the plot

show_vectors : bool

Whether or not to draw the vectors indicating the supporting principal components

show_vector_labels : bool

whether or not to draw the names of the vectors

show_point_labels : bool

Whether or not to label the scatter points

markersize : int

size of the scatter markers on the plot

text_group : list of str

Group names that you want labeled with text

three_d : bool

if you want hte plot in 3d (need to set up the axes beforehand)

Returns:

For each vector in data:

x, y, marker, distance

plot_violins()[source]

Make violinplots of each feature

Must be called after plot_samples because it depends on the existence of the “self.magnitudes” attribute.

shorten(x)[source]
Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.