flotilla.visualize.decomposition module

class flotilla.visualize.decomposition.DecompositionViz(reduced_space, components_, explained_variance_ratio_, feature_renamer=None, groupby=None, singles=None, pooled=None, outliers=None, featurewise=False, order=None, violinplot_kws=None, data_type='expression', label_to_color=None, label_to_marker=None, scale_by_variance=True, x_pc='pc_1', y_pc='pc_2', n_vectors=20, distance='L2', n_top_pc_features=50, max_char_width=30)[source]

Bases: object

Plots the reduced space from a decomposed dataset. Does not perform any reductions of its own

Plot the results of a decomposition visualization

Parameters:

reduced_space : pandas.DataFrame

A (n_samples, n_dimensions) DataFrame of the post-dimensionality reduction data

components_ : pandas.DataFrame

A (n_features, n_dimensions) DataFrame of how much each feature contributes to the components (trailing underscore to be consistent with scikit-learn)

explained_variance_ratio_ : pandas.Series

A (n_dimensions,) Series of how much variance each component explains. (trailing underscore to be consistent with scikit-learn)

feature_renamer : function, optional

A function which takes the name of the feature and renames it, e.g. from an ENSEMBL ID to a HUGO known gene symbol. If not provided, the original name is used.

groupby : mapping function | dict, optional

A mapping of the samples to a label, e.g. sample IDs to phenotype, for the violinplots. If None, all samples are treated the same and are colored the same.

singles : pandas.DataFrame, optional

For violinplots only. If provided and ‘plot_violins’ is True, will plot the raw (not reduced) measurement values as violin plots.

pooled : pandas.DataFrame, optional

For violinplots only. If provided, pooled samples are plotted as black dots within their label.

outliers : pandas.DataFrame, optional

For violinplots only. If provided, outlier samples are plotted as a grey shadow within their label.

featurewise : bool, optional

If True, then the “samples” are features, e.g. genes instead of samples, and the “features” are the samples, e.g. the cells instead of the gene ids. Essentially, the transpose of the original matrix. If True, then violins aren’t plotted. (default False)

order : list-like, optional

The order of the labels for the violinplots, e.g. if the data is from a differentiation timecourse, then this would be the labels of the phenotypes, in the differentiation order.

violinplot_kws : dict, optional

Any additional parameters to violinplot

data_type : ‘expression’ | ‘splicing’, optional

For violinplots only. The kind of data that was originally used for the reduction. (default ‘expression’)

label_to_color : dict, optional

A mapping of the label, e.g. the phenotype, to the desired plotting color (default None, auto-assigned with the groupby)

label_to_marker : dict, optional

A mapping of the label, e.g. the phenotype, to the desired plotting symbol (default None, auto-assigned with the groupby)

scale_by_variance : bool, optional

If True, scale the x- and y-axes by their explained_variance_ratio_ (default True)

{x,y}_pc : str, optional

Principal component to plot on the x- and y-axis. (default “pc_1” and “pc_2”)

n_vectors : int, optional

Number of vectors to plot of the principal components. (default 20)

distance : ‘L1’ | ‘L2’, optional

The distance metric to use to plot the vector lengths. L1 is “Cityblock”, i.e. the sum of the x and y coordinates, and L2 is the traditional Euclidean distance. (default “L1”)

n_top_pc_features : int, optional

THe number of top features from the principal components to plot. (default 50)

max_char_width : int, optional

Maximum character width of a feature name. Useful for crazy long feature IDs like MISO IDs

plot(ax=None, title='', plot_violins=False, show_point_labels=False, show_vectors=True, show_vector_labels=True, markersize=10, legend=True, bokeh=False, metadata=None, plot_loadings='heatmap', n_components=5)[source]

Plot reduced space

Figures can be saved with:

dv.plot() dv.fig_reduced.savefig(‘decomposition.pdf’)

Parameters:

ax : matplotlib.axes.Axes object, optional

An axes object to plot the reduced space and the components onto

title : str, optional

Title to add to the plot

plot_violins : bool, optional

If True, also make a matplotlib.figure.Figure object of the top features

show_point_labels : bool, optional

If True, show the labels of the points plotted onto the canvas. If this was not featurewise (default), then the points are the samples. If this was featurewise, then the points are the features

show_vectors : bool, optional

If True, plot the vectors with the highest magnitude in PC1 and PC2 space

show_vector_labels : bool, optional

If True, label the vectors with their features (if not featurewise), else their samples

markersize : int, optional

Size of the plotting marker of the samples on the plot

legend : bool, optional

If True, plot a legend showing which celltype corresponds to which color

bokeh : bool, optional

If True, attempt to plot this using interactive BokehJS plots which allow for hovering tooltips

metadata : pandas.DataFrame, optional

If provided, all columns of this metadata will be shown when hovering over the samples in the bokeh version of the plot

plot_loadings : ‘heatmap’ | ‘scatter’

Whether to plot the loadings of features as a heatmap or a scatterplot

n_components : int, optional

Number of components to plot on the heatmap

Returns:

self : DecompositionViz

plot_explained_variance(title='PCA explained variance')[source]

If the reducer is a form of PCA, then plot the explained variance ratio by the components.

plot_loadings(pc='pc_1', n_features=50, ax=None)[source]
plot_loadings_heatmap(n_features=50, n_components=5)[source]

Plot the loadings of each feature in the top principal components

Creates a heatmap of the top features contributing to the first few principal components, sorted by the features’ contribution to PC1.

Parameters:

n_features : int, optional

Total number of features to plot. Half of these will be the top features contributing to the positive side of PC1, the other will be the top features contributing to the negative side of PC2

n_components : int, optional

Number of components to plot

plot_samples(show_point_labels=True, title='PCA', show_vectors=True, show_vector_labels=True, markersize=10, three_d=False, legend=True, ax=None)[source]

Plot PCA scatterplot

Parameters:

groupby : groupby

How to group the samples by color/label

label_to_color : dict

Group labels to a matplotlib color E.g. if you’ve already chosen specific colors to indicate a particular group. Otherwise will auto-assign colors

label_to_marker : dict

Group labels to matplotlib marker

title : str

title of the plot

show_vectors : bool

Whether or not to draw the vectors indicating the supporting principal components

show_vector_labels : bool

whether or not to draw the names of the vectors

show_point_labels : bool

Whether or not to label the scatter points

markersize : int

size of the scatter markers on the plot

text_group : list of str

Group names that you want labeled with text

three_d : bool

if you want hte plot in 3d (need to set up the axes beforehand)

Returns:

For each vector in data:

x, y, marker, distance

plot_violins()[source]

Make violinplots of each feature

Must be called after plot_samples because it depends on the existence of the “self.magnitudes” attribute.

shorten(x)[source]
Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.