flotilla.data_model.splicing module

class flotilla.data_model.splicing.SplicingData(data, feature_data=None, binsize=0.1, outliers=None, feature_rename_col=None, feature_ignore_subset_cols=None, excluded_max=0.2, included_min=0.8, pooled=None, predictor_config_manager=None, technical_outliers=None, minimum_samples=0, feature_expression_id_col=None)[source]

Bases: flotilla.data_model.base.BaseData

Instantiate a object for percent spliced in (PSI) scores

Parameters:

data : pandas.DataFrame

A [n_events, n_samples] dataframe of data events

n_components : int

Number of components to use in the reducer

binsize : float

Value between 0 and 1, the bin size for binning the study_data scores

excluded_max : float

Maximum value for the “excluded” bin of psi scores. Default 0.2.

included_max : float

Minimum value for the “included” bin of psi scores. Default 0.8.

Notes

‘thresh’ from BaseData is not used.

binify(data)[source]
binned_reducer = None
excluded_label = 'excluded >>'
included_label = 'included >>'
modality_assignments(*args, **kwargs)[source]

Assigned modalities for these samples and features.

Parameters:

sample_ids : list of str, optional

Which samples to use. If None, use all. Default None.

feature_ids : list of str, optional

Which features to use. If None, use all. Default None.

data : pandas.DataFrame, optional

If provided, use this dataframe instead of the sample_ids and feature_ids provided

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

Returns:

modality_assignments : pandas.Series

The modality assignments of each feature given these samples

modality_counts(*args, **kwargs)[source]

Count the number of each modalities of these samples and features

Parameters:

sample_ids : list of str

Which samples to use. If None, use all. Default None.

feature_ids : list of str

Which features to use. If None, use all. Default None.

data : pandas.DataFrame, optional

If provided, use this dataframe instead of the sample_ids and feature_ids provided

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

Returns:

modalities_counts : pandas.Series

The number of events detected in each modality

n_components = 2
plot_event_modality_estimation(event_id, sample_ids=None, data=None, groupby=None, min_samples=10)[source]

Plots the mathematical reasoning for an event’s modality assignment

Parameters:

event_id : str

Unique name of the splicing event

sample_ids : list of str, optional

Which sample ids to use

data : pandas.DataFrame

Which data to use, if e.g. you filtered splicing events on expression data

groupby : mapping, optional

A sample id to celltype mapping

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

plot_feature(feature_id, sample_ids=None, phenotype_groupby=None, phenotype_order=None, color=None, phenotype_to_color=None, phenotype_to_marker=None, nmf_xlabel=None, nmf_ylabel=None, nmf_space=False, fig=None, axesgrid=None)[source]
plot_hist_single_vs_pooled_diff(data, feature_ids=None, color=None, title='', hist_kws=None)[source]

Plot histogram of distances between singles and pooled

plot_lavalamp(phenotype_to_color, sample_ids=None, feature_ids=None, data=None, groupby=None, order=None)[source]
plot_lavalamp_pooled_inconsistent(data, feature_ids=None, fraction_diff_thresh=0.1, color=None)[source]
plot_modalities_bars(sample_ids=None, feature_ids=None, data=None, groupby=None, phenotype_to_color=None, percentages=False, ax=None, min_samples=10)[source]

Make grouped barplots of the number of modalities per group

Parameters:

sample_ids : None or list of str

Which samples to use. If None, use all

feature_ids : None or list of str

Which features to use. If None, use all

color : None or matplotlib color

Which color to use for plotting the lavalamps of these features and samples

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

plot_modalities_lavalamps(sample_ids=None, feature_ids=None, data=None, groupby=None, phenotype_to_color=None, min_samples=10)[source]

Plot “lavalamp” scatterplot of each event

Parameters:

sample_ids : None or list of str

Which samples to use. If None, use all

feature_ids : None or list of str

Which features to use. If None, use all

color : None or matplotlib color

Which color to use for plotting the lavalamps of these features and samples

x_offset : numeric

How much to offset the x-axis of each event. Useful if you want to plot the same event, but in several iterations with different celltypes or colors

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

plot_modalities_reduced(sample_ids=None, feature_ids=None, data=None, ax=None, title=None, min_samples=10)[source]

Plot events modality assignments in NMF space

This will calculate modalities on all samples provided, without grouping them by celltype. This is because each NMF axis can only show one set of sample ids’ modalties.

Parameters:

sample_ids : list of str

Which samples to use. If None, use all. Default None.

feature_ids : list of str

Which features to use. If None, use all. Default None.

data : pandas.DataFrame, optional

If provided, use this dataframe instead of the sample_ids and feature_ids provided

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

ax : matplotlib.axes.Axes object

Axes to plot on. If none, gets current axes

title : str

Title of the reduced space plot

plot_two_features(feature1, feature2, groupby=None, label_to_color=None, fillna=None, **kwargs)[source]
plot_two_samples(sample1, sample2, fillna=None, **kwargs)[source]
pooled_inconsistent(*args, **kwargs)[source]

Return splicing events which pooled samples are consistently different from the single cells.

Parameters:

singles_ids : list-like

List of sample ids of single cells (in the main ”.data” DataFrame)

pooled_ids : list-like

List of sample ids of pooled cells (in the other ”.pooled” DataFrame)

feature_ids : None or list-like

List of feature ids. If None, use all

fraction_diff_thresh : float

Returns:

large_diff : pandas.DataFrame

All splicing events which have a scaled difference larger than the fraction diff thresh

raw_reducer = None
Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.