flotilla.data_model.splicing module¶

class flotilla.data_model.splicing.SplicingData(data, feature_data=None, binsize=0.1, outliers=None, feature_rename_col=None, feature_ignore_subset_cols=None, excluded_max=0.2, included_min=0.8, pooled=None, predictor_config_manager=None, technical_outliers=None, minimum_samples=0, feature_expression_id_col=None)[source]¶

Bases: flotilla.data_model.base.BaseData

Instantiate a object for percent spliced in (PSI) scores

Parameters:

Parameters:	data : pandas.DataFrame A [n_events, n_samples] dataframe of data events n_components : int Number of components to use in the reducer binsize : float Value between 0 and 1, the bin size for binning the study_data scores excluded_max : float Maximum value for the “excluded” bin of psi scores. Default 0.2. included_max : float Minimum value for the “included” bin of psi scores. Default 0.8.

data : pandas.DataFrame

A [n_events, n_samples] dataframe of data events

n_components : int

Number of components to use in the reducer

binsize : float

Value between 0 and 1, the bin size for binning the study_data scores

excluded_max : float

Maximum value for the “excluded” bin of psi scores. Default 0.2.

included_max : float

Minimum value for the “included” bin of psi scores. Default 0.8.

Notes

‘thresh’ from BaseData is not used.

binify(data)[source]¶

binned_reducer = None¶

excluded_label = 'excluded >>'¶

included_label = 'included >>'¶

modality_assignments(*args, **kwargs)[source]¶

Assigned modalities for these samples and features.

Parameters:

Parameters:	sample_ids : list of str, optional Which samples to use. If None, use all. Default None. feature_ids : list of str, optional Which features to use. If None, use all. Default None. data : pandas.DataFrame, optional If provided, use this dataframe instead of the sample_ids and feature_ids provided min_samples : int, optional Minimum number of samples to use per grouped celltype. Default 10
Returns:	modality_assignments : pandas.Series The modality assignments of each feature given these samples

sample_ids : list of str, optional

Which samples to use. If None, use all. Default None.

feature_ids : list of str, optional

Which features to use. If None, use all. Default None.

data : pandas.DataFrame, optional

If provided, use this dataframe instead of the sample_ids and feature_ids provided

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

Returns:

modality_assignments : pandas.Series

The modality assignments of each feature given these samples

modality_counts(*args, **kwargs)[source]¶

Count the number of each modalities of these samples and features

Parameters:

Parameters:	sample_ids : list of str Which samples to use. If None, use all. Default None. feature_ids : list of str Which features to use. If None, use all. Default None. data : pandas.DataFrame, optional If provided, use this dataframe instead of the sample_ids and feature_ids provided min_samples : int, optional Minimum number of samples to use per grouped celltype. Default 10
Returns:	modalities_counts : pandas.Series The number of events detected in each modality

sample_ids : list of str

Which samples to use. If None, use all. Default None.

feature_ids : list of str

Which features to use. If None, use all. Default None.

data : pandas.DataFrame, optional

If provided, use this dataframe instead of the sample_ids and feature_ids provided

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

Returns:

modalities_counts : pandas.Series

The number of events detected in each modality

n_components = 2¶

plot_event_modality_estimation(event_id, sample_ids=None, data=None, groupby=None, min_samples=10)[source]¶

Plots the mathematical reasoning for an event’s modality assignment

Parameters:

Parameters:	event_id : str Unique name of the splicing event sample_ids : list of str, optional Which sample ids to use data : pandas.DataFrame Which data to use, if e.g. you filtered splicing events on expression data groupby : mapping, optional A sample id to celltype mapping min_samples : int, optional Minimum number of samples to use per grouped celltype. Default 10

event_id : str

Unique name of the splicing event

sample_ids : list of str, optional

Which sample ids to use

data : pandas.DataFrame

Which data to use, if e.g. you filtered splicing events on expression data

groupby : mapping, optional

A sample id to celltype mapping

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

plot_feature(feature_id, sample_ids=None, phenotype_groupby=None, phenotype_order=None, color=None, phenotype_to_color=None, phenotype_to_marker=None, nmf_xlabel=None, nmf_ylabel=None, nmf_space=False, fig=None, axesgrid=None)[source]¶

plot_hist_single_vs_pooled_diff(data, feature_ids=None, color=None, title='', hist_kws=None)[source]¶: Plot histogram of distances between singles and pooled

plot_lavalamp(phenotype_to_color, sample_ids=None, feature_ids=None, data=None, groupby=None, order=None)[source]¶

plot_lavalamp_pooled_inconsistent(data, feature_ids=None, fraction_diff_thresh=0.1, color=None)[source]¶

plot_modalities_bars(sample_ids=None, feature_ids=None, data=None, groupby=None, phenotype_to_color=None, percentages=False, ax=None, min_samples=10)[source]¶

Make grouped barplots of the number of modalities per group

Parameters:

Parameters:	sample_ids : None or list of str Which samples to use. If None, use all feature_ids : None or list of str Which features to use. If None, use all color : None or matplotlib color Which color to use for plotting the lavalamps of these features and samples min_samples : int, optional Minimum number of samples to use per grouped celltype. Default 10

sample_ids : None or list of str

Which samples to use. If None, use all

feature_ids : None or list of str

Which features to use. If None, use all

color : None or matplotlib color

Which color to use for plotting the lavalamps of these features and samples

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

plot_modalities_lavalamps(sample_ids=None, feature_ids=None, data=None, groupby=None, phenotype_to_color=None, min_samples=10)[source]¶

Plot “lavalamp” scatterplot of each event

Parameters:

Parameters:	sample_ids : None or list of str Which samples to use. If None, use all feature_ids : None or list of str Which features to use. If None, use all color : None or matplotlib color Which color to use for plotting the lavalamps of these features and samples x_offset : numeric How much to offset the x-axis of each event. Useful if you want to plot the same event, but in several iterations with different celltypes or colors min_samples : int, optional Minimum number of samples to use per grouped celltype. Default 10

sample_ids : None or list of str

Which samples to use. If None, use all

feature_ids : None or list of str

Which features to use. If None, use all

color : None or matplotlib color

Which color to use for plotting the lavalamps of these features and samples

x_offset : numeric

How much to offset the x-axis of each event. Useful if you want to plot the same event, but in several iterations with different celltypes or colors

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

plot_modalities_reduced(sample_ids=None, feature_ids=None, data=None, ax=None, title=None, min_samples=10)[source]¶

Plot events modality assignments in NMF space

This will calculate modalities on all samples provided, without grouping them by celltype. This is because each NMF axis can only show one set of sample ids’ modalties.

Parameters:

Parameters:	sample_ids : list of str Which samples to use. If None, use all. Default None. feature_ids : list of str Which features to use. If None, use all. Default None. data : pandas.DataFrame, optional If provided, use this dataframe instead of the sample_ids and feature_ids provided min_samples : int, optional Minimum number of samples to use per grouped celltype. Default 10 ax : matplotlib.axes.Axes object Axes to plot on. If none, gets current axes title : str Title of the reduced space plot

sample_ids : list of str

Which samples to use. If None, use all. Default None.

feature_ids : list of str

Which features to use. If None, use all. Default None.

data : pandas.DataFrame, optional

If provided, use this dataframe instead of the sample_ids and feature_ids provided

min_samples : int, optional

Minimum number of samples to use per grouped celltype. Default 10

ax : matplotlib.axes.Axes object

Axes to plot on. If none, gets current axes

title : str

Title of the reduced space plot

plot_two_features(feature1, feature2, groupby=None, label_to_color=None, fillna=None, **kwargs)[source]¶

plot_two_samples(sample1, sample2, fillna=None, **kwargs)[source]¶

pooled_inconsistent(*args, **kwargs)[source]¶

Return splicing events which pooled samples are consistently different from the single cells.

Parameters:

Parameters:	singles_ids : list-like List of sample ids of single cells (in the main ”.data” DataFrame) pooled_ids : list-like List of sample ids of pooled cells (in the other ”.pooled” DataFrame) feature_ids : None or list-like List of feature ids. If None, use all fraction_diff_thresh : float
Returns:	large_diff : pandas.DataFrame All splicing events which have a scaled difference larger than the fraction diff thresh

singles_ids : list-like

List of sample ids of single cells (in the main ”.data” DataFrame)

pooled_ids : list-like

List of sample ids of pooled cells (in the other ”.pooled” DataFrame)

feature_ids : None or list-like

List of feature ids. If None, use all

fraction_diff_thresh : float

Returns:

large_diff : pandas.DataFrame

All splicing events which have a scaled difference larger than the fraction diff thresh

raw_reducer = None¶