flotilla.data_model.splicing module¶

class flotilla.data_model.splicing.DownsampledSplicingData(df, sample_descriptors)[source]¶

Bases: flotilla.data_model.base.BaseData

Instantiate an object of downsampled splicing data

Parameters:

Parameters:	df : pandas.DataFrame A “tall” dataframe of all miso summary events, with the usual MISO summary columns, and these are required: ‘splice_type’, ‘probability’, ‘iteration.’ Where “probability” indicates the randomly sampling probability from the bam file used to generate these reads, and “iteration” indicates the integer iteration performed, e.g. if multiple resamplings were performed. experiment_design_data: pandas.DataFrame

df : pandas.DataFrame

A “tall” dataframe of all miso summary events, with the usual MISO summary columns, and these are required: ‘splice_type’, ‘probability’, ‘iteration.’ Where “probability” indicates the randomly sampling probability from the bam file used to generate these reads, and “iteration” indicates the integer iteration performed, e.g. if multiple resamplings were performed.

experiment_design_data: pandas.DataFrame

Notes

Warning: this data is usually HUGE (we’re taking like 10GB raw .tsv files) so make sure you have the available memory for dealing with these.

binned_reducer = None¶

n_components = 2¶

raw_reducer = None¶

shared_events[source]¶

Returns:

Returns:	event_count_df : pandas.DataFrame Splicing events on the rows, splice types and probability as column MultiIndex. Values are the number of iterations which share this splicing event at that probability and splice type.

event_count_df : pandas.DataFrame

Splicing events on the rows, splice types and probability as column MultiIndex. Values are the number of iterations which share this splicing event at that probability and splice type.

shared_events_barplot(figure_dir='./')[source]¶

PLot a “histogram” via colored bars of the number of events shared by different iterations at a particular sampling probability

Parameters:

Parameters:	figure_dir : str Where to save the pdf figures created

figure_dir : str

Where to save the pdf figures created

shared_events_percentage(min_iter_shared=5, figure_dir='./')[source]¶

Plot the percentage of all events detected at that iteration, shared by at least ‘min_iter_shared’

Parameters:

Parameters:	min_iter_shared : int Minimum number of iterations sharing an event figure_dir : str Where to save the pdf figures created

min_iter_shared : int

Minimum number of iterations sharing an event

figure_dir : str

Where to save the pdf figures created

class flotilla.data_model.splicing.SpliceJunctionData(df, phenotype_data)[source]¶

Bases: flotilla.data_model.splicing.SplicingData

Class to hold splice junction information from SJ.out.tab files from STAR

Constructor for SpliceJunctionData

Parameters:	data, experiment_design_data

class flotilla.data_model.splicing.SplicingData(data, feature_data=None, binsize=0.1, outliers=None, feature_rename_col=None, feature_ignore_subset_cols=None, excluded_max=0.2, included_min=0.8, pooled=None, predictor_config_manager=None, technical_outliers=None, minimum_samples=0)[source]¶

Bases: flotilla.data_model.base.BaseData

Instantiate a object for percent spliced in (PSI) scores

Parameters:

Parameters:	data : pandas.DataFrame A [n_events, n_samples] dataframe of data events n_components : int Number of components to use in the reducer binsize : float Value between 0 and 1, the bin size for binning the study_data scores excluded_max : float Maximum value for the “excluded” bin of psi scores. Default 0.2. included_max : float Minimum value for the “included” bin of psi scores. Default 0.8.

data : pandas.DataFrame

A [n_events, n_samples] dataframe of data events

n_components : int

Number of components to use in the reducer

binsize : float

Value between 0 and 1, the bin size for binning the study_data scores

excluded_max : float

Maximum value for the “excluded” bin of psi scores. Default 0.2.

included_max : float

Minimum value for the “included” bin of psi scores. Default 0.8.

Notes

‘thresh’ from BaseData is not used.

binify(data)[source]¶

binned_reducer = None¶

modalities(*args, **kwargs)[source]¶

Assigned modalities for these samples and features.

Parameters:

Parameters:	sample_ids : list of str Which samples to use. If None, use all. Default None. feature_ids : list of str Which features to use. If None, use all. Default None. bootstrapped : bool Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. bootstrappped_kws : dict Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)
Returns:	modality_assignments : pandas.Series The modality assignments of each feature given these samples

sample_ids : list of str

Which samples to use. If None, use all. Default None.

feature_ids : list of str

Which features to use. If None, use all. Default None.

bootstrapped : bool

Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality.

bootstrappped_kws : dict

Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

Returns:

modality_assignments : pandas.Series

The modality assignments of each feature given these samples

modalities_counts(*args, **kwargs)[source]¶

Count the number of each modalities of these samples and features

Parameters:

Parameters:	sample_ids : list of str Which samples to use. If None, use all. Default None. feature_ids : list of str Which features to use. If None, use all. Default None. bootstrapped : bool Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False. bootstrappped_kws : dict Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)
Returns:	modalities_counts : pandas.Series The number of events detected in each modality

sample_ids : list of str

Which samples to use. If None, use all. Default None.

feature_ids : list of str

Which features to use. If None, use all. Default None.

bootstrapped : bool

Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False.

bootstrappped_kws : dict

Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

Returns:

modalities_counts : pandas.Series

The number of events detected in each modality

n_components = 2¶

percent_pooled_inconsistent(*args, **kwargs)[source]¶: The percent of splicing events which are

plot_feature(feature_id, sample_ids=None, phenotype_groupby=None, phenotype_order=None, color=None, phenotype_to_color=None, phenotype_to_marker=None, xlabel=None, ylabel=None, nmf_space=False)[source]¶

plot_hist_single_vs_pooled_diff(sample_ids, feature_ids=None, color=None, title='', hist_kws=None)[source]¶

plot_lavalamp_pooled_inconsistent(sample_ids, feature_ids=None, fraction_diff_thresh=0.1, color=None)[source]¶

plot_modalities_bar(sample_ids=None, feature_ids=None, ax=None, i=0, normed=True, legend=True, bootstrapped=False, bootstrapped_kws=None)[source]¶

Plot stacked bar graph of each modality

Parameters:

Parameters:	bootstrapped : bool Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False. bootstrappped_kws : dict Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

bootstrapped : bool

Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False.

bootstrappped_kws : dict

Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

plot_modalities_lavalamps(sample_ids=None, feature_ids=None, color=None, x_offset=0, use_these_modalities=True, bootstrapped=False, bootstrapped_kws=None, ax=None)[source]¶

Plot “lavalamp” scatterplot of each event

Parameters:

Parameters:	sample_ids : None or list of str Which samples to use. If None, use all feature_ids : None or list of str Which features to use. If None, use all color : None or matplotlib color Which color to use for plotting the lavalamps of these features and samples x_offset : numeric How much to offset the x-axis of each event. Useful if you want to plot the same event, but in several iterations with different celltypes or colors axes : None or list of matplotlib.axes.Axes objects Which axes to plot these on use_these_modalities : bool If True, then use these sample ids to calculate modalities. Otherwise, use the modalities assigned using ALL samples and features bootstrapped : bool Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False. bootstrappped_kws : dict Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

sample_ids : None or list of str

Which samples to use. If None, use all

feature_ids : None or list of str

Which features to use. If None, use all

color : None or matplotlib color

Which color to use for plotting the lavalamps of these features and samples

x_offset : numeric

How much to offset the x-axis of each event. Useful if you want to plot the same event, but in several iterations with different celltypes or colors

axes : None or list of matplotlib.axes.Axes objects

Which axes to plot these on

use_these_modalities : bool

If True, then use these sample ids to calculate modalities. Otherwise, use the modalities assigned using ALL samples and features

bootstrapped : bool

Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False.

bootstrappped_kws : dict

Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

plot_modalities_reduced(sample_ids=None, feature_ids=None, ax=None, title=None, bootstrapped=False, bootstrapped_kws=None)[source]¶

Plot modality assignments in DataFrameNMF space (option for lavalamp?)

Parameters:

Parameters:	bootstrapped : bool Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False. bootstrappped_kws : dict Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

bootstrapped : bool

Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False.

bootstrappped_kws : dict

Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

pooled_inconsistent(*args, **kwargs)[source]¶

Return splicing events which pooled samples are consistently different from the single cells.

Parameters:

Parameters:	singles_ids : list-like List of sample ids of single cells (in the main ”.data” DataFrame) pooled_ids : list-like List of sample ids of pooled cells (in the other ”.pooled” DataFrame) feature_ids : None or list-like List of feature ids. If None, use all fraction_diff_thresh : float
Returns:	large_diff : pandas.DataFrame All splicing events which have a scaled difference larger than the fraction diff thresh

singles_ids : list-like

List of sample ids of single cells (in the main ”.data” DataFrame)

pooled_ids : list-like

List of sample ids of pooled cells (in the other ”.pooled” DataFrame)

feature_ids : None or list-like

List of feature ids. If None, use all

fraction_diff_thresh : float

Returns:

large_diff : pandas.DataFrame

All splicing events which have a scaled difference larger than the fraction diff thresh

raw_reducer = None¶

reduce(sample_ids=None, feature_ids=None, featurewise=False, reducer=None, standardize=False, reducer_kwargs=None, bins=None)[source]¶

Parameters:	sample_ids – list of sample ids feature_ids – list of features featurewise – reduce transpose (feature X sample) instead of sample X feature reducer – DataFrameReducer object, defaults to DataFramePCA standardize – standardize columns before reduction reducer_kwargs – kwargs for reducer bins – bins to use for binify
Returns:	reducer object