flotilla.compute.splicing module¶

class flotilla.compute.splicing.Modalities(excluded_max=0.2, included_min=0.8)[source]¶

Bases: object

Estimate the modality of a splicing event

This is based off of the “percent spliced-in” (PSI) score of a splicing event, for example in a cassette exon event, how many transcripts use the cassette exon gives the “PSI” (\(\Psi\)) score of that splicing event

Possible modalities include: - Excluded (most cells have excluded the exon) - Middle (most cells have both the included and excluded isoforms) - Included (most cells have included the exon) - Bimodal (approximately a 50:50 distribution of inclusoion:exclusion) - Uniform (uniform distribution of exon usage)

The way that these modalities are calculated is by binning each splicing event across all cells from (0, excluded_max, included_min, 1), and finding the Jensen-Shannon Divergence that event, and each of the five modalities

Parameters:

Parameters:	excluded_max : float, optional (default=0.2) Maximum value of excluded bin included_min : float, optional (default=0.8) Minimum value of included bin

excluded_max : float, optional (default=0.2)

Maximum value of excluded bin

included_min : float, optional (default=0.8)

Minimum value of included bin

assignments(sqrt_jsd_modalities)[source]¶

Return the modality with the smallest square root JSD to each event

Parameters:

Parameters:	sqrt_jsd_modalities : pandas.DataFrame A modalities x features dataframe of the square root Jensen-Shannon divergence between this event and each modality
Returns:	assignments : pandas.Series The closest modality to each splicing event

sqrt_jsd_modalities : pandas.DataFrame

A modalities x features dataframe of the square root Jensen-Shannon divergence between this event and each modality

Returns:

assignments : pandas.Series

The closest modality to each splicing event

counts(psi, bootstrapped=False, bootstrapped_kws=None)[source]¶

Return the number of events in each modality category

Parameters:

Parameters:	psi : pandas.DataFrame A samples x features dataframe of psi scores of a splicing event
Returns:	counts : pandas.Series Counts of each modality

psi : pandas.DataFrame

A samples x features dataframe of psi scores of a splicing event

Returns:

counts : pandas.Series

Counts of each modality

fit_transform(*args, **kwargs)[source]¶

Given psi scores, estimate the modality of each

Parameters:

Parameters:	data : pandas.DataFrame A samples x features dataframe, where you want to find the splicing modality of each column (feature) bootstrapped : bool Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False. bootstrappped_kws : dict Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)
Returns:	assignments : pandas.Series Modality assignments of each column (feature)

data : pandas.DataFrame

A samples x features dataframe, where you want to find the splicing modality of each column (feature)

bootstrapped : bool

Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False.

bootstrappped_kws : dict

Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

Returns:

assignments : pandas.Series

Modality assignments of each column (feature)

modalities_bins = array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 1], [1, 1, 1]])¶

modalities_names = ['excluded', 'middle', 'included', 'bimodal', 'uniform']¶

sqrt_jsd_modalities(binned)[source]¶

Calculate JSD between all binned splicing events and true modalities

Use square root of JSD because it’s a metric.

Parameters:

Parameters:	binned : pandas.DataFrame A (n_bins, n_events) sized DataFrame of binned splicing events
Returns:	sqrt_jsd : pandas.DataFrame A (n_modalities, n_events) sized DataFrame of the square root JSD between splicing events and all modalities

binned : pandas.DataFrame

A (n_bins, n_events) sized DataFrame of binned splicing events

Returns:

sqrt_jsd : pandas.DataFrame

A (n_modalities, n_events) sized DataFrame of the square root JSD between splicing events and all modalities

true_modalities = excluded middle included bimodal uniform 0 1 0 0 1 1 1 0 1 0 0 1 2 0 0 1 1 1¶

flotilla.compute.splicing.get_switchy_score_order(x)[source]¶

Apply switchy scores to a 2D array of data scores

Parameters:

Parameters:	x : numpy.array A 2-D numpy array in the shape [n_events, n_samples]
Returns:	numpy.array A 1-D array of the ordered indices, in switchy score order

x : numpy.array

A 2-D numpy array in the shape [n_events, n_samples]

Returns:

numpy.array

A 1-D array of the ordered indices, in switchy score order

flotilla.compute.splicing.switchy_score(array)[source]¶

Transform a 1D array of data scores to a vector of “switchy scores”

Calculates std deviation and mean of sine- and cosine-transformed versions of the array. Better than sorting by just the mean which doesn’t push the really lowly variant events to the ends.

Parameters:

Parameters:	array : numpy.array A 1-D numpy array or something that could be cast as such (like a list)
Returns:	float The “switchy score” of the study_data which can then be compared to other splicing event study_data

array : numpy.array

A 1-D numpy array or something that could be cast as such (like a list)

Returns:

float

The “switchy score” of the study_data which can then be compared to other splicing event study_data