flotilla.compute.splicing module

class flotilla.compute.splicing.Modalities(excluded_max=0.2, included_min=0.8)[source]

Bases: object

Estimate the modality of a splicing event

This is based off of the “percent spliced-in” (PSI) score of a splicing event, for example in a cassette exon event, how many transcripts use the cassette exon gives the “PSI” (\(\Psi\)) score of that splicing event

Possible modalities include: - Excluded (most cells have excluded the exon) - Middle (most cells have both the included and excluded isoforms) - Included (most cells have included the exon) - Bimodal (approximately a 50:50 distribution of inclusoion:exclusion) - Uniform (uniform distribution of exon usage)

The way that these modalities are calculated is by binning each splicing event across all cells from (0, excluded_max, included_min, 1), and finding the Jensen-Shannon Divergence that event, and each of the five modalities

Parameters:

excluded_max : float, optional (default=0.2)

Maximum value of excluded bin

included_min : float, optional (default=0.8)

Minimum value of included bin

assignments(sqrt_jsd_modalities)[source]

Return the modality with the smallest square root JSD to each event

Parameters:

sqrt_jsd_modalities : pandas.DataFrame

A modalities x features dataframe of the square root Jensen-Shannon divergence between this event and each modality

Returns:

assignments : pandas.Series

The closest modality to each splicing event

counts(psi, bootstrapped=False, bootstrapped_kws=None)[source]

Return the number of events in each modality category

Parameters:

psi : pandas.DataFrame

A samples x features dataframe of psi scores of a splicing event

Returns:

counts : pandas.Series

Counts of each modality

fit_transform(*args, **kwargs)[source]

Given psi scores, estimate the modality of each

Parameters:

data : pandas.DataFrame

A samples x features dataframe, where you want to find the splicing modality of each column (feature)

bootstrapped : bool

Whether or not to use bootstrapping, i.e. resample each splicing event several times to get a better estimate of its true modality. Default False.

bootstrappped_kws : dict

Valid arguments to _bootstrapped_fit_transform. If None, default is dict(n_iter=100, thresh=0.6, minimum_samples=10)

Returns:

assignments : pandas.Series

Modality assignments of each column (feature)

modalities_bins = array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 1], [1, 1, 1]])
modalities_names = ['excluded', 'middle', 'included', 'bimodal', 'uniform']
sqrt_jsd_modalities(binned)[source]

Calculate JSD between all binned splicing events and true modalities

Use square root of JSD because it’s a metric.

Parameters:

binned : pandas.DataFrame

A (n_bins, n_events) sized DataFrame of binned splicing events

Returns:

sqrt_jsd : pandas.DataFrame

A (n_modalities, n_events) sized DataFrame of the square root JSD between splicing events and all modalities

true_modalities = excluded middle included bimodal uniform 0 1 0 0 1 1 1 0 1 0 0 1 2 0 0 1 1 1
flotilla.compute.splicing.get_switchy_score_order(x)[source]

Apply switchy scores to a 2D array of data scores

Parameters:

x : numpy.array

A 2-D numpy array in the shape [n_events, n_samples]

Returns:

numpy.array

A 1-D array of the ordered indices, in switchy score order

flotilla.compute.splicing.switchy_score(array)[source]

Transform a 1D array of data scores to a vector of “switchy scores”

Calculates std deviation and mean of sine- and cosine-transformed versions of the array. Better than sorting by just the mean which doesn’t push the really lowly variant events to the ends.

Parameters:

array : numpy.array

A 1-D numpy array or something that could be cast as such (like a list)

Returns:

float

The “switchy score” of the study_data which can then be compared to other splicing event study_data

Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.