outrigger.psi.compute module

outrigger.psi.compute.calculate_psi(event_annotation, reads2d, isoform1_junctions, isoform2_junctions, min_reads=10, method='mean', uneven_coverage_multiplier=10, n_jobs=-1)[source]

Compute percent-spliced-in of events based on junction reads

Parameters:

event_annotation : pandas.DataFrame

A table where each row represents a single splicing event. The required

columns are the ones specified in isoform1_junctions,

isoform2_junctions, and event_col.

reads2d : pandas.DataFrame

A (n_samples, n_total_junctions) table of the number of reads found in all samples’ exon-exon, all junctions. Very very large, e.g. 1000 samples x 50,000 junctions = 50 million elements number of reads observed at a splice junction of a particular sample.

isoform1_junctions : list

Columns in event_annotation which represent junctions that correspond to isoform1, the Psi=0 isoform, e.g. [‘junction13’] for SE (junctions between exon1 and exon3)

isoform2_junctions : list

Columns in event_annotation which represent junctions that correspond to isoform2, the Psi=1 isoform, e.g. [‘junction12’, ‘junction23’] (junctions between exon1, exon2, and junction between exon2 and exon3)

min_reads : int, optional

Minimum number of reads for a junction to be viable. The rules governing compatibility of events are complex, and it is recommended to read the documentation for outrigger psi (default=10)

method : “mean” | “min”, optional

Denotes the method by which to aggregate junctions from the same isoform - either use the mean (default) or the minimum. (default=”mean”)

uneven_coverage_multiplier : int, optional

Scale factor for the maximum amount bigger one side of a junction can be before rejecting the event, e.g. for an SE event with two junctions, junction12 and junction23, junction12=40 but junction23=500, then this event would be rejected because 500 > 40*10 (default=10)

n_jobs : int, optional

Number of subprocesses to create. Default is -1, which is to use as many processes/cores as possible

Returns:

psi : pandas.DataFrame

An (samples, events) dataframe of the percent spliced-in values

summary : pandas.DataFrame

A (n_samples * n_events, 7) shaped table with the sample id, junction reads, percent spliced-in (Psi), and notes on each event in each sample, that explains why or why not Psi was calculated