outrigger.psi.compute module¶
-
outrigger.psi.compute.
calculate_psi
(event_annotation, reads2d, isoform1_junctions, isoform2_junctions, min_reads=10, method='mean', uneven_coverage_multiplier=10, n_jobs=-1)[source]¶ Compute percent-spliced-in of events based on junction reads
Parameters: event_annotation : pandas.DataFrame
A table where each row represents a single splicing event. The required
- columns are the ones specified in isoform1_junctions,
isoform2_junctions, and event_col.
reads2d : pandas.DataFrame
A (n_samples, n_total_junctions) table of the number of reads found in all samples’ exon-exon, all junctions. Very very large, e.g. 1000 samples x 50,000 junctions = 50 million elements number of reads observed at a splice junction of a particular sample.
isoform1_junctions : list
Columns in event_annotation which represent junctions that correspond to isoform1, the Psi=0 isoform, e.g. [‘junction13’] for SE (junctions between exon1 and exon3)
isoform2_junctions : list
Columns in event_annotation which represent junctions that correspond to isoform2, the Psi=1 isoform, e.g. [‘junction12’, ‘junction23’] (junctions between exon1, exon2, and junction between exon2 and exon3)
min_reads : int, optional
Minimum number of reads for a junction to be viable. The rules governing compatibility of events are complex, and it is recommended to read the documentation for
outrigger psi
(default=10)method : “mean” | “min”, optional
Denotes the method by which to aggregate junctions from the same isoform - either use the mean (default) or the minimum. (default=”mean”)
uneven_coverage_multiplier : int, optional
Scale factor for the maximum amount bigger one side of a junction can be before rejecting the event, e.g. for an SE event with two junctions, junction12 and junction23, junction12=40 but junction23=500, then this event would be rejected because 500 > 40*10 (default=10)
n_jobs : int, optional
Number of subprocesses to create. Default is -1, which is to use as many processes/cores as possible
Returns: psi : pandas.DataFrame
An (samples, events) dataframe of the percent spliced-in values
summary : pandas.DataFrame
A (n_samples * n_events, 7) shaped table with the sample id, junction reads, percent spliced-in (Psi), and notes on each event in each sample, that explains why or why not Psi was calculated