outrigger.io.star module

Read splice junction output files from STAR aligner (SJ.out.tab)

outrigger.io.star.int_to_junction_motif(n)[source]
outrigger.io.star.make_metadata(spliced_reads, columns=('junction_id', 'chrom', 'junction_start', 'junction_stop', 'strand', 'annotated', 'exon_start', 'exon_stop'))[source]

Get barebones junction chrom, start, stop, strand information

Parameters:

spliced_reads : pandas.DataFrame

Concatenated SJ.out.tab files created by read_sj_out_tab

columns : iterable

Which columns to use to make the metadata

Returns:

junctions : pandas.DataFrame

A (n_junctions, 9) dataframe containing the columns:
  • junction_id
  • chrom
  • intron_start
  • intron_stop
  • exon_start
  • exon_stop
  • strand
  • intron_motif
  • annotated
outrigger.io.star.read_multiple_sj_out_tab(filenames, ignore_multimapping=False, sample_id_func=<function basename>, n_jobs=-1)[source]

Read the splice junction files and return a tall, tidy dataframe

Adds a column called “sample_id” based on the basename of the file, minus “SJ.out.tab”

Parameters:

filenames : iterator

A list or other iterator of filenames to read

multimapping : bool

If True, include the multimapped reads in total read count

sample_id_func : function

A function to extract the sample id from the filenames

Returns:

metadata : pandas.DataFrame

A tidy dataframe, where each row has the observed reads for a sample

outrigger.io.star.read_sj_out_tab(filename)[source]

Read an SJ.out.tab file as produced by the RNA-STAR aligner into a pandas Dataframe

Parameters:

filename : str of filename or file handle

Filename of the SJ.out.tab file you want to read in

Returns:

sj : pandas.DataFrame

Dataframe of splice junctions with the columns, (‘chrom’, ‘junction_start’, ‘junction_stop’, ‘strand’, ‘junction_motif’, ‘exon_start’, ‘exon_stop’, ‘annotated’, ‘unique_junction_reads’, ‘multimap_junction_reads’, ‘max_overhang’)