flotilla.compute.expression module

class flotilla.compute.expression.TwoWayGeneComparisonLocal(sample1_name, sample2_name, df, p_value_cutoff=0.001, local_fraction=0.1, bonferroni=True, fdr=None, dtype='RPKM')[source]

Bases: object

Compare gene expression for two samples

Plots a scatter-plot of sample1 vs sample2, taken from df. Calculates differentially expressed genes with a Z-test from the closest (local_fraction * 100)% points. Stores result from statistical calculations in self.result_

Parameters:

sample1_name : str

Name of the first (control) sample. Must be a row name (index) in df. Plotted on the x-axis.

sample2_name : str

Name of the second (treatment) sample. Must be a row name (index) in df. Plotted on the y-axis.

df : pandas.DataFrame

A samples (rows) x features (columns) pandas DataFrame of expression values

p_value_cutoff : float, optional

Cutoff for the p-values. Default 0.001.

local_fraction : float, optional

What fraction of genes to use for local z-score calculation. Default 0.1

bonferonni : bool, optional

Whether or not to use the Bonferonni correction on p-values

fdr : ???, optional

benjamini-hochberg FDR filtering - check result, proceed with caution. sometimes breaks :(

dtype : str, optional

Data type

gstats()[source]

Write general statistics of the two-way comparison to standard output

flotilla.compute.expression.benjamini_hochberg(p_values, fdr=0.1)[source]

Benjamini-Hochberg correction for multiple hypothesis testing

From: http://udel.edu/~mcdonald/statmultcomp.html One good technique for controlling the false discovery rate was briefly mentioned by Simes (1986) and developed in detail by Benjamini and Hochberg (1995). Put the individual P-values in order, from smallest to largest. The smallest P-value has a rank of i=1, the next has i=2, etc. Then compare each individual P-value to (i/m)Q, where m is the total number of test and Q is the chosen false discovery rate. The largest P-value that has P<(i/m)Q is significant, and all P-values smaller than it are also significant.

Parameters:

p_values : list

List of p-values

fdr : float, optional

Desired false-discovery rate cutoff

Returns:

sigs : numpy.array

Boolean array of whether or not the provided p-values are significant given the FDR cutoff

Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.