flotilla.compute.expression module¶

class flotilla.compute.expression.TwoWayGeneComparisonLocal(sample1_name, sample2_name, df, p_value_cutoff=0.001, local_fraction=0.1, bonferroni=True, fdr=None, dtype='RPKM')[source]¶

Bases: object

Compare gene expression for two samples

Plots a scatter-plot of sample1 vs sample2, taken from df. Calculates differentially expressed genes with a Z-test from the closest (local_fraction * 100)% points. Stores result from statistical calculations in self.result_

Parameters:

Parameters:	sample1_name : str Name of the first (control) sample. Must be a row name (index) in df. Plotted on the x-axis. sample2_name : str Name of the second (treatment) sample. Must be a row name (index) in df. Plotted on the y-axis. df : pandas.DataFrame A samples (rows) x features (columns) pandas DataFrame of expression values p_value_cutoff : float, optional Cutoff for the p-values. Default 0.001. local_fraction : float, optional What fraction of genes to use for local z-score calculation. Default 0.1 bonferonni : bool, optional Whether or not to use the Bonferonni correction on p-values fdr : ???, optional benjamini-hochberg FDR filtering - check result, proceed with caution. sometimes breaks :( dtype : str, optional Data type

sample1_name : str

Name of the first (control) sample. Must be a row name (index) in df. Plotted on the x-axis.

sample2_name : str

Name of the second (treatment) sample. Must be a row name (index) in df. Plotted on the y-axis.

df : pandas.DataFrame

A samples (rows) x features (columns) pandas DataFrame of expression values

p_value_cutoff : float, optional

Cutoff for the p-values. Default 0.001.

local_fraction : float, optional

What fraction of genes to use for local z-score calculation. Default 0.1

bonferonni : bool, optional

Whether or not to use the Bonferonni correction on p-values

fdr : ???, optional

benjamini-hochberg FDR filtering - check result, proceed with caution. sometimes breaks :(

dtype : str, optional

Data type

gstats()[source]¶: Write general statistics of the two-way comparison to standard output

flotilla.compute.expression.benjamini_hochberg(p_values, fdr=0.1)[source]¶

Benjamini-Hochberg correction for multiple hypothesis testing

From: http://udel.edu/~mcdonald/statmultcomp.html One good technique for controlling the false discovery rate was briefly mentioned by Simes (1986) and developed in detail by Benjamini and Hochberg (1995). Put the individual P-values in order, from smallest to largest. The smallest P-value has a rank of i=1, the next has i=2, etc. Then compare each individual P-value to (i/m)Q, where m is the total number of test and Q is the chosen false discovery rate. The largest P-value that has P<(i/m)Q is significant, and all P-values smaller than it are also significant.

Parameters:

Parameters:	p_values : list List of p-values fdr : float, optional Desired false-discovery rate cutoff
Returns:	sigs : numpy.array Boolean array of whether or not the provided p-values are significant given the FDR cutoff

p_values : list

List of p-values

fdr : float, optional

Desired false-discovery rate cutoff

Returns:

sigs : numpy.array

Boolean array of whether or not the provided p-values are significant given the FDR cutoff