flotilla.compute.infotheory module¶

Information-theoretic calculations

flotilla.compute.infotheory.bin_range_strings(bins)[source]¶

Given a list of bins, make a list of strings of those bin ranges

Parameters:

Parameters:	bins : list_like List of anything, usually values of bin edges
Returns:	bin_ranges : list List of bin ranges >>> bin_range_strings((0, 0.5, 1)) [‘0-0.5’, ‘0.5-1’]

bins : list_like

List of anything, usually values of bin edges

Returns:

bin_ranges : list

List of bin ranges

>>> bin_range_strings((0, 0.5, 1))

[‘0-0.5’, ‘0.5-1’]

flotilla.compute.infotheory.binify(df, bins)[source]¶

Makes a histogram of each column the provided binsize

Parameters:

Parameters:	data : pandas.DataFrame A samples x features dataframe. Each feature (column) will be binned into the provided bins bins : iterable Bins you would like to use for this data. Must include the final bin value, e.g. (0, 0.5, 1) for the two bins (0, 0.5) and (0.5, 1). nbins = len(bins) - 1
Returns:	binned : pandas.DataFrame An nbins x features DataFrame of each column binned across rows

data : pandas.DataFrame

A samples x features dataframe. Each feature (column) will be binned into the provided bins

bins : iterable

Bins you would like to use for this data. Must include the final bin value, e.g. (0, 0.5, 1) for the two bins (0, 0.5) and (0.5, 1). nbins = len(bins) - 1

Returns:

binned : pandas.DataFrame

An nbins x features DataFrame of each column binned across rows

flotilla.compute.infotheory.entropy(binned, base=2)[source]¶

Find the entropy of each column of a dataframe

Parameters:

Parameters:	binned : pandas.DataFrame A nbins x features DataFrame of probability distributions, where each column sums to 1 base : numeric The log-base of the entropy. Default is 2, so the resulting entropy is in bits.
Returns:	entropy : pandas.Seires Entropy values for each column of the dataframe.

binned : pandas.DataFrame

A nbins x features DataFrame of probability distributions, where each column sums to 1

base : numeric

The log-base of the entropy. Default is 2, so the resulting entropy is in bits.

Returns:

entropy : pandas.Seires

Entropy values for each column of the dataframe.

flotilla.compute.infotheory.jsd(p, q)[source]¶

Finds the per-column JSD betwen dataframes p and q

Jensen-Shannon divergence of two probability distrubutions pandas dataframes, p and q. These distributions are usually created by running binify() on the dataframe.

Parameters:

Parameters:	p : pandas.DataFrame An nbins x features DataFrame. q : pandas.DataFrame An nbins x features DataFrame.
Returns:	jsd : pandas.Series Jensen-Shannon divergence of each column with the same names between p and q

p : pandas.DataFrame

An nbins x features DataFrame.

q : pandas.DataFrame

An nbins x features DataFrame.

Returns:

jsd : pandas.Series

Jensen-Shannon divergence of each column with the same names between p and q

flotilla.compute.infotheory.kld(p, q)[source]¶

Kullback-Leiber divergence of two probability distributions pandas dataframes, p and q

Parameters:

Parameters:	p : pandas.DataFrame An nbins x features DataFrame q : pandas.DataFrame An nbins x features DataFrame
Returns:	kld : pandas.Series Kullback-Lieber divergence of the common columns between the dataframe. E.g. between 1st column in p and 1st column in q, and 2nd column in p and 2nd column in q.

p : pandas.DataFrame

An nbins x features DataFrame

q : pandas.DataFrame

An nbins x features DataFrame

Returns:

kld : pandas.Series

Kullback-Lieber divergence of the common columns between the dataframe. E.g. between 1st column in p and 1st column in q, and 2nd column in p and 2nd column in q.

Notes

The input to this function must be probability distributions, not raw values. Otherwise, the output makes no sense.