flotilla.compute.decomposition module

Perform various dimensionality reduction algorithms on data

class flotilla.compute.decomposition.DataFrameICA(df, n_components=None, **kwargs)[source]

Bases: flotilla.compute.decomposition.DataFrameReducerBase, sklearn.decomposition.fastica_.FastICA

Perform Independent Comopnent Analysis on a DataFrame

Initialize and fit a dataframe to a decomposition algorithm

Parameters:

df : pandas.DataFrame

A (samples, features) dataframe of data to fit to the reduction algorithm

n_components : int

Number of components to calculate. If None, use as many components as there are samples

kwargs : keyword arguments

Any other arguments to the reduction algorithm

class flotilla.compute.decomposition.DataFrameNMF(df, n_components=None, **kwargs)[source]

Bases: flotilla.compute.decomposition.DataFrameReducerBase, sklearn.decomposition.nmf.NMF

Perform Non-Negative Matrix Factorization on a DataFrame

fit(X)[source]

Override scikit-learn’s fit() for our purposes

Duplicated fit code for DataFrameNMF because sklearn’s NMF cheats for efficiency and calls fit_transform. Method resolution order (“MRO”) resolves the closest (in this package) _fit_transform first and so there’s a recursion error:

def fit(self, X, y=None, **kwargs):
self._fit_transform(X, **kwargs) return self
class flotilla.compute.decomposition.DataFramePCA(df, n_components=None, **kwargs)[source]

Bases: flotilla.compute.decomposition.DataFrameReducerBase, sklearn.decomposition.pca.PCA

Perform Principal Components Analaysis on a DataFrame

Initialize and fit a dataframe to a decomposition algorithm

Parameters:

df : pandas.DataFrame

A (samples, features) dataframe of data to fit to the reduction algorithm

n_components : int

Number of components to calculate. If None, use as many components as there are samples

kwargs : keyword arguments

Any other arguments to the reduction algorithm

class flotilla.compute.decomposition.DataFrameReducerBase(df, n_components=None, **kwargs)[source]

Bases: object

Just like scikit-learn’s reducers, but with prettied up DataFrames.

Initialize and fit a dataframe to a decomposition algorithm

Parameters:

df : pandas.DataFrame

A (samples, features) dataframe of data to fit to the reduction algorithm

n_components : int

Number of components to calculate. If None, use as many components as there are samples

kwargs : keyword arguments

Any other arguments to the reduction algorithm

fit(X)[source]

Perform a scikit-learn fit and relabel dimensions to be informative names

Parameters:

X : pandas.DataFrame

A (n_samples, n_features) Dataframe of data to reduce

Returns:

self : DataFrameReducerBase

A instance of the data, now with components_, explained_variance_, and explained_variance_ratio_ attributes

fit_transform(X)[source]

Perform both a fit and a transform on the input data

Fit the data to the reduction algorithm, and transform the data to the reduced space.

Parameters:

X : pandas.DataFrame

A (n_samples, n_features) dataframe to both fit and transform

Returns:

self : DataFrameReducerBase

A fit and transformed instance of the object

Raises:

ValueError

If the input is not a pandas DataFrame, will not perform the fit and transform

static relabel_pcs(x)[source]

Given a list of integers, change the name to be a 1-based principal component representation

transform(X)[source]

Transform a matrix into the compoment space

Parameters:

X : pandas.DataFrame

A (n_samples, n_features) sized DataFrame to transform into the current compoment space

Returns:

component_space : pandas.DataFrame

A (n_samples, self.n_components) sized DataFrame transformed into component space

class flotilla.compute.decomposition.DataFrameTSNE(df, n_components=None, **kwargs)[source]

Bases: flotilla.compute.decomposition.DataFrameReducerBase

Perform t-Distributed Stochastic Neighbor Embedding on a DataFrame

Read more: http://homepage.tudelft.nl/19j49/t-SNE.html

Initialize and fit a dataframe to a decomposition algorithm

Parameters:

df : pandas.DataFrame

A (samples, features) dataframe of data to fit to the reduction algorithm

n_components : int

Number of components to calculate. If None, use as many components as there are samples

kwargs : keyword arguments

Any other arguments to the reduction algorithm

fit_transform(X)[source]

Perform both a fit and a transform on the input data

Fit the data to the reduction algorithm, and transform the data to the reduced space.

Parameters:

X : pandas.DataFrame

A (n_samples, n_features) dataframe to both fit and transform

Returns:

self : DataFrameReducerBase

A fit and transformed instance of the object

Raises:

ValueError

If the input is not a pandas DataFrame, will not perform the fit and transform

Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.