flotilla.compute.outlier module¶

Detect outlier samples in data

class flotilla.compute.outlier.OutlierDetection(X, method=None, nu=0.1, kernel='rbf', gamma=0.1, random_state=0, **kwargs)[source]¶

Bases: object

Construct an outlier detection object

Parameters:

Parameters:	X : pandas.DataFrame A (n_samples, n_features) dataframe, where the outliers will be detected from the rows (the samples) method : sklearn classifier, optional If None, defaults to OneClassSVM. The method class must have both method.fit() and method.predict() methods nu : float, optional (default 0.1) An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken. kernel : str, optional (default=’rbf’) The kernel to be used by the outlier detection algorihthm gamma : float, optional (default=0.1) Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. If gamma is 0.0 then 1/n_features will be used instead. random_state : int, optional (default=0) Random state of the method, for reproducibility. kwargs : other keyword arguments, optional All other keyword arguments are passed to method()

X : pandas.DataFrame

A (n_samples, n_features) dataframe, where the outliers will be detected from the rows (the samples)

method : sklearn classifier, optional

If None, defaults to OneClassSVM. The method class must have both method.fit() and method.predict() methods

nu : float, optional (default 0.1)

An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.

kernel : str, optional (default=’rbf’)

The kernel to be used by the outlier detection algorihthm

gamma : float, optional (default=0.1)

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. If gamma is 0.0 then 1/n_features will be used instead.

random_state : int, optional (default=0)

Random state of the method, for reproducibility.

kwargs : other keyword arguments, optional

All other keyword arguments are passed to method()

predict(X=None)[source]¶

Predict which samples are outliers

Parameters:

Parameters:	X : pandas.DataFrame, optional (default None) A (n_samples, n_features) Dataframe. If None, predict outliers of the original input data, where the new data has the same number of features as the original data. Otherwise, use the original input data to detect outliers on this new data.
Returns:	outliers : pandas.Series A boolean

X : pandas.DataFrame, optional (default None)

A (n_samples, n_features) Dataframe. If None, predict outliers of the original input data, where the new data has the same number of features as the original data. Otherwise, use the original input data to detect outliers on this new data.

Returns:

outliers : pandas.Series

A boolean