Compute predictors on data, e.g. classify or regress on features/samples
Bases: flotilla.compute.predict.PredictorBase
Classifier for categorical response variables. A dataset-predictor pair from PredictorDatasetManager
One datset, one predictor, from dataset manager.
Parameters: | predictor_name : str
|
---|
Bases: object
choose the coef that makes some result most likely at all n_features (or some other function of the dataset)
I have no idea what this does. @mlovci
Parameters: | n_features : int
coef : float
max_feature_scaler : function
n_estimators_scaler : function
|
---|---|
Returns: | ??? |
Bases: object
A dataset-predictor pair from PredictorDatasetManager
One datset, one predictor, from dataset manager.
Parameters: | predictor_name : str
data_name : str
trait_name : str
X_data : pandas.DataFrame, optional
trait : pandas.Series, optional
predictor_obj : sklearn predictor, optional
predictor_scoring_fun : function, optional
score_cutoff_fun : function, optional
n_features_dependent_kwargs : dict, optional
constant_kwargs : dict, optional
|
---|
Bases: object
A configuration for a predictor, names and tracks/sets parameters
Dynamically configures some args for predictor based on n_features (if this attribute exists) set general parameters with __init__ yield instances, set by your parameters, with __call__
Construct a predictor configuration
Parameters: | predictor_name : str
obj : sklearn predictor
predictor_scoring_fun : function, optional
score_cutoff_fun : function, optional
n_features_dependent_kwargs : dict, optional (default None)
kwargs : other keyword arguments, optional
|
---|
Bases: object
Manage several predictor configurations
A container for predictor configurations, includes several built-ins @mlovci: built-ins such as ........ ? What is predictor_config vs new_predictor_config? Why are they separate?
Attributes
predictor_config : | |
predictor_configs : | |
builtin_predictor_configs : |
Methods
new_predictor_config(*args, **kwargs) | Create a new predictor configuration |
>>> pcm = PredictorConfigManager()
|
|
>>> # add a new type of predictor
|
|
>>> pcm.new_predictor_config(ExtraTreesClassifier, 'ExtraTreesClassifier',
|
|
... n_features_dependent_kwargs= | |
... {‘max_features’: PredictorConfigScalers.max_feature_scaler, | |
... ‘n_estimators’: PredictorConfigScalers.n_estimators_scaler, | |
... ‘n_jobs’: PredictorConfigScalers.n_jobs_scaler}, | |
... bootstrap=True, random_state=0, | |
... oob_score=True, | |
... verbose=True}) |
Construct a predictor configuration manager with ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, and GradientBoostingRegressor as default predictors.
Create a new predictor configuration
Parameters: | name : str
obj : sklearn predictor object, optional (default=None)
predictor_scoring_fun : function, optional (default=None)
score_cutoff_fun : function, optional (default=None)
n_features_dependent_kwargs : dict, optional (default=None)
kwargs : other keyword arguments
|
---|---|
Returns: | predictorconfig : PredictorConfig
|
Raises: | ValueError
KeyError
|
Create a new predictor configuration, added to predictors
Parameters: | name : str
kwargs : other keyword arguments, optional
|
---|---|
Returns: | predictor : sklearn predictor
|
Bases: object
Scale parameters specified in the keyword arugments based on the dataset size
Scale the maximum number of features per estimator
# TODO: @mlovci what are the principles behind this scaler? to see each feature “x” number of times?
Parameters: | n_features : int, optional (default 500)
coef : float, optional (default 2.5)
|
---|---|
Returns: | n_features : int
|
Raises: | ValueError
|
Scale the number of estimators based on the input features
# TODO: @mlovci what are the principles behind this scaler? to see each feature “x” number of times?
Parameters: | n_features : int, optional (default 500)
coef : float, optional (default 2.5)
|
---|---|
Returns: | n_estimators : int
|
Raises: | ValueError
|
Scale the number of jobs based on how many features are in the data
# TODO: @mlovci what are the principles behind this scaler? to see each feature “x” number of times?
Parameters: | n_features : int
|
---|---|
Returns: | n_jobs : int
|
Raises: | ValueError
|
Bases: object
Store a (n_samples, n_features) matrix and (n_samples,) trait pair
In scikit-learn parlance, store an X (data of independent variables) and y (target prediction) pair
Parameters: | data : pandas.DataFrame
trait : pandas.Series |
---|---|
Raises: | data - X
|
Check if this is the same as another dataset.
Parameters: | data : pandas.DataFrame
trait : pandas.Series
categorical_trait : bool
|
---|---|
Raises: | AssertionError
|
A single, initialized PredictorConfig instance
Parameters: | name : str
kwargs : other keyword arguments
|
---|---|
Returns: | predictorconfig : PredictorConfig
|
Bases: object
A collection of PredictorDataSet instances.
Parameters: | predictor_config_manager : PredictorConfigManager, optional (default None)
|
---|
Attributes
datasets | 3-layer deep dict of {data: {trait: {categorical: dataset}}} |
???? @mlovci please fill in
Parameters: | data_name : str
trait_name : str
categorical_trait : bool, optional (default=False)
|
---|---|
Returns: | dataset : PredictorDataSet
|
??? Difference betwen this and dataset??? @mlovci
Parameters: | data_name : str
trait_name : str
categorical_trait : bool, optional (default=False)
data : pandas.DataFrame, optional (default=None)
trait : pandas.Series, optional (default=None)
predictor_config_manager : PredictorConfigManager (default=None) |
---|---|
Returns: | dataset : PredictorDataSet
|
Bases: flotilla.compute.predict.PredictorBase
Regressor for continuous response variables. A dataset-predictor pair from PredictorDatasetManager
One datset, one predictor, from dataset manager.
Parameters: | predictor_name : str
|
---|
Return scores of how important a feature is to the prediction
Most predictors score output coefficients in the variable cls.feature_importances_ and others may use another name for scores, so this function bridges the gap
Parameters: | cls : sklearn predictor
|
---|---|
Returns: | scores : pandas.Series
|
Calculate a minimum score cutoff for the best features
By default, this function calculates: \(f(x) = mean(x) + 2 * std(x)\)
Parameters: | arr : numpy.ndarray
std_multiplier : float, optional (default=2)
|
---|---|
Returns: | cutoff : float
|