Base data class for all data types. All data types in flotilla inherit from this, or a child object (like ExpressionData).
Bases: object
Base class for biological data measurements.
All data types in flotilla inherit from this, and have all functionality described here
Attributes
feature_subsets | Dict of feature subset names to their list of feature ids |
variant | Features whose variance is 2 std devs away from mean variance |
data | (pandas.DataFrame) A (n_samples, m_features) sized DataFrame of filtered input data, with features with too few samples (minimum_samples) detected at thresh removed. Compared to data_original, ``m_features <= n_features` |
data_type | (str) String indicating what kind of data this is, e.g. “splicing” or “expression” |
data_original | (pandas.DataFrame) A (n_samples, n_features) sized DataFrame of all input data, before removing features for having too few samples |
feature_data | (pandas.DataFrame) A (k_features, n_features_about_features) sized DataFrame of features about the feature data. Notice that this DataFrame does not need to be the same size as the data, but must at least include all the features from data. Compared to data, k_features >= m_features |
predictor_config_manager | (PredictorConfigManager) Manage different combinations of predictor on different data subtypes |
Methods
maybe_renamed_to_feature_id(feature_id) | To be able to give a simple gene name, e.g. |
feature_renamer | If feature_rename_col is specified in BaseData.__init__(), this will rename the feature ID to a new name. If feature_rename_col is not specified, then this will return the original id |
Abstract base class for biological measurements
Parameters: | data : pandas.DataFrame
thresh : float, optional (default=-np.inf)
minimum_samples : int, optional (default=0)
feature_data : pandas.DataFrame, optional (default=None)
feature_rename_col : str, optional (default=None)
feature_ignore_subset_cols : list-like (default=None)
technical_outliers : list-like, optional (default=None)
outliers : list-like, optional (default=None)
pooled : list-like, optional (default=None)
predictor_config_manager : PredictorConfigManager, optional
data_type : str, optional (default=None)
|
---|
Notes
Any cells not marked as “technical_outliers”, “outliers” or “pooled” are considered as single-cell samples.
Get features whose change in NMF space between phenotypes is large
Parameters: | groupby : mappable
phenotype_transitions : list of length-2 tuples of str
n : int
|
---|---|
Returns: | big_transitions : pandas.DataFrame
|
Make and memoize a predictor on a categorical trait (associated with samples) subset of genes
Parameters: | trait : pandas.Series
sample_ids : None or list of strings
feature_ids : None or list of strings
standardize : bool
predictor : flotilla.visualize.predict classifier
predictor_kwargs : dict or None
predictor_scoring_fun : function
score_cutoff_fun : function
|
---|---|
Returns: | predictor : flotilla.compute.predict.PredictorBaseViz
|
Convert a feature subset name to a list of feature ids
Mean Jensen-Shannon divergence of features across phenotypes
Parameters: | groupby : mappable
n_iter : int
n_bins : int
|
---|---|
Returns: | jsd_2d : pandas.DataFrame
|
Jensen-Shannon divergence of features across phenotypes
Parameters: | groupby : mappable
n_iter : int
n_bins : int
|
---|---|
Returns: | jsd_df : pandas.DataFrame
|
To be able to give a simple gene name, e.g. “RBFOX2” and get the official ENSG ids or MISO ids
Parameters: | feature_id : str
|
---|---|
Returns: | feature_id : str or list-like
|
Calculate NMF-space position of splicing events in phenotype groups
Parameters: | groupby : mappable
n : int or float
|
---|---|
Returns: | df : pandas.DataFrame
|
Get distance in NMF space of different splicing events
Parameters: | groupby : mappable
phenotype_transitions : list of str pairs
n : int or float
|
---|---|
Returns: | nmf_space_transitions : pandas.DataFrame
|
Violinplots and NMF transitions of features different in phenotypes
Plot violinplots and NMF-space transitions of features that have large NMF-space transitions between different phenotypes
Parameters: | n : int
|
---|
Classify samples on boolean or categorical traits
Parameters: | trait : pandas.Series
sample_ids : list-like, optional (default=None)
feature_ids : list-like, optional (default=None)
predictor_name : str
standardize : bool, optional (default=True)
data_name : str, optional (default=None)
groupby : mappable, optional (default=None)
label_to_color : dict, optional (default=None)
label_to_marker : dict, optional (default=None)
order : list, optional (default=None)
color : list, optional (default=None)
plotting_kwargs : other keyword arguments
|
---|---|
Returns: | cv : ClassifierViz
|
Principal component-like analysis of measurements
Parameters: | x_pc : int, optional
y_pc : int, optional
sample_ids : list, optional
feature_ids : list, optional
featurewise : bool, optional
reducer : DataFrameReducerBase, optional
plot_violins : bool, optional
groupby : mappable, optional
label_to_color : dict, optional
label_to_marker : dict, optional
order : list, optional
reduce_kwargs : dict, optional
title : str, optional
most_variant_features : bool, optional
std_multiplier : float, optional
scale_by_variance : bool, optional
plotting_kwargs : other keyword arguments
|
---|---|
Returns: | viz : DecompositionViz
|
Plot the violinplot of a feature. Have the option to show NMF movement
Plot the values of two features
Parameters: | sample1 : str
sample2 : str
fillna : float
Any other keyword arguments valid for seaborn.jointplot |
---|---|
Returns: | jointgrid : seaborn.axisgrid.JointGrid
See Also seaborn.jointplot |
Make and memoize a reduced dimensionality representation of data
Parameters: | data : pandas.DataFrame
sample_ids : None or list of strings
feature_ids : None or list of strings
featurewise : bool
standardize : bool
title : str
reducer_kwargs : dict
|
---|---|
Returns: | reducer_object : flotilla.compute.reduce.ReducerViz
|
Get NMF distance of features between phenotype transitions
Parameters: | positions : pandas.DataFrame
transitions : list of 2-string tuples
|
---|---|
Returns: | transitions : pandas.DataFrame
|
Get subsets from metadata, including boolean and categorical columns
Parameters: | metadata : pandas.DataFrame
minimum : int
subset_type : str
ignore : list-like
|
---|---|
Returns: | subsets : dict
|