Bases: object
A biological study, with associated metadata, expression, and splicing data.
Construct a biological study
This class only accepts data, no filenames. All data must already have been read in and exist as Python objects.
Parameters: | sample_metadata : pandas.DataFrame
version : str
expression_data : pandas.DataFrame
expression_feature_data : pandas.DatFrame
expression_feature_rename_col : str
expression_log_base : float
thresh : float
expression_plus_one : bool
splicing_data : pandas.DataFrame
splicing_feature_data : pandas.DataFrame
splicing_feature_rename_col : str
splicing_feature_expression_id_col : str
mapping_stats_data : pandas.DataFrame
mapping_stats_number_mapped_col : str
spikein_data : pandas.DataFrame
spikein_feature_data : pandas.DataFrame
drop_outliers : bool
species : str
gene_ontology_data : pandas.DataFrame
metadata_pooled_col : str
|
---|
Splicing events whose change in NMF space is large
By large, we mean that difference is 2 standard deviations away from the mean
Parameters: | phenotype_transitions : list of length-2 tuples of str
data_type : ‘splicing’ | ‘expression’
n : int
|
---|---|
Returns: | big_transitions : pandas.DataFrame
|
Number of cells that detected each event, per celltype
Assign samples marked as “outlier” in metadata, to other datas
Percentage of events inconsistent with pooled at expression threshs
Parameters: | bins : list-like
|
---|---|
Returns: | expression_vs_inconsistent : pd.DataFrame
|
Given a name of a feature subset, get the associated feature ids
Parameters: | data_type : str
feature_subset : str
|
---|---|
Returns: | feature_ids : list of strings
|
Filter splicing events on expression values
Parameters: | expression_thresh : float
|
---|---|
Returns: | psi : pandas.DataFrame
|
Create a study object from a datapackage dictionary
Parameters: | datapackage : dict |
---|---|
Returns: | study : flotilla.Study
|
Create a study from a url of a datapackage.json file
Parameters: | datapackage_url : str
species_data_pacakge_base_url : str
|
---|---|
Returns: | study : Study
|
Raises: | AttributeError
|
Calculate gene ontology enrichment of provided features
Parameters: | feature_ids : list-like
background : list-like, optional
domain : str or list, optional
p_value_cutoff : float, optional
min_feature_size : int, optional
min_background_size : int, optional
Returns ——- enrichment : pandas.DataFrame
|
---|
User selects from columns that start with ‘outlier_‘ to merge multiple outlier classifications
Performs Jensen-Shannon Divergence on both splicing and expression study_data
Jensen-Shannon divergence is a method of quantifying the amount of change in distribution of one measurement (e.g. a splicing event or a gene expression) from one celltype to another.
Get modality assignments of splicing data
Parameters: | sample_subset : str or None, optional
feature_subset : str or None, optional
expression_thresh : float, optional
|
---|---|
Returns: | modalities : pandas.DataFrame
|
Get number of splicing events in modality categories
Parameters: | sample_subset : str or None, optional
feature_subset : str or None, optional
expression_thresh : float, optional
|
---|---|
Returns: | modalities : pandas.DataFrame
|
The change in NMF space of splicing events across phenotypes
Parameters: | phenotype_transitions : list of length-2 tuples of str
data_type : ‘splicing’ | ‘expression’
n : int
|
---|---|
Returns: | big_transitions : pandas.DataFrame
|
Plot a predictor for the specified data type and trait(s)
Parameters: | data_type : str
trait : str
|
---|
Visualize hierarchical relationships within samples and features
Visualize clustered correlations of samples across features
Plot the violinplot and NMF transitions of a splicing event
Plot the graph (network) of these data
Parameters: | data_type : str
sample_subset : str or None
feature_subset : str or None
|
---|
Make grouped barplots of the number of modalities per phenotype
Parameters: | sample_subset : str or None
feature_subset : str or None
expression_thresh : float
percentages : bool
|
---|
Plot each modality in each celltype on a separate axes
Parameters: | sample_subset : str or None
feature_subset : str or None
expression_thresh : float
|
---|
Plot splicing events with modality assignments in NMF space
This will plot a separate NMF space for each celltype in the data, as well as one for all samples.
Parameters: | sample_subset : str or None
feature_subset : str or None
expression_thresh : float
|
---|
Performs DataFramePCA on both expression and splicing study_data
Parameters: | data_type : str
x_pc : int, optional
y_pc : int, optional
sample_subset : str or None
feature_subset : str or None
title : str, optional
featurewise : bool, optional
plot_violins : bool
show_point_labels : bool, optional
reduce_kwargs : dict, optional
color_samples_by : str, optional
bokeh : bool, optional
most_variant_features : bool, optional
std_multiplier : float, optional
scale_by_variance : bool, optional
kwargs : other keyword arguments
|
---|
Make a scatterplot of two features’ data
Parameters: | feature1 : str
feature2 : str
|
---|
Plot a scatterplot of two samples’ data
Parameters: | sample1 : str
sample2 : str
data_type : “expression” | “splicing”
Any other keyword arguments valid for seaborn.jointplot |
---|---|
Returns: | jointgrid : seaborn.axisgrid.JointGrid
See Also seaborn.jointplot |
Convert a string naming a subset of phenotypes in the data into sample ids
Parameters: | phenotype_subset : str
|
---|---|
Returns: | sample_ids : list of strings
|
Bases: object
Manage several predictor configurations
A container for predictor configurations, includes several built-ins @mlovci: built-ins such as ........ ? What is predictor_config vs new_predictor_config? Why are they separate?
Attributes
predictor_config : | |
predictor_configs : | |
builtin_predictor_configs : |
Methods
new_predictor_config(*args, **kwargs) | Create a new predictor configuration |
>>> pcm = PredictorConfigManager()
|
|
>>> # add a new type of predictor
|
|
>>> pcm.new_predictor_config(ExtraTreesClassifier, 'ExtraTreesClassifier',
|
|
... n_features_dependent_kwargs= | |
... {‘max_features’: PredictorConfigScalers.max_feature_scaler, | |
... ‘n_estimators’: PredictorConfigScalers.n_estimators_scaler, | |
... ‘n_jobs’: PredictorConfigScalers.n_jobs_scaler}, | |
... bootstrap=True, random_state=0, | |
... oob_score=True, | |
... verbose=True}) |
Construct a predictor configuration manager with ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, and GradientBoostingRegressor as default predictors.
Names of the predictor configurations
Create a new predictor configuration
Parameters: | name : str
obj : sklearn predictor object, optional (default=None)
predictor_scoring_fun : function, optional (default=None)
score_cutoff_fun : function, optional (default=None)
n_features_dependent_kwargs : dict, optional (default=None)
kwargs : other keyword arguments
|
---|---|
Returns: | predictorconfig : PredictorConfig
|
Raises: | ValueError
KeyError
|
Create a new predictor configuration, added to predictors
Parameters: | name : str
kwargs : other keyword arguments, optional
|
---|---|
Returns: | predictor : sklearn predictor
|
Dict of predictor configurations
Bases: object
A collection of PredictorDataSet instances.
Parameters: | predictor_config_manager : PredictorConfigManager, optional (default None)
|
---|
Attributes
datasets | 3-layer deep dict of {data: {trait: {categorical: dataset}}} |
???? @mlovci please fill in
Parameters: | data_name : str
trait_name : str
categorical_trait : bool, optional (default=False)
|
---|---|
Returns: | dataset : PredictorDataSet
|
3-layer deep dict of {data: {trait: {categorical: dataset}}}
??? Difference betwen this and dataset??? @mlovci
Parameters: | data_name : str
trait_name : str
categorical_trait : bool, optional (default=False)
data : pandas.DataFrame, optional (default=None)
trait : pandas.Series, optional (default=None)
predictor_config_manager : PredictorConfigManager (default=None) |
---|---|
Returns: | dataset : PredictorDataSet
|
Example code for making a datapackage for a Study
Bases: object
A biological study, with associated metadata, expression, and splicing data.
Construct a biological study
This class only accepts data, no filenames. All data must already have been read in and exist as Python objects.
Parameters: | sample_metadata : pandas.DataFrame
version : str
expression_data : pandas.DataFrame
expression_feature_data : pandas.DatFrame
expression_feature_rename_col : str
expression_log_base : float
thresh : float
expression_plus_one : bool
splicing_data : pandas.DataFrame
splicing_feature_data : pandas.DataFrame
splicing_feature_rename_col : str
splicing_feature_expression_id_col : str
mapping_stats_data : pandas.DataFrame
mapping_stats_number_mapped_col : str
spikein_data : pandas.DataFrame
spikein_feature_data : pandas.DataFrame
drop_outliers : bool
species : str
gene_ontology_data : pandas.DataFrame
metadata_pooled_col : str
|
---|
Splicing events whose change in NMF space is large
By large, we mean that difference is 2 standard deviations away from the mean
Parameters: | phenotype_transitions : list of length-2 tuples of str
data_type : ‘splicing’ | ‘expression’
n : int
|
---|---|
Returns: | big_transitions : pandas.DataFrame
|
Number of cells that detected each event, per celltype
Assign samples marked as “outlier” in metadata, to other datas
Percentage of events inconsistent with pooled at expression threshs
Parameters: | bins : list-like
|
---|---|
Returns: | expression_vs_inconsistent : pd.DataFrame
|
Given a name of a feature subset, get the associated feature ids
Parameters: | data_type : str
feature_subset : str
|
---|---|
Returns: | feature_ids : list of strings
|
Filter splicing events on expression values
Parameters: | expression_thresh : float
|
---|---|
Returns: | psi : pandas.DataFrame
|
Create a study object from a datapackage dictionary
Parameters: | datapackage : dict |
---|---|
Returns: | study : flotilla.Study
|
Create a study from a url of a datapackage.json file
Parameters: | datapackage_url : str
species_data_pacakge_base_url : str
|
---|---|
Returns: | study : Study
|
Raises: | AttributeError
|
Calculate gene ontology enrichment of provided features
Parameters: | feature_ids : list-like
background : list-like, optional
domain : str or list, optional
p_value_cutoff : float, optional
min_feature_size : int, optional
min_background_size : int, optional
Returns ——- enrichment : pandas.DataFrame
|
---|
User selects from columns that start with ‘outlier_‘ to merge multiple outlier classifications
Performs Jensen-Shannon Divergence on both splicing and expression study_data
Jensen-Shannon divergence is a method of quantifying the amount of change in distribution of one measurement (e.g. a splicing event or a gene expression) from one celltype to another.
Get modality assignments of splicing data
Parameters: | sample_subset : str or None, optional
feature_subset : str or None, optional
expression_thresh : float, optional
|
---|---|
Returns: | modalities : pandas.DataFrame
|
Get number of splicing events in modality categories
Parameters: | sample_subset : str or None, optional
feature_subset : str or None, optional
expression_thresh : float, optional
|
---|---|
Returns: | modalities : pandas.DataFrame
|
The change in NMF space of splicing events across phenotypes
Parameters: | phenotype_transitions : list of length-2 tuples of str
data_type : ‘splicing’ | ‘expression’
n : int
|
---|---|
Returns: | big_transitions : pandas.DataFrame
|
Plot a predictor for the specified data type and trait(s)
Parameters: | data_type : str
trait : str
|
---|
Visualize hierarchical relationships within samples and features
Visualize clustered correlations of samples across features
Plot the violinplot and NMF transitions of a splicing event
Plot the graph (network) of these data
Parameters: | data_type : str
sample_subset : str or None
feature_subset : str or None
|
---|
Make grouped barplots of the number of modalities per phenotype
Parameters: | sample_subset : str or None
feature_subset : str or None
expression_thresh : float
percentages : bool
|
---|
Plot each modality in each celltype on a separate axes
Parameters: | sample_subset : str or None
feature_subset : str or None
expression_thresh : float
|
---|
Plot splicing events with modality assignments in NMF space
This will plot a separate NMF space for each celltype in the data, as well as one for all samples.
Parameters: | sample_subset : str or None
feature_subset : str or None
expression_thresh : float
|
---|
Performs DataFramePCA on both expression and splicing study_data
Parameters: | data_type : str
x_pc : int, optional
y_pc : int, optional
sample_subset : str or None
feature_subset : str or None
title : str, optional
featurewise : bool, optional
plot_violins : bool
show_point_labels : bool, optional
reduce_kwargs : dict, optional
color_samples_by : str, optional
bokeh : bool, optional
most_variant_features : bool, optional
std_multiplier : float, optional
scale_by_variance : bool, optional
kwargs : other keyword arguments
|
---|
Make a scatterplot of two features’ data
Parameters: | feature1 : str
feature2 : str
|
---|
Plot a scatterplot of two samples’ data
Parameters: | sample1 : str
sample2 : str
data_type : “expression” | “splicing”
Any other keyword arguments valid for seaborn.jointplot |
---|---|
Returns: | jointgrid : seaborn.axisgrid.JointGrid
See Also seaborn.jointplot |
Convert a string naming a subset of phenotypes in the data into sample ids
Parameters: | phenotype_subset : str
|
---|---|
Returns: | sample_ids : list of strings
|
Bases: flotilla.data_model.base.BaseData
Object for holding and operating on expression data
Bases: flotilla.data_model.base.BaseData
Instantiate a object for percent spliced in (PSI) scores
Parameters: | data : pandas.DataFrame
n_components : int
binsize : float
excluded_max : float
included_max : float
|
---|
Notes
‘thresh’ from BaseData is not used.
Assigned modalities for these samples and features.
Parameters: | sample_ids : list of str, optional
feature_ids : list of str, optional
data : pandas.DataFrame, optional
|
---|---|
Returns: | modality_assignments : pandas.Series
|
Count the number of each modalities of these samples and features
Parameters: | sample_ids : list of str
feature_ids : list of str
data : pandas.DataFrame, optional
|
---|---|
Returns: | modalities_counts : pandas.Series
|
Plot histogram of distances between singles and pooled
Make grouped barplots of the number of modalities per group
Parameters: | sample_ids : None or list of str
feature_ids : None or list of str
color : None or matplotlib color
x_offset : numeric
|
---|
Plot “lavalamp” scatterplot of each event
Parameters: | sample_ids : None or list of str
feature_ids : None or list of str
color : None or matplotlib color
x_offset : numeric
|
---|
Plot events modality assignments in NMF space
This will calculate modalities on all samples provided, without grouping them by celltype. This is because each NMF axis can only show one set of sample ids’ modalties.
Parameters: | sample_ids : list of str
feature_ids : list of str
data : pandas.DataFrame, optional
ax : matplotlib.axes.Axes object
title : str
|
---|
Return splicing events which pooled samples are consistently different from the single cells.
Parameters: | singles_ids : list-like
pooled_ids : list-like
feature_ids : None or list-like
fraction_diff_thresh : float |
---|---|
Returns: | large_diff : pandas.DataFrame
|
Bases: flotilla.data_model.base.BaseData
Bases: flotilla.data_model.expression.ExpressionData
Class for Spikein data and associated functions Attributes ———-
Constructor for
Parameters: | data, experiment_design_data |
---|
Bases: flotilla.data_model.base.BaseData
Constructor for mapping statistics data from STAR
Constructor for MappingStatsData
Parameters: | data, sample_descriptors |
---|
Bases: object
Object to calculate enrichment of Gene Ontology terms
Acceptable Gene Ontology tables can be downloaded from ENSEMBL’s BioMart tool: http://www.ensembl.org/biomart
Parameters: | data : pandas.DataFrame
|
---|
Bonferroni-corrected hypergeometric p-values of GO enrichment
Calculates hypergeometric enrichment of the features of interest, in each GO category.
Parameters: | features_of_interest : list-like
background : list-like, optional
p_value_cutoff : float, optional
cross_reference : dict-like, optional
min_feature_size : int, optional
min_background_size : int, optional
domain : str or list, optional
|
---|---|
Returns: | enrichment_df : pandas.DataFrame
|
Raises: | ValueError
|