Storing supplemental data on ``Study`` objects
==============================================
A recently added feature is the ability to store any arbitrary pandas
dataframe on ``study.supplemental``, and this will get re-loaded every
time you ``embark`` on that datapackage. Let's start with the
batch-corrected `BrainSpan `_ Allen Brain
Institute's Brain Atlas data.
.. code:: python
import flotilla
study = flotilla.embark(flotilla._brainspan)
.. parsed-literal::
Creating a directory for saving your flotilla projects: /home/travis/flotilla_projects
Creating a directory for saving the data for this project: /home/travis/flotilla_projects/brainspan_filtered_and_markers_amazon
https://s3-us-west-2.amazonaws.com/flotilla/brainspan_batch_corrected_for_amazon_s3/datapackage.json has not been downloaded before.
Downloading now to /home/travis/flotilla_projects/brainspan_filtered_and_markers_amazon/datapackage.json
2015-06-09 22:42:57 Parsing datapackage to create a Study object
https://s3-us-west-2.amazonaws.com/flotilla/brainspan_batch_corrected_for_amazon_s3/expression_feature.csv has not been downloaded before.
Downloading now to /home/travis/flotilla_projects/brainspan_filtered_and_markers_amazon/expression_feature.csv
https://s3-us-west-2.amazonaws.com/flotilla/brainspan_batch_corrected_for_amazon_s3/expression.csv has not been downloaded before.
Downloading now to /home/travis/flotilla_projects/brainspan_filtered_and_markers_amazon/expression.csv
https://s3-us-west-2.amazonaws.com/flotilla/brainspan_batch_corrected_for_amazon_s3/metadata.csv has not been downloaded before.
Downloading now to /home/travis/flotilla_projects/brainspan_filtered_and_markers_amazon/metadata.csv
2015-06-09 22:43:12 Initializing Study
2015-06-09 22:43:12 Initializing Predictor configuration manager for Study
2015-06-09 22:43:12 Predictor ExtraTreesClassifier is of type
2015-06-09 22:43:12 Added ExtraTreesClassifier to default predictors
2015-06-09 22:43:12 Predictor ExtraTreesRegressor is of type
2015-06-09 22:43:12 Added ExtraTreesRegressor to default predictors
2015-06-09 22:43:12 Predictor GradientBoostingClassifier is of type
2015-06-09 22:43:12 Added GradientBoostingClassifier to default predictors
2015-06-09 22:43:12 Predictor GradientBoostingRegressor is of type
2015-06-09 22:43:12 Added GradientBoostingRegressor to default predictors
2015-06-09 22:43:12 Loading metadata
2015-06-09 22:43:12 Loading expression data
2015-06-09 22:43:12 Initializing expression
2015-06-09 22:43:13 Done initializing expression
2015-06-09 22:43:16 Successfully initialized a Study object!
Let's take a look at how big this expression matrix is.
.. code:: python
study.expression.data.shape
.. parsed-literal::
(493, 14321)
Yikes, 14,321 features is a lot! Let's subset on just the most variant
genes. By default, this is the genes whose variance is two standard
deviations away from the mean variance of all genes.
.. code:: python
variant_ids = study.expression.feature_subsets['variant']
variant_expression = study.expression.data.ix[:, variant_ids]
variant_expression.shape
.. parsed-literal::
(493, 553)
564 features isn't so bad. Let's correlate all features to each other in
this subset.
.. code:: python
%%time
variant_expression_corr = variant_expression.corr()
variant_expression_corr.head()
.. parsed-literal::
CPU times: user 393 ms, sys: 0 ns, total: 393 ms
Wall time: 392 ms
That didn't take *too* long, but I'm sure you can imagine it would take
a really long time for ALL genes!
Now let's assign this to the ``study.supplemental`` object with a name
of our choice. To keep things simple, I'm gonna give it the same name.
.. code:: python
study.supplemental.variant_expression_corr = variant_expression_corr
Now let's save the object and re-``embark`` to make sure it's there.
.. code:: python
study.save('brainspan2')
study2 = flotilla.embark('brainspan2')
.. parsed-literal::
Wrote datapackage to /home/travis/flotilla_projects/brainspan2/datapackage.json
2015-06-09 22:44:04 Reading datapackage from /home/travis/flotilla_projects/brainspan2/datapackage.json
2015-06-09 22:44:04 Parsing datapackage to create a Study object
2015-06-09 22:44:10 Initializing Study
2015-06-09 22:44:10 Initializing Predictor configuration manager for Study
2015-06-09 22:44:10 Predictor ExtraTreesClassifier is of type
2015-06-09 22:44:10 Added ExtraTreesClassifier to default predictors
2015-06-09 22:44:10 Predictor ExtraTreesRegressor is of type
2015-06-09 22:44:10 Added ExtraTreesRegressor to default predictors
2015-06-09 22:44:10 Predictor GradientBoostingClassifier is of type
2015-06-09 22:44:10 Added GradientBoostingClassifier to default predictors
2015-06-09 22:44:10 Predictor GradientBoostingRegressor is of type
2015-06-09 22:44:10 Added GradientBoostingRegressor to default predictors
2015-06-09 22:44:10 Loading metadata
2015-06-09 22:44:11 Loading expression data
2015-06-09 22:44:11 Initializing expression
2015-06-09 22:44:13 Done initializing expression
2015-06-09 22:44:16 Successfully initialized a Study object!
Let's make sure our ``variant_expression_corr`` dataframe is really
there.
.. code:: python
study2.supplemental.variant_expression_corr.head()
.. raw:: html
|
ENSG00000003137 |
ENSG00000004848 |
ENSG00000006016 |
ENSG00000006116 |
ENSG00000006128 |
ENSG00000006377 |
ENSG00000007350 |
ENSG00000016082 |
ENSG00000041353 |
ENSG00000041982 |
... |
ENSG00000258283 |
ENSG00000258403 |
ENSG00000258444 |
ENSG00000258518 |
ENSG00000258752 |
ENSG00000259190 |
ENSG00000259279 |
ENSG00000259373 |
ENSG00000259410 |
ENSG00000259603 |
ENSG00000003137 |
1.000000 |
-0.046835 |
0.090661 |
0.053573 |
-0.047665 |
-0.155271 |
0.054222 |
-0.160111 |
-0.115487 |
-0.044074 |
... |
-0.224868 |
-0.019139 |
-0.121898 |
-0.392903 |
-0.122529 |
0.273103 |
0.136641 |
0.448154 |
0.445432 |
0.041884 |
ENSG00000004848 |
-0.046835 |
1.000000 |
0.612271 |
0.707699 |
0.558755 |
0.671949 |
0.028770 |
0.148974 |
0.448605 |
0.001084 |
... |
0.701812 |
0.328554 |
0.040970 |
-0.478688 |
-0.082581 |
-0.196485 |
0.135765 |
-0.662072 |
-0.570054 |
-0.623628 |
ENSG00000006016 |
0.090661 |
0.612271 |
1.000000 |
0.652296 |
0.585003 |
0.450687 |
0.037159 |
0.111290 |
0.414487 |
0.022681 |
... |
0.650395 |
0.300988 |
0.233828 |
-0.391661 |
0.055337 |
-0.289575 |
0.060468 |
-0.418865 |
-0.398856 |
-0.348691 |
ENSG00000006116 |
0.053573 |
0.707699 |
0.652296 |
1.000000 |
0.516889 |
0.424020 |
-0.185337 |
-0.071044 |
0.469453 |
-0.299232 |
... |
0.687966 |
0.526633 |
-0.027971 |
-0.569975 |
-0.209932 |
-0.020450 |
0.220832 |
-0.560163 |
-0.465723 |
-0.655024 |
ENSG00000006128 |
-0.047665 |
0.558755 |
0.585003 |
0.516889 |
1.000000 |
0.715297 |
0.000925 |
0.415798 |
0.564539 |
0.053909 |
... |
0.716306 |
0.429350 |
0.264993 |
-0.256103 |
0.155573 |
-0.431776 |
-0.079680 |
-0.378022 |
-0.389113 |
-0.255839 |
5 rows × 553 columns
Yay, it's here!