What’s new in the package
A catalog of new features, improvements, and bug-fixes in each release.
v0.2.8 (........)
Bug fixes
- Study.tidy_splicing_with_expression now deals with when splicing events
map to multiple gene names. Fixes #304 with #309.
Miscellaneous
- Rasterize lavalamp plot for visualizing many splicing events at once,
otherwise the image is too big. PR #308
- Change modality estimation to a two-step process: Estimate \($\Psi~0\) and \($\Psi~1\)
first, which change 1 parameter of the Beta distribution at a time,
then bimodal and middle, which change both parameters of the Beta
distribution at once.
v0.2.7 (April 20th, 2015)
This release doesn’t have any changes in code, and is just to update the
documentation.
Miscellaneous
- Fixed tutorial building on the documentation page
v0.2.6 (April 10th, 2015)
This is a patch release, with non-breaking changes from 0.2.5.
New features
- Add a data_model.SupplementalData data type, which allows the
user to store any pandas.DataFrame on the data_model.Study
object as study.supplemental. This is essentially user-driven caching.
Plotting functions
- Changed default loadings plot of PCA to a heatmap of the first 5 PCs
Miscellaneous
Streamlined test suite to test fewer things at a time, which shortened the
test suite from ~20 minutes to ~3 minutes, about 85% time savings.
Improved accuracy (fewer false positives) in splicing modality estimation
Added requirement for new non-plotting features to at least be documented as
IPython notebooks, so the knowledge is shared.
Changed PCA plot to place legend in “best” place
Changed default plotting marker from a circle to a randomly chosen symbol
from a list
- Violinplots are now variable width and expand with the number of samples
-
data_model.Study.plot_event() and
data_model.Study.plot_pca() when plot_violins=True
Add info about data type when reporting that a feature was not found
Fix lack of tutorial on how to create a datapackage
v0.2.5 (March 3rd, 2015)
This is a patch release, with non-breaking changes from v0.2.4. This includes
many changes and bugfixes. Upgrading to this version is highly recommended.
New features
- Added data structure and functions for calculating gene ontology enrichment
in .data_model.Study.go_enrichment, using the data structure
.gene_ontology.GeneOntologyData
Plotting functions
New function
data_model.Study.plot_expression_vs_inconsistent_splicing()
shows the percent of splicing events in single cells that are inconsistent
with the pooled samples. Has the option to choose an expression cutoff.
Add options to data_model.Study.plot_pca() and
data_model.Study.interactive_pca()
- Keyword argument color_samples_by will take a column name from the
metadata DataFrame, to color samples by different columns in the
metadata.
- Keyword argument scale_by_variance is a boolean which when True
(default) will scale the \(x\) and \(y\) axes by the explained
variance of their individual principal components (PC1 for \(x\) and
PC2 for \(y\)). If False, then the axes are the same scale, by the
variance in PC1. Often this will “squish” down the samples in the \(y\)
-axis.
API changes
- data_model.Study.plot_classifier() returns a
visualize.predict.ClassifierViz() object
- Multi-index columns for data matrices are no longer supported
- Modalities are now calculated using Bayesian methods
- data_model.Splicing._subset_and_standardize() now doesn’t fill
NA``s with the mean Percent spliced-in/Psi/:math:`\Psi` score for the
event, but rather replaces ``NA with the value 0.5. Then, all values for
that event are transformed with \(\arccos\)/\(\cos^{-1}\)/arc cosine
so that all values range from \(-\pi\) to \(+\pi\) and are centered
around \(0\).
Bug fixes
- Fixed issue with
data_model.Study.tidy_splicing_with_expression() and
data_model.Study.filter_splicing_on_expression() which
had an issue with when the index names are not “miso_id” or
“sample_id”.
- Don’t cache data_model.BaseData.feature_renamer_series(), so you
can change the column used to rename features
Miscellaneous
- Add link to PyData NYC talk
- Add scrambled dataset with ~300 samples and both expression and splicing
- Fix build status badge in README
- Removed auto-call to %matplotlib inline call within
flotilla.visualize because it messes up the make lint call
and it’s dishonest to the user to be messing with their IPython under the
hood. It’s possible they don’t want the plotting to be inline, but rather
in a separate X-window as specified by their $DISPLAY environment
variable.
- Reformatted all code to pass make lint and make pep8, and these
standards will be enforced for all future enhancements.
- Add Gitter chat room badge to README
v0.2.4 (November 23rd, 2014)
This is a patch release, with non-breaking changes from v0.2.3.
Plotting functions
- New clustered heatmap and data_model.Study.plot_clustermap() and
data_model.Study.plot_correlations()
API changes
- data_model.Study.save() now saves relative instead of absolute
paths, which makes for more portable datapackages
- Underlying code for visualize.DecompositionViz and
visualize.ClassifierViz now plots via plot()
v0.2.3 (November 17th, 2014)
This is a patch release, with non-breaking changes from v0.2.2.
Compute functions
- Restore Study.detect_outliers(),
Study.interactive_choose_outliers() and
Study.interactive_reset_outliers()
Plotting functions
- Add Study-level NMF space transitions/positions
Bug Fixes
- embark() wouldn’t work if metadata didn’t have a pooled column,
now it does
- BaseData.drop_outliers() would actually drop samples from the data,
but we never want to remove data, only mark it as something to be removed so
all the original data is there
- For all compute submodules, add a check to make sure the input
data is truly a probability distribution (non-negative, sums to 1)
- BaseData.plot_feature() now plots all features with the same name
(e.g. all splicing events within that gene) onto a single fig object
Other
- Rename modalities that couldn’t be assigned when bootstrapped=True in
compute.splicing.Modalities, from “unassigned” to “ambiguous”
v0.2.2 (November 7th, 2014)
This is a patch release, with non-breaking changes from v0.2.0.
v0.2.1 (November 6th, 2014)
This is a patch release, with non-breaking changes from v0.2.0.
v0.2.0 (November 5th, 2014)
This is a minor release, with some breaking changes from v0.1.1.
New features
- Plot the expression or splicing of two samples with
Study.plot_two_samples()
- Plot the expression or splicing of two features with
Study.plot_two_features()
- Detect outliers with Study.interactive_choose_outliers() which
performs a OneClassSVM on the PCA-reduced space of data (either
expression or splicing), using the first three components
- Study doesn’t filter out the pooled or outlier samples from the
data, only technical outliers with fewer reads than specified in the
argument mapping_stats_min_reads.
- To filter expression or splicing data on the number of samples that must
detect each feature, you can specify expression_thresh, and
metadata_min_samples in the Study constructor.
- For example, if expression_thresh=1 and metadata_min_samples=3,
then we will only take genes which have expression values greater than
1 in at least 3 samples. Additionally, we will also take splicing events
which were detected in at least three cells, since
metadata_min_samples applies to all data types.
API changes
- The attribute data in BaseData (i.e.
BaseData.data) now contains all the data, including pooled,
singles, and outliers
- The attribute data_original in BaseData (i.e.
BaseData.data_original) contains the original, unfiltered
data. For example, before removing features detected in fewer than 3 samples
with expression > 1.
- BaseData now has the attributes
BaseData.singles, BaseData.pooled, and
BaseData.outliers which are on-the-fly subsets of
BaseData.data. This is to maintain data provenance, meaning if
“outliers” is changed, this is also changed.
- In Study, you now must specify expression_feature_rename_col,
splicing_feature_rename_col, mapping_stats_number_mapped_col
explicitly, they are no longer defaulting to,
{splicing,expression}_feature_rename_col="gene_name" and
mapping_stats_number_mapped_col="Uniquely mapped reads number"
Other Changes
- Status messages in embark() have been moved to stdout instead
of stderr to avoid confusion that something is going wrong
- In embark(), user gets notified which samples are removed for having
too few reads (default minimum number of reads is \(5\times 10^5\), or
half a million reads).