What’s new in the package

A catalog of new features, improvements, and bug-fixes in each release.

v0.2.8 (........)

Bug fixes

  • Study.tidy_splicing_with_expression now deals with when splicing events map to multiple gene names. Fixes #304 with #309.

Miscellaneous

  • Rasterize lavalamp plot for visualizing many splicing events at once, otherwise the image is too big. PR #308
  • Change modality estimation to a two-step process: Estimate \($\Psi~0\) and \($\Psi~1\) first, which change 1 parameter of the Beta distribution at a time, then bimodal and middle, which change both parameters of the Beta distribution at once.

v0.2.7 (April 20th, 2015)

This release doesn’t have any changes in code, and is just to update the documentation.

Miscellaneous

  • Fixed tutorial building on the documentation page

v0.2.6 (April 10th, 2015)

This is a patch release, with non-breaking changes from 0.2.5.

New features

  • Add a data_model.SupplementalData data type, which allows the user to store any pandas.DataFrame on the data_model.Study object as study.supplemental. This is essentially user-driven caching.

Plotting functions

  • Changed default loadings plot of PCA to a heatmap of the first 5 PCs

Bug fixes

  • Fixed data_model.Study.save() to actually save:
    • Gene Ontology Data
    • Minimum number of mapped reads per sample
    • Minimum number of samples to use per feature, at the specified threshold (e.g. use features with TPM > 1 in at least 20 cells)
  • Fixed data_model.base.subsets_from_metadata() to use boolean columns properly, which allows for boolean columns in data_model.MetaData and data_model.BaseData.feature_data

Miscellaneous

  • Streamlined test suite to test fewer things at a time, which shortened the test suite from ~20 minutes to ~3 minutes, about 85% time savings.

  • Improved accuracy (fewer false positives) in splicing modality estimation

  • Added requirement for new non-plotting features to at least be documented as IPython notebooks, so the knowledge is shared.

  • Changed PCA plot to place legend in “best” place

  • Changed default plotting marker from a circle to a randomly chosen symbol from a list

  • Violinplots are now variable width and expand with the number of samples

    data_model.Study.plot_event() and data_model.Study.plot_pca() when plot_violins=True

  • Add info about data type when reporting that a feature was not found

  • Fix lack of tutorial on how to create a datapackage

v0.2.5 (March 3rd, 2015)

This is a patch release, with non-breaking changes from v0.2.4. This includes many changes and bugfixes. Upgrading to this version is highly recommended.

New features

  • Added data structure and functions for calculating gene ontology enrichment in .data_model.Study.go_enrichment, using the data structure .gene_ontology.GeneOntologyData

Plotting functions

  • New function data_model.Study.plot_expression_vs_inconsistent_splicing() shows the percent of splicing events in single cells that are inconsistent with the pooled samples. Has the option to choose an expression cutoff.

  • Add options to data_model.Study.plot_pca() and data_model.Study.interactive_pca() - Keyword argument color_samples_by will take a column name from the

    metadata DataFrame, to color samples by different columns in the metadata.

    • Keyword argument scale_by_variance is a boolean which when True (default) will scale the \(x\) and \(y\) axes by the explained variance of their individual principal components (PC1 for \(x\) and PC2 for \(y\)). If False, then the axes are the same scale, by the variance in PC1. Often this will “squish” down the samples in the \(y\) -axis.

API changes

  • data_model.Study.plot_classifier() returns a visualize.predict.ClassifierViz() object
  • Multi-index columns for data matrices are no longer supported
  • Modalities are now calculated using Bayesian methods
  • data_model.Splicing._subset_and_standardize() now doesn’t fill NA``s with the mean Percent spliced-in/Psi/:math:`\Psi` score for the event, but rather replaces ``NA with the value 0.5. Then, all values for that event are transformed with \(\arccos\)/\(\cos^{-1}\)/arc cosine so that all values range from \(-\pi\) to \(+\pi\) and are centered around \(0\).

Bug fixes

  • Fixed issue with data_model.Study.tidy_splicing_with_expression() and data_model.Study.filter_splicing_on_expression() which had an issue with when the index names are not “miso_id” or “sample_id”.
  • Don’t cache data_model.BaseData.feature_renamer_series(), so you can change the column used to rename features

Miscellaneous

  • Add link to PyData NYC talk
  • Add scrambled dataset with ~300 samples and both expression and splicing
  • Fix build status badge in README
  • Removed auto-call to %matplotlib inline call within flotilla.visualize because it messes up the make lint call and it’s dishonest to the user to be messing with their IPython under the hood. It’s possible they don’t want the plotting to be inline, but rather in a separate X-window as specified by their $DISPLAY environment variable.
  • Reformatted all code to pass make lint and make pep8, and these standards will be enforced for all future enhancements.
  • Add Gitter chat room badge to README

v0.2.4 (November 23rd, 2014)

This is a patch release, with non-breaking changes from v0.2.3.

Plotting functions

  • New clustered heatmap and data_model.Study.plot_clustermap() and data_model.Study.plot_correlations()

API changes

  • data_model.Study.save() now saves relative instead of absolute paths, which makes for more portable datapackages
  • Underlying code for visualize.DecompositionViz and visualize.ClassifierViz now plots via plot()

v0.2.3 (November 17th, 2014)

This is a patch release, with non-breaking changes from v0.2.2.

Compute functions

  • Restore Study.detect_outliers(), Study.interactive_choose_outliers() and Study.interactive_reset_outliers()

Plotting functions

  • Add Study-level NMF space transitions/positions

Bug Fixes

  • embark() wouldn’t work if metadata didn’t have a pooled column, now it does
  • BaseData.drop_outliers() would actually drop samples from the data, but we never want to remove data, only mark it as something to be removed so all the original data is there
  • For all compute submodules, add a check to make sure the input data is truly a probability distribution (non-negative, sums to 1)
  • BaseData.plot_feature() now plots all features with the same name (e.g. all splicing events within that gene) onto a single fig object

Documentation

Other

  • Rename modalities that couldn’t be assigned when bootstrapped=True in compute.splicing.Modalities, from “unassigned” to “ambiguous”

v0.2.2 (November 7th, 2014)

This is a patch release, with non-breaking changes from v0.2.0.

Documentation updates

v0.2.1 (November 6th, 2014)

This is a patch release, with non-breaking changes from v0.2.0.

Documentation updates

v0.2.0 (November 5th, 2014)

This is a minor release, with some breaking changes from v0.1.1.

New features

  • Plot the expression or splicing of two samples with Study.plot_two_samples()
  • Plot the expression or splicing of two features with Study.plot_two_features()
  • Detect outliers with Study.interactive_choose_outliers() which performs a OneClassSVM on the PCA-reduced space of data (either expression or splicing), using the first three components
  • Study doesn’t filter out the pooled or outlier samples from the data, only technical outliers with fewer reads than specified in the argument mapping_stats_min_reads.
  • To filter expression or splicing data on the number of samples that must detect each feature, you can specify expression_thresh, and metadata_min_samples in the Study constructor.
    • For example, if expression_thresh=1 and metadata_min_samples=3, then we will only take genes which have expression values greater than 1 in at least 3 samples. Additionally, we will also take splicing events which were detected in at least three cells, since metadata_min_samples applies to all data types.

API changes

  • The attribute data in BaseData (i.e. BaseData.data) now contains all the data, including pooled, singles, and outliers
  • The attribute data_original in BaseData (i.e. BaseData.data_original) contains the original, unfiltered data. For example, before removing features detected in fewer than 3 samples with expression > 1.
  • BaseData now has the attributes BaseData.singles, BaseData.pooled, and BaseData.outliers which are on-the-fly subsets of BaseData.data. This is to maintain data provenance, meaning if “outliers” is changed, this is also changed.
  • In Study, you now must specify expression_feature_rename_col, splicing_feature_rename_col, mapping_stats_number_mapped_col explicitly, they are no longer defaulting to, {splicing,expression}_feature_rename_col="gene_name" and mapping_stats_number_mapped_col="Uniquely mapped reads number"

Other Changes

  • Status messages in embark() have been moved to stdout instead of stderr to avoid confusion that something is going wrong
  • In embark(), user gets notified which samples are removed for having too few reads (default minimum number of reads is \(5\times 10^5\), or half a million reads).
Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.