What’s new in the package

A catalog of new features, improvements, and bug-fixes in each release.

v0.2.1 (November 6th, 2014)

This is a patch release, with non-breaking changes from v0.2.0.

Documentation updates

v0.2.0 (November 5th, 2014)

This is a minor release, with some breaking changes from v0.1.1.

New features

  • Plot the expression or splicing of two samples with Study.plot_two_samples()
  • Plot the expression or splicing of two features with Study.plot_two_features()
  • Detect outliers with Study.interactive_choose_outliers() which performs a OneClassSVM on the PCA-reduced space of data (either expression or splicing), using the first three components
  • Study doesn’t filter out the pooled or outlier samples from the data, only technical outliers with fewer reads than specified in the argument mapping_stats_min_reads.
  • To filter expression or splicing data on the number of samples that must detect each feature, you can specify expression_thresh, and metadata_min_samples in the Study constructor.
    • For example, if expression_thresh=1 and metadata_min_samples=3, then we will only take genes which have expression values greater than 1 in at least 3 samples. Additionally, we will also take splicing events which were detected in at least three cells, since metadata_min_samples applies to all data types.

API changes

  • The attribute data in BaseData (i.e. BaseData.data) now contains all the data, including pooled, singles, and outliers
  • The attribute data_original in BaseData (i.e. BaseData.data_original) contains the original, unfiltered data. For example, before removing features detected in fewer than 3 samples with expression > 1.
  • BaseData now has the attributes BaseData.singles, BaseData.pooled, and BaseData.outliers which are on-the-fly subsets of BaseData.data. This is to maintain data provenance, meaning if “outliers” is changed, this is also changed.
  • In Study, you now must specify expression_feature_rename_col, splicing_feature_rename_col, mapping_stats_number_mapped_col explicitly, they are no longer defaulting to, {splicing,expression}_feature_rename_col="gene_name" and mapping_stats_number_mapped_col="Uniquely mapped reads number"

Other Changes

  • Status messages in embark() have been moved to stdout instead of stderr to avoid confusion that something is going wrong
  • In embark(), user gets notified which samples are removed for having too few reads (default minimum number of reads is \(5\times 10^5\), or half a million reads).
Olga B. Botvinnik is funded by the NDSEG fellowship and is a NumFOCUS John Hunter Technology Fellow.
Michael T. Lovci was partially funded by a fellowship from Genentech.
Partially funded by NIH grants NS075449 and HG004659 and CIRM grants RB4-06045 and TR3-05676 to Gene Yeo.