What’s new in the package¶

A catalog of new features, improvements, and bug-fixes in each release.

v0.2.4 (November 23rd, 2014)¶

This is a patch release, with non-breaking changes from v0.2.3.

New clustered heatmap and data_model.Study.plot_clustermap() and data_model.Study.plot_correlations()

data_model.Study.save() now saves relative instead of absolute paths, which makes for more portable datapackages
Underlying code for visualize.DecompositionViz and visualize.ClassifierViz now plots via plot()

This is a patch release, with non-breaking changes from v0.2.2.

Restore Study.detect_outliers(), Study.interactive_choose_outliers() and Study.interactive_reset_outliers()

embark() wouldn’t work if metadata didn’t have a pooled column, now it does
BaseData.drop_outliers() would actually drop samples from the data, but we never want to remove data, only mark it as something to be removed so all the original data is there
For all compute submodules, add a check to make sure the input data is truly a probability distribution (non-negative, sums to 1)
BaseData.plot_feature() now plots all features with the same name (e.g. all splicing events within that gene) onto a single fig object

Rename modalities that couldn’t be assigned when bootstrapped=True in compute.splicing.Modalities, from “unassigned” to “ambiguous”

This is a patch release, with non-breaking changes from v0.2.0.

Update documentation

Fixed issue with pip install reported by @roryk

This is a patch release, with non-breaking changes from v0.2.0.

This is a minor release, with some breaking changes from v0.1.1.

Plot the expression or splicing of two samples with Study.plot_two_samples()
Plot the expression or splicing of two features with Study.plot_two_features()
Detect outliers with Study.interactive_choose_outliers() which performs a OneClassSVM on the PCA-reduced space of data (either expression or splicing), using the first three components
Study doesn’t filter out the pooled or outlier samples from the data, only technical outliers with fewer reads than specified in the argument mapping_stats_min_reads.
To filter expression or splicing data on the number of samples that must detect each feature, you can specify expression_thresh, and metadata_min_samples in the Study constructor.
- For example, if expression_thresh=1 and metadata_min_samples=3, then we will only take genes which have expression values greater than 1 in at least 3 samples. Additionally, we will also take splicing events which were detected in at least three cells, since metadata_min_samples applies to all data types.

The attribute data in BaseData (i.e. BaseData.data) now contains all the data, including pooled, singles, and outliers
The attribute data_original in BaseData (i.e. BaseData.data_original) contains the original, unfiltered data. For example, before removing features detected in fewer than 3 samples with expression > 1.
BaseData now has the attributes BaseData.singles, BaseData.pooled, and BaseData.outliers which are on-the-fly subsets of BaseData.data. This is to maintain data provenance, meaning if “outliers” is changed, this is also changed.
In Study, you now must specify expression_feature_rename_col, splicing_feature_rename_col, mapping_stats_number_mapped_col explicitly, they are no longer defaulting to, {splicing,expression}_feature_rename_col="gene_name" and mapping_stats_number_mapped_col="Uniquely mapped reads number"

Status messages in embark() have been moved to stdout instead of stderr to avoid confusion that something is going wrong
In embark(), user gets notified which samples are removed for having too few reads (default minimum number of reads is \(5\times 10^5\), or half a million reads).