fastTSNE
A visualization of 160,796 single cell transcriptomes from the mouse nervous system [Zeisel 2018] computed in under 2 minutes using FFT accelerated interpolation and approximate nearest neighbors. See basic usage notebook for more details.
The goal of this project is to have fast implementations of tSNE in one place, with a flexible API and without any external dependencies. This makes it very easy to experiment with various aspects of tSNE and makes the package very easy to distribute.
This package provides two fast implementations of tSNE:
 Barneshut tSNE [2] is appropriate for small data sets and has asymptotic complexity O(n log n).
 FFT Accelerated tSNE [3] is appropriate for larger data sets (>10,000 samples). It has asymptotic complexity O(n).
BarnesHut tends to be slightly faster on smaller data sets (typically by a minute or two) while FItSNE should always be used for larger data sets (>10,000 samples). In most cases, using the FItSNE implementation is a safe default.
To better understand the speed tradeoffs, it is useful to know how tSNE works. tSNE runs in two main phases. In the first phase we find the K nearest neighbors for each sample. We offer exact nearest neighbor search using scikitlearn's nearest neighbors KDTrees and approximate nearest neighbor search using a Python/Numba implementation of nearest neighbor descent. Exact search tends to be faster for smaller data sets and approximate search is faster for larger data sets. The second phase runs the optimization phase (which can, again, be run in several phases). In every iteration we must evaluate the negative gradient, which involves computing all pairwise interactions. This can be accelerated using BarnesHut space partitioning trees (scaling with O(n log n)) or FFT accelerated interpolation (scaling with O(n)) for larger data sets. For more details, see the corresponding papers.
Benchmarks
The numbers are not exact. The benchmarks were run on an Intel i77700HQ CPU @ 2.80GHz (up to 3.80GHz) processor.
FFT benchmarks are run using approximate nearest neigbhor search. Exact search is used for BarnesHut.
The typical benchmark to use is the MNIST data set containing 70,000 28x28 images (784 pixels).
MNIST  Exact NN  Approximate NN  BH gradient  FFT gradient 

4 cores  2086s  22s  243s  67s 
Installation
fastTSNE can be installed using conda
from condaforge with
conda install channel condaforge fasttsne
fastTSNE can also be installed using pip. The only prerequisite is numpy
.
Once numpy is installed, simply run
pip install fasttsne
and you're good to go.
FFTW
By default, fastTSNE uses numpy's implementation of the Fast Fourier Transform because of it's wide availability. If you would like to squeeze out maximum performance, you can install the highly optimized FFTW C library, available through conda. fastTSNE will automatically detect FFTW and will use that. The speed ups here are generally not large, but can save seconds to minutes when running tSNE on larger data sets.
Usage
We provide two modes of usage. One is somewhat familliar to scikitlearn's TSNE.fit
.
We also provide an advanced interface for finer control of the optimization, allowing us to interactively tune the embedding and make use of various tricks to improve the embedding quality.
Basic usage
We provide a basic interface somewhat similar to the one provided by scikitlearn.
from fastTSNE import TSNE
from sklearn import datasets
iris = datasets.load_iris()
x, y = iris['data'], iris['target']
tsne = TSNE(
n_components=2, perplexity=30, learning_rate=200,
n_jobs=4, angle=0.5, initialization='pca', metric='euclidean',
early_exaggeration_iter=250, early_exaggeration=12, n_iter=750,
neighbors='exact', negative_gradient_method='bh',
)
embedding = tsne.fit(x)
There are two parameters which you will want to watch out for:
neighbors
controls nearest neighbor search. If our data set is small,exact
is the better choice.exact
uses scikitlearn's KD trees. For larger data, approximate search can be orders of magnitude faster. This is selected withapprox
. Nearest neighbor search is performed only once at the beginning of the optmization, but can dominate runtime on large data sets, therefore this must be properly chosen.negative_gradient_method
controls which approximation technique to use to approximate pairwise interactions. These are computed at each step of the optimization. Van Der Maaten [2] proposed using the BarnesHut tree approximation and this has be the defacto standard in most tSNE implementations. This can be selected by passingbh
. Asymptotically, this scales as O(n log n) in the number of points works well for up to 10,000 samples. More recently, Linderman et al. [3] developed another approximation using interpolation which scales linearly in the number of points O(n). This can be selected by passingfft
. There is a bit of overhead to this method, making it slightly slower than BarnesHut for small numbers of points, but is very fast for larger data sets, while BarnesHut becomes completely unusable. For smaller data sets the difference is typically in the order of seconds, at most minutes, so a safe default is using the FFT approximation.
Our tsne
object acts as a fitter instance, and returns a TSNEEmbedding
instance. This acts as a regular numpy array, and can be used as such, but can be further optimized if we see fit or can be used for adding new points to the embedding.
We don't log any progress by default, but provide callbacks that can be run at any interval of the optimization process. A simple logger is provided as an example.
from fastTSNE.callbacks import ErrorLogger
tsne = TSNE(callbacks=ErrorLogger(), callbacks_every_iters=50)
A callback can be any callable object that accepts the following arguments.
def callback(iteration, error, embedding):
...
Callbacks are used to control the optimization i.e. every callback must return a boolean value indicating whether or not to stop the optimization. If we want to stop the optimization via callback we simply return True
.
Additionally, a list of callbacks can also be passed, in which case all the callbacks must agree to continue the optimization, otherwise the process is terminated and the current embedding is returned.
Advanced usage
Recently, Kobak and Berens [4] demonstrate several tricks we can use to obtain better tSNE embeddings. The main critique of tSNE is that global structure is mainly thrown away. This is typically the main selling point for UMAP over tSNE. In the preprint, several techniques are presented that enable tSNE to capture more global structure. All of these tricks can easily be implemented using fastTSNE and are shown in the notebook examples.
To introduce the API, we will implement the standard tSNE algorithm, the one implemented by TSNE.fit
.
from fastTSNE import initialization, affinity
from fastTSNE.tsne import TSNEEmbedding
init = initialization.pca(x)
affinities = affinity.PerplexityBasedNN(x, perplexity=30, method='approx', n_jobs=8)
embedding = TSNEEmbedding(
init, affinities, negative_gradient_method='fft',
learning_rate=200, n_jobs=8, callbacks=ErrorLogger(),
)
embedding.optimize(n_iter=250, exaggeration=12, momentum=0.5, inplace=True)
embedding.optimize(n_iter=750, momentum=0.8, inplace=True)
References

Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using tSNE." Journal of machine learning research 9.Nov (2008): 25792605.

Van Der Maaten, Laurens. "Accelerating tSNE using treebased algorithms." The Journal of Machine Learning Research 15.1 (2014): 32213245.

Linderman, George C., et al. "Efficient Algorithms for tdistributed Stochastic Neighborhood Embedding." arXiv preprint arXiv:1712.09005 (2017).

Kobak, Dmitry, and Philipp Berens. "The art of using tSNE for singlecell transcriptomics." bioRxiv (2018): 453449.