t-SNE

Note: The Toolbox supports parallel computing to accelerate computation. Large files might take a long time to compute. A typical time for a 50Mb file (Pavia, 340 x 610 x 103) takes about 10 minutes on an average desktop computer.

Features:

  • Builds a 2D or 3D scatter map from the entire datacube.

  • Enables a user to separate clusters via the k-means and visualize their position on an image.

  • Enables a user to visualize up to 10 clusters.

  • Calculates the area covered by each cluster (in pixels).


Steps:

1.      Open a file.

2.      Select Toolboxes → t-SNE 

Calculations for t-SNE are computationally demanding. For that, the datacube size is minimized by using pre-binning step in both spectral (bin=3) and spatial bin (3x3) dimensions. The preprocessing calculation starts immediately.

3.      Select a method: available options are:

a.      Barnes-Hut variation of t-SNE is used to speed the t-SNE algorithm and to cut down on its memory usage. The Barnes-Hut algorithm groups nearby points together to lower the complexity and memory usage of the t-SNE optimization step. The Barnes-Hut algorithm is an approximate optimizer, not an exact optimizer. There is a nonnegative tuning parameter Theta that effects a tradeoff between speed and accuracy. Larger values of Theta give faster but less accurate optimization results. The algorithm is relatively insensitive to Theta values in the range (0.2 - 0.8).

b.      Exact method.

4.      Select a type of embedding to specify the type of scatter plot that you will receive. Available options are:

a.      2D (default)

b.      3D

5.      Under Distance Metric from the top, chose the parameters for the calculations (see Additional Information for further explanation):

a.      Distance metric

b.      PCA components

c.      Exaggeration parameter

d.      Perplexity parameter

e.      Standardize

f.       Under Optimization Control

6.      Click the green Run t-SNE button. The t-SNE scatter plot will be generated on the right upper panel.

7.      Select the Number of clusters and the Distance Metric. The results of clustering will be seen in the CLUSTER OUTPUT panel on the right lower panel.


Click the green Select Class button to compare which cluster is relevant to various areas in the datacube.

A pop-up will appear asking for which class/cluster the user would like to view.


The area covered by the selected cluster (in a bright color with the area measured in pixels) will appear at the bottom of the image.


 

Additional Information:

There are two t-SNE algorithms, specified as Exact or Barnes-Hut.

The Exact algorithm optimizes the Kullback-Leibler divergence of distributions between the original space and the embedded space.

Barnes-Hut algorithm performs an approximate optimization faster and uses less memory when the number of data rows is large.

For both algorithms, t-SNE uses squared pairwise distances to calculate the Gaussian kernel in the joint distribution of X.

Exaggeration: Size of natural clusters in data, scalar value 1 or greater, 4 (default).  Size of natural clusters in data, specified as a scalar value 1 or greater. A large exaggeration makes t-SNE learn larger joint probabilities of Y and creates relatively more space between clusters in Y. t-SNE uses exaggeration in the first 99 optimization iterations. If the value of Kullback-Leibler divergence increases in the early stage of the optimization, try reducing the exaggeration.

Number PCA Components: PCA dimension reduction nonnegative integer, 0 (default). PCA dimension reduction, specified as a nonnegative integer. Before t-SNE embeds the high-dimensional data, it first reduces the dimensionality of the data to Number PCA Components using the PCA function. When Number PCA Components is 0, t-SNE does not use PCA.

Perplexity: Effective number of local neighbors of each point positive scalar, 30 (default). Effective number of local neighbors of each point, specified as a positive scalar. Larger Perplexity causes t-SNE to use more points as nearest neighbors. Use a larger value of Perplexity for a large dataset. Typical Perplexity values are from 5 to 50. In the Barnes-Hut algorithm, t-SNE uses min(Perplexity,N-1) as the number of nearest neighbors.

The 'Perplexity' value cannot be greater than the number of rows of X.

Standardize: Normalize input data, false (default) / true. Normalize input data, specified as false or true. When true, t-SNE centers and scales X by dividing the columns by their standard deviations. When features in X are on different scales, set 'Standardize' to true. Do this because the learning process is based on nearest neighbors, so features with large scales can override the contribution of features with small scales.

Learn Rate: Learning rate for optimization process, 500 (default), positive scalar. Typically, set values from 100 through 1000. When Learn Rate is too small, t-SNE can converge to a poor local minimum. When Learn Rate is too large, the optimization can initially have the Kullback-Leibler divergence increase rather than decrease.

ThetaBarnes-Hut tradeoff parameter, 0.5 (default), scalar from 0 through 1. Barnes-Hut tradeoff parameter, specified as a scalar from 0 through 1. Higher values give a faster but less accurate optimization. Applies only when Algorithm is Barnes-Hut.

 

k-Medoids Clustering: k-Medoids clustering is a partitioning method commonly used in domains that require robustness to outlier data, arbitrary distance metrics, or ones for which the mean or median does not have a clear definition. It is similar to k-Means, and the goal of both methods is to divide a set of measurements or observations into k subsets or clusters so that the subsets minimize the sum of distances between a measurement and a center of the measurement’s cluster. In the k-Means algorithm, the center of the subset is the mean of measurements in the subset, often called a centroid. In the k-Medoids algorithm, the center of the subset is a member of the subset, called a medoid.

k-Medoid is more robust to noise and outliers as compared to k-Means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances.

The k-Medoids algorithm returns Medoids, which are the actual data points in the data set. This allows you to use the algorithm in situations where the mean of the data does not exist within the data set. This is the main difference between k-Medoids and k-Means where the centroids returned by k-Means may not be within the data set. Hence k-Medoids is useful for clustering categorical data where a mean is impossible to define or interpret.

 

Performance of t-SNE:

Performance depends on data sizes and algorithms. t-SNE can take a good deal of time to process data. If you have N data points in D dimensions that you want to map to Y dimensions, then

  • Exact t-SNE takes of order D×N2 operations.

  • Barnes-Hut t-SNE takes of order D×Nlog(N)×exp(dimension(Y)) operations.

For Pavia, N is greater than 1000 or so and has the embedding dimension Y is 2 or 3 hence the Barnes-Hut algorithm can be faster than the exact algorithm.


References:

van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing Data using t-SNE." J. Machine Learning Research 9, 2008, pp. 2579–2605.

van der Maaten, Laurens. Barnes-Hut-SNEarXiv:1301.3342 [cs.LG], 2013.

Jacobs, Robert A. "Increased rates of convergence through learning rate adaptation." Neural Networks 1.4, 1988, pp. 295–307.

[4] Wattenberg, Martin, Fernanda Viégas, and Ian Johnson. "How to Use t-SNE Effectively." Distill, 2016. Available at How to Use t-SNE Effectively.

k-Medoids:

Kaufman, L., and Rousseeuw, P. J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, New Jersey: John Wiley & Sons, Inc.

Park, H-S, and Jun, C-H. (2009). A simple and fast algorithm for k-Medoids clustering. Expert Systems with Applications. 36, 3336-3341.

Previous
Previous

Correlation Matrix

Next
Next

Phasor Clustering Toolbox