Towards a Common Dimensionality Reduction Approach; Unifying PCA, tSNE, and UMAP through a Cohesive Framework
Date
Authors
Draganov, Andrew
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Dimensionality reduction is a widely studied field that is used to visualize data, cluster samples, and extract insights from high-dimensional distributions. The classical approaches such as PCA, Isomap, and Laplacian eigenmaps rely on clear optimization strategies while more modern approaches such at tSNE and UMAP define gradient descent search spaces through disparities between the high- and low-dimensional datasets. In this work, we notice that all of these approaches can be interpreted as minimizing the difference between two kernel functions – one for the high dimensional space and one for the low dimensional space. In particular, once we abstract the kernel functions, we can develop a common framework for any dimensionality reduction problem. Namely, one needs to identify their high-dimensional distance kernel, the low-dimensional distance kernel, and the method used for minimization. With this in mind, we identify the relevant general framework and then proceed to discuss the ways in which PCA, tSNE, and UMAP all fit into it. For each, we discuss insights that were obtained during the process. We lastly highlight next steps and directions for future work.
Description
Keywords
Dimensionality Reduction, Principal Component Analysis, TSNE, UMAP, Graph Laplacian