Towards a Common Dimensionality Reduction Approach; Unifying PCA, tSNE, and UMAP through a Cohesive Framework

Berry, TyrusDraganov, Andrew2021-10-142021-10-14https://hdl.handle.net/1920/12149Dimensionality reduction is a widely studied field that is used to visualize data, cluster samples, and extract insights from high-dimensional distributions. The classical approaches such as PCA, Isomap, and Laplacian eigenmaps rely on clear optimization strategies while more modern approaches such at tSNE and UMAP define gradient descent search spaces through disparities between the high- and low-dimensional datasets. In this work, we notice that all of these approaches can be interpreted as minimizing the difference between two kernel functions – one for the high dimensional space and one for the low dimensional space. In particular, once we abstract the kernel functions, we can develop a common framework for any dimensionality reduction problem. Namely, one needs to identify their high-dimensional distance kernel, the low-dimensional distance kernel, and the method used for minimization. With this in mind, we identify the relevant general framework and then proceed to discuss the ways in which PCA, tSNE, and UMAP all fit into it. For each, we discuss insights that were obtained during the process. We lastly highlight next steps and directions for future work.enDimensionality ReductionPrincipal Component AnalysisTSNEUMAPGraph LaplacianTowards a Common Dimensionality Reduction Approach; Unifying PCA, tSNE, and UMAP through a Cohesive FrameworkThesis