Exemplar-driven Learning for Data Clustering

Mani, Priya

Exemplar-driven Learning for Data Clustering

dc.contributor.advisor	Domeniconi, Carlotta
dc.contributor.author	Mani, Priya
dc.creator	Mani, Priya
dc.date.accessioned	2022-08-03T20:18:33Z
dc.date.available	2022-08-03T20:18:33Z
dc.date.issued	2021
dc.description.abstract	Clustering is a fundamental machine learning problem which seeks to discover groups of data based on a notion of similarity. Clustering is ill-posed as the notion of an optimal clustering is subjective to the application at hand. The clustering solution obtained depends on the characteristics of the data as well as on the design choices of algorithms. Clustering high-dimensional data poses additional challenges due to unreliable estimations of distances and density. High-dimensional data often reside in subspaces which entails discovering the dimensions relevant to each cluster. Another challenge in high-dimensional spaces is the emergence of the hubness phenomenon, whereby few data points, known as hubs, appear frequently as nearest neighbors. While certain hubs exhibit useful clustering properties of the data, others can negatively influence neighborhood computation and clustering results. As such, not all data points are beneficial for clustering, and more accurate and reliable clustering solutions can be obtained by leveraging an informative subset of data. In my dissertation, I propose data-driven and adaptive selection strategies to leverage exemplars and guide the optimization of unsupervised learning algorithms for data clustering. I introduce a new geometric characterization of hubs to guide the discovery of sub- space clusters, and introduce a hubness-driven algorithm to find subspace clusters in high-dimensional data. Furthermore, I leverage selective neighborhoods to approximate the data manifold and to regularize non-negative matrix factorization for data clustering. As a result, I design an unsupervised manifold regularized matrix factorization algorithm which jointly learns a sparse set of representatives and their neighbor affinities, along with the data factorization. I further propose a fast and effective approximation of my approach by relaxing the selectivity constraints on the data. Finally, data exemplars can be leveraged to learn unsupervised deep representations. To this end, I use hubs to regularize a variational auto-encoder and to learn a discriminative embedding for unsupervised down-stream tasks. I introduce an unsupervised and data- driven regularization of the latent space using a mixture of hub-based priors and a hub-based contrastive loss. I evaluate the quality of data clustering and generative modeling within the learned latent embedding, and achieve competitive performance with respect to state-of-the-art methods on benchmark data.
dc.format.extent	153 pages
dc.identifier.uri	https://hdl.handle.net/1920/12943
dc.language.iso	en
dc.rights	Copyright 2021 Priya Mani
dc.subject	Computer science
dc.subject	Exemplars
dc.subject	Hubness phenomenon
dc.subject	Matrix factorization
dc.subject	Selective Regularization
dc.subject	Subspace Clustering
dc.subject	Variational auto-encoders
dc.title	Exemplar-driven Learning for Data Clustering
dc.type	Dissertation
thesis.degree.discipline	Computer Science
thesis.degree.grantor	George Mason University
thesis.degree.level	Ph.D.
thesis.degree.name	Ph.D. in Computer Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mani_gmu_0883E_12731.pdf
Size:: 10.59 MB
Format:: Adobe Portable Document Format

Download

Collections

College of Engineering and Computing