Exemplar-driven Learning for Data Clustering

dc.contributor.advisorDomeniconi, Carlotta
dc.contributor.authorMani, Priya
dc.creatorMani, Priya
dc.date.accessioned2022-08-03T20:18:33Z
dc.date.available2022-08-03T20:18:33Z
dc.date.issued2021
dc.description.abstractClustering is a fundamental machine learning problem which seeks to discover groups of data based on a notion of similarity. Clustering is ill-posed as the notion of an optimal clustering is subjective to the application at hand. The clustering solution obtained depends on the characteristics of the data as well as on the design choices of algorithms. Clustering high-dimensional data poses additional challenges due to unreliable estimations of distances and density. High-dimensional data often reside in subspaces which entails discovering the dimensions relevant to each cluster. Another challenge in high-dimensional spaces is the emergence of the hubness phenomenon, whereby few data points, known as hubs, appear frequently as nearest neighbors. While certain hubs exhibit useful clustering properties of the data, others can negatively influence neighborhood computation and clustering results. As such, not all data points are beneficial for clustering, and more accurate and reliable clustering solutions can be obtained by leveraging an informative subset of data. In my dissertation, I propose data-driven and adaptive selection strategies to leverage exemplars and guide the optimization of unsupervised learning algorithms for data clustering. I introduce a new geometric characterization of hubs to guide the discovery of sub- space clusters, and introduce a hubness-driven algorithm to find subspace clusters in high-dimensional data. Furthermore, I leverage selective neighborhoods to approximate the data manifold and to regularize non-negative matrix factorization for data clustering. As a result, I design an unsupervised manifold regularized matrix factorization algorithm which jointly learns a sparse set of representatives and their neighbor affinities, along with the data factorization. I further propose a fast and effective approximation of my approach by relaxing the selectivity constraints on the data. Finally, data exemplars can be leveraged to learn unsupervised deep representations. To this end, I use hubs to regularize a variational auto-encoder and to learn a discriminative embedding for unsupervised down-stream tasks. I introduce an unsupervised and data- driven regularization of the latent space using a mixture of hub-based priors and a hub-based contrastive loss. I evaluate the quality of data clustering and generative modeling within the learned latent embedding, and achieve competitive performance with respect to state-of-the-art methods on benchmark data.
dc.format.extent153 pages
dc.identifier.urihttps://hdl.handle.net/1920/12943
dc.language.isoen
dc.rightsCopyright 2021 Priya Mani
dc.subjectComputer science
dc.subjectExemplars
dc.subjectHubness phenomenon
dc.subjectMatrix factorization
dc.subjectSelective Regularization
dc.subjectSubspace Clustering
dc.subjectVariational auto-encoders
dc.titleExemplar-driven Learning for Data Clustering
dc.typeDissertation
thesis.degree.disciplineComputer Science
thesis.degree.grantorGeorge Mason University
thesis.degree.levelPh.D.
thesis.degree.namePh.D. in Computer Science

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mani_gmu_0883E_12731.pdf
Size:
10.59 MB
Format:
Adobe Portable Document Format