Publication:
Sparse K-Means Compression for Federated Machine Learning and Linear Regression Using Sketched and Quantized Predictors

dc.contributor.advisorKepplinger, David
dc.contributor.authorHill, Daniel
dc.date.accessioned2026-06-02T20:10:21Z
dc.date.issued2025-05
dc.description.abstractThe Information Age has led to the generation of vast and unquantifiable amounts of data, but technology has struggled to keep pace with the growing demand for efficient storage and transmission. Compression algorithms provide a means to reduce storage and transmission costs while preserving essential information for learning and analysis. This dissertation makes two contributions in this area: a novel compression scheme for federated learning and a statistical analysis framework regarding the use of compressed data in linear regression. Regarding the first contribution, we propose the Sparse $k$-Means (SparK) algorithm specifically designed for Federated Learning applications. SparK compresses model parameter updates between clients and a server by combining sparsification with $k$-means clustering. Using the desired inverse compression rate as its sole hyperparameter, SparK optimizes the degree of sparsification and the number of clusters in $k$-means for each model layer to achieve the desired compression with minimal distortion. Experimental results demonstrate that SparK performs comparably or better than similar sparsification and clustering methods on a standard test bed across various compression levels. Regarding the second contribution, we examine dithered 1-bit compression of predictors and response variables in the context of linear regression. We propose an M-estimator of the associated regression coefficients and establish its asymptotic Normality and asymptotic mean squared error (MSE). This is complemented by a non-asymptotic analysis of the MSE for three compressors: 1-bit stochastic quantization, Gaussian sketching, and their combination. High-probability upper bounds are derived for each compressor under both fixed and random design assumptions. The relative efficiency in comparison to the ordinary least squares estimator with access to uncompressed data is studied as well.
dc.format.mediumdoctoral dissertations
dc.identifier.urihttps://hdl.handle.net/1920/15115
dc.identifier.urihttps://doi.org/10.13021/MARS/15350
dc.language.isoen
dc.rightsCopyright 2025 Daniel Hill
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0
dc.subjectCompressed Regression
dc.subjectFederated Learning
dc.subjectGradient Compression
dc.subjectQuantization
dc.subjectSketching
dc.subjectSparsification
dc.titleSparse K-Means Compression for Federated Machine Learning and Linear Regression Using Sketched and Quantized Predictors
dc.typeDissertation
dspace.entity.typePublication
thesis.degree.disciplineStatistics
thesis.degree.grantorGeorge Mason University
thesis.degree.levelDoctoral
thesis.degree.namePhD in Statistics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hill_gmu_0883E_13668.pdf
Size:
2.27 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: