Publication: Sparse K-Means Compression for Federated Machine Learning and Linear Regression Using Sketched and Quantized Predictors
| dc.contributor.advisor | Kepplinger, David | |
| dc.contributor.author | Hill, Daniel | |
| dc.date.accessioned | 2026-06-02T20:10:21Z | |
| dc.date.issued | 2025-05 | |
| dc.description.abstract | The Information Age has led to the generation of vast and unquantifiable amounts of data, but technology has struggled to keep pace with the growing demand for efficient storage and transmission. Compression algorithms provide a means to reduce storage and transmission costs while preserving essential information for learning and analysis. This dissertation makes two contributions in this area: a novel compression scheme for federated learning and a statistical analysis framework regarding the use of compressed data in linear regression. Regarding the first contribution, we propose the Sparse $k$-Means (SparK) algorithm specifically designed for Federated Learning applications. SparK compresses model parameter updates between clients and a server by combining sparsification with $k$-means clustering. Using the desired inverse compression rate as its sole hyperparameter, SparK optimizes the degree of sparsification and the number of clusters in $k$-means for each model layer to achieve the desired compression with minimal distortion. Experimental results demonstrate that SparK performs comparably or better than similar sparsification and clustering methods on a standard test bed across various compression levels. Regarding the second contribution, we examine dithered 1-bit compression of predictors and response variables in the context of linear regression. We propose an M-estimator of the associated regression coefficients and establish its asymptotic Normality and asymptotic mean squared error (MSE). This is complemented by a non-asymptotic analysis of the MSE for three compressors: 1-bit stochastic quantization, Gaussian sketching, and their combination. High-probability upper bounds are derived for each compressor under both fixed and random design assumptions. The relative efficiency in comparison to the ordinary least squares estimator with access to uncompressed data is studied as well. | |
| dc.format.medium | doctoral dissertations | |
| dc.identifier.uri | https://hdl.handle.net/1920/15115 | |
| dc.identifier.uri | https://doi.org/10.13021/MARS/15350 | |
| dc.language.iso | en | |
| dc.rights | Copyright 2025 Daniel Hill | |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0 | |
| dc.subject | Compressed Regression | |
| dc.subject | Federated Learning | |
| dc.subject | Gradient Compression | |
| dc.subject | Quantization | |
| dc.subject | Sketching | |
| dc.subject | Sparsification | |
| dc.title | Sparse K-Means Compression for Federated Machine Learning and Linear Regression Using Sketched and Quantized Predictors | |
| dc.type | Dissertation | |
| dspace.entity.type | Publication | |
| thesis.degree.discipline | Statistics | |
| thesis.degree.grantor | George Mason University | |
| thesis.degree.level | Doctoral | |
| thesis.degree.name | PhD in Statistics |
