Learning on Large-Scale Data with Security and Privacy



Journal Title

Journal ISSN

Volume Title



Recent advancements in machine learning domain have been enabled by the ability to analyze massive volumes of data, and to extract and learn patterns within that data. However, large-scale data collection raises privacy concerns, as it can expose individual's sensitive data to actors with malicious intent. This lack of privacy can lead to potential data breaches, and consequently, can compromise the successful development of machine learning techniques. Secure Computation is a branch of modern cryptography that introduces promising solutions for processing data in a privacy-preserving manner. It enables computing any functionality on data while the data is "encrypted". This field has been the topic of extensive research in recent years and made remarkable progress. However, most results remained impractical for real applications and its deployment remained limited due to efficiency and scalability constraints. The goal of this dissertation is to present novel protocol designs and development techniques to overcome these efficiency and scalability limitations. We demonstrate how to construct secure and privacy-preserving machine learning schemes that are practical for real-world applications, while dealing with large-scale data, and guaranteeing security against different types of adversaries. In the first part of this dissertation, we design and develop privacy-preserving machine learning frameworks using secure computation techniques and explore the trade-off between security and efficiency on these frameworks. In order to improve the efficiency, we relax the security notion by allowing the adversary to learn some small information during the computation. Then, we use Differential Privacy mechanisms to provide a formal bound on the amount of leakage, and prove that what is learned by the adversary is deferentially private. We also leverage Parallel Computation techniques to improve the performance and running time of these novel algorithms. These frameworks follow a centralized computation architecture in which users send their private data to untrusted computation servers in order to perform some computations on them. In the second part, we design and develop secure and privacy-preserving machine learning algorithms in the distributed setting known as Federated Learning. In federated learning, users do not share their sensitive data with the computation severs, but instead they train a local model on their private data and only send their model parameters to the computation servers, which then aggregate those local parameters and construct a global model on all participants' data. Our secure and privacy-preserving federated learning protocols are designed to have low communication cost, as well as being robust to the users dropping out of the protocol at any point. We leverage secure computation and differential privacy techniques to preserve the privacy of user's data, as well as the trained model's parameters. All of our secure and privacy-preserving frameworks presented in this dissertation are designed to support two adversarial models, passive and active adversaries.