Towards Large-Scale and High-Performance Deep Learning Computing



Journal Title

Journal ISSN

Volume Title



Deep Learning (DL) has achieved superior performance than human in complex cognitive tasks like vision, speech, language, medical, games, etc. With the ever-increasing market demands, DL applications and the underlying computing hardware have demonstrated strong scaling trends in terms of Model Scaling and Computing Scaling (e.g., increased computing parallelism, memory, and storage to serve larger models). On the one hand, deep neural network (DNN) models are becoming increasingly bigger with higher structure complexity, larger parameter sizes, and increased computational workloads so as to strive for better accuracy. For example, one of the biggest model VGG16 in 2017 has only 138M parameters, while in 2021, the largest model MT-NLG has reached530B parameters, which shows a 4,000× increase. On the other hand, high-performance computing hardware demonstrates strong scaling trends and provides the core support for both DL model training and inference. For example, NVIDIA GPUs have maintained 2x increase in the number of transistors for each generation, which demonstrates exponential performance gain like throughput and memory bandwidth scaling in the past decade. Such a double scaling trend greatly complicates the high-performance deep learning computing system, including model design, kernel compiling and runtime scheduling, etc. In this dissertation, we introduce several of our works in high-performance deep learning computing from a full-stack optimization point of view. This includes not only algorithmlevel compression and acceleration: Antidote and DCCNN, but also the hardware-software co-design perspectives like GPU-aware DNN design, TA-DNN, and the multi-tenant DLruntime scheduling, MT-Graph. Finally, we summarize our prior works and share our vision and insights towards the future large-scale deep learning systems. Through introducing these works and sharing our understanding, we hope that this dissertation could shed some light on the future high-performance deep learning computing research.