Mason Archival Repository Service

Wukong: A Fast, Cost-Effective, and Easy-to-Use Serverless DAG Engine

Show simple item record

dc.contributor.advisor Cheng, Yue
dc.contributor.author Carver, Benjamin
dc.creator Carver, Benjamin
dc.date 2021-04-21
dc.date.accessioned 2021-10-04T21:48:46Z
dc.date.available 2021-10-04T21:48:46Z
dc.identifier.uri http://hdl.handle.net/1920/12093
dc.description.abstract Data analytics applications can often be modeled as a directed acyclic graph (DAG), where the nodes are fine-grained tasks and the edges are task dependencies. A DAG scheduler can be used to distribute the tasks to cloud computing resources where they can be executed in parallel to speedup work ow applications. Serverless computing is a cloud computing platform that enables the decomposition of traditionally monolithic, server-based applications into a collection of fine-grained cloud functions. Developers write the function logic while the service provider takes care of provisioning, scaling, and managing the back-end servers or virtual machines (VMs) that the functions run on. Creating a serverlessoriented DAG scheduler poses a major challenge, as executing complex, burst-parallel DAG jobs requires rapid scaling and high task throughput while minimizing data movement across tasks. Despite these challenges, data analytics workloads are well-suited for serverless computing. The auto-scaling property of serverless computing platforms accommodates short tasks and bursty workloads, while the pay-per-use billing model of serverless computing providers keeps the cost of short tasks low. In this thesis, we thoroughly investigate the problem space of DAG scheduling in serverless computing. Our goal is to demonstrate that serverless-oriented, parallel computing frameworks can support fast and efficient, DAG-based, parallel-computation work ows that are easy to deploy and manage. To accomplish this, we identify and evaluate a set of techniques to make DAG schedulers serverless-aware, and we implement these techniques in Wukong, a serverless DAG engine built atop AWS Lambda. Our techniques and optimizations bring multiple benefits, including enhanced data locality, reduced network I/Os, automatic resource elasticity, and improved cost effectiveness. We show that when comparing Wukong to numpywren, a serverless system for linear algebra, Wukong achieves near-ideal scalability, executes parallel computation jobs up to 68:17x faster, reduces network I/O by multiple orders of magnitude, and achieves 92:96% tenant-side cost savings compared to numpywren, a serverless linear algebra library. This thesis contains two modified, published papers along with an additional chapter describing the latest, work-in-progress version of Wukong. In Chapter 3, we describe and evaluate the initial prototype of Wukong. This first version of Wukong delivered competitive performance compared to a comparable, serverful Dask cluster. In Chapter 4, we present the second version of Wukong. We present a series of optimizations that greatly improve the cost-effectiveness and performance of Wukong. Finally, we present the current work-in-progress version of Wukong in Chapter 5. The latest version of Wukong is completely serverless, requiring no user-deployed or user-managed servers. As a result, this version of Wukong is extremely simple to use while still delivering competitive performance and cost-effectiveness. This thesis establishes that serverless computing is an appropriate setting for creating a serverless DAG engine. We show that, by designing a DAG engine that takes into account both the benefits and the challenges of serverless computing platforms, it is possible to create a fast, cost-effective, and easy-to-use serverless parallel computing framework. en_US
dc.language.iso en en_US
dc.subject serverless computing en_US
dc.subject data analytics en_US
dc.subject cloud computing en_US
dc.subject DAG scheduling en_US
dc.title Wukong: A Fast, Cost-Effective, and Easy-to-Use Serverless DAG Engine en_US
dc.type Thesis en_US
thesis.degree.name Master of Science in Computer Science en_US
thesis.degree.level Master's en_US
thesis.degree.discipline Computer Science en_US
thesis.degree.grantor George Mason University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


Browse

My Account

Statistics