Comparison of Performance of Big Data Applications in Different Environments

Motwani, Devang

Comparison of Performance of Big Data Applications in Different Environments

dc.contributor.advisor	Homayoun, Houman
dc.contributor.author	Motwani, Devang
dc.creator	Motwani, Devang
dc.date	2018-06-22
dc.date.accessioned	2020-01-29T18:16:15Z
dc.date.available	2020-01-29T18:16:15Z
dc.description.abstract	Virtualization is utilized by all the Tech companies in their software product development and deployment on a regular basis. Containers lead to a huge hype due to the success of Docker, a tool to use containers, that made software development and deployment easier as well modular. Prior to Containers, Virtual Machines were almost the only form of Virtualization that was used and was considered stable. Due to these new emerging technologies, there has been a lot of research published over comparison in performance of Host, Virtual Machine and Containers. But there has not been much work done in the area of Big Data applications such as Microbenchmarks, Graph applications, Search applications and Machine learning running in these different environments over frameworks such as Apache Hadoop, Spark, Flink, etc. so that we can understand how different components of computer architecture are affected. This thesis compares the performance of these standard big data applications on host, virtual machines and containers (specifically Docker) by running these applications and fetching hardware counter values of C0 state, L3/L2 caches hits, CPU power etc. which would help create a pictorial representation of their comparison. This comparison has been able to justify that application running on Apache Flink, considering some trade-offs, executes Graph, Search and Machine Learning Applications execute faster and more efficiently than on Apache Spark and Hadoop MapReduce. However, Microbenchmark applications perform faster and efficiently on Apache Spark than on Hadoop MapReduce or Apache Flink. Previous research has stated that host and container environment would have almost same performance and there would be some overhead faced in running applications on Virtual Machines but looking at the results, it can be stated that nothing can specifically be drawn about the performance in different environments, different applications running in different frameworks have different performance and this has been discussed and detailed in this work.
dc.identifier.uri	https://hdl.handle.net/1920/11655
dc.language.iso	en
dc.subject	Virtualization performance
dc.subject	Virtual machines and docker
dc.subject	Intel performance counter monitor
dc.subject	BigData Applications Performance
dc.subject	Container performance
dc.subject	Apache Hadoop, Spark and Flink
dc.title	Comparison of Performance of Big Data Applications in Different Environments
dc.type	Thesis
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	George Mason University
thesis.degree.level	Master's
thesis.degree.name	Master of Science in Computer Engineering

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Motwani_thesis_2018.pdf
Size:: 10.08 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.52 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

College of Engineering and Computing