Comparison of Performance of Big Data Applications in Different Environments

dc.contributor.advisorHomayoun, Houman
dc.contributor.authorMotwani, Devang
dc.creatorMotwani, Devang
dc.description.abstractVirtualization is utilized by all the Tech companies in their software product development and deployment on a regular basis. Containers lead to a huge hype due to the success of Docker, a tool to use containers, that made software development and deployment easier as well modular. Prior to Containers, Virtual Machines were almost the only form of Virtualization that was used and was considered stable. Due to these new emerging technologies, there has been a lot of research published over comparison in performance of Host, Virtual Machine and Containers. But there has not been much work done in the area of Big Data applications such as Microbenchmarks, Graph applications, Search applications and Machine learning running in these different environments over frameworks such as Apache Hadoop, Spark, Flink, etc. so that we can understand how different components of computer architecture are affected. This thesis compares the performance of these standard big data applications on host, virtual machines and containers (specifically Docker) by running these applications and fetching hardware counter values of C0 state, L3/L2 caches hits, CPU power etc. which would help create a pictorial representation of their comparison. This comparison has been able to justify that application running on Apache Flink, considering some trade-offs, executes Graph, Search and Machine Learning Applications execute faster and more efficiently than on Apache Spark and Hadoop MapReduce. However, Microbenchmark applications perform faster and efficiently on Apache Spark than on Hadoop MapReduce or Apache Flink. Previous research has stated that host and container environment would have almost same performance and there would be some overhead faced in running applications on Virtual Machines but looking at the results, it can be stated that nothing can specifically be drawn about the performance in different environments, different applications running in different frameworks have different performance and this has been discussed and detailed in this work.
dc.subjectVirtualization performance
dc.subjectVirtual machines and docker
dc.subjectIntel performance counter monitor
dc.subjectBigData Applications Performance
dc.subjectContainer performance
dc.subjectApache Hadoop, Spark and Flink
dc.titleComparison of Performance of Big Data Applications in Different Environments
dc.typeThesis Engineering Mason University's of Science in Computer Engineering


Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
10.08 MB
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
2.52 KB
Item-specific license agreed upon to submission