by David Lion and Adrian Chiu, University of Toronto; Hailong Sun, Beihang University; Xin Zhuang, University of Toronto; Nikola Grcevski, Vena Solutions; Ding Yuan, University of Toronto.
Published in and presented at OSDI ‘16.
Many widely used, latency sensitive, data-parallel distributed systems, such as HDFS, Hive, and Spark choose to use the Java Virtual Machine (JVM), despite debate on the overhead of doing so. This paper analyzes the extent and causes of the JVM performance overhead in the above mentioned systems. Surprisingly, we find that the warm-up overhead, i.e., class loading and interpretation of bytecode, is frequently the bottleneck. For example, even an I/O intensive, 1GB read on HDFS spends 33% of its execution time in JVM warm-up, and Spark queries spend an average of 21 seconds in warm-up.
The findings on JVM warm-up overhead reveal a contradiction between the principle of parallelization, i.e., speeding up long running jobs by parallelizing them into short tasks, and amortizing JVM warm-up overhead through long tasks. We solve this problem by designing HotTub, a new JVM that amortizes the warm-up overhead over the lifetime of a cluster node instead of over a single job by reusing a pool of already warm JVMs across multiple applications. The speed-up is significant. For example, using HotTub results in up to 1.8X speed-ups for Spark queries, despite not adhering to the JVM specification in edge cases.
By eliminating JVM warm-up overhead HotTub gains a large speed-up for short running, latency-sensitive queries. The average time spent in warm-up overhead is 21 seconds for a query on Spark and 13 seconds for a query on Hive.
HotTub reuses existing, already warm, JVMs by maintaining a pool of JVMs that have ran before. It does not require any modification to the application and can be enabled by simply adding a flag to the 'java’ command. When 'java’ is run it will check if there is a reusable JVM in the pool. If not it will create a new JVM and run the application. If a JVM can be reused it will first be reinitialized and then the application will run with no warm-up overhead. After the application finishes running the JVM will be reset before being returned to the pool for later reuse.