Java Virtual Threads: An opportunity to simplify high-load programming

2024-04-17

Java 21 was released last September. This is an LTS release so it will stay with us for a while. Among other features entering General Availability in Java 21, there are virtual threads that have been in preview since Java 19. There are certain scenarios where this feature can significantly improve the performance of a high-load application and make the lives of developers easier. Today, we’re going to take a closer look at virtual threads.

Let’s imagine a simple inbound request that propagates through a server application making a database request, processing the received data then making a REST call and finally returning a response.

With the classical Thread Per Task approach to serve a request, the application uses a dedicated platform thread and synchronous blocking calls to third-party APIs. Every inbound request has an assigned platform thread that stops whenever it has to wait for a response from a third party. This is how Spring Web MVC works for example. We like this approach for its simplicity, ease of development, debugging, and support.

The problem with this approach is that it doesn’t work well in a high-load scenario. The stream of incoming requests causes the application to assign more and more threads that spend a significant portion of their time being blocked, occupying memory. So the application will reach the point where it can no longer create new threads, despite the available CPU. Finally, many threads being blocked and then unblocked causes substantial interactions with the OS kernel and context switching on the CPU side and thus creates inefficiency.

With virtual threads, there is a relatively small JVM-managed pool of platform threads and a Thread API-like layer of abstraction on top of that pool. Creating a virtual thread is no different from creating a POJO. It happens within the JVM and doesn’t require any system calls. An application can create as many virtual threads as its memory configuration allows.

Whenever a virtual thread needs to run, it is mounted by the JVM onto one of the available Platform Threads from the pool. If the virtual thread hits a blocking IO or any other API that causes it to stop (e.g. attempting to acquire a semaphore), the JVM unmounts it from the platform thread and mounts another virtual thread that is ready to run. All of that happens with minimum interactions with the OS, meaning fast. When the virtual thread is done and no longer needed it is simply garbage collected as would any other Java object.

To demonstrate the difference in performance between applications using platform and virtual threads, we can use a simple test setup.

The test environment is implemented using AWS EC2 instances. With fine-grained control over the environment specs, we can remove all the non-significant factors that bias the result.

The test environment consists of a Thread Service we built which is implemented as a simple Spring Boot 3 application using Web MVC and configurable thread type. The processing here is simple. It just forwards the incoming request to a third party using blocking IO and forwards back the response. So, processing threads spend most of their time in a blocked state waiting for the third party to respond.

We produced a Delay Service that emulates third-party API behavior by receiving requests from a Thread Service and responding after a delay specified in a request parameter. It was implemented as a scalable Spring Boot 3 application utilizing WebFlux. This design ensures that the Delay Service efficiently manages traffic without becoming a bottleneck, thereby preserving the integrity of performance benchmarks.

The load is generated by a Gatling test suite with a simple load scenario. We send requests to the Thread Service which in turn forwards them to the Delay Service, receiving responses after a 200-milisecond delay and forwarding them back to the test suite. Gradually increasing the request rate, we evaluate the response time of the Thread Service, expecting signs of stress to appear at some point.

The following chart is the response time percentiles chart produced by the Gatling Test Suite for the application using Platform Threads.

The expected operation of the Thread Service under test would be to return a response after a roughly 200-millisecond delay. That is the case at the beginning of the test when the request rate is low. We can see that at 1100 QPS the application starts showing the signs of strain - a significant increase in the response time. The application crashes at 1200 QPS as it exhausts the resources of the underlying system. In fact, the host system itself becomes unresponsive.

The same setup, but now the application under test is configured to use virtual threads.

We can see several spikes in higher percentiles that can be attributed to garbage collection sessions but, overall, the application behaves normally, showing a slight increase in higher percentiles towards the end of the test when the request rate is high.

To summarize:

Virtual threads are not a silver bullet. They are effective when serving requests that call blocking APIs. If processing a request represents mostly computational load, don’t bother.
Virtual threads are not reusable. When done, they are garbage-collected and may represent an additional load for the Garbage Collector.
We can apply virtual threads to easily increase the performance of existing applications written mostly in a synchronous style (in certain scenarios).
Virtual threads enable synchronous-style coding while the JVM handles asynchronism, simplifying the development of high-load applications that are maintainable and supportable due to their use of Thread Per Task mode and blocking APIs.

Got a project?

Harness the power of your data with the help of our tailored data-centric expertise.