Altering Solr Resources with G1

Sayan Das
Aug 5, 2021
7 min read

Updated: Aug 23, 2021

Introduction

The Java community is continuously evolving at a very rapid pace. With Java 9, it deprecated the famous Concurrent Mark Sweep GC Collector, in favor of the new G1 collector. Engineers can resist anything except giving their application beefier resources. Especially when it comes to memory-hungry Solr, luring us to turn the heap up. This article will be focused on tunning G1 GC params in Solr and tell how blibli.com‘s search gained major performance gains with limited resources by just adding 11 lines of configurations!

G1 Collector Overview

In this section, I will cover the basic concept of how G1 works and key concepts which are essential while configuring the collector on Solr. I won’t cover the entire work of G1 but here are few articles to refer [1] and [2]. Moving on, if you are using JDK 9 (in the latter part, we will discuss why it is not recommended to use JDK 9 and lower if you are planning to upgrade to G1) and onwards, JVM by default ships with G1 collector rather than Parallel GC. You can also enable G1 collector in JDK 7 update 4 and later by using -XX:+UseG1GC on command line JVM parameters. The key point which makes G1 different from other collectors is the grid-style heap layout, heap is equally partitioned into small equally spaced cells of memory each cell can be either free or occupied by the young or old generations.

The size of each individual cell can be configured with -XX:G1HeapRegionSize=n JVM parameter, by default the region size is allocated by maxHeapSize/2048 and rounded down to the first power of 2 between 1MB and 32MB; region sizes <1MB or >32MB are not supported.

Humongous Allocation

There can be a situation in which an object is bigger than individual free cell size, and may require multiple contiguous free cells to fit in. These objects are called humongous objects, and these objects play a crucial role in tuning the G1 collector on Solr. Conditions that qualify to be humongous are when the object size exceeds 50% of heap region size i.e -XX:G1HeapRegionSize and these are assigned a contiguous set of regions. JVM treats such objects a bit differently they can be reclaimed during any type of collection cycle (well depends if you are using JDK 8u40 JDK-8048179 or else they will be reclaimed in full garbage collection, here is a good article explaining how eager reclamation works for humongous object [4]).

Heap Region and Generation

Nothing special here it’s just like other collectors. Quoting official documentation from Oracle.

The vast majority of objects are allocated in a pool dedicated to young objects (the young generation), and most objects die there. When the young generation fills up, it causes a minor collection in which only the young generation is collected; garbage in other generations isn’t reclaimed. The costs of such collections are, to the first order, proportional to the number of live objects being collected; a young generation full of dead objects is collected very quickly. Typically, some fraction of the surviving objects from the young generation is moved to the old generation during each minor collection. Eventually, the old generation fills up and must be collected, resulting in a major collection, in which the entire heap is collected. Major collections usually last much longer than minor collections because a significantly larger number of objects are involved.

CMS vs. G1 Collector

Applications running today with either the CMS or the ParallelOld garbage collector would benefit from switching to G1 if the application has one or more of the following traits.

More than 50% of the Java heap is occupied with live data.
The rate of object allocation rate or promotion varies significantly.
Undesired long garbage collection or compaction pauses (longer than 0.5 to 1 second)

How we optimized Solr with G1

One thing clear out, optimizing garbage collection won’t reduce the actual memory requirement, this will only help save extra CPU cycles while doing garbage collection, which will be useful serving requests. For optimizing memory utilization you can refer to this article [3], with this and optimizing GC you can squeeze out much performance from both CPU and memory front. Moving on, there is no thumb rule to determine the configuration theoretically but it “all depends” on the following factors:

Size of each document
Number of documents present n given node
Request throughput for query and update requests
How frequent is index commit
Numbers of rows trying to fetch in request.
Fetching all the fields or few fields only

All the factors are interlinked together, one thing or another may alter GC behavior. Let's see the setup that we used to get our ideal GC configuration and few observations.

Experiment Setup

For hardware we had 3 VMs with the following configuration:

Operating System: Centos 7.2
Memory: 32g
CPU: Intel x86 16cores, Threads per core: 2

And the Solr configuration we were having was:

Solr v8.3.1
Documents: 5million
Heap configuration: 18g Xms=Xmx
Single shard
Java 8

Each VM running a single node with a single NRT replica present on each of them as part of a single Solr cluster. Further, I followed the simple “survival of the fittest” to get the final result. We first started with a default config (let's say 1st Generation Node 2 β is the default configuration) and tinkered (mutated) each parameter at a time i.e 𝛼 and ɣ. The equal load was provided on each node and observed the GC pause, average response time, 95th response time, and the request throughput. The best performing setup was used for second-generation and repeated the same until we get to the final desired configuration.

Generative style testing

Instrumenting the application through the process is important. We used NewRelic to observe the Solr node’s heap region usage and pauses for minor as well full GCs accordingly we decided which parameter to update for the next iteration. But if you don’t have paid license for NewRelic, in that case, Grafana and Prometheus will solve the purpose. Or else turn on the GC logs, by attaching the following parameters to GC_TUNE variable of Solr’s solr.in.sh file and use some online tool like https://gceasy.io/ to visualize. GC Easy helped us to understand the GC activity in a much more granular way, than even New Relic.

-XX:+PrintGCDetails

-XX:+PrintGCDateStamps

-XX:+PrintGCCause

-XX:+PrintTenuringDistribution

-XX:+UseGCLogFileRotation

-XX:NumberOfGCLogFiles=10

-XX:GCLogFileSize=5M

‍

Observation #1: Query Analysis

Let’s say you are fetching a large number of documents (maybe assume rows=100000) but with just fl=id and in another hand you are requesting the same rows but with fl=* on a separate node, both of them will have very different memory footprint. In Solr, every document is an Object and each document have its own fields that are also Object, so let's say each document on Solr have 10 String fields. Roughly we can say in the first request i.e rows=100000&fl=id must have created 100000*1 objects and the second request rows=100000&fl=* would have created 100000*10 that is 10x more than the previous and resulting in many aggressive GCs. One thing to note is, such requests on Solr create Objects having a very short life span, so as explained earlier if we can fit in those objects in survivor space we can reduce the frequency of full GCs.

Observation #2: Humongous Object

This is the most critical observation out of all. By tuning this parameter alone we reduced our garbage collection time by 20x! On analysis of the GC logs we saw most of the time JVM was busy allocating humongous objects.

This was really concerning as the majority of objects were humongous objects and JVM was really busy cleaning up those objects. As we know the fact that most of the objects on solr have a very small life span, somehow if we could put those objects in survivor space then it solves our problem. One way is to increase the region size so that each such object can fit within 50% of the region size. So we increased the region size to 16m, with we reduced old gen usage and started using survivor space, and with that, we reduced the GC pauses significantly.

Observation #3: JVM Version

G1 is the default GC with java 9, however, it’s present in Java 8 as well but it was in a very early stage and you have to be explicit while defining the same. We experimented against java 8 and java 11 specifically. Without tests, Java 11 performed significantly better as compared to java 8. Most of the performance gains were from improved garbage collection, and CPU resources were getting utilized better for serving more requests i.e more throughput and there was less pause i.e lower latency. Quantitatively speaking we saw 50% response time gains on j11 as compared with j8.

Okay, so we have upgraded JDK version. Let's see what new stuff is packed with the version upgrade. We went through the entire release note and bug fixes [5]. One particular improvement that caught our eye was JEP-307. This is regarding the parallelism of major GC fixed in java 10. Here is the thumb rule which we followed to configure the same:

-XX: ParallelGCThreads=n. where n = 5/8*(total CPU threads) this should not cross more than 8

-XX:ConcGCThreads=n where n = ParallelGCThreads/4

Observation #4: Solr Caches and it impacts on Old gen

As discussed earlier, all the objects on Solr are short-lived objects and in “ideal” scenarios it should not move to old gen space. But it’s not the case every time. Solr uses different types of solr caches that have a much longer time span. Few things which determine the old gen utilization are as following

Number of caches enabled and cache size: Higher the cache size means more old gen utilization and more aggressive major GCs
Auto warm count: Whenever a new searcher gets open, a higher auto warm count means older objects will get copied, which means fewer objects getting collected, i.e less aggressive mixed collection
Commit interval: with each commit, a new searcher gets opened and the older cache gets dropped and the hottest entries get copied i.e auto warmed. That means more the commit interval fewer objects are collected.

By considering the above 3 points and monitoring our cache utilization we configured our old gen space with -XX:G1NewSizePercent=x and -XX:G1MaxNewSizePercent=y. by adjusting out old gen space we could sustain the irregular surges in traffic efficiently by restricting the flow of extra short spanned objects to old gen space.

Conclusion

After just customizing our GC parameter we gained 23% improvement on response time and reduced CPU time for Garbage Collection by 95%. Here is how our response time and throughput stacks up.