Controlling Memory Usage By Forcing Release of Freed Memory
Tableau Server supports both Windows and Linux and provides on-premise customers (i.e. those who choose to host their own Tableau Server instances) the flexibility to pick an operating system of their choice. Our hosted offering, Tableau Online, and our free platform, Tableau Public, both run on Linux.
As we continue to support both operating systems, there are times when we run into interesting differences between how Tableau Server operates on Windows versus Linux. This blog post is an attempt to shed light on one such difference related to memory usage patterns. We will talk about how a single function call dramatically reduced the memory footprint of Tableau Server processes on Linux and helped us improve stability and availability of our services.
The image below is based on data from one of our Tableau Online production pods. It shows a trend of the number of times applications were restarted due to excessive memory consumption. Notice the significant drop in restarts after the code was deployed.
Memory usage
Within Tableau Server the component that is responsible for rendering the viz (the vizqlserver process) can be memory intensive. Specifically, VizQL sessions are stored in memory and they can take up a lot of space. Memory intensive processes within Tableau Server have a watchdog thread that continually monitors memory usage and terminates processes as needed to keep the overall node healthy. A node in a Tableau Server cluster typically hosts multiple processes, so these restarts are sometimes necessary to keep a single process from using up all the resources on that node. Having said that, a restart due to excessive memory usage is undesirable because it interrupts the workflow of users. To prevent restarts of processes in production (Tableau Online and Tableau Public), we have had to provision machines with very high memory capacity.
Session expiry not releasing memory on Linux
When investigating a customer issue related to memory usage, we noticed a marked difference between memory patterns on Windows as compared to Linux. On Windows, when a VizQL session expired, we saw the memory usage of the vizqlserver process drop significantly. This was expected since sessions are memory intensive. Interestingly, on Linux, we did not observe this large drop in memory.
Our initial thought was that sessions were not getting cleaned up as expected on Linux but after some more troubleshooting we ruled out that theory. What we were left with was a mystery. The same code was behaving very differently when it came to releasing memory back to the operating system. On Windows, calling delete/free reduced the memory footprint of the process and returned memory back to the OS. On Linux, it did not.
malloc_trim
An online search revealed that other folks have run into similar memory release issues with Linux. By default, glibc’s allocator implementation does not always release freed memory on the heap back to the OS. This can have major memory usage implications especially for long running applications. There is a Linux specific function called ‘malloc_trim’ that can be used to force release of freed memory from the heap back to the operating system. We decided to run some experiments with malloc_trim to see if it would help.
Using gdb, we were able to call malloc_trim directly (without making any code changes) and the results were very encouraging. After VizQL sessions expired, calling malloc_trim did indeed force the return of a large chunk of memory back to the OS. With malloc_trim, the memory profile on Linux matched Windows more closely. We were excited to find out what impact this would have on memory related restarts in production and whether it would help us reduce memory capacity of our hosts.
Results
Engineers from various teams got together quickly to implement the required code changes to call malloc_trim periodically from the code. With the vizqlserver executable showing such promising results, there was good reason to call malloc_trim in other memory intensive processes as well. The changes themselves were straightforward and were pushed out to production (Tableau Online) as part of the next release.
As you saw in the image above, the results so far have been amazing! The number of process restarts due to excessive memory consumption has dropped dramatically. Additionally, average memory footprint over time has also reduced significantly. The image below shows the average memory consumption of the major Tableau Server processes on different nodes in one of our production pods. Each node is highlighted using a different color.
What’s next?
We are really excited about the positive impact that these changes are having on the availability and stability of our hosted services. We implemented these changes on Tableau Public as well and it has helped us deal with the COVID19 traffic surge that we recently saw on Public.
We encourage our self-hosted customers to continue monitoring availability and resource usage. If they find that memory usage on Linux is dramatically lower after upgrading to 2020.3 (or newer), they could potentially consider reducing their instance sizes.
Histórias relacionadas
Subscribe to our blog
Receba em sua caixa de entrada as atualizações mais recentes do Tableau.