My blog post earlier this month about what causes Ruby memory bloat, and about a way to potentially reduce memory usage by 70%, triggered quite a storm in the community! It's clear that I stumbled upon a significant pain point.

What's on the grapevine?

Here are the community highlights I've observed so far.

Returning to Ruby?

Some people on Twitter were talking about returning to Ruby if the memory usage issue is resolved, because that's what drove them away in the first place.

Under discussion by Ruby core team

Sam Saffron opened an issue in the Ruby bug tracker to discuss my research, and whether the Ruby core team is willing to merge this work upstream. The Ruby core team has no objections so far, but errs on the side of caution.

Caught the attention of the malloc_trim developer

Carlos O'Donell from Red Hat – the person who wrote malloc_trim()joined the discussion and said that he's very interested in the results. Maybe we'll see more improvements in the glibc memory allocator in the future?

First test results are coming in

Noah Gibbs posted more details about his performance benchmark of my patch. He observed a small (1%) performance improvement.

Vladimir Dementyev tested my patch using Action Cable and blogged about the results. He observed a memory usage reduction similar to setting MALLOC_ARENA_MAX=2, but also observed a performance hit of 12%-21%.

Sam Saffron benchmarked my patch using Discourse. His results show the following, when comparing default Ruby with my patch:

  • Memory usage is clearly reduced: from 4.7 GB to 3.9 GB.
  • About 7% performance reduction in the 99th percentile performance benchmark.
  • Average CPU usage is slightly higher: about 0.5% increase.
  • Jemalloc fares better than the malloc_trim patch, both w.r.t. memory usage and performance.

My take on these results:

  • So there is a performance impact after all. Noah Gibb's performance benchmark turns out to be unrepresentative.
  • The performance problem is probably solvable. The most obvious solution is to call malloc_trim() fewer times, e.g. once every 3 GCs.

Still room for improvement

I've had a heated discussion with Nate Berkopec about the implications of my research. We concluded that, although malloc_trim() helps a lot, we can probably reduce memory usage even futher if we use alternative memory allocation strategies to reduce fragmentation.

Indeed: Sam Saffron's tests seem to indicate that jemalloc still outperforms malloc_trim(), both in performance and memory usage.

Using jemalloc as Ruby's default is a bit problematic. There was a heated discussion in the Ruby bug tracker about this, but in the end no decision was made. The main issue raised is the fact that memory usage only reduces when using jemalloc 3; memory usage is still high when using jemalloc 5. Nobody knows why, so that makes the choice of defaulting to jemalloc very dodgy.

It is potentially possible to write a custom memory allocator especially for Ruby. But that is a significant amount of work, so no plans have been made so far to start such an effort.

How you can help with testing

I'm calling for the community to help me with testing in their own production environments. We need more answers to the following questions:

  1. Does malloc_trim() really reduce memory usage? If so, by how much?
  2. Does malloc_trim() have a performance impact? If so, by how much?
  3. Bonus question: how does malloc_trim() compare to jemalloc?

The answers to the above questions are probably application-specific, which is why the more people post their test results, the better.

(Note that dev/staging is not enough; we need production data.)

Two methodologies

In this blog post I've detailed two methodologies for collecting data to answer questions 1 and 2. (If you can also answer bonus question 3, then great! But how to do that is left as exercise for you)

Methodology 1 overview (preferred):

  1. Install a patched Ruby on half of your production servers.
  2. Compare memory usage and performance between the two halves.

Methodology 2 overview:

  1. Install a patched Ruby on all your production servers.
  2. Compare memory usage and performance data from before and after having patched Ruby.

Methodology 1 is preferred your data won't be polluted by natural traffic variances over time, but methodology 2 is more convenient to setup.

Step 1: preparation

1.1 Upgrade to Ruby 2.6

We want to compare a normal Ruby 2.6 to Ruby 2.6 with the trim patch applied. We don't want to compare with older Ruby versions. So if you aren't on Ruby 2.6 already, then please be sure to upgrade to 2.6.

Even if you've chosen methodology 1, you should upgrade all your servers to 2.6. Otherwise it wouldn't be a fair comparison.

After upgrading to Ruby 2.6, be sure to redeploy your app, because bundle install will need to be rerun.

Also be sure to reconfigure your app servers (e.g. SystemD service units, or Passenger settings) to make use of Ruby 2.6.

If you're using Passenger, then modify the app's passenger_ruby setting. Here's an example, assuming you're using Passenger for Nginx:

passenger_ruby /path-to-your-ruby-2.6;

Note when using RVM: be sure to point to your RVM Ruby wrapper script, not the raw binary. Read the Passenger documentation to learn how to obtain that path.

1.2 Disable swap

In order to be able to measure memory usage accurately, I highly recommend disabling swap.

To disable swap immediately, run this command. Note however that its effect only persists until you reboot the server.

sudo swapoff -a

To disable swap permanently, you also need to remove the swap partition from /etc/fstab.

1.3 Setup performance & memory usage monitoring

If you don't have it yet, then be sure to setup performance and memory usage monitoring.

The goal is to see how performance and memory usage changes over time. So any solution that is capable of producing a graph is fine. The data retention time should be a couple of weeks.

  • Performance monitoring should collect information about response times over time (average, 99th percentile).
  • Memory usage monitoring should collect information about Ruby process RSS memory usage over time.

If you've chosen methodology 1, then be sure to assign each half of your cluster to different monitoring projects, so that you can compare the data from both halves instead of having the data merged together.

If you've chosen methodology 2, then once you have a monitoring solution set up, let it collect data for a week or so.

Here are some monitoring services that are able to monitor performance and memory usage:

  • AppSignal (see this guide)
  • New Relic
  • Heroku's dashboard
  • Skylight
  • Scout

Step 2: install Ruby 2.6 with trim patch

If you've chosen methodology 1, then install Ruby 2.6 with trim patch on half of your servers.

If you've chosen methodology 2, then install it on all your servers.

Be sure to restart your app servers after installing the patched Ruby.

If you installed Ruby through RVM

curl -SLO
rvm install 2.6.2 --patch ruby_gc_malloc_trim.patch

If you installed Ruby through rbenv

curl -SLO
rbenv install --patch 2.6.2 < ruby_gc_malloc_trim.patch

If you installed Ruby through chruby

If you used ruby-install:

ruby-install -p ruby 2.6.2

If you used ruby-build:

curl -SLO
ruby-build --patch 2.6.2 < ruby_gc_malloc_trim.patch

If you installed Ruby manually from source

Download the Ruby source code, extract it and enter the source root:

curl -SLO
tar xzf ruby-2.6.2.tar.gz
cd ruby-2.6.2

Download the patch and apply it:

curl -SLO
patch -p1 < ruby_gc_malloc_trim.patch

Finally, configure, build and install the source according to the settings you used before:

./configure [your previous configure settings here]
make -j2
sudo make install

If you installed Ruby through any other means

If you installed Ruby through any other means (e.g. through Debian/RPM packages), then you'll have to migrate away from them, and towards either RVM, rbenv, chruby or manually installing from source.

If you're using Docker, then you'll have to install Ruby inside the container using either RVM, rbenv, chruby or manually from source.

Step 3: monitor and interpret the results

Let this new setup run for a week or so.

If you've chosen methodology 1, then compare the performance and memory usage data between the two halves of your cluster. What's the difference?

If you've chosen methodology 2, then have a look at the performance and memory usage before and after installing the patch. What has changed?

Step 4: share your results

I would be grateful if you can blog about your results and tell me about it. If you're in need of some inspiration: this is what I consider a good blog post.