another man's ramblings on code and tech

Optimizing Logstash Performance

Logstash has a wide array of powerful filters, from the ones shipped with it to community maintained plugins. However, Logstash's speed can be compromised when these filters are not used properly. I encountered this problem while performance testing a setup we've been working on for a CRM company. During times of heavy load Logstash was capable of handling our approximately 40,000 logs per minute (or 2.5 million per hour). During heavy loads with backlogs, however, Logstash was not able to keep up. It would take hours for it to catch up, which was simply not acceptable. Here's some tips I have for optimizing Logstash performance.

1. Isolate performance problems

Before on can go about solving a performance problem one must figure out where the problem is in the first place. I knew that there were 3 possible problems with our Logstash installation:

  • Inputs plugins were blocked and could not take in any more at a faster speed
  • Filters were using too much CPU and taking too long to handle
  • Outputs were blocked due to I/O speeds or Elasticsearch

To identify what the problem was, I used three main tools:

  • strace to follow the Logstash input, filter, and output threads
  • htop to view the CPU usage of inputs, filters, and outputs
  • iotop to view the I/O of the system during these backlogs

You can find out more about these tools online, but suffice it to say you can learn quite a bit about your system with them. The strategy that I used was to remove plugins one by one while monitoring stats. If, after removing a plugin, I still saw the performance hit, then I knew it was one of the remaining plugins. Using these monitoring tools with this strategy I learned two things: that the bulk of the performance problem was in one type of filter file, and that Logstash was only using one thread to perform all filtering. I needed this multiline because I needed to concatenate Java exceptions into one log. What I learned with this was…

2. Never, ever, ever use the multiline plugin on the Logstash side

The multiline plugin requires that Logstash only use one worker thread for filtering. This is because order is very important when using the multiline filter; to concatenate a log with the last one that came in, you must know exactly what the last log was. So, whenever possible, avoid using this plugin. If you really need the multiline filter, then consider using Jason Wood's Log-Courier, a modification of Logstash Forwarder which includes plugin support to perform multiline operations on the forwarding side, not on the Logstash side. This allows you to increase Logstash's worker threads while still having multiline filters performed on incoming logs.

3. Drop needless logs

Another valuable way to increase Logstash performance is to preemptively drop needless logs. This can be done via the "drop{}" plugin on the Logstash side, or via Log-Courier's "filter" codec on the forwarding side. This is a pretty simple idea to grasp: send less logs to your centralizer and it will have less work to do in the log run. Remember, working with Logstash is a numbers game. The less work you do for each log, even if minimal, will have significant results in the long run.

4. Increase worker threads

Worker threads are allocated for performing filters. You can control this setting by adding the "-w" parameter to LS_OPTS in the Logstash service file located at /etc/init.d/ on Red Hat based systems. Explicitly telling Logstash to split up work among multiple threads will increase speed a lot. Obviously you should not let Logstash have so many threads that your OS comes to a grinding halt, so work on modulating this value to the specifications of your system.

5. Optimize regular expression

Again, working with Logstash filters is a numbers game. Even a slight reduction in regular expression processing will increase your speed significantly when you're taking in 40,000 logs per minute. I suggest using to figure out how efficient your current regex is and to optimize it more. I also suggest you read these awesome articles about how to optimize reges.

6. Selectively apply regular expression

Technically this idea should be in the last section, but it had such a great impact on my system that I felt it should be emphasized on its own. Selectively applying regex based on the size of a log is a great way to reduce your overall processing needs. In my case, 90% of logs were less than 400 characters long. 10%, however, were quite long: in the 1000-35,000 character range. I knew that any log above 1000 characters in length was either a long running query or an exception. And given at the top level of our filter logic a log would have to fail 9 steps of regex for it to be marked as generic, I knew it would be helpful to have these longer logs skip all but the regular expression for parsing LRQs and exceptions. I therefore used a range filter to determine the size of a log and give long logs a "long" tag. Based on whether the "long" tag existed I could selectively apply regex. Using this method increased the speed of our setup exponentially and it teaches a good lesson: try to apply only as much regex as necessary. You can use size, type, host, or any of your own fields to selectively apply regex, but its important that you have some method of filtering how many regular expressions each log has to go through in your filters.

And that's it! Those were the most powerful methods I found of optimizing our Logstash performance with backlogs. Remember, getting Logstash to run quickly is a numbers game: an optimization seemingly small when viewed one log at a time means huge returns in the long run.

Date: Aug 20 2015