Performancetuning of python ipc with ZeroMQ

May 6, 2014

While having a look at different approaches of event parsing, e.g. logstash or fluentd, we also did our own experiments in that field using python.

Event processing tends to be a task quite suitable for parallelization. Add python to this fact and soon you’ll hit one drawback of python in a multicore environment: the infamous GIL (see here for the brilliant talk of David Beazley on this topic)

One of the test scenarios consisted of a torando tcp server receiving log events from different syslog sources. These were passed on to be analyzed via regular expressions and then finally stored in an elasticsearch backend. In order to utilize more than one core on a multicore machine, python offers the multiprocessing package which acts mostly as a drop in replacement for the regular threading package. One thing we noticed though, is that the performance of the default multiprocessing.Queue implementation leaves a bit to desired. So we tried some different approaches to tune the performance of ipc via a queue implementation as well as giving pypy a test spin to see if its jit can be benefical here.

Here are some numbers handling 200.000 events, event size ~1k (running on a 4 core X5450@3.00GHz xen guest).
The tcp server running in one process, analyzing and output in two other processes. Output is to /dev/null for these tests.

Implementation python2.7 pypy-2.2.1
native mp queue, no queue buffer, native pickeling 58 sec, 3400 eps 51 sec, 3900 eps
Native mp queue, no queue buffer, msgpack pickeling 44 sec, 4500 eps 34 sec, 5900 eps
native mp queue, queue buffer of 100 events, msgpack pickeling 39 sec, 5100 eps 27 sec, 7400 eps
ZMQ based mp queue, queue buffer of 100 events, msgpack pickeling 32 sec, 6250 eps 14sec, 14300 eps

So using zmq and pypy gave us a considerable performance improvement. If you want to have a closer look you can find the sources for the zmq queue here