Performancetuning of python ipc with ZeroMQMay 6, 2014
While having a look at different approaches of event parsing, e.g. logstash or fluentd, we also did our own experiments in that field using python.
Event processing tends to be a task quite suitable for parallelization. Add python to this fact and soon you’ll hit one drawback of python in a multicore environment: the infamous GIL (see here for the brilliant talk of David Beazley on this topic)
One of the test scenarios consisted of a torando tcp server receiving log events from different syslog sources. These were passed on to be analyzed via regular expressions and then finally stored in an elasticsearch backend. In order to utilize more than one core on a multicore machine, python offers the multiprocessing package which acts mostly as a drop in replacement for the regular threading package. One thing we noticed though, is that the performance of the default multiprocessing.Queue implementation leaves a bit to desired. So we tried some different approaches to tune the performance of ipc via a queue implementation as well as giving pypy a test spin to see if its jit can be benefical here.
Here are some numbers handling 200.000 events, event size ~1k (running on a 4 core X5450@3.00GHz xen guest).
The tcp server running in one process, analyzing and output in two other processes. Output is to /dev/null for these tests.
|native mp queue, no queue buffer, native pickeling||58 sec, 3400 eps||51 sec, 3900 eps|
|Native mp queue, no queue buffer, msgpack pickeling||44 sec, 4500 eps||34 sec, 5900 eps|
|native mp queue, queue buffer of 100 events, msgpack pickeling||39 sec, 5100 eps||27 sec, 7400 eps|
|ZMQ based mp queue, queue buffer of 100 events, msgpack pickeling||32 sec, 6250 eps||14sec, 14300 eps|