Accelerating Kafka/Flink throughput
Last month, we built a proof of concept application in Flink SQL to process millions of messages. However, many production use cases require much more data, on the order of hundreds of millions of messages per hour. We have worked on bursty streams of data that require processing millions of events per second. Such volume spikes can occur in financial market at the beginning or end of the trading day. Sudden spikes exist in other industries, rush hour in traffic monitoring, video streaming around major sporting events, etc.
This is what our message volume looked like on the first iteration of the POC:
In this small cluster, we processed 1.2 million events per minute at peak. Our Flink cluster would scale up to 2 CFUs. In contrast, our Kafka cluster had hit its soft limit of 50 CFUs.
It was clear that our limiting factor was the amount of data we could push into Kafka. We were running up against network bandwidth limitations. The only way to get around this was to find a more efficient format for the events. JSON, though widely used, was not the most efficient option for our data. JSON is designed to be human readable, is every widely used, and very portable. For the purposes of our application, portability was not a required feature. Our entire application is written within Confluent Cloud, which also supports the protobuf format for messages. Protobuf messages are binary format and not self-describing, and both features help reduce the size of individual messages. The effort involved in maintaining and managing an external message specification used to serialize/de-serialize protobuf messages is borne entirely by Confluent Cloud, making it realtively simple for a developer to deploy a data pipeline with protobuf.
Without modifying anything else in the pipeline, or increasing the size of the Kafka cluster, we were able to achieve peak throughput of 5 million messages consumed, and 6 million produced per minute.
It’s important to note that the throughput improvement isn’t purely in Kafka. Flink’s Kafka Source operator now retrieves a smaller number of bytes per message. The sink operator writes out a smaller number of bytes per message as well. The efficiency gains percolate through stream processing, not just in the stream store.
Write us at info@greyker.com to learn more about our services and how we can help make your data streaming applications more efficient!