Producing & Consuming (Much) More With Kafka

Stav Alfi
octopol-engineering
5 min readApr 27, 2021

--

On the last couple of days, Me and @AmiBenson are working on a new task which involves consuming large amount of data from multiple kafka topics into a NodeJS service.

Requirements:

  1. We need to consume in a rate 1–4k messages per second on 2 Kafka topics.

2. Proccessing/consuming Delay/Lag is unacceptable.

3. Single consumer* — The Data must be consumed in the same order which the producer is producing. Let’s leave that aside for now.

Tech & Env: Prometheus for monitoring, NodeJS, KafkaJS, this article is based on experiments on my macbook pro 2020(16g RAM and 8 CPU), Postgres (using docker postgres:13.2-alpine image and kafka using bitnami/kafka:2.7.0-debian-10-r114 image (with zookeeper)).

To improve the development velocity, we created a small mock-service which produces the same data we expect.

To sum it all up, we have 2 challenges:

  1. Be able to produce the same amount of data.
  2. Be able to consume this data without lag.

— — — — — — — — — — — — — — — — — — — — — — — — — —

Mock-Producer

What it does is generating array of messages forever and send each array to our Kafka broker. Each generation of a message involves some json → xml parsing so it may take some time as well.

First Attempt

How bad can it be?!

Well, pretty bad:

We were able to produce 700 messages per second.

Let me explain, internally, some kafka clients has an internal queue that collectes all the messages should be sent and actually sends it every `X`ms. This feature is linger. KafkaJS currently doesn’t support it so what it does is to really send every single message. This is extemrly slow.

Second Attempt — Produce In Batch

Let’s try to send messages in batch to our kafka broker. It will help us reach out to our kafka much less times and each time we will send much more data.

For start, let’s try to send batches of 700+- messages on every batch:

Now we are talking. We were able to send 4k messages per second.

But can we do better?

When we try increase the batch size from 700 to 1000+, KafkaJS throws an error:

Third Attempt — Compress The Batch

KafkaJS allow us to compress the data before we sent it. Let’s try it out and send 3000 messages pet batch:

Interstingly, there is no difference at all. Let’s try to increase the batch size just to be sure to 6,000 messages (Yes, GZIP let us increase the batch size from 700 to 6,000!)

We can even send more than 20,000 but the service performance was too slow.

We improved from 3900+- to 4100+-. I also tested it with batch = 10,000 and 20,000 and there was no addtional improvement.

To sum it all up, this attemp did not do any major performance improvement.

Addtional ways of improvement should come from the actual message-generation logic. for this article, this is out of scope.

We achieved our goal.

— — — — — — — — — — — — — — — — — — — — — — — — — — —

Consumer

Consume from kafka → save in PostgreSQL. In the middle, we parse XML messages to JSON. I bet that’s gonna heart.

We are using Typeorm.

First Attempt

How bad can it be?!

And again, we have the same problem: we are sending too many requests to our postgres, so we waste time on waiting until postgres will receive the request and send us a response:

All we were able to process is 500+- messages per second.

Second Attempt — Batch Inserts To PostrgeSQL

In our case, all the data we received is a representation of a state in a given time. There are no db-records updatesm, just inserts.

In this case, we can send batch INSERT queries to our DB. To do that, first we need to consume in batch from our Kafka. That’s easy with KafkaJS:

What could go wrong here?!

It should work, but there was a problem, each kafka-consumer-batch can contain between 1 to 60K (or even more) messages. depends on the amount of unread messages in the topic. It means that the DB query may contain 60k records. Postgres throw us errors about that. To over come them, we did our own batching mechanism using a little bit of lodash magic:

lodash-chunk: split up an array to sub-arrays

In the first section (23:36 → 23:40), we are processing 4K messages per second. But can we do more? Yes. We stopped my service and let the mock fill up our local kafka between 23:40 → 23:43. In 23:43 we run our service to see how it behaves when he needs to catch up.

We can see that the rate is now 6K + messages per second.

— — — — — — — — — — — — — — — — — —

Proccessing/Consuming Delay/Lag Is Unacceptable

Ny now, we are able to produce and consume in a rate of 1–4K messages per second. As requested.

Let’s combine both of the consumer and the producer to a single metric — lag-calculation:

Well, this is unacceptable! Luckily we are not in production yet :P

See you on the next chapter :)

--

--