NodeJS Devtools Performance (CPU) Optimizations — Example 1

Stav Alfi
octopol-engineering
4 min readOct 10, 2021

--

Today me and my cool colleague (@AmiBenSon https://github.com/amibenson) waited too much time to our app to load it’s first data and it made me thinking what the hell takes so much time to process events from pub/sub (Kafka) internally inside NodeJs process. We consume messages in batches. In the first 2–3 minutes, our procuder sends us alot of messages so each batch may contain 10k-30k messages. After that, batches may increase up to 100 messages at most. Some of the big batchs takes may take from 2s to process (too much :S) up to 30s processing.

That’s alot. And most of the processing is synchronous. that’s a huge problem because:

  1. NodeJs can’t react to readiness/liveness checks of monitoring systems (k8s).
  2. Prometheus can’t reach out to the pod to collect metrics :S
  3. other internal IO operations can’t occur concurrenctly.
Shit, we are in trouble

I’m considering my self a begginer with Devtools so I bet other folks here will do much great job explaining what I’m going to show and what other conclusions we can take from my small investigation.

Let’s Start

The first step is to identify the problems; What takes up most of the CPU.

Let’s run our nodeJs program (better to compile it first):

node --cpu-prof packages/csm-im/dist/src/index.js

Now we wait until something interesting happen (see the horrible screenshot above).

And we hit ctrl+c to exit our NodeJs process and stop the cpu-profiling (NodeJs will save the cpu-profiling in the same dir we run our command).

Go to Devtools:

Go to the performance tab

Click on:

Load profile…

And pick your generated cpu-profilling result:

ignore the details

Well, that’s the fun part. understanding what the hell do we even see here.

from the logs of my app, I saw that there were exactly 4 batches which my app processed during the cpu-profilling. I can only guess that the following blocks represent something related to them:

Each execution of a function is printed in here. Rule: The more long the function line is, it means more time to run. Maybe it did only I/O stuff and the process did not do much CPU stuff, we don’t know yet.

In my case, each yellow line:

represents a call of the nodejs-kafka-client to our processing function. every time the kafka-client sent 10k–30k events so that function call took time, as you can see.

if we look a little bit deeper, each line may have more lines underneath it. If a line has 4 lines underneath it, it means that line represent a function call that called 4 functions (maybe some/all of them are the same). We can measure how much each function call took in ms (milliseconds). Remember that a function may do something by it self without calling other functions and it may take time as well: for example running for-loop from 0 to 1,000,000.

In my screenshot, I scrolled down and deeper to see what are the most inner functions represent and how much time they took:

I saw that the following line:

/Users/stavalfi/projects/singer/packages/csm-im/dist/src/persist-im/persist-molly.service.js:27

occured alot and it took 11ms to run!

I went to the source code to see what that was:

I saw some other lines as well that pointed out something really bad is going wrong in this screenshot

It made me think:

con1 and con2 are strings which represents Dates (UTC): 2021–10–10T19:38:47.490Z.

To sort the array (for some reason we wanted to sort it — I don’t care why at this point), we are converting each date to an epoch so we can have a comparation in our sort.

All in all, we are doing tons of parsings and tons of sortings in every batch.

First Solution

I went over the code and our db and converted every representation of a date from string to number. Now I don’t need any parsings. We do have tons of sortings but at least no parsings.

You probably have alot of questions about what is the code doing, why I have dates in strings or what is the damage of sorting them as numbers. Let’s leave all these questions aside because they are irrelevant to this article.

Now let’s run the profiller again:

Looks much much better: 1500% improvement!! (20–30s → 1–2s)

The processing durations still very bad (1.5s+-) but at least we made a huge improvements.

--

--