Redis Server— Debuging Performance

Stav Alfi
octopol-engineering
7 min readAug 7, 2021

--

In this post I will share some techniques on how to debug performance problems.

Topics:

  1. Setup Redis on your bare metal local machine
  2. Run your service with Production configurations
  3. redis-cli monitor
  4. Batching commands
  5. Redis slowlog
  6. Redis Server CPU, RAM, SSD IOPS (IO Per Second)

What’s Not On The Menu:

  1. Networking Issues
  2. Overview about Redis Multi, Pipelines, Lua Scripts Pros/Cons/Best Practices — You need all of them.
  3. Monitoring your service: Prometheus/Grafana/K8S-Metrics
  4. Redis-Clusters

Setup Redis On Your Bare Metal Local Machine

I created a small NodeJS script to setup a single instance or cluster of redis-masters (3+): https://gist.github.com/stavalfi/f5c8c5373c1e619dfde53853b178935c

Check if you are using Redis-Cluster in production or not. To run single instance: ts-node <file-name>.ts 1 or ts-node <file-name>.ts 3 for redis-cluster of master-instances.

Run Single Redis Instance (Not a Redis-Cluster)

Why Setting Up Redis For This Test

Well, we are going to test performance so we want to *avoid* any network latencies, any virtualizations performance degregations: k8s, docker, aws EC2, VMs, ….

— — — — — — — — —

Run your service with Production configurations

Generate some requests: fake or not. Just make sure your service will call Redis at some point. More calls are better.

— — — — — — — — —

redis-cli monitor

In my service, I use multi, pipeline and lua-scripts. My goal is to see which redis-calls were made over time. At this point, I care less which operation made the redis-call (multi/pipeline/lua/…).

Open a new terminal and run `redis-cli -p <port> monitor`. It will print every command that has been sent to redis.

Note: In this section, it is best to use single redis instance (not cluster) because if you are using cluster, then you will need to monitor every instance seperatly.

make sure something is printed during your test

The monitor command will print every call to redis in this format:

<unix timestamp in microseconds> [@ <client-ip>:<client-port>] “<redis-command> “…redis-command params…”

  • Note: microsecond = second/1,000,000

We can make use of the monitor output to investigate how much calls were made using each command by some bash commands: `cat log | awk ‘{if(NR>1) print $4}’ | sort | uniq -c | sort -r`

format: <count-of-repeats> <redis-command>

Well, what did I just ran and what are we seeing?

  1. cat — read everything.log and send it to stdin of the next command using “|” (linux pipe).
  2. awk — filter-out and string manipulation of each line in everything.log.
  3. {if(NR>1) ....} — filter out the first line in everything.log because it is “OK” and is not part of the monitor output format.
  4. awk '{... print $4}' — each line in the monitor output format is: <unix timestamp in microseconds>SPACE[@SPACE<client-ip>:<client-port>]SPACE“<redis-command>SPACE“…redis-command params…”. print $4 will print the 4'th word which is the <redis-command>
  5. sort — sort in asc (small to big). this is needed for the next command uniq because we want to group by each command and print how much times it was called. uniq will group by only same lines that apear one after the other so we need to sort before that.
  6. uniq — count the re-appearance of same-lines
  7. sort -r — sort the ouput of uniq in desc order (big to little). (Note:sort will evaluate each line as a hole and not only the first word in the line).

What Can We Do With That

See each line and try to understand if it makes any sense that some command has x calls during your test. In my case, I called 80k times the “evalsha” redis-command which is a call to a custom LUA script.

At This point you can identify some bugs or too much calls and decrease the amount of calls to redis from your production enviroment and from custom LUA scripts.

Batching Commands

We can do even more with awk . We can group by something stronger — <count> <redis-command> <redis-key> : awk ‘{if(NR>1) print $4 $5}’

In some cases we might see we called the same redis-key too much using the same command.

Example

Today I noticed that I called HINCRBY (https://redis.io/commands/HINCRBY) too much times on the same key.

HINCRBY — incr by x a value inside a hashmap.

Problem — HINCRBY does not allow to incr multiple keys in the same hashmap Why it’s a problem — In my usecase I need to call HINCRBY at most 30 times everytime. This is alot.

Will MULTI will solve it — With MULTI I’m able to send multiple redis-commands at once to redis to avoid sending multiple redis-calls seperatly and it will be a network panelty. Even with MULTI it was pretty slow command when you are doing it alot.

Solution? After some reading online, it seems that Redis created BITFIELD to solve this problem (and other problems as well) — it is a datastructure over string to manipulate bits. It is pretty convient command and it’s easy to setup with that an array of counters. It does supports batching so I was able to decrease 1–30 calls to HINCRBY to one call to BITFIELD . I gained 10% performance optimizations :)

Redis SlowLog

Next, we want to see how much time it took Redid-server to run every redis-command in microseconds (microsecond = second/1,000,000).

Redis implemented a feature called slowlog . it will print each command if it’s execution (inside the redis-server!!) took more than x microseconds.

To set x :

redis.conf

  1. Go to https://redis.io/topics/config
  2. Check which config file you need to read to check it’s options. Do that after you check your redis version. For Redis 6: https://raw.githubusercontent.com/redis/redis/6.0/redis.conf
  3. Search for slowlogs:
for this test — set 0 to track on all commands

4. restart your redis service.

Set In Runtime

for this test — set 0 to track on all commands

Notes:

  1. To delete all the slowlog until now, run: slowlog reset
  2. Slowlog in Redis is a append-only to in-memory data-structure so it won’t descrease your performance. it may end up your free RAM so be carefull with that.

Now, run your service and then call:

-1 will print all the logs. 10 will print only the 10 last commands. choose -1 for this test.

Format: Each command is a different “slowlog”:

  1. Command-id (increasing as your can see: you see the end of the slowlog so the top is the newest log).
  2. UNIX timestamp and seconds when the log was made (or the command was executed — I’M NOT SURE!)
  3. How much time in micro seconds it took Redis to run your command. Lower is better. microsecond = second/1,000,000. Remember that there is also a network panelty outside of Redis and there is your CPU, RAM, Virtualization panelty. Redis may not be the core reason for the performance degradation.
  4. Redis-Command: <command-name> <…params…>

Note: the output is not sorted. It’s just an append-only data-strucutre for logs in-memory

I built a small script in NodeJS script to evaluate the output of `slowlog get -1`:

https://gist.github.com/stavalfi/285eebfc93872d7eb83e31286ae845a5

The above code will show me how much time it took Redis to execute each command in seconds:

You can manipulate the script to group by commands and evaluate AVG execution time of every command.

Redis Server CPU, RAM, SSD IOPS (IO Per Second)

The above techniques where used to improve performance by changing the code. Now, we will evaluate the CPU, RAM and your SSD inside your Redis-Server.

If you have prometheus or any other monitoring tool over your redis-server, use that to evaluate the CPU, RAM and SSD IOPS (SSD in case you are using persistency in Redis).

Locally or inside your redis-production server, you can use htop to evaluate CPU and RAM usage:

Run htop inside your redis server when you run your test. check if each CPU Core (1–4 in my case) is increasing too much.

Redis is mosly single threaded but still, you need to make sure redis does not blow up your CPU or RAM.

Also, check if you can disable Swp by evaluating how much RAM you will need for your app in Redis and if it’s less than the amount of RAM in your Redis server, you can disable Swp. because as Redis-labs recommend, there is no point of transfering some of the RAM to the SSD and back if you have more RAM than you actually use at any given point in time.

--

--