Redis Server— Debuging Performance
In this post I will share some techniques on how to debug performance problems.
Topics:
- Setup Redis on your bare metal local machine
- Run your service with Production configurations
- redis-cli monitor
- Batching commands
- Redis slowlog
- Redis Server CPU, RAM, SSD IOPS (IO Per Second)
What’s Not On The Menu:
- Networking Issues
- Overview about Redis Multi, Pipelines, Lua Scripts Pros/Cons/Best Practices — You need all of them.
- Monitoring your service: Prometheus/Grafana/K8S-Metrics
- Redis-Clusters
Setup Redis On Your Bare Metal Local Machine
I created a small NodeJS script to setup a single instance or cluster of redis-masters (3+): https://gist.github.com/stavalfi/f5c8c5373c1e619dfde53853b178935c
Check if you are using Redis-Cluster in production or not. To run single instance: ts-node <file-name>.ts 1
or ts-node <file-name>.ts 3
for redis-cluster of master-instances.
Why Setting Up Redis For This Test
Well, we are going to test performance so we want to *avoid* any network latencies, any virtualizations performance degregations: k8s, docker, aws EC2, VMs, ….
— — — — — — — — —
Run your service with Production configurations
Generate some requests: fake or not. Just make sure your service will call Redis at some point. More calls are better.
— — — — — — — — —
redis-cli monitor
In my service, I use multi, pipeline and lua-scripts. My goal is to see which redis-calls were made over time. At this point, I care less which operation made the redis-call (multi/pipeline/lua/…).
Open a new terminal and run `redis-cli -p <port> monitor`. It will print every command that has been sent to redis.
Note: In this section, it is best to use single redis instance (not cluster) because if you are using cluster, then you will need to monitor every instance seperatly.
The monitor command will print every call to redis in this format:
<unix timestamp in microseconds> [@ <client-ip>:<client-port>] “<redis-command> “…redis-command params…”
- Note: microsecond = second/1,000,000
We can make use of the monitor output to investigate how much calls were made using each command by some bash commands: `cat log | awk ‘{if(NR>1) print $4}’ | sort | uniq -c | sort -r`
Well, what did I just ran and what are we seeing?
cat
— readeverything.log
and send it to stdin of the next command using “|” (linux pipe).awk
— filter-out and string manipulation of each line ineverything.log
.{if(NR>1) ....}
— filter out the first line ineverything.log
because it is “OK” and is not part of the monitor output format.awk '{... print $4}'
— each line in the monitor output format is: <unix timestamp in microseconds>SPACE[@SPACE<client-ip>:<client-port>]SPACE“<redis-command>SPACE“…redis-command params…”.print $4
will print the 4'th word which is the <redis-command>sort
— sort in asc (small to big). this is needed for the next commanduniq
because we want to group by each command and print how much times it was called.uniq
will group by only same lines that apear one after the other so we need to sort before that.uniq
— count the re-appearance of same-linessort -r
— sort the ouput ofuniq
in desc order (big to little). (Note:sort
will evaluate each line as a hole and not only the first word in the line).
What Can We Do With That
See each line and try to understand if it makes any sense that some command has x
calls during your test. In my case, I called 80k times the “evalsha” redis-command which is a call to a custom LUA script.
At This point you can identify some bugs or too much calls and decrease the amount of calls to redis from your production enviroment and from custom LUA scripts.
Batching Commands
We can do even more with awk
. We can group by something stronger — <count> <redis-command> <redis-key>
: awk ‘{if(NR>1) print $4 $5}’
Example
Today I noticed that I called HINCRBY
(https://redis.io/commands/HINCRBY) too much times on the same key.
HINCRBY —
incr by x
a value inside a hashmap.
Problem — HINCRBY
does not allow to incr multiple keys in the same hashmap Why it’s a problem — In my usecase I need to call HINCRBY
at most 30 times everytime. This is alot.
Will MULTI
will solve it — With MULTI
I’m able to send multiple redis-commands at once to redis to avoid sending multiple redis-calls seperatly and it will be a network panelty. Even with MULTI
it was pretty slow command when you are doing it alot.
Solution? After some reading online, it seems that Redis created BITFIELD
to solve this problem (and other problems as well) — it is a datastructure over string to manipulate bits. It is pretty convient command and it’s easy to setup with that an array of counters. It does supports batching so I was able to decrease 1–30 calls to HINCRBY
to one call to BITFIELD
. I gained 10% performance optimizations :)
Redis SlowLog
Next, we want to see how much time it took Redid-server to run every redis-command in microseconds (microsecond = second/1,000,000).
Redis implemented a feature called slowlog
. it will print each command if it’s execution (inside the redis-server!!) took more than x
microseconds.
To set x
:
redis.conf
- Go to https://redis.io/topics/config
- Check which config file you need to read to check it’s options. Do that after you check your redis version. For Redis 6: https://raw.githubusercontent.com/redis/redis/6.0/redis.conf
- Search for slowlogs:
4. restart your redis service.
Set In Runtime
Notes:
- To delete all the slowlog until now, run:
slowlog reset
- Slowlog in Redis is a append-only to in-memory data-structure so it won’t descrease your performance. it may end up your free RAM so be carefull with that.
Now, run your service and then call:
Format: Each command is a different “slowlog”:
- Command-id (increasing as your can see: you see the end of the slowlog so the top is the newest log).
- UNIX timestamp and seconds when the log was made (or the command was executed — I’M NOT SURE!)
- How much time in micro seconds it took Redis to run your command. Lower is better. microsecond = second/1,000,000. Remember that there is also a network panelty outside of Redis and there is your CPU, RAM, Virtualization panelty. Redis may not be the core reason for the performance degradation.
- Redis-Command: <command-name> <…params…>
Note: the output is not sorted. It’s just an append-only data-strucutre for logs in-memory
I built a small script in NodeJS script to evaluate the output of `slowlog get -1`:
The above code will show me how much time it took Redis to execute each command in seconds:
You can manipulate the script to group by commands and evaluate AVG execution time of every command.
Redis Server CPU, RAM, SSD IOPS (IO Per Second)
The above techniques where used to improve performance by changing the code. Now, we will evaluate the CPU, RAM and your SSD inside your Redis-Server.
If you have prometheus or any other monitoring tool over your redis-server, use that to evaluate the CPU, RAM and SSD IOPS (SSD in case you are using persistency in Redis).
Locally or inside your redis-production server, you can use htop to evaluate CPU and RAM usage:
Run htop
inside your redis server when you run your test. check if each CPU Core (1–4 in my case) is increasing too much.
Redis is mosly single threaded but still, you need to make sure redis does not blow up your CPU or RAM.
Also, check if you can disable Swp by evaluating how much RAM you will need for your app in Redis and if it’s less than the amount of RAM in your Redis server, you can disable Swp. because as Redis-labs recommend, there is no point of transfering some of the RAM to the SSD and back if you have more RAM than you actually use at any given point in time.