Stabilize your MariaDB performance using this simple step

While benchmarking MariaDB Server, I often observe a spike in performance at the start that eventually drops by 10-15% and then continues to remain constant at the said level. Concern is not the drawn down but the fluctuation that users may observe especially while running longer workload. Fortunately, found a way to resolve this but still investigating why it is so.

Setup

Machine Configuration:
- ARM: 64 vCPU (2 NUMA) ARM Kunpeng 920 CPU @ 2.6 Ghz
- x86: 64 vCPU (2 NUMA) Intel(R) Xeon(R) Gold 6151 CPU @ 3.00GHz
Workload:
- sysbench update-index uniform
Other configuration details here (+ thread_handling=pool-of-threads).
- Shared Buffer: 80GB
- Data: 74 GB
- Redo-Log: 20 GB
Storage: NVME SSD
- sequential read/write IOPS: 190+K/125+K
- random read/write IOPS: 180+K/65+K
MariaDB Version: 10.8 3 (work-in-progress. Wanted to use redo-log optimization).
Scalability: 1-1024 threads

Benchmarking

For benchmarking, we execute sysbench update-index (uniform) workload for 60 mins and monitor the throughput every second.

Observations:

ARM starts with approximately ~ 150K throughput but within a short period of time throughput drops down to 125K and then continues to remain stable at the new level there-by leading to a draw-down of 15% from the recent high.
x86 continues to show a lot of jitter in the performance.

I started doubting the furious flushing, filling of the redo log, etc… None of these indicators explained the draw-down.
Accidentally, I tried to clear the OS cache and things started to improve. Note: “command used to purge os cache: echo 3 > /proc/sys/vm/drop_caches”

Observations:

So clearing the OS cache, at a regular interval (in this case every 100 seconds) has a +ve effect on throughput in ARM case but it fails to have a +ve effect in the x86 case.
Let’s reconfirm this observation with a different configuration machines.

Observations:

With these different configuration machines (24 vCPU, 48 GB, 22K IOPS), regularly clearing the OS cache helps in reducing jitter for both ARM and x86.
Just to rule possible ambiguity, vm.swapiness is set to 1 on all benchmarked machines.

Conclusion

So clearing OS cache seems to have a +ve effect on performance. It helps reduce jitter. But it is not a blanket advice. As we saw above, it could have different effect based on the machine and configuration. It could be something worth trying in your environment to explore if that helps. Also, we are yet to trace why/what cached data clearance is helping reducing jitter.

If you have more questions/queries do let me know. Will try to answer them.

Written on March 7, 2022

All the product names, logo, trademarks and registered trademarks are property of their respective owners