openGauss is an open-source relational database built for internet-scale and optimized for ARM architecture (retaining compatibility with x86). DB kernel is derived from PostgreSQL so PgSQL users will find a lot of things familiar but a series of optimizations are added to make it faster and support distributed setup (cluster ecosystem). It supports a row-based, column-based, and in-memory storage engine with full ACID compliance. The ecosystem and contribution from the community continue to grow.
MariaDB on the openEuler-ARM stack has shown promising results during the initial evaluation. (If you have missed that blog click here). Taking the assessment further we decided to evaluate the setup in a multi-numa environment since it represents enterprise deployment. The aim is to find out if things scale like other OS-ARM stacks and if the bottleneck continues to remain the same or if something else pops up on the openEuler-ARM stack.
openEuler is an opensource linux based operating system with a customized scheduler, io, libraries, etc.. It is optimized for ARM64 architecture and so it could be interesting to evaluate different enterprise software running on the openEuler-ARM stack. As part of this study, let’s explore how MariaDB that already has packages on ARM for different operating systems performs on openEuler.
While benchmarking MariaDB Server, I often observe a spike in performance at the start that eventually drops by 10-15% and then continues to remain constant at the said level. Concern is not the drawn down but the fluctuation that users may observe especially while running longer workload. Fortunately, found a way to resolve this but still investigating why it is so.
MariaDB is continuously evolving and in order to make it more scalable lot of age old, data constructs are being upgraded/revamped to the new age scalable constructs. This series of changes has helped it scale better than most of the open-source databases available. If you have been profiling MariaDB for quite some time now then it is important to ensure that you upgrade/widen your profiling scope to cover these new hot spots.
Majority of the users use cases are covered with sysbench variants of workload but there are users who have use-cases that could be best represented with TPCC or for that matter they would like to compare 2 databases using TPCC as a base standard. To help fill this gap I decided to evaluate TPCC using MariaDB on ARM.
MariaDB/MySQL default uses one thread per connection. This approach is generally good if the connection is active for a longer time. If the connections are short-lived then the cost of creating a connection could overshadow the cost of running the query. Also, with increasing scalability, OS-scheduling introduces more jitter. In this case, threadpool could act as a good alternative.
It is a well-known fact that a good compiler can emit an optimal code thereby allowing software to produce better throughput. Clang compiler popularity continues to grow and since I am working mostly on performance issues I am often asked if I have tried MariaDB-on-Arm with clang compiled binaries. Finally, I got some time and decided to try it out.
If you are using MariaDB for some time now you may have heard about adaptive flushing. “Adaptive” refers to a behavior where-in the algorithm auto-tunes itself based on certain parameters. In the new-generation world, it is called an “AI-based algorithm”. The same concept is now being applied to purge. Purge is a critical and resource-consuming operation so scheduling of purge along with user workload needs to be balanced. This is what exactly adaptive purge would do.
Tuning IO workloads is often challenging given it involves optimal usage of available IO bandwidth. MariaDB has multiple options to control this but often users tend to ignore the simpler options and tend to play around with complex or wrong options. In this article, we will take a step-by-step approach and see if we can tune an IO workload.
Increasing cores means more compute power but it is quite likely that this power is distributed across more numa nodes. Also, traditional arrangement where-in 1 cpu socket = 1 numa node too is changing. Already we have arrangements where-in cores from a single cpu socket are organized to form 2 numa nodes. Next generation softwares (including DB) needs to adapt to these changing arrangements to scale well on such multi-numa machines.
MariaDB has been releasing packages for the MariaDB Server on ARM for quite some time now. Infact, it was first in mysql space to get ported and optimize the server for ARM. It continues to evaluate its new performance features/releases/regression by testing them on ARM through community support.
During our last blog post we explored the mariadb on arm cluster performance in master-slave mode. In this blog post, we will explore mariadb on arm cluster performance in multi-master mode. MariaDB Server has an in-built support for Multi-Master setup using galera synchronous replication. Users who can’t afford slave lag continue to opt for multi-master solution that can also help aid write-scalability if load could be distributed (with reduced overlap).
MySQL is heavily tunable and some of the configuration can have significant impact on its performance. During my experiment for numa scalability I encountered one such configuration. Default configuration tends to suggest heavy contention for write workload but once tuned it helps scale MySQL by more than 2x.
Majority of users use databases in cluster form (either master-slave or multi-master). I often get a question that if I have tried benchmarking MariaDB server in Master-Slave Setup on ARM and if yes then what is slave lag like. So this time I decided to study the same using the basic experiment focusing on slave lag.
We all love MariaDB Server for its features and performance. Lately, it further improved it through a series of optimization (in 10.6) around locking, flushing, etc… So we decided to give it a try and also analyze its performance with more numa nodes (4 numa nodes). Article enlist, issues faced on the numa scalability front, solution adopted and how it helped hit the threshold of 1.6 millions QPS for query workload.
One of the most important things an user does on a regular basis is backup of the database. MariaDB offers Mariabackup that helps users to take full and incremental backup. Mariabackup is already supported on ARM so let’s explore its performance on ARM.
High Performance Computing (aka HPC) software often refers to software/application that needs significant computing power. Examples include database servers, application servers, big-data applications, etc.. Operation is not only limited to data processing but also involves heavy IO to the different channels. The software needs to ensure optimal overlap of CPU workload and IO workload keeping CPU busy while the next set of the data to process is being loaded by IO sub-system.
In previous post we analyzed MySQL performance on x86 and ARM using the Cost-Performance Model (#cpm) where-in Cost and other resources (except compute power) was kept constant allowing compute power to differ. ARM being cheaper got more compute power for the same cost. That was a user perspective story but developers were also interested to understand how MySQL scale if we provide the same computing power.
In the previous blog, we saw that users don’t lose anything by moving to MySQL on ARM. Infact, users are set to gain performance and save cost. In this blog post we will see performance numbers and analyze them to understand points where ARM scores.
MySQL on ARM is gaining consistent momentum and community is excited about it. Beyond performance, users also tend to explore other aspects like feature-set, ecosystem, support, etc… Let’s explore what users would gain/lose by moving to mysql on arm.
One of the key activities that a DBA does regularly is creating backup of an active database instance. So having a working backup tool in place especially for hot-backup is important when a user think of running DB-on-ARM. Even though Percona Xtrabackup (PXB) is not yet officially offered on ARM one can compile and successfully run it on ARM. This article will help explore the same.
Percona Monitoring and Management (PMM) is an effective tool in tracking stats of the running MySQL servers. Especially, the timelines capability helps users to get the picture of how the given stats changes over tenure of the workload. PMM official packages are not yet available on ARM but part of the PMM (importantly the stats collector aka exporter) could be compiled on ARM that would facilitate reporting stats of the MySQL instance running on ARM to PMM-Server there-by allowing it to track MySQL on ARM.
ARM processors are fast gaining popularity in the High Performance Computing (HPC) space with multiple cloud providers providing powerful and flexible variants of ARM instances to boot. Users are still in a dilemma about whether running MySQL on ARM is really effective? To help ease this out we introduce a Cost-Performance-Model (#cpm). Model is generic in nature to help normalize computing configuration based on cost and could be used for other HPC kinds of software too.
ARM introduced LSE (Large System Extensions) as part of its ARMv8.1 specs. This means if your processor is ARMv8.1 compatible it would support LSE. LSE are meant to optimize atomic instructions by replacing the old styled exclusive load-store using a single CAS (compare-and-swap) or SWP (for exchange), etc…. Said extensions are known to inherently increase performance of applications using atomics.
MySQL has multiple mutex implementations viz. wrapper over pthread, futex based, Spin-Lock based (EventMutex). All of them have their own pros and cons but since long MySQL defaulted to EventMutex as it has been found to be optimal for MySQL use-cases.
“Running MySQL on selected NUMA node(s)” looks pretty straightforward but unfortunately it isn’t. Recently, I was faced with a situation that demanded running MySQL on 2 (out of 4) NUMA nodes.
Managing global counters in a multi-threaded system has always been challenging. They pose serious scalability challenges. Introduction of NUMA just increased the complexity. Fortunately multiple options have been discovered with hardware lending support to help solve/ease some of these issues. In this blog we will go over how we can make Global Counter NUMA SMART and also see what performance impact each of this approach has.
ARM community that has developers from varied organizations has contributed some really good patches to MySQL. Most of them are awaiting acceptance. Blog is meant to analyze these patches along with their pros and cons. Hopefully this would help ease MySQL/Oracle to accept these long-awaited patches.
Often we observe jitter in MySQL throughput while running benchmark. Same could be true even for users but there are so many other things to look for (especially IO bottleneck) that the aspect we plan to discuss today may get overlooked. In this article we will discuss one such reason that could affect the MySQL performance.
InnoDB uses mutexes for exclusive access and rw-locks for the shared access of the resources. rw-locks are used to control access to the common shared resources like buffer pool pages, tablespaces, adaptive search systems, data-dictionary, informaton_schema, etc… In short, rw-locks play a very important role in the InnoDB system and so tracking and monitoring them is important too.
By and large this would be a topic of interest for most of us including me when I started to explore this space. Before we dwell into the numbers let’s first understand some basic differences between 2 architectures. Beyond being CISC and RISC let’s look at the important differences from MySQL perspective.
I am sure most of you may have this question. In fact, I too had it before I started working on #mysqlonarm initiative. What does it take to run MySQL on ARM? Does it really work? What about dependencies? What kind of performance does it have? What about support? Is there enough community support? This could go on…..
ARM processors are everywhere. It is quite likely some of you may be reading this blog from an ARM powered device. Phone, IoT devices, consumer and home appliances, health-care devices, all are powered by ARM processors. ARM processors are known to be power efficient and so most of these devices that demands a long recharge cycle but less processing power started using them.