openGauss on ARM

openGauss is an open-source relational database built for internet-scale and optimized for ARM architecture (retaining compatibility with x86). DB kernel is derived from PostgreSQL so PgSQL users will find a lot of things familiar but a series of optimizations are added to make it faster and support distributed setup (cluster ecosystem). It supports a row-based, column-based, and in-memory storage engine with full ACID compliance. The ecosystem and contribution from the community continue to grow.

Given the DB is optimized for ARM, it sparked my interest and so I decided to evaluate it.

Key Features

openGauss was derived from PostgreSQL 9.2 but post that many features were added to openGauss to make it more enterprise-ready.

  • Multi-threaded (vs multi-process): One of the most discussed features of PgSQL is the need for multi-threading. Fortunately, openguass has been ported to use a multi-threaded model. Each process in PgSQL maps directly to a thread.
  • Thread Pool: PgSQL still doesn’t have a thread pool and needs an additional component like pgpool or so. openGauss has an inherent thread pool allowing it to scale and handle the multiple active short-lived connections effectively.
  • Incremental Checkpoint: Given the time it takes for the time-based checkpoint this feature comes as a savior with continuous checkpointing (more on lines of MySQL).
  • Doublewrite: To avoid half-cooked pages (again on lines of MySQL).
  • Optimization of Global Counter: Most global counters in opengauss use thread local storage copy that is then aggregated to the main counter.
  • NUMA scalability: openguass has been designed/optimized/tuned to ensure it scales well with multiple NUMA nodes and increases scalability and that is quite evident with broader spread use of thread local storage, multiple threads, threadpool, etc… allowing it to scale better with multi-numa node machines.

While a lot of these features are done a lot more is being done especially around IO.


  • Machine Configuration:
    • ARM: 96 vCPU (4 NUMA) ARM Kunpeng 920 CPU @ 2.6 Ghz
  • Workload (using sysbench):
    • CPU bound workload
    • pgbench: select, update workload
    • sysbench: read-only, read-write, write-only workload (pattern: uniform, zipfian)
  • Other configuration details here
    • Shared Buffer: 80GB
    • Data: 32 GB (pgbench)
    • Data: 75 GB (sysbench)
  • Storage: SATA SSD
    • sequential read/write IOPS: 65K/44+K (8K blocks)
    • random read/write IOPS: 51+K/38+K (8K blocks)
  • openGauss Version: 3.0.1 [compile from source]
  • Scalability: 1-1024 threads: 21/42/84 threads for server and 3/6/12 threads for sysbench (core-binding).


Let’s first explore the benchmark using some standard suits like pgbench, and sysbench. We will then discuss more specific configuration, NUMA scalability, threadpool, how it performs compared to pgsql, etc…



  • pgbench read-only workload continues to scale well with increasing scalability but does hit a contention after a certain point that indeed suggests a scope of further improvement.
  • pgbench update workload continues to scale well with increasing scalability.

drop in the read-only workload of pgbench could be attributed to the memory allocation routine.
+ 3.40% 6236 worker gaussdb [.] GenericMemoryAllocator::AllocSetAlloc<true, false, false>



  • sysbench read/update workload continues to scale linearly before hitting a threshold, post that point the performance is almost flat (better than going down due to increased contention).

numa scalability

opengauss scales well with the increasing scalability. Let’s now explore how it performs with increasing NUMA nodes.


  • opengauss is well optimized for NUMA nodes. With increasing NUMA for different variants of workload, it continues to scale well. Update workload could be further improved but like other databases, with 4 NUMA, performance doesn’t regress.

effect of threadpool

threadpool is best suited for the environment where there are a lot of short-lived connections. Let’s see the effect of threadpool.


  • threadpool is ideally expected to improve performance and it does in opengauss case too but only for the pgbench-ro workload. For all other workloads including sysbench ro workload, it failed to show the improvement. Infact, for update workload, a serious regression is observed with threadpool.

pgsql vs opengauss

Opengauss is inherited from PgSQL so it would be interesting to see how these both perform. We tried our best to match the configuration using the latest release (PgSQL 14.5/OpenGauss 13.0.1) of both the databases and on comparable servers (24 ARM cores, 48 GB of memory, and similar IO volumes).


  • Read-only workload performance looks comparable.
  • Update workload performance seems to show a significant difference despite of the multiple enhancement in opengauss. (Note: opengauss is operated with threadpool and incremental checkpoint turned off so things are comparable).


Based on the overall evaluation, numa optimization, features, etc.. opengauss seems promising. Given it is relatively new, there is ample scope for improvement.

If you have more questions/queries do let me know. Will try to answer them.

Written on August 23, 2022
All the product names, logo, trademarks and registered trademarks are property of their respective owners