Monday, November 20, 2017

Sysbench: IO-bound, InnoDB, a small server, MySQL 5.0 to 8.0

This has results for IO-bound sysbench with InnoDB and MySQL versions 5.0, 5.1, 5.5, 5.6, 5.7 and 8.0. The results here don't always match the results from in-memory sysbench. One source of variance is the differing performance of the SSDs used on the i3 and i5 NUC.

There are four types of tests: write-heavy, scan-heavy, point-query and inlist-query. Performance within each group is usually similar. Reading the summary below the results for modern InnoDB are mostly an improvement. This is good news because these tests are limited to low-concurrency and the many improvements in modern InnoDB don't help here.

tl;dr - performance for InnoDB in 8.0.3 compared to 5.0.96
  • Full index scans were faster on the i5 NUC and both slower & faster on the i3 NUC. Shorter scans from read-write and read-only were always faster.
  • QPS was about the same on update-inlist, but decreased on update-one, update-index, update-nonindex and insert.
  • QPS increased on point-query
  • QPS increased on inlist-query for the i5 NUC and frequently increased for the i3 NUC
tl;dr - performance for InnoDB in 8.0.3 compared to 5.6.35
  • Full index scans were faster on the i5 NUC and slower on the i3 NUC. Shorter scans from read-write and read-only were always faster.
  • QPS was about the same on update-inlist, decreased on update-one, increased on update-index, about the same on update-nonindex and decreased on insert.
  • QPS increased on point-query
  • QPS increased on inlist-query for the i5 NUC and frequently increased for the i3 NUC

Configuration

The tests used InnoDB from upstream MySQL versions 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.3. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines including: set innodb_purge_threads=1 to reduce mutex contention, disabled SSL for MySQL 5.7, used the same charset and collation. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream. The built-in InnoDB is used for MySQL 5.0 and 5.1.

Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. The database is larger than RAM. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is larger than RAM.

The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is much smaller. Assuming I will always disable turbo boost in the future I might as well stick with an i3 NUC for my next purchase.

Results


All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below.The tests are explained here.

The graphs and tables that follow present the relative QPS.  The relative QPS is the QPS for the test divided by the QPS for the base case. The base case is the QPS for InnoDB from either MySQL 5.0.96 or 5.6.35. When the relative QPS is less than one than the engine is slower than the base case.

Graphs

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so I provide extra graphs here. The tests are explained here. The graphs have the QPS relative to InnoDB 5.0.96.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are for update-nonindex and update-index. To keep this from getting out of hand I save the analysis for the per-test sections.
The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are for read-write with range-size=100, read-only with range-size=10,000 and the full scan with the QPS relative to InnoDB  5.0.96. The results for read-only and the full scan are from the tests run after the write-heavy tests. 

Scan performance for InnoDB improved a lot starting in MySQL 5.7 but QPS and throughput weren't always better in modern InnoDB. The per-test sections have more details.
The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS relative to InnoDB 5.0.96.

The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS relative to InnoDB 5.0.96.
full scan

The full scan of the PK index is done before and after the write-heavy tests. Full scan on the i5 NUC is much faster starting with InnoDB 5.7. For the i3 NUC that isn't true. Results for InnoDB 5.5 are an outlier, and 5.5 is odd on many tests. Overheads that impact this include whether page write-back is in progress and fragmentation that causes random IO and/or prevents read ahead. This impact might be larger on the i3 NUC because it has a slower SSD and is less likely to finish write-back before the scan test starts, and more sensitive to the loss of IO capacity from write-back.

The scan throughput for InnoDB in MySQL 8.0 relative to 5.0:
  • Is 3.47 on the i5 NUC and 0.44 on the i3 NUC for the scan before write-heavy tests
  • Is 2.78 on the i5 NUC and 1.41 on the i3 NUC for the scan after write-heavy tests
The scan throughput for InnoDB in MySQL 8.0 relative to 5.6:
  • Is 1.40 on the i5 NUC and 0.32 on the i3 NUC for the scan before write-heavy tests
  • Is 1.40 on the i5 NUC and 0.82 on the i3 NUC for the scan after write-heavy tests
full scan - before write-heavy
i3 NUC          i5 NUC
Mrps    ratio   Mrps    ratio   engine
0.737   1.00    0.714   1.00    5.0.96
0.758   1.03    0.718   1.01    5.1.72
1.280   1.74    2.176   3.05    5.5.51
1.019   1.38    1.766   2.47    5.6.35
0.320   0.43    2.424   3.39    5.7.17
0.327   0.44    2.480   3.47    8.0.3

full scan - after write-heavy
Mrps    ratio   Mrps    ratio   engine
0.533   1.00    0.898   1.00    5.0.96
0.583   1.09    0.910   1.01    5.1.72
1.523   2.86    2.132   2.37    5.5.51
0.914   1.71    1.786   1.99    5.6.35
0.829   1.56    2.406   2.68    5.7.17
0.751   1.41    2.500   2.78    8.0.3

update-inlist

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is ~0.94. Relative to MySQL 5.6 it is 1.08 on the i5 NUC and 0.94 on the i3 NUC. Compared to the other update-only tests, this one spends more time in the storage engine per update statement and it is less sensitive to new overheads in parse and optimize.

The regression here for InnoDB 5.7 and 8.x is smaller than for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
211     1.00    359     1.00    5.0.96
208     0.99    378     1.05    5.1.72
190     0.90    342     0.95    5.5.51
212     1.00    307     0.86    5.6.35
213     1.01    392     1.09    5.7.17
199     0.94    335     0.93    8.0.3

update-one

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.52 on the i5 NUC and 0.42 on the i3 NUC. Relative to MySQL 5.6 it is 0.73 on the i5 NUC and 0.66 on the i3 NUC. While this is an IO-bound benchmark, this test is not read IO-bound because it updates the same row and that data should remain cached.

The result here for InnoDB 5.7 and 8.x is similar to the result for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
14308   1.00    15488   1.00    5.0.96
12052   0.84    13224   0.85    5.1.72
 8584   0.60    10431   0.67    5.5.51
 9120   0.64    10988   0.71    5.6.35
 7839   0.55     9570   0.62    5.7.17
 5992   0.42     8046   0.52    8.0.3

update-index

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.80 on the i5 NUC and 0.59 on the i3 NUC. Relative to MySQL 5.6 it is 1.04 on the i5 NUC and 1.44 on the i3 NUC. I assume that new CPU overhead in parse and optimize explains the regression from 5.0.96 to 8.03. This test is more IO-bound than the update-nonindex test that follows because it must do secondary index maintenance. The i3 NUC has a slower SSD and that might explain why the i3 NUC regression here is worse than the i5 NUC, but they are similar on update-nonindex.

The result here for InnoDB 5.7 and 8.x is similar to the result for in-memory sysbench on the i5 NUC. But the i3 NUC regression is larger here.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
537     1.00    1103    1.00    5.0.96
518     0.96    1076    0.98    5.1.72
186     0.35     585    0.53    5.5.51
220     0.41     850    0.77    5.6.35
312     0.58     924    0.84    5.7.17
319     0.59     883    0.80    8.0.3

update-nonindex

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.88. Relative to MySQL 5.6 it is ~0.95. The small regression might be explained by new CPU overheads in modern MySQL.

The regression here for InnoDB 5.7 and 8.x is smaller than for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1665    1.00    2882    1.00    5.0.96
1616    0.97    2759    0.96    5.1.72
1471    0.88    2582    0.90    5.5.51
1526    0.92    2683    0.93    5.6.35
1556    0.93    2773    0.96    5.7.17
1458    0.88    2541    0.88    8.0.3

delete

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.36 on the i5 NUC and 1.15 on the i3 NUC. Relative to MySQL 5.6 it is 0.94. The small regression after 5.6.35 might be explained by new CPU overheads in modern MySQL.

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2242    1.00    3527    1.00    5.0.96
2256    1.01    3551    1.01    5.1.72
2055    0.92    3334    0.95    5.5.51
2729    1.22    5068    1.44    5.6.35
2763    1.23    5115    1.45    5.7.17
2576    1.15    4794    1.36    8.0.3

read-write with --range-size=100

The QPS for InnoDB in MySQL 8.0 relative to both 5.0 and 5.6 is ~1.08 on the i5 NUC and ~1.02 on the i3 NUC.

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2089    1.00    2826    1.00    5.0.96
2043    0.98    2695    0.95    5.1.72
1810    0.87    2606    0.92    5.5.51
2028    0.97    2799    0.99    5.6.35
2208    1.06    3077    1.09    5.7.17
2135    1.02    3038    1.08    8.0.3

read-write with --range-size=10000

The QPS for InnoDB is ~1.50 relative to both MySQL 5.0 and 5.6. InnoDB range scans are faster starting in MySQL 5.7. The benefit here is larger than for the test above because the range scan here is longer (10,000 rows vs 100).

Modern InnoDB was also faster than older InnoDB on in-memory sysbench but the improvement here is larger.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
168     1.00    219     1.00    5.0.96
167     0.99    216     0.99    5.1.72
170     1.01    214     0.98    5.5.51
169     1.01    217     0.99    5.6.35
257     1.53    335     1.53    5.7.17
247     1.47    328     1.50    8.0.3

read-only with --range-size=100

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.32 on the i5 NUC and 1.10 on the i3 NUC. Relative to MySQL 5.6 it is 1.32 on the i5 NUC and 1.10 on the i3 NUC.  The speedup here for InnoDB 5.7 is larger than the speedup for InnoDB 5.7 on the read-write test with range-size=100.

Modern InnoDB was slower than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2903    1.00    3640    1.00    5.0.96
2831    0.98    3638    1.00    5.1.72
2892    1.00    3668    1.01    5.5.51
2885    0.99    3687    1.01    5.6.35
3272    1.13    4954    1.36    5.7.17
3207    1.10    4794    1.32    8.0.3

read-only.pre with --range-size=10000

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is ~1.40. Relative to MySQL 5.6 it is ~1.50. InnoDB range scans became faster starting in MySQL 5.7. The benefit here is larger than for the test above because the range scan here is longer (10,000 rows vs 100).

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
149     1.00    182     1.00    5.0.96
148     0.99    181     0.99    5.1.72
143     0.96    178     0.98    5.5.51
139     0.93    173     0.95    5.6.35
212     1.42    273     1.50    5.7.17
206     1.38    264     1.45    8.0.3

read-only with --range-size=10000

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.48 on the i5 NUC and 1.40 on the i3 NUC. Relative to MySQL 5.6 it is ~1.50. This is similar to the result above for read-write with range-size=10000.

This test is run after the write-heavy tests. The test in the previous section was run before the write-heavy tests. They have similar QPS and b-tree fragmentation didn't reduce QPS on this test.

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
149     1.00    181     1.00    5.0.96
149     1.00    179     0.99    5.1.72
143     0.96    178     0.98    5.5.51
140     0.94    178     0.98    5.6.35
210     1.41    275     1.52    5.7.17
208     1.40    267     1.48    8.0.3

point-query.pre

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.07 on the i5 NUC and 1.03 on the i3 NUC. Relative to MySQL 5.6 it is 1.11 on the i5 NUC and 1.06 on the i3 NUC.

Modern InnoDB is slightly faster than older InnoDB here but was much slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3986    1.00    5478    1.00    5.0.96
3839    0.96    5276    0.96    5.1.72
3854    0.97    5092    0.93    5.5.51
3879    0.97    5250    0.96    5.6.35
4264    1.07    6198    1.13    5.7.17
4124    1.03    5873    1.07    8.0.3

point-query

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.08 on the i5 NUC and 0.98 on the i3 NUC. Relative to MySQL 5.6 it is 1.11 on the i5 NUC and 1.08 on the i3 NUC. It is good to see a small improvement since 5.6.35.

This test is run after the write-heavy tests. The test in the previous section was run before the write-heavy tests. They have similar QPS and b-tree fragmentation didn't reduce QPS on this test.

Modern InnoDB is about as fast as older InnoDB here but was much slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
4278    1.00    5464    1.00    5.0.96
4127    0.96    5256    0.96    5.1.72
3878    0.91    5022    0.92    5.5.51
3896    0.91    5310    0.97    5.6.35
4332    1.01    6155    1.13    5.7.17
4180    0.98    5891    1.08    8.0.3

random-points.pre

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.51 on the i5 NUC and 1.35 on the i3 NUC. Relative to MySQL 5.6 it is 2.13 on the i5 NUC and 1.18 on the i3 NUC.

The QPS improvement from 5.6.35 to 5.7.17 on the i5 NUC is large. Looking at vmstat and iostat output I see InnoDB 5.6 used ~33% more storage reads and and almost 3X more CPU per fetched row. There is also QPS improvement on the i3 NUC from 5.6 to 5.7 but it isn't as large.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

Modern InnoDB is faster than older InnoDB here but was slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
49      1.00     70     1.00    5.0.96
49      1.00     64     0.91    5.1.72
53      1.08     51     0.73    5.5.51
56      1.14     50     0.71    5.6.35
69      1.41    105     1.50    5.7.17
66      1.35    106     1.51    8.0.3

random-points

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.69 on the i5 NUC and 1.00 on the i3 NUC. Relative to MySQL 5.6 it is 1.56 on the i5 NUC and 0.94 on the i3 NUC. I don't yet understand the large improvement starting in MySQL 5.7 both here and in the previous section.

Modern InnoDB is faster than older InnoDB here but was slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
53      1.00     59     1.00    5.0.96
53      1.00     55     0.93    5.1.72
53      1.00     59     1.00    5.5.51
56      1.06     64     1.08    5.6.35
51      0.96    100     1.69    5.7.17
53      1.00    100     1.69    8.0.3

hot-points

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.78 on the i5 NUC and 0.66 on the i3 NUC. Relative to MySQL 5.6 it is 0.88 on the i5 NUC and 0.90 on the i3 NUC. This test is always in-memory as it fetches the same data per query. The results are similar to the results for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
4925    1.00    4704    1.00    5.0.96
4352    0.88    4450    0.95    5.1.72
3982    0.81    4591    0.98    5.5.51
3609    0.73    4167    0.89    5.6.35
3455    0.70    3771    0.80    5.7.17
3246    0.66    3690    0.78    8.0.3

insert

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.86. Relative to MySQL 5.6 it is 0.81 on the i5 NUC and 0.99 on the i3 NUC. The regression since MySQL 5.0.96 is likely from new code above the storage engine (optimizer, parser).

The regression here for modern InnoDB is similar to the results on in-memory sysbench. My guess is that this test doesn't do many storage reads per insert so the new CPU overheads in modern MySQL are significant.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
5868    1.00    7535    1.00    5.0.96
5665    0.97    7431    0.99    5.1.72
4479    0.76    6063    0.80    5.5.51
5097    0.87    7991    1.06    5.6.35
5730    0.98    7234    0.96    5.7.17
5043    0.86    6468    0.86    8.0.3

Thursday, November 16, 2017

A new optimization in gcc 5.x and MySQL

We were testing the impact of changing from gcc 4.9 to 5.x and from older to newer jemalloc. For one of the in-memory sysbench tests the QPS at high-concurrency dropped by 10%. The test was read-only with range-size set to 1000. If the test was limited to the order-ranges query then the QPS dropped by ~25%. This wasn't good.

I repeated the test with newer jemalloc and gcc 4.9 and there was no loss of QPS. So now it looked like a problem with gcc 5.x and not jemalloc. I then did a build with tcmalloc, but at startup mysqld would get an illegal free error from tcmalloc. After finding what I think is a RocksDB bug and then another problem, I found a workaround and soon discovered that for tcmalloc and glibc malloc there was also a decrease in QPS for gcc 5.x but not 4.9. Now I was worried that I might lose the debugging expertise of the internal jemalloc team at work, but fortunately they found the problem while collaborating with the MyRocks team.

AFAIK this isn't a MyRocks-only problem because it comes from the allocation done to sort for the order by clause. But I am tired of running tests and won't test it for InnoDB. Good news for the rest of the world. This is an issue for MySQL 5.6 but probably not for 5.7 and 8.x.

The problem is a gcc5 optimization (see gcc issues 67618 and 83022) that transforms the call sequence malloc, memset into a call to calloc. This appears to be done even for 2mb allocations (my.cnf had sort_buffer_size=2m). The output from perf wasn't clear about the problem this creates. For jemalloc it reported a new CPU overhead from smp_call_function_interrupt calling flush_tlb_func and all of that is kernel code. I was told that was from jemalloc zero-ing pages. A workaround that doesn't require a code change is to compile with -fno-builtin-malloc. There are workarounds that require code changes that I won't list here.

Here are performance results from mysqld compiled with gcc 4.9 vs 5.x and linked with different allocators (jemalloc, tcmalloc, glibc malloc). The test is in-memory sysbench read-only with range-size=1000, 8 tables and 1M rows/table. The test uses 48 concurrent connections and the server has 48 HW threads (24 cores, HT enabled).

QPS     configuration
78622   gcc4.9, jemalloc
76787   gcc4.9, tcmalloc
73340   gcc4.9, glibc malloc
-
65673   gcc5.x, jemalloc
58958   gcc5.x, tcmalloc
48750   gcc5.x, glibc malloc
-
78028   gcc5.x, jemalloc, -fno-builtin-malloc
78135   gcc5.x, tcmalloc, -fno-builtin-malloc
78207   gcc5.x, glibc malloc, -fno-builtin-malloc


Tuesday, November 14, 2017

Sysbench, in-memory, small server: MyRocks over time

In this post I compare four MyRocks releases from February to October using in-memory sysbench and a small server. The goal is understand where we have made MyRocks faster and slower this year.

tl;dr
  • For many tests there is no decrease in QPS from February to October
  • For some tests the QPS decreased by 3% to 8%
  • The largest regression is for read-heavy tests that run after write-heavy tests. Querying an LSM tree after many updates usually costs more in CPU and/or IO compared to querying it before the updates. But the CPU overhead might have increased since February.

Configuration

The tests used MyRocks from FB MySQL which is currently based on 5.6.35. Builds were done using HEAD from February 10, April 14, June 16, August 15 and October 16. The git hashes for these builds are:

  • February 10 - FB MySQL f3019b, RocksDB c2ca7a
  • April 14 - FB MySQL e28823, RocksDB 9300ef
  • June 16 - FB MySQL 52e058, RocksDB 7e5fac
  • August 15 - FB MySQL 0d76ae, RocksDB 50a969
  • October 16 - FB MySQL 1d0132, RocksDB 019aa7
All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines but there are a few new & changed options in that time. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. Compression was not used.

Sysbench is run with 2 tables and 2M rows per table. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database fits in RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The results below have the QPS for the test with 1 client relative to the QPS for the February 10 build. The tests are explained here.

Graphs

No graphs this time. The results aren't that interesting.

update-inlist

Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for the February 10 build. Values are provided for the i3 and i5 NUC.

There is no regression.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1713    1.00    2002    1.00    feb10
1826    1.07    2133    1.07    apr14
1605    0.94    1987    0.99    jun16
1698    0.99    2017    1.01    aug15
1761    1.03    2087    1.04    oct16

update-one

QPS decreased by ~8%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8375    1.00    9295    1.00    feb10
8036    0.96    9151    0.98    apr14
7774    0.93    8602    0.93    jun16
7705    0.92    8555    0.92    aug15
7732    0.92    8620    0.93    oct16

update-index

There is no regression on the i3 NUC and QPS has decreased by 7% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
5981    1.00    6861    1.00    feb10
5799    0.97    6722    0.98    apr14
5678    0.98    6300    0.92    jun16
5809    1.00    6306    0.92    aug15
6022    1.04    6392    0.93    oct16

update-nonindex

QPS decreased by 3% on the i3 NUC and 6% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
6521    1.00    7184    1.00    feb10
6346    0.97    7127    0.99    apr14
5913    0.91    6516    0.91    jun16
6066    0.93    6565    0.91    aug15
6310    0.97    6724    0.94    oct16

delete

QPS decreased by 7% on the i3 NUC and 5% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
15301   1.00    16361   1.00    feb10
14714   0.96    16552   1.01    apr14
13973   0.91    15736   0.96    jun16
14216   0.93    15447   0.94    aug15
14233   0.93    15515   0.95    oct16

read-write with range-size=100

QPS decreased by 7%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8101    1.00    8583    1.00    feb10
7700    0.95    8162    0.95    apr14
7333    0.91    7812    0.91    jun16
7366    0.91    7747    0.90    aug15
7532    0.93    7967    0.93    oct16

read-write with range-size=10000

QPS decreased by 5%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
262     1.00    308     1.00    feb10
253     0.97    300     0.97    apr14
241     0.92    290     0.94    jun16
246     0.94    285     0.93    aug15
249     0.95    294     0.95    oct16

read-only with range-size=100

QPS decreased by 8% on the i3 NUC and 21% on the i5 NUC. I suspect that the 21% regression on i5 NUC is an outlier and unlikely to repeat, but I will find out when I test a new build. I think it is an outlier because there is variance with MyRocks for read-heavy tests that follow write-heavy tests. The state of the LSM tree (number of entries in the memtable, number of files in L0) is not deterministic across test runs and that impacts read performance. This is made worse because the LSM tree state can remain in that state for the duration of the read-only test. I prefer to run read-write tests when evaluating MyRocks to avoid this variance.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8610    1.00    9272    1.00    feb10
7818    0.91    8223    0.89    apr14
7506    0.87    8463    0.91    jun16
7604    0.88    7660    0.93    aug15
7920    0.92    7321    0.79    oct16

read-only.pre with range-size=10000

QPS decreased by 4% on the i3 NUC and 3% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
215     1.00    261     1.00    feb10
213     0.99    256     0.98    apr14
203     0.94    261     1.00    jun16
208     0.97    256     0.98    aug15
207     0.96    254     0.97    oct16

read-only with range-size=100000

QPS decreased by 6% on the i3 NUC and 8% on the i5 NUC. The decrease here is larger than for the previous test. The difference is that this test is run after write-heavy tests while the previous test is run before them. It costs more to search the LSM structures after random updates (compare the QPS here with the QPS in the previous test), and that cost may have increased. I have written more about mistakes to avoid when doing a benchmark with an LSM and if you only do read-only tests before fragmenting the LSM tree you might be an optimist.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
214     1.00    257     1.00    feb10
203     0.95    242     0.94    apr14
194     0.91    240     0.93    jun16
197     0.92    230     0.89    aug15
201     0.94    237     0.92    oct16

point-query.pre

QPS decreased by 3% on the i3 NUC and 1% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
15775   1.00    16190   1.00    feb10
15314   0.97    16035   0.99    apr14
14504   0.92    15218   0.94    jun16
14627   0.93    15462   0.96    aug15
15277   0.97    16022   0.99    oct16

point-query

QPS decreased by 7%. The decrease here is larger than for the previous test. See the comment two sections about about read-only tests that follow write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
15326   1.00    15900   1.00    feb10
14126   0.92    14556   0.92    apr14
13612   0.89    15030   0.95    jun16
13557   0.88    13721   0.86    aug15
14328   0.93    14801   0.93    oct16

random-points.pre

QPS decreased by 3% on the i3 NUC and 2% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1450    1.00    1527    1.00    feb10
1459    1.01    1499    0.98    apr14
1301    0.90    1360    0.89    jun16
1374    0.95    1394    0.91    aug15
1401    0.97    1502    0.98    oct16

random-points

QPS decreased by 8% on the i3 NUC and 12% on the i5 NUC. See the comment two sections about about read-only tests that follow write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1063    1.00    1151    1.00    feb10
 928    0.87     940    0.82    apr14
 952    0.90     947    0.82    jun16
 962    0.90     847    0.74    aug15
 973    0.92    1008    0.88    oct16

hot-points

QPS decreased by 15%. This is like random-points except it fetches the same values for every query. It is run after the write-heavy tests. The regression is similar to random-points.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1565    1.00    1809    1.00    feb10
1384    0.88    1531    0.85    apr14
1239    0.79    1422    0.79    jun16
1334    0.85    1341    0.74    aug15
1329    0.85    1535    0.85    oct16

insert

QPS decreased by 8% on the i3 NUC and 7% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8337    1.00    9102    1.00    feb10
8377    1.00    9086    1.00    apr14
7871    0.94    8723    0.96    jun16
8074    0.97    8785    0.97    aug15
7650    0.92    8446    0.93    oct16

Monday, November 13, 2017

Sysbench, in-memory, small server: InnoDB & MyRocks

In this post I compare MyRocks and InnoDB using in-memory sysbench and a small server. Previous posts explained performance for MySQL from 5.0 to 8.0 for MyISAM and InnoDB. In this post I limit the results to MySQL 5.6.35 for MyRocks and both 5.6.35 and 5.7.17 for InnoDB.

tl;dr - the QPS for MyRocks relative to InnoDB in MySQL 5.6 where 1.0 means they are equal
  • For write heavy tests MyRocks gets ~0.80 vs InnoDB except on the update-index test where it benefits from read-free secondary index maintenance
  • For scan heavy tests MyRocks gets between 0.70 and 0.94 vs InnoDB
  • For point query tests MyRocks gets between 0.71 and 0.80 vs InnoDB
  • For in-list query tests MyRocks gets between 0.26 and 0.45 vs InnoDB
Not really tl;dr
  • Range scans are much faster for InnoDB in 5.7 than in 5.6
  • QPS for InnoDB in MySQL 5.7 is frequently worse than in 5.6 because there is more CPU overhead per query. I assume this is the cost from more features. Most of the overhead appears to be in code above the storage engine (optimizer, parser, etc). This test uses low concurrency and doesn't highlight great improvements at high concurrency in InnoDB 5.7. The random-points section below has more details.
  • MyRocks read performance suffers more than InnoDB when the database is subject to a write-heavy workload prior to the read-heavy tests. It costs more CPU time to search the LSM structures after random changes. This is more true for in-memory workloads than for IO-bound. Note that many bulk-loads done by benchmarks don't show this cost because they insert data in key order. Benchmarks that do bulk-load and then query can be misleading, especially if the bulk-load is in key order.

From the ratios above you might ask why I like MyRocks. Some of the performance differences are things we need to fix and work has been in progress, so better results are coming. But I like it because it is more efficient than InnoDB for two reasons. First, it uses less space than InnoDB (less space-amplification) so you need less SSD to store the same data. Second, it writes less to storage per transaction so SSD devices last longer. InnoDB, WiredTiger and mmapv1 are all increasing the rate at which I must replace SSDs on my test servers. Finally, this workload is in-memory and MyRocks is usually better for IO-bound than for in-memory. 

Configuration

The tests used InnoDB from upstream MySQL 5.6.35 and 5.7.17 and then MyRocks from FB MySQL 5.6.35. The MyRocks build is from August 15 with git hash 0d76ae. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, used the same charset and collation, set innodb_purge_threads=1 to reduce mutex contention. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream. Compression was not used for InnoDB or MyRocks.

Sysbench is run with 2 tables and 2M rows per table. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database fits in RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is much smaller.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. For all tests but scan the result has the QPS for the test with 1 client relative to the QPS for InnoDB in MySQL 5.6.35. The tests are explained here.

Graphs

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. Fortunately the results within each group are similar and one graph per group is sufficient. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. For write-heavy InnoDB does better than MyRocks on all of the tests except update-index where MyRocks benefits from read-free secondary index maintenance. The workload is low-concurrency so InnoDB in 5.7 suffers from more CPU overhead without getting a return from its high-concurrency improvements. The write-heavy tests do a read-modify-write per query, InnoDB is more efficient for the reads so it gets better QPS than MyRocks. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6.
  • update-inlist - 0.60 & 0.77
  • update-one - 0.78 & 0.80
  • update-index - 1.15 & 1.85
  • update-nonindex - 0.65 & 0.77
  • delete - 0.78 & 0.97
  • insert - 0.78 & 0.86
For scan-heavy InnoDB does better on all of the tests and given the range scan improvements to InnoDB in MySQL 5.7 the comparisons will get harder. MyRocks suffers from more CPU overhead in the merge iterator. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6. The graph is for read-write with range-size=100.
  • read-write with range-size=100 - 0.67 & 0.75
  • read-write with range-size=10000 - 0.85 & 0.87
  • read-only with range-size=100 - 0.69 & 0.79
  • read-only.pre with range-size=10000 - 0.94 & 0.92
  • read-only with range-size=10000 - 0.85 & 0.89
The point-query group includes the point-query test run before and then after the write-heavy tests. InnoDB does better on both of the tests because MyRocks uses more CPU per query. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6. The graph is for the test run after the write-heavy tests.
  • point-query run before - 0.75 & 0.80
  • point-query run after - 0.71 & 0.73
The inlist-query group includes the hot-points test and the random-points tests run before and after the write-heavy tests. InnoDB does better on all tests because MyRocks uses more QPS per query. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6. The graph is for the test run after the write-heavy tests.
  • hot-points - 0.32 & 0.36
  • random-points run before - 0.44 & 0.45
  • random-points run after - 0.26 & 0.32
update-inlist

Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the engine relative to the QPS for InnoDB in MySQL 5.6.35 from the result with one client. Values are provided for the i3 and i5 NUC.

InnoDB loses ~5% of QPS from 5.6 to 5.7. InnoDB does better than MyRocks.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2201    1.00    3349    1.00    InnoDB-5.6
2050    0.93    3192    0.95    InnoDB-5.7.17
1698    0.77    2017    0.60    MyRocks-5.6

update-one

InnoDB loses ~12% of QPS from 5.6 to 5.7. InnoDB does better than MyRocks.

i3 NUC          i5 NUC

QPS     ratio   QPS     ratio   engine
9650    1.00    10969   1.00    InnoDB-5.6
8097    0.84     9648   0.88    InnoDB-5.7.17
7705    0.80     8555   0.78    MyRocks-5.6

update-index

MyRocks does much better than InnoDB because secondary index maintenance is read free.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3134    1.00     5489   1.00    InnoDB-5.6
2983    0.95     5861   1.07    InnoDB-5.7.17
5809    1.85     6306   1.15    MyRocks-5.6

update-nonindex

See comments in the update-index section above. MyRocks QPS here is similar to the QPS for update-index, but InnoDB QPS is ~2X better here than for update-index.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
7860    1.00    10148   1.00    InnoDB-5.6
6337    0.81     9135   0.90    InnoDB-5.7.17
6066    0.77     6565   0.65    MyRocks-5.6

delete

Not much to add here.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
14590   1.00    19900   1.00    InnoDB-5.6
12758   0.87    17621   0.89    InnoDB-5.7.17
14216   0.97    15447   0.78    MyRocks-5.6

read-write with range-size=100

Range scans are more efficient in InnoDB 5.7 than in 5.6. MyRocks gets less QPS because it uses more CPU per query

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 9844   1.00    11591   1.00    InnoDB-5.6
10152   1.03    12290   1.06    InnoDB-5.7.17
 7366   0.75     7747   0.67    MyRocks-5.6

read-write with range-size=10000

Range scans are more efficient in InnoDB 5.7 than in 5.6 and the benefit is larger here than the previous test because the range scans are larger (10,000 rows vs 100). MyRocks gets less QPS because it uses more CPU per query

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
283     1.00    335     1.00    InnoDB-5.6
442     1.56    500     1.49    InnoDB-5.7.17
246     0.87    285     0.85    MyRocks-5.6

read-only with range-size=100

Range scans are more efficient in InnoDB 5.7 than in 5.6. MyRocks gets less QPS because it uses more CPU per query.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 9644   1.00    11120   1.00    InnoDB-5.6
11369   1.18    12247   1.10    InnoDB-5.7.17
 7604   0.79     7660   0.69    MyRocks-5.6

read-only.pre with range-size=10000

Range scans are more efficient in InnoDB 5.7 than in 5.6 and the benefit is larger here than the previous test because the range scans are larger (10,000 rows vs 100). MyRocks gets less QPS because it uses more CPU per query.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
226     1.00    272     1.00    InnoDB-5.6
358     1.58    394     1.45    InnoDB-5.7.17
208     0.92    256     0.94    MyRocks-5.6

read-only with range-size=100000

See the comment in the previous section. This test was run after the write-heavy tests. The test for the previous section was run before the write-heavy tests. QPS for InnoDB here and above is similar, but QPS for MyRocks here is 5% to 10% less because it costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM and if you only do read-only tests before fragmenting the LSM tree you might be an optimist.

One thing I didn't notice until now is that QPS for InnoDB on read-write is larger than for read-only for tests with range-size=10000 but not for range-size=100. I don't know why yet.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
221     1.00    272     1.00    InnoDB-5.6
359     1.62    393     1.44    InnoDB-5.7.17
197     0.89    230     0.85    MyRocks-5.6

point-query.pre

InnoDB uses more CPU/query in 5.7 than in 5.6 so QPS is smaller. MyRocks has the same problem.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
18329   1.00    20505   1.00    InnoDB-5.6
16592   0.91    18126   0.88    InnoDB-5.7.17
14627   0.80    15462   0.75    MyRocks-5.6

point-query

See the comment in the previous section. This test was run after the write-heavy tests. The test for the previous section was run before the write-heavy tests. QPS for InnoDB here and above is similar, but QPS for MyRocks here is 7% to 11% less because it costs more to search the LSM structures after random updates.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
18554   1.00    19257   1.00    InnoDB-5.6
16883   0.91    17723   0.92    InnoDB-5.7.17
13557   0.73    13721   0.71    MyRocks-5.6

random-points.pre

The QPS decrease for InnoDB from 5.6 to 5.7 is much smaller here than for point-query.pre above. I assume that most of the QPS decrease from 5.6 to 5.7 is from code above InnoDB (parse, optimize, etc) and the query here fetches 100 rows by PK using an in-list. So the per-query overhead here is amortized over 100 rows versus 1 row above.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3040    1.00    3203    1.00    InnoDB-5.6
2931    0.96    3121    0.97    InnoDB-5.7.17
1374    0.45    1394    0.44    MyRocks-5.6

random-points

See the comment in the previous section. This test was run after the write-heavy tests. The test for the previous section was run before the write-heavy tests. QPS for InnoDB here and above is similar, but QPS for MyRocks here is 30% to 40% less because it costs more to search the LSM structures after random updates. The cost here is greater than the cost above for read-only and point-query because this test spends a larger fraction of the per-query time in the storage engine, so it is more sensitive to storage engine overheads. This is the same reason that the 5.6 to 5.7 regression is smaller here for InnoDB than for the tests that preceded random-points.pre.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3036    1.00    3223    1.00    InnoDB-5.6
2947    0.97    3123    0.97    InnoDB-5.7.17
 962    0.32     847    0.26    MyRocks-5.6

hot-points

This is like random-points except it fetches the same values. It is run after the write-heavy tests. It is always in-memory. InnoDB can benefit from the adaptive hash index. But I am curious why the QPS here for MyRocks is closer to the random-points test run before the write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3666    1.00    4242    1.00    InnoDB-5.6
3458    0.94    3898    0.92    InnoDB-5.7.17
1334    0.36    1341    0.32    MyRocks-5.6

insert

InnoDB loses 12% to 15% of the insert rate in MySQL 5.7 because the per-insert CPU overhead is larger. MyRocks suffers the same problem.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
9340    1.00    11244   1.00    InnoDB-5.6
7853    0.84     9892   0.88    InnoDB-5.7.17
8074    0.86     8785   0.78    MyRocks-5.6