About Online DDL

DDL(Data Definition Language), that is, the data definition language, including new table, table add fields, change fields and other operations.
Large table DDL at will execute dozens of minutes, even a few hours, such a long time to lock the table for the business is obviously unacceptable.
Online DDL means that the append, delete, update, and search (DML) operations are not blocked during the execution of DDL.

Mysql Online DDL

For historical reasons, mysql5.5 and below do not support native Online DDL, which locks tables and blocks DML during DDL execution. For a long time, mysql users implemented Online DDL using third-party tools.
Since mysql5.6, mysql finally supports native Online DDL. However, in the past, third-party tools have also been greatly developed and verified in a large number of production environments, whether to fully use Online DDL still needs to be evaluated.

Mysql native Online DDL

Introduction

Mysql has supported Online DDL since 5.6, which initially had very limited operations; mysql5.7 supports more operations, which can cover most DDL operations. And in the latest 8.0, part of DDL begin to support second level operations.
In general, Online DDL has been performing very well since 5.7, and can replace third-party tools in most scenarios. After 8.0, part of DDL supports second-level changes, which has obvious advantages over other tools.
Mysql DDL has the following algorithms:
● copy - not Online DDL, locks the table
● inplace – 5.6 + supported, only partial operations, operates at the engine layer, is about 10 times faster than pt-osc, consumes very little resources (when not refactoring tables)
● inplace(rebuild table) - When the inplace algorithm reconstructs the table, it is about 3 times faster than pt-osc, but may incur large master-slave latency
● instant – supported by version 8.0 and above, only a few operations are supported. The operation of table metadata is basically second level operations

Advantages

● Simple operation – no need to install third party tools, login to the mysql command line to operate, only need to add the ALTER statement after the algorithm specified and the lock level specified to execute
● The speed of DDL is fast and the resource consumption is small. inplace algorithm operates in the engine layer, the speed is basically better than the third-party tools of copy table, and the speed of some DDL is more than 10 times of the third-party tools. The instant algorithm operates on the order of seconds

The Cons

● Limited support for operations - while most DDLS are supported, some DDLS are still not supported to date, such as changing column data types
● There is some risk of execution - locking the table for a period of time may occur when there are slow queries or long transactions on the table. pt-osc has the same problem
● There may be long master-slave latency – slave DDL is played back as a single thread, no other DML operations can be played back during playback, and the master-slave latency is theoretically at least twice the execution time of the DDL (one DDL each for the master and slave)

Rationale

Each table has a MDL(metadata lock), where the session gets read MDL(shared MDL) when performing DML and writes MDL(exclusive MDL) when performing DDL. Read MDL allows other DMLS to execute, while write MDL does not allow other DMLS, DDLS to execute. In other words, MDL implements preventing other DDLS from operating with DML while executing DDLS.
In the past when DDL was executed (copy algorithm), DDL would fetch the write MDL until the end of the DDL, and the DDL process did not allow any DML operations.
Since mysql5.6, DDL(inplace algorithm) only fetches write MDL(start) at the beginning and end, and the execution part which takes the longest time only fetches read MDL, which is the implementation of Online DDL for users, DDL and DML are executed concurrently.

Using the

Here, take mysql5.7 as an example, list the following common DDL operations:
● Add column – inplace rebuild table(8.0 support Instant algorithm)
ALTER TABLE tbl_name ADD COLUMN column_name column_definition, ALGORITHM=INPLACE, LOCK=NONE;
● Add a secondary index – inplace not rebuild table
ALTER TABLE tbl_name ADD INDEX name (col_list), ALGORITHM=INPLACE, LOCK=NONE;
● Alter column data type - online DDL is still not supported today

Default algorithm

When a DDL statement does not declare an algorithm or lock level, Mysql defaults to the most efficient algorithm supported by the DDL, and the lock level specifies the lowest level.
For example, the inplace algorithm is used by default in 5.7 and the instant algorithm is used by default in 8.0
Progress monitoring
When a native Online DDL executes a process, there is no feedback until it is complete, and it is a bit tricky to see the progress of the DDL:

  1. Enable performance_schema
    This is a low-level monitoring module that comes with mysql. Various cloud manufacturers close this module by default (there are some performance losses). You need to go to the console to manually open it, and the configuration requires a restart to take effect.
    show variables like ‘performance_schema’;
  2. Start alter statement monitoring
    UPDATE performance_schema.setup_instruments SET ENABLED = ‘YES’ WHERE NAME LIKE ‘stage/innodb/alter%’;
    UPDATE performance_schema.setup_consumers SET ENABLED = ‘YES’ WHERE NAME LIKE ‘%stages%’;
  3. Check the DDL progress
    Online DDL is divided into 7 phases, 7 phases share WORK_COMPLETED/WORK_ESTIMATED to measure progress.
    Since WORK_ESTIMATED will increase with the increase of stages, the actual progress may be partially biased, See https://dev.mysql.com/doc/refman/5.7/en/monitor-alter-table-performance-schema.html
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    SELECT STMT.SQL_TEXT,
          STAGE.EVENT_NAME,
          CONCAT(WORK_COMPLETED, '/', WORK_ESTIMATED) AS PROGRESS,
          CONCAT(ROUND(100 * WORK_COMPLETED / WORK_ESTIMATED, 2), ' %') AS PROCESSING_PCT,
          SYS.FORMAT_TIME(STAGE.TIMER_WAIT) AS TIME_COSTS,
          CONCAT(ROUND((STAGE.TIMER_END - STMT.TIMER_START) / 1E12 *
                      (WORK_ESTIMATED - WORK_COMPLETED) / WORK_COMPLETED,
                       2),
                 ' s') AS REMAINING_SECONDS
    FROM PERFORMANCE_SCHEMA.EVENTS_STAGES_CURRENT     STAGE,
          PERFORMANCE_SCHEMA.EVENTS_STATEMENTS_CURRENT STMT
    WHERE STAGE.THREAD_ID = STMT.THREAD_ID
      AND STAGE.NESTING_EVENT_ID = STMT.EVENT_ID\G

Considerations

● Explicitly declare algorithms and locks when using - ALGORITHM=INPLACE, LOCK=NONE; mysql will give an error if the algorithm and lock level are not supported. When the declaration algorithm and lock level are not shown in the DDL statement, Mysql will use the copy algorithm by default for some operations that do not support Online DDL, which will eventually lead to a long time lock table, and there is a certain risk. At the same time, there are a small number of DDLS that support inplace algorithm but do not support it completely

● The operations supported by 5.6, 5.7, 8.0 are all different. Please check the official website for the supported operations

● Online DDL start and end with short lock table (1 s)

● There may be large master-slave latency - DDL needs to be executed on the master before it can be executed on the slave (this is achieved by binlog replay of alter statements, which is single-threaded), so concurrent DML executed on the master during DDL needs to wait for the slave to finish playing back the DDL before it can be played back. In summary, when the master executes the DDL, the slave node’s delay (Seconds behind master) will gradually increase to a peak of twice the DDL execution time, and then quickly fall back to 0.

● When there is a long transaction (or slow query) in the current action table, the Online DDL will be blocked by the transaction, and the DML after the Online DDL will be blocked by the Online DDL blocking. The details are as follows:

  • Long transaction A acquires the read MDL of table and holds the read MDL until the end of the transaction, assuming that the transaction lasts for 10 minutes

  • The Online DDL operation needs to fetch the write MDL for A short time (1s), so the DDL is blocked until the end of the long transaction A

  • After the DDL is blocked, transaction B(or normal DML) arrives and needs to share the read MDL of transaction A. However, because the write MDL lock has higher priority (MDL lock is write first, if A session requests to write MDL lock, all subsequent readers will be blocked), transaction B cannot share the read MDL of transaction A, so it is blocked. Wait for the Online DDL to acquire and release the write MDL

  • Then both Online DDL and transaction B are blocked for 10 minutes, after which Online DDL briefly fetches the write DML and releases it, and transaction B runs normally. (There is also a second possibility, when Online DDL blocks for lock_wait_timeout, DDL will timeout. However, pt-osc will set this parameter to 60s when running.)

  • After blocking occurs, the show processlist command can be used to view all session status, or the DDL can be terminated in time to stop the loss