MySQL

The world's most popular open source database

Contact a MySQL Representative


  • MySQL.com
  • Developer Zone
  • Partners & Solutions
  • Customer Login
  • DevZone
  • Downloads
  • Documentation
  • Articles
  • Forums
  • Bugs
  • Forge
  • Blogs
 
  • You are currently browsing the archives for the MySQL 6.0 New Features category.

  • Pages

    • About
    • Find and store the error return value in procedures or functions
  • Archives

    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
  • Categories

    • MySQL 5.1 Features (3)
    • MySQL 5.4 New Features (2)
    • MySQL 6.0 New Features (5)
    • MySQL 6.x New Features (5)
    • News (8)
    • Personal Opinion (4)
    • Tiny Tweaks (10)
    • Uncategorized (19)



New Features In MySQL 6.x

Archive for the ‘MySQL 6.0 New Features’ Category

SIGNAL and RESIGNAL are in 6.0 main tree

Tuesday, March 3rd, 2009

I wrote before about SIGNAL and RESIGNAL, and I wrote these worklog tasks that you can see on forge:
WL#2110: Stored Procedures: Implement SIGNAL http://forge.mysql.com/worklog/task.php?id=2110
WL#2265: Stored Procedures: Implement RESIGNAL http://forge.mysql.com/worklog/task.php?id=2265
This is the feature that lets you write your own errors, or change texts in existing error messages.

Today’s news is: my colleague Marc Alff has put the feature in the “main” 6.0 tree. This means it will
be downloadable from source within a few days, and will be part of the next 6.0 binary download.

Posted in MySQL 6.0 New Features | No Comments »

Online Backup: Encryption

Sunday, October 12th, 2008

By Peter Gulutzan

A new addition inside the new “MySQL Online Backup” feature is encryption. Australian Senior Software Engineer Stewart Smith has already mentioned it a few months ago in his blog. There’s a preview that you can download on the online backup pages. It’s probably going to get architecture-reviewer approval next week. Probably it will be part of MySQL-6.0 source downloads soon. I extract some paragraphs from what I wrote in a non-public worklog task, WL#4271.

For BACKUP:
BACKUP DATABASE database_name TO ‘image-file-name’
ENCRYPTION_ALGORITHM = { AES | 3DES }
[ ENCRYPTION_KEYSIZE = { 128 | 192 | 256 } ]
PASSWORD = ‘password’;

For RESTORE:
RESTORE FROM ‘image-file-name’
ENCRYPTION_ALGORITHM = { AES | 3DES }
[ ENCRYPTION_KEYSIZE = { 128 | 192 | 256 } ]
PASSWORD = ‘password’;

According to WL#4037 “Online backup: Use zlib compression to reduce backup file size”, there might be compression clauses, for example “WITH COMPRESSION COMPRESSION_ALGORITHM = gzip”. The order does not matter, that is, this worklog task does not attempt to specify the relative positions of the compression and encryption clauses. But in any case compression occurs before encryption.

If ENCRYPTION_ALGORITHM = AES then ENCRYPTION_KEYSIZE may be specified, or it may be omitted (default = 128). If ENCRYPTION_ALGORITHM = 3DES then ENCRYPTION_KEYSIZE may not be specified.

The minimum length of password is zero. The maximum length of password is implementor-defined.

There are no new privilege requirements.

Example:

BACKUP DATABASE d TO ‘1′ ENCRYPTION_ALGORITHM=AES ENCRYPTION_KEYSIZE=128
PASSWORD=”;
RESTORE FROM ‘1′ ENCRYPTION_ALGORITHM=3DES PASSWORD=”;

Effect

MySQL encrypts with the specified algorithm and password, for example something analogous to AES_ENCRYPT(,’password’);
MySQL decrypts with the specified algorithm and password, for example something analogous to AES_DECRYPT(,’password’);

Therefore the RESTORE statement will fail if the algorithm and password differ in any way from what was used in BACKUP. For example, if backup was done with BACKUP DATABASE d TO ‘1′
ENCRYPTION_ALGORITHM=AES ENCRYPTION_KEYSIZE=256 PASSWORD=”, these statements will fail:
RESTORE FROM ‘1′;
RESTORE FROM ‘1′ ENCRYPTION_ALGORITHM=AES PASSWORD=”;
RESTORE FROM ‘1′ ENCRYPTION_ALGORITHM=AES ENCRYPTION_KEYSIZE=256 PASSWORD=’a';

Encryption is of the whole image, not per-database.

Assumptions

Assume password interception won’t occur, because there is some protection from an SSL connection or due to a later worklog task, WL#1054 “Pluggable authentication support”.

Assume logs which would contain passwords are stored in secure places. The PASSWORD clause is logged, replicated, and apparent in columns like mysql.online_backup.command — but protected by privileges on such metadata.

Assume PROCESS privilege is rare so don’t worry that SHOW PROCESSLIST exposes the IDENTIFIED BY clause.

Assume AES and 3DES are all we need for now.

New Reserved words

None.

Encryption Algorithms

The encryption algorithms are in the yassl library. If mysqld is built without the yassl library, then the encryption/decryption algorithms won’t work, at least for now.

Observations on the preview

There are a few anomalies that might be bugs. Generally I don’t report bugs to bugs.mysql.com until/unless they’re in a main tree.

1. This statement causes a crash:
restore from ‘5.bak’ password=’pants’;

2. This statement causes no error although keysize should not be 0 or have a default:
backup database test to ‘5.bak’ encryption_algorithm=aes encryption_keysize=0 password=”The Rain In Spain”;

3. This statement causes a crash:
backup database test to ‘4.bak’ encryption_keysize=128;

4. This series of statements causes a crash:
use test
drop table if exists t;
create table t (s1 varchar(1000)) engine=falcon;
insert into t values (0×1);
set names utf8;
backup database test to ‘1.bak’ encryption_algorithm=aes password=’Γ’;
/* The Γ character is GREEK CAPITAL LETTER GAMMA U+0393 */

5. Nothing happens to backup_history or backup_progress.

6. This statement causes no error although AES should not be inside “s:
BACKUP DATABASE test TO ‘8.bak’ ENCRYPTION_ALGORITHM=`aes` PASSWORD=”;

Posted in MySQL 6.0 New Features | 2 Comments »

Bazaar Branches of MySQL 6.0

Sunday, June 29th, 2008

Bazaar Branches of MySQL 6.0

Now that MySQL has switched to using bazaar for source code repositories, an in place to go for 6.0 public code is Canonical’s “launchpad” site, specifically “Bazaar Branches related to Sun/MySQL Engineering” at https://code.launchpad.net/~mysql. The Bazaar Branches page has a list. Each row in the list is for a source-code package with all the MySQL files necessary to build the server, as described in the MySQL Reference Manual chapter MySQL Installation Using a Source Distribution. For 6.0 lovers, the interesting entries in the Bazaar Branches list are mysql-6.0, mysql-6.0-backup, mysql-6.0-falcon, mysql-6.0-rpl, mysql-6.0-wl3220, mysql-maria.

how to read the list
The Bazaar Branches page’s instructions look good enough to me. But comments in the “Last commit” column might have MySQL-specific abbreviations that I’d like to warn about. Specifically:
* A comment like “Fix to Bug#37679 …” will not refer to the bugs in Canonical’s launchpad.net site, but to MySQL’s own database of bugs: http://bugs.mysql.com.
* A comment containing the initials “WL”, for example “WL#4398 fix compile error …” or “WL3220: Loose index scan …”, is referring to a WorkLog task number. The public worklog tasks are on http://forge.mysql.com. Not all worklog tasks are public.

mysql-6.0
We call mysql-6.0 the “main” tree. When 6.0 becomes generally available, every feature will be in mysql-6.0. Probably this is the only branch you’ll need, unless you are a fanatical follower of a particular team’s progress on a particular day. The code here might be more recent, and more experimental, than what’s in the tar file on the “MySQL 6.0 Downloads” page.

mysql-6.0-backup
I had a long article on BACKUP and RESTORE a few weeks ago. A few changes since then: (a) some people from “Sun Classic” have joined the team (yay, Sun integration!); (b) the progress data is now in true “logs” so there are some options for viewing them as tables or files, turning them on and off, etc., the same way that it’s done with query log or general log.

mysql-6.0-rpl
The letters “rpl” stand for “replication”. Don’t ask me why, maybe this is the way that Swedes do abbreviations. (MySQL AB was originally a Swedish company, and two of the prominent replication people, Mats Kindahl and Lars Thalmann, are from places near Stockholm and Uppsala.)

mysql-6.0-falcon
Elsewhere I’ve seen statements like “Falcon is currently only supported within a special fork of the MySQL release, termed MySQL-6.0-falcon.” That’s false. All mysql-6.0 packages include Falcon. The distinguishing feature of mysql-6.0-falcon is that it has recent changes done by Falcon team members which will be merged with the other trees after a delay of perhaps a few weeks.

mysql-6.0-wl3220
The suffix ‘wl3220″ means “this branch contains the experimental code for worklog task 3220″. I mentioned before that some worklog tasks are public. You can find the description for this one at this location:
http://forge.mysql.com/worklog/task.php?id=3220.
The task is to speed up COUNT(DISTINCT column_name) and similar functions. At the moment that I’m writing this, it isn’t working yet.

mysql-maria
I included this in a list of 6.0 “sites worth seeing”, even though, if you say
SELECT VERSION();
you’ll see “5.1.25-maria-alpha”. I’m not sure that the maria storage engine is to be released along with version 5.1, it now seems more likely that it will be in 6.0 or 6.1. But it’s a little early to state what the Maria storage engine will be part of.

Other branches
Other branches exist internally, for example
mysql-6.0-wl2110 SIGNAL/RESIGNAL (all parts working now)
mysql-6.1-fk Foreign keys (parser and metadata storage working now)
Eventually these branches will either appear on the Bazaar Branches page, or get merged into one of the other branches before becoming public. Keep watching the page. You’ll probably see frequent changes.

Posted in MySQL 6.0 New Features | No Comments »

MySQL 6.0 Feature #2: Online Backup

Monday, May 19th, 2008

MySQL 6.0 Feature #2: Online Backup

Online Backup
MySQL 6.0 New Features Document#2
2008-05-18

By Alexander Nozdrin, Chuck Bell, Lars Thalmann, Peter Gulutzan, Rafal Somla

High Executive Summary

Online backup is a new feature in MySQL 6.0.
It protects from database loss.
It does not block all other concurrent MySQL connections.
It will interest Database Administrators.

Database Administrators Summary

BACKUP DATABASE copies all data and metadata in one or more MySQL databases, into an “image file”. RESTORE reads an image file and rewrites all the data and metadata in one or more MySQL databases. So if you lose a database, you can recover all of it as of the time of the last BACKUP DATABASE statement.And then you can re-run the statements in MySQL’s binary log to recover “from the time of the last BACKUP DATABASE statement”, “to the time that the database loss occurred”. In other words … Careful Use of Online Backup will protect from database loss.

The word “online” means “without blocking”. So, unlike earlier backup solutions which could make all other jobs hang, the Online Backup statements will allow certain Database Manipulation Language statements to go on concurrently. Some locking still occurs, but it’s minimal.

To complete the package, there are new tools for monitoring and recording each online backup process or subprocess.

BACKUP DATABASE Syntax

BACKUP DATABASE database_name_list TO ‘image-file-name’;

It is possible to use SCHEMA as a synonym for DATABASE, that is, “BACKUP SCHEMA …” instead of “BACKUP DATABASE …”.

The database_name_list may be a list of databases separated by commas, or “*” meaning “all databases except mysql or information_schema”.
There is no predefined maximum number of databases.
It is illegal to use the same database name more than once.
All databases in the list must exist.
The user must have SELECT or equivalent privileges for all objects in all databases.
The use of file names is affected by the security startup option –secure-file-priv
http://dev.mysql.com/doc/refman/6.0/en/server-options.html#option_mysqld_secure-file-priv.

The ‘image-file-name’ is the name of the image file.
This must be specified as a string literal, not as a variable.
Only one file name is allowed.
The path may be absolute, or relative to ‘backupdir’ (see section titled ‘backupdir’).
The file is always on the server, there is no LOCAL option to force the image file to go to a client computer.
There must not already be a file with the same name.
The server must have the ability to create files and write on the directory.

Examples:
BACKUP DATABASE * TO ‘/usr/local/mysql/var/t1′;
BACKUP DATABASE `C#` TO ‘t2′;
BACKUP SCHEMA a,b,c TO ‘d’;

Effects:
* All data and metadata in all listed databases is copied to the image file.
* While BACKUP DATABASE is going on, it is possible to monitor progress with a new mysql table, mysql.online_backup_progress.
* After BACKUP DATABASE completes successfully, there is a record in a new mysql table, mysql.online_backup. (The record is created when the backup starts, is updated as the backup proceeds, and reaches its final state when the backup ends.)

Minor notes: BACKUP DATABASE is classed as a “Data Manipulation Language” statement because it reads.
It does not lock table files in a way that would prevent other jobs from reading and writing, but it does block other jobs from performing any “Database Definition Language” with any object in any database in the list.
Upon completion, BACKUP DATABASE returns a result set with the backup number.
It causes an automatic “commit”.

Terminology notes: MySQL uses the word “image” for the data that BACKUP DATABASE produces. At present the “image” is always a file so ‘image-file’ is the only possibility. This has no relation to the way that the word ‘image’ is used by other DBMS vendors.

RESTORE syntax

RESTORE FROM ‘image-file-name’;

The ‘image-file-name’ is the name of an image file that was made with BACKUP DATABASE. It must be specified as a string literal.

The user must have CREATE or equivalent privileges for all objects in all databases that were backed up.

Examples:
BACKUP DATABASE test TO ‘/usr/local/mysql/var/t3′;
RESTORE FROM ‘/usr/local/mysql/var/t3′;

Effects:
Everything that was backed up with BACKUP DATABASE is restored with RESTORE. At time of writing, existing databases are overwritten. Later, it will be necessary to drop databases before restoring. Upon completion, RESTORE returns a result set with the backup number.

Minor notes: RESTORE is classed as a “Data Definition Language” statement because it alters. It causes some table locking because it will write to each table in each database, so “online restore” does not exist.

backupdir

The ‘backupdir’ i.e. ‘backup directory’ variable is new. You can see its value with “SELECT @@backupdir” but it is read only so you may not set it while the server is up. You may specify –backupdir on the mysqld command line, for example:
mysqld –backupdir=’/’

If the ‘image-file-name’ in BACKUP or RESTORE has a relative path (for example ‘x’ instead of ‘/usr/local/mysql/var/x’), then the DBMS will put the backupdir value before image-file-name. For example, if backupdir is ‘/usr/local/mysql/var’, the statement
RESTORE FROM ‘x’
will open the file ‘/usr/local/mysql/var/x’.

By default, backupdir is the same as datadir, that is, if datadir = ‘/usr/local/mysql/var’ then backupdir = ‘/usr/local/mysql/var’. But it is not good policy to put the database and the backups on the same device! MySQL recommends that you change backupdir if you have another device that backups can go to.

What is backed up

BACKUP DATABASE backups the database and all objects that are within the database:

persistent base tables created with engine = memory, archive, myisam, innodb, or falcon (including indexes, foreign-key constraints, partitions)

and

views, stored procedures, stored functions (except UDFs), triggers, events.

What is not backed up

Some items are not backed up because they are not in databases, or are not persistent, or are not currently supportable by the backup system. MySQL recommends that users take other measures, such as file copy, to ensure that all parts of the system are restorable after RESTORE.

Not-backed-up items are:
Temporary tables
Session objects
Contents of INFORMATION_SCHEMA.GLOBAL_STATUS
Contents of INFORMATION_SCHEMA.GLOBAL_VARIABLES
Contents of INFORMATION_SCHEMA.PROCESSLIST
Contents of INFORMATION_SCHEMA.PROFILING (community only)
Contents of INFORMATION_SCHEMA.SESSION_STATUS
Contents of INFORMATION_SCHEMA.SESSION_VARIABLES
Contents of INFORMATION_SCHEMA.FALCON*
Tablespaces
Plugins
UDF files

Global objects:
- USER
- privileges which are not “per-database”
- PLUGIN (e.g. parser plugin) not including code
- plugin code
- UDF (i.e. function defined in shared libraries) not including code
- UDF code
- LOGFILE GROUP
- TABLESPACE
- SERVER i.e. connections (for federated tables)
- logs (general log, slow query log, storage-engine logs)
- contents of mysql.host
- time zone
- configuration files
- index.xml

Objects outside all databases:
- master.info
- relay-log.info
- Storage engine logs
(Some storage engines have their own log
files. MySQL will not back them up, and
will not restore them. The plan is that
BACKUP will take place ‘as of’ a moment
of validity. Uncommitted data at the
validity-moment time is not restorable.)
- binary log

Static objects:
- Static files that come as part of the MySQL installation,
for example (Windows) mysqld-nt.exe
- User programs
- Operating system shared libraries
- Scripts for starting MySQL server when booting operating system
- Environment initializations for $MYSQL_HOST or similar variables
- Option files (my.cnf, my.ini, mysql-service-agent.ini,
/etc/init.d, /var/lib/mysql-cluster/config.ini)

Plugins:
(The plugins are all the files that can be read in the directory that you see when you say SHOW VARIABLES LIKE ‘plugin_dir’; for example /usr/local/mysql/lib/mysql.)

Files in the basedir ’shared’ directory:
(These files only change if users customize, for example by adding a new collation or translating an error message.
Examples:
/usr/local/mysql/share/mysql/charsets/*.xml
/usr/local/mysql/share/mysql/charsets/*.conf
/usr/local/mysql/share/mysql/english/errmsg.txt)

mysql.online_backup

A new table in the mysql database, mysql.online_backup, has a record of the backups.

It has these columns:

backup_id bigint unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY
This is the ‘Surrogate key for (near) duplicate entries’. This will correspond to the number that is returned by the BACKUP DATABASE statement.

process_id int unsigned NOT NULL
This is ‘The process id that operation ran as’.

binlog_pos int unsigned DEFAULT ‘0′
This is ‘The recorded binlog position of backup/restore’.

binlog_file char(64) DEFAULT NULL
This is ‘The recorded binlog filename at time of backup/restore’.

backup_state enum(’complete’,’starting’,'validity point’,'running’,'error’,'cancel’) NOT NULL
This is the ‘Status of current operation’.

operation enum(’backup’,'restore’) NOT NULL
This is the ‘Type of operation’.

error_num int(11) NOT NULL DEFAULT ‘0′
This is ‘The error from this run 0 == none’.

num_objects int unsigned NOT NULL DEFAULT ‘0′
This is ‘The number of objects in the backup’.

total_bytes bigint unsigned DEFAULT NULL
This is ‘The size of the backup repository in bytes’.

validity_point_time datetime DEFAULT NULL
This is ‘The time of the validity point.’. Times are in the system time zone, which may be UTC. Changes re time zones are happening as we write this.

start_time datetime DEFAULT NULL
This is ‘The date/time of start of operation’.

stop_time datetime DEFAULT NULL
This is ‘The date/time of end of operation’.

host_or_server_name char(30) DEFAULT NULL
This is ‘The server name where operation ran’.

username char(30) DEFAULT NULL
This is ‘The user name who ran the operation’.

backup_file char(100) DEFAULT NULL
This is ‘The name of the file’.

user_comment varchar(200) DEFAULT NULL
This is ‘The comment from user entered at command line’.

command varchar(512) DEFAULT NULL
This is ‘The command used to issue operation’.

engines varchar(100) DEFAULT NULL
This is ‘The name of the storage engines used in the operation’. Soon we’ll change it to ‘drivers’ instead of ‘engines’.

With this table, it’s possible to keep track of what backups exist. You need SELECT privilege to see it.

mysql.online_backup_progress

Another new table in the mysql database, mysql.online_backup_progress, has temporary information while BACKUP or RESTORE is happening.

It has these columns:

backup_id bigint unsigned NOT NULL
This is the ‘Key for online_backup table entries’.

object char(30) NOT NULL
This is ‘The object being operated on’.

start_time datetime DEFAULT NULL
This is ‘The date/time of start of operation’.

stop_time datetime DEFAULT NULL
This is ‘The date/time of end of operation’.

total_bytes bigint DEFAULT NULL
This is ‘The size of the object in bytes’.

progress bigint unsigned DEFAULT NULL
This is ‘The number of bytes processed’.

error_num int NOT NULL DEFAULT ‘0′
This is ‘The error from this run 0 == none’.

notes char(100) DEFAULT NULL
This is the ‘Commentary from the backup engine’.

With this table, it’s possible to monitor progress. Another way to monitor progress is with SHOW PROCESSLIST. You need SELECT privilege to see it.

A session

Let us walk through a session, using all the statements and tools that we have described so far. The following is an actual copy of the mysql client screen output, after clearing the database directory and running mysql_install_db. The server was started with a –log-bin switch.


pgulutzan@linux:~> /usr/local/mysql/bin/mysql --user=root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 6.0.5-alpha-debug-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> /* Create database. Create table. Populate table. */
mysql> CREATE DATABASE dw;
Query OK, 1 row affected (0.05 sec)

mysql> USE dw
Database changed
mysql> CREATE TABLE t (s1 INT, s2 CHAR(5)) engine=falcon;
Query OK, 0 rows affected (0.28 sec)

mysql> INSERT INTO t VALUES (1,'a'),(2,'b'),(3,'c');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> COMMIT;
Query OK, 0 rows affected (0.00 sec)

mysql> /* Backup the database to file 'dw.dmp' on data directory. */
mysql> BACKUP DATABASE dw TO 'dw.dmp';
+-----------+
| backup_id |
+-----------+
| 1         |
+-----------+
1 row in set (2.61 sec)

mysql> /* Pretend to monitor progress. In fact online_backup_progress
   /*>    will show nothing new, and SHOW PROCESSLIST will show nothing,
   /*>    because the BACKUP DATABASE statement has finished.
   /*>    But these statements would show interesting things
   /*>    if we were running them on another connection before
   /*>    BACKUP DATABASE finished. */
mysql> SHOW PROCESSLIST;
+----+------+-----------+------+---------+------+-------+------------------+
| Id | User | Host      | db   | Command | Time | State | Info
|
+----+------+-----------+------+---------+------+-------+------------------+
|  2 | root | localhost | dw   | Query   |    0 | NULL  | SHOW
PROCESSLIST |
+----+------+-----------+------+---------+------+-------+------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM mysql.online_backup_progress\G
*************************** 1. row ***************************
  backup_id: 1
     object: backup kernel
 start_time: NULL
  stop_time: NULL
total_bytes: 0
   progress: 0
  error_num: 0
      notes: starting
*************************** 2. row ***************************
  backup_id: 1
     object: backup kernel
 start_time: NULL
  stop_time: NULL
total_bytes: 0
   progress: 0
  error_num: 0
      notes: running
*************************** 3. row ***************************
  backup_id: 1
     object: backup kernel
 start_time: NULL
  stop_time: NULL
total_bytes: 0
   progress: 0
  error_num: 0
      notes: validity point
*************************** 4. row ***************************
  backup_id: 1
     object: backup kernel
 start_time: NULL
  stop_time: NULL
total_bytes: 0
   progress: 0
  error_num: 0
      notes: running
*************************** 5. row ***************************
  backup_id: 1
     object: backup kernel
 start_time: NULL
  stop_time: NULL
total_bytes: 0
   progress: 0
  error_num: 0
      notes: complete
5 rows in set (0.00 sec)

mysql> /* "Lose" the database. */
mysql> DROP DATABASE dw;
Query OK, 1 row affected (0.00 sec)

mysql> /* Restore the database from the image file 'x.dmp'. */
mysql> RESTORE FROM 'dw.dmp';
+-----------+
| backup_id |
+-----------+
| 2         |
+-----------+
1 row in set (1.77 sec)

mysql> /* Confirm that table t in databasee dw is restored. */
mysql> SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES
    ->   WHERE table_schema = 'dw' AND table_name = 't';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (0.03 sec)

mysql> /* Confirm that table t has the data that it should have. */
mysql> SELECT * FROM dw.t;
+------+------+
| s1   | s2   |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
+------+------+
3 rows in set (0.00 sec)

mysql> /* See what it is in the new mysql.online_backup table. */
mysql> SELECT * FROM mysql.online_backup\G
*************************** 1. row ***************************
          backup_id: 1
         process_id: 0
         binlog_pos: 520
        binlog_file: ./linux-bin.000001
       backup_state: complete
          operation: backup
          error_num: 0
        num_objects: 1
        total_bytes: 24
validity_point_time: 2008-04-10 19:12:00
         start_time: 2008-04-10 19:12:00
          stop_time: 2008-04-10 19:12:00
host_or_server_name: localhost
           username: root
        backup_file: dw.dmp
       user_comment:
            command: BACKUP DATABASE dw TO 'dw.dmp'
            engines: Snapshot
*************************** 2. row ***************************
          backup_id: 2
         process_id: 0
         binlog_pos: 0
        binlog_file: NULL
       backup_state: complete
          operation: restore
          error_num: 0
        num_objects: 1
        total_bytes: 24
validity_point_time: NULL
         start_time: 2008-04-10 19:13:17
          stop_time: 2008-04-10 19:13:17
host_or_server_name: localhost
           username: root
        backup_file: dw.dmp
       user_comment:
            command: RESTORE FROM 'dw.dmp'
            engines: Snapshot
2 rows in set (0.00 sec)

mysql>

Consistent Read

WARNING: This section contains information about the Falcon engine which is not true at time of writing. Since it is an essential part of what makes online backup “online”, we include it anyway.

Suppose user#1 says “BACKUP DATABASE dw …” and then user#2 comes along and says “UPDATE dw.t …”. What happens?

With a “non-online backup”, we would expect that the UPDATE has to wait until the BACKUP DATABASE finishes (the database’s tables are locked), or we would expect that the image file has some rows from table that were changed, and some that were not — an “inconsistent backup”. Both these expectations are gloomy.

With an “online backup”, on the other hand, we would expect that the UPDATE does not have to wait, and yet the backup is guaranteed to be consistent. This depends on a feature that both InnoDB and Falcon storage engines provide: “consistent read”.

By asking for “consistent read” (or “snapshot”), BACKUP DATABASE ensures that everything the DBMS reads, in all tables, in all databases in the database list, is seen “as of” the same moment. Changes that happen during BACKUP DATABASE are ignored. With the InnoDB and Falcon engines, this happens due to their support for consistent read. With MyISAM and MEMORY engines, this cannot happen, so if there are tables created with those storage engines, any UPDATE or INSERT or REPLACE or DELETE or TRUNCATE statements are blocked during copy of those tables. For example:
* Suppose database d contains three tables: t1,t2,t3.
Table t1 is Falcon, t2 is MyISAM, t3 is MEMORY.
* User says “BACKUP DATABASE d;”.
* The DBMS does not need to lock t1 which has consistent read.
The DBMS locks t2, copies all its rows, and unlocks t2.
The DBMS locks t3, copies all its rows, and unlocks t3.

In any case, any DDL (CREATE/ALTER/DROP) statements are always blocked. The word “online” only means that DML statements can take place at the same time as the backup.

A plan

An effective backup plan would then look something like this:

1. Always start the mysqld server with –log-bin, so that there will be a log file. It’s this that allows for backups with log updates, which MySQL calls “incremental backups”. (Incremental backups are not planned for the beta version and may be available as enterprise features.)

2. Regularly back up with BACKUP DATABASE. Monitor progress with mysql.online_backup_progress table or SHOW PROCESSLIST.

3. Regularly audit that backups have happened, by looking at the mysql.online_backup table.

4. Copy image files to safe places.

Occasionally you should keep in shape by testing the RESTORE procedure. It could go like this — on a spare machine or with a set of test databases:

1. Re-read this document. You must be aware that this test can be destructive

2. Collect: your image file, and a copy of the binlog file(s).

3. Read the mysql.online_backup row that corresponds to this image file, with
SELECT * FROM mysql.online_backup
WHERE operation = ‘backup’
AND backup_file = ‘dw.dmp’; /* if ‘dw.dmp’ = image-file-name*/
Make sure that column `backup_state` says ‘complete’.
Make sure that column `command` has a list of the databases that you want to restore.
Make a note of the `binlog_pos` and `binlog_file` column values, which you will need for Step 8.

4. Drop the databases that were in the BACKUP DATABASE statement.

5. Stop the mysqld server. Restart with only one user (max-connections = 1, skip-networking).
Log in as a user with all the privileges needed for restoring (BACKUP, FILE, etc.).

6. RESTORE FROM ‘image-file’;

7. Re-run the binary log from the time of the destruction.
Use mysqlbinlog –start-position=(what you got in Step 3)

It might be possible to restore to a different place by changing @@datadir, but nobody has tried that yet.

Replacing mysqldump

The image file is also useful for copying databases from one server to another, as in mysqldump.

However, MySQL is not deprecating mysqldump and we continue to recommend it for basic import/export. In that environment, the advantages of mysqldump over Online Backup are:
It is readable by humans.
It is possible to be specific about the objects to backup/restore.
It might be faster — we haven’t done many speed tests yet.

New error messages

The following error messages are new.

ER_BACKUP_BACKUP_START 1617 HY000 Starting backup process
ER_BACKUP_BACKUP_DONE 1618 HY000 Backup completed
ER_BACKUP_RESTORE_START 1619 HY000 Starting restore process
ER_BACKUP_RESTORE_DONE 1620 HY000 Restore completed
ER_BACKUP_NOTHING_TO_BACKUP 1621 HY000 Nothing to backup
ER_BACKUP_CANNOT_INCLUDE_DB 1622 HY000 Database ‘%-.64s’ cannot be included in a backup
ER_BACKUP_BACKUP 1623 HY000 Error during backup operation - server’s error log contains more information about the error
ER_BACKUP_RESTORE 1624 HY000 Error during restore operation - server’s error log contains more information about the error
ER_BACKUP_RUNNING 1625 HY000 Can’t execute this command because another BACKUP/RESTORE operation is in progress
ER_BACKUP_BACKUP_PREPARE 1626 HY000 Error when preparing for backup operation
ER_BACKUP_RESTORE_PREPARE 1627 HY000 Error when preparing for restore operation
ER_BACKUP_INVALID_LOC 1628 HY000 Invalid backup location ‘%-.64s’
ER_BACKUP_READ_LOC 1629 HY000 Can’t read backup location ‘%-.64s’
ER_BACKUP_WRITE_LOC 1630 HY000 Can’t write to backup location ‘%-.64s’
ER_BACKUP_LIST_DBS 1631 HY000 Can’t enumerate server databases
ER_BACKUP_LIST_TABLES 1632 HY000 Can’t enumerate server tables
ER_BACKUP_LIST_DB_TABLES 1633 HY000 Can’t enumerate tables in database %-.64s
ER_BACKUP_SKIP_VIEW 1634 HY000 Skipping view %-.64s in database %-.64s
ER_BACKUP_NO_ENGINE 1635 HY000 Skipping table %-.64s since it has no valid storage engine
ER_BACKUP_TABLE_OPEN 1636 HY000 Can’t open table %-.64s
ER_BACKUP_READ_HEADER 1637 HY000 Can’t read backup archive preamble
ER_BACKUP_WRITE_HEADER 1638 HY000 Can’t write backup archive preamble
ER_BACKUP_NO_BACKUP_DRIVER 1639 HY000 Can’t find backup driver for table %-.64s
ER_BACKUP_NOT_ACCEPTED 1640 HY000 %-.64s backup driver was selected for table %-.64s but it rejects to handle this table
ER_BACKUP_CREATE_BACKUP_DRIVER 1641 HY000 Can’t create %-.64s backup driver
ER_BACKUP_CREATE_RESTORE_DRIVER 1642 HY000 Can’t create %-.64s restore driver
ER_BACKUP_TOO_MANY_IMAGES 1643 HY000 Found %d images in backup archive but maximum %d are supported
ER_BACKUP_WRITE_META 1644 HY000 Error when saving meta-data of %-.64s
ER_BACKUP_READ_META 1645 HY000 Error when reading meta-data list
ER_BACKUP_CREATE_META 1646 HY000 Can’t create %-.64s
ER_BACKUP_GET_BUF 1647 HY000 Can’t allocate buffer for image data transfer
ER_BACKUP_WRITE_DATA 1648 HY000 Error when writing %-.64s backup image data (for table #%d)
ER_BACKUP_READ_DATA 1649 HY000 Error when reading data from backup stream
ER_BACKUP_NEXT_CHUNK 1650 HY000 Can’t go to the next chunk in backup stream
ER_BACKUP_INIT_BACKUP_DRIVER 1651 HY000 Can’t initialize %-.64s backup driver
ER_BACKUP_INIT_RESTORE_DRIVER 1652 HY000 Can’t initialize %-.64s restore driver
ER_BACKUP_STOP_BACKUP_DRIVER 1653 HY000 Can’t shut down %-.64s backup driver
ER_BACKUP_STOP_RESTORE_DRIVERS 1654 HY000 Can’t shut down %-.64s backup driver(s)
ER_BACKUP_PREPARE_DRIVER 1655 HY000 %-.64s backup driver can’t prepare for synchronization
ER_BACKUP_CREATE_VP 1656 HY000 %-.64s backup driver can’t create its image validity point
ER_BACKUP_UNLOCK_DRIVER 1657 HY000 Can’t unlock %-.64s backup driver after creating the validity point
ER_BACKUP_CANCEL_BACKUP 1658 HY000 %-.64s backup driver can’t cancel its backup operation
ER_BACKUP_CANCEL_RESTORE 1659 HY000 %-.64s restore driver can’t cancel its restore operation
ER_BACKUP_GET_DATA 1660 HY000 Error when polling %-.64s backup driver for its image data
ER_BACKUP_SEND_DATA 1661 HY000 Error when sending image data (for table #%d) to %-.64s restore driver
ER_BACKUP_SEND_DATA_RETRY 1662 HY000 After %d attempts %-.64s restore driver still can’t accept next block of data
ER_BACKUP_OPEN_TABLES 1663 HY000 Open and lock tables failed in %-.64s
ER_BACKUP_THREAD_INIT 1664 HY000 Backup driver’s table locking thread can not be initialized
ER_BACKUP_PROGRESS_TABLES 1665 HY000 Can’t open the online backup progress tables

Known Bugs

This list excludes bugs which are ‘patch pending’ or ‘patch queued’. To be sure you get an up-to-date list, check the MySQL bugs database http://bugs.mysql.com and click Category = Backup.

32499 BACKUP DATABASE crashes server if database has 600 tables (any engine)
33019 Online backup process can consume too much processor time
33022 Online backup does not preserve charsets for object names
33173 Error log polluted by backup debug info
33352 Backup: crash if I use old set of mysql files.
33354 Backup: restore changes current database to null
33355 Backup: hang if database is mysql
33364 Online Backup: Purge statement missing
33414 Backup: DDL hangs indefinitely if ongoing backup
33564 Backup: deadlock during restore
33566 Backup: crash with partitions and Falcon
33567 Backup: duplicate database names allowed
33568 Backup: crash if many databases in list
33569 Backup: tablespace not restored
33570 Backup: restore failure if sql_mode change
33571 Backup: restore failure if sql_mode=ansi
33573 Backup: timestamp column gets current default
33574 Backup: restore failure if temporary table exists
33575 Backup: crash on shutdown if Falcon table
33836 Backup: tables have default character set latin1
34065 Backup: file names may be too short
34067 Backup: restore failure if directory is not empty
34171 Backup: ignoring –secure-file-priv
34180 Backup: restore fails if CREATE EVENT fails
34181 Backup: crash if @@max_heap_table_size is small
34202 Backup: RESTORE turns off the slow log
34205 Backup: restore fails if Falcon and no autocommit
34210 Backup: lost InnoDB rows if concurrent restore
34480 Backup: Backup to a file copies redundantly if database name is repeated.
34721 Backup driver can’t refuse to provide driver
34758 Server crashes if database with views backed up using CS driver
34759 Server crash when backing-up database with trigger
34766 No error message for Full disk while performing network or local backup
34767 Existing DB contents are lost, if I attempt to restore a corrupted backup file.
34867 Backup: crash if altered view
34868 Backup: restore failure if two procedures
34902 Backup: crash if view depends on dropped object
34903 Backup: restore failure if view and temporary
34915 Backup: Not able to backup view.
35079 It is not possible to interrupt long BACKUP or RESTORE operation
35117 Backup: Server crash for backup of CSV engine with CHAR data type
35118 Backup:Restore fails if previous create Procedure/Function/Trigger command
35230 Backup: no backupdir
35240 RESTORE command doesn’t write binlog position to online_backup table
35249 Mysql server crash for delete operation followed by backup in memory SE
35347 Mysql Server crash while doing restore with views for default driver
35585 Online backup: Apparent synonyms for binlog start_position
35595 Backup crash server
35659 Falcon crashes on shutdown after backup+restore+drop DB
35698 Trying to backup a Falcon database leads to assertion
35806 The current system time becomes stagnant after Restore operation is complete.
35951 Backup: restore fails after directory removal
35967 Backup: can’t delete from mysql.online_backup_progress
36213 Restore fails for a database that has views created using another database .
36257 Backup project missing dependencies
36265 Views not backed up if they are dependent on tables from different Database
36480 Backup: restore fails with changed Falcon tablespace
36530 Backup: Restore can fail due to enabled triggers, events
36531 Backup: Objects can be restored in wrong order
36533 Backup and replication don’t work together
36571 Restore hangs for database size of 33K and more
36572 During Restore, select operation from another connection hangs.
36585 Trigger hangs, if fired on database being restored.
36586 Online backup stream library can miss end of a stream.
36624 Backup stream library works wrong on 64bit platform
36635 Backup: errors should be warnings
36664 Server crash in backup_no_be test on Solaris
36749 Data Loss after Restore, if Trigger fired on the table that is being Restored.
36778 Read operation during RESTORE leads to data loss
36782 View and its related table data lost after Restore.
Comparison with Oracle

Some people, especially migrators, will wonder how MySQL’s “Online Backup” compares with Oracle 11g. This chart shows the essentials. Notice that we compare, not with Oracle Recovery Manager (RMAN), but with Oracle Data Pump export (expdp).

We do this because our backup facility has two drivers that use a table scan algorithm to read the rows thereby making backups taken with the default and consistent snapshot drivers a ‘logical’ backup. As native drivers that use a binary copy algorithm are added, our backup facility will give the user more freedom to choose the type of backup to use. However, for this release, only ‘logical’ backups are available.

              Oracle 11g                    MySQL 6.0
              Data Pump                     "Online Backup"
              ==========                    ===============

Type          "logical database backup"     "logical database backup"
                                            with "physical" aspects

File is       on server?                    on server

Monitoring    "Status" after ctrl+c         tables in mysql database
              or DBA_DATAPUMP_JOBS view     or SHOW PROCESSLIST
              or V$SESSION_LONGOPS view

Preparable    Yes, DBMS_DATAPUMP API        no

Metadata      yes                           database-level metadata
                                            is saved except where noted

Compression   yes, in Enterprise Edition    no

Encryption    yes, in Enterprise Edition    no

Parallel      yes, in Enterprise Edition    no

Data Transforms yes (remap to obfuscate)    no

XML           "supported"                   sure

By Partition  You can dump/restore single   no
              partitions.

Re-Use        Yes, you can overwrite        yes, unavoidably

Exclude/Include Yes, you can skip or add    no
                specified objects

Diff OSs        Yes, you can change to a    Yes, but mixed
                different operating system. endianness has
                                            not been tested

Stop+Restart    Yes, you can stop an export No
                and resume it later.

Methods         File Copy and other ways.   File Copy.

Privileges      CREATE TABLE privilege (for See section titled
                master table), and access   "Coming Soon".
                to a directory object.

Log File        Optional.                   Always.

Remap           Yes e.g. change names of    No.
                tables during re-import.

Flashback       Yes, to a particular SCN.   Yes, "as of" start time.

Read Access     Yes, read-without-RESTORE   No.
                is possible by creating an
                external table.

Example       "expdp                        "BACKUP
               system/password@db10g        DATABASE
               full=Y                       TO
               directory=TEST_DIR           'DB10G.dmp'"
               dumpfile=DB10G.dmp
               logfile=expdpDB10G.log"

Feature#1      Multiple Image Files e.g.    No.
               "DUMPFILE=x1.dmp, x2.dmp".

Feature#2      "ESTIMATE_ONLY"              No.

Feature#3      Picking what to dump or      No.
               restore:
               "EXCLUDE=TABLE:EMP"
               "INCLUDE=TABLE_DATA"
               ALL|DATA_ONLY|METADATA_ONLY
               QUERY='...' predicate clause
               TABLES=...
               TABLESPACES=...

Feature#4      FLASHBACK_SCN|FLASHBACK_TIME No.

Feature#5      parameter file, PARFILE=xxx  No.

Feature#6      "version of objects to       No.
               export" =
               COMPATIBLE | LATEST | or
               "any database version"

Replication

What about Backup and Replication working together? This section contains only tentative suggestions, it is not useful for reference.

If BACKUP DATABASE is done on the master:
it will not be logged.

If RESTORE is done on a master:
it will not be logged.
In future version, RESTORE statement will (possibly) be logged together with the backup image provided that the user uses a special option.

If BACKUP DATABASE is done on a master:
Backup will not be synchronized with replication position. This means that it is hard to know what position one should use on the master after doing a restore. In later versions, the slave position will be synched with the backup image and thus the restore will set the replication state correctly so that it may continue replication. See MySQL’s WL#4105 worklog task description.

If RESTORE is done on a slave:
Replication shall not be allowed to execute while doing restore.

BACKUP DATABASE on a master does not affect a slave.

RESTORE on master will not affect slaves if no slaves are allowed during restore — that’s MySQL’s recommendation.

After RESTORE, the step of ‘replaying the binlog’ should normally be with binlogging off.

At the time of this writing, the work for synchronization with replication is being completed, for the first quarter of 2008 (using Sun’s system of quarters). Some of the strategies being considered include:

The BACKUP DATABASE statement is:
() replicated and the slave also makes an image file
() replicated but the slave does not make an image file
() not replicated

The RESTORE statement is:
() replicated and the slave also restores from an image file
() not replicated
() not even noted in bin log

The mysql database

There is a database named mysql which contains descriptions of objects and persistent information about all other databases. You cannot backup this database –
BACKUP DATABASE mysql TO ‘file’; /* This causes an error */
BACKUP DATABASE * TO ‘file’; /* This doesn’t back up mysql */

Therefore a backup image is not complete protection against a media crash, because the tables in mysql are necessary for reading tables in other databases. Therefore database administrators must take additional precautions with mysql database tables.

One possible set of additional precautions is:
1. LOCK TABLES for all tables in the mysql database.
2. Copy (with SQL or with operating system) all contents of mysql tables to a file, for example ‘mysql-image.bak’.
3. UNLOCK TABLES.
4. Preserve ‘mysql-image.bak’ along with the latest backup image.
Then, after a media crash and a RESTORE, a person with root privileges can add the data back with INSERT statements, using the data in the mysql-image.bak file for inspiration.

At time of writing the items in this category are:

Users and global-level privileges
Everything that’s in the table mysql.user.

Plugins
Everything that’s in the table mysql.plugin.
This does not include code. The code is in a different category, ‘Belonging but not exclusively belonging’.

Connections for federated tables
Everything that’s in the table mysql.servers.

Help
Everything that’s in the table help_category.
Everything that’s in the table help_keyword.
Everything that’s in the table help_relation.
Everything that’s in the table help_topic.

Time zone
Everything that’s in the table time_zone.
Everything that’s in the table time_zone_leap_second.
Everything that’s in the table time_zone_name.
Everything that’s in the table time_zone_transition.
Everything that’s in the table time_zone_transition_type.

Logs
general_log
slow_log

UDFs
LOGFILE GROUPS

mysql.host
mysql.db
mysql.host
mysql.ndb_binlog_index
mysql.online_backup

Known restrictions

Deliberate restrictions which will not change:
1. It is not possible to have BACKUP DATABASE or RESTORE in a function, stored procedure, event, or trigger.
2. It is not possible to PREPARE a ‘BACKUP DATABASE’ statement or a ‘RESTORE’ statement.
3. It is not possible to back up the mysql database (see section “The mysql database”) or the information_schema database.

Accidental restrictions which may someday change:
1. The maximum backup length may be restricted due to the maximum file length with some file systems. For example, file sizes may be limited for 4GB for the file system in certain releases of the Microsoft Windows operating system.

Coming Soon

Online Backup is not a completed feature. This document has only said what is actually here today. By the time that the the feature is “beta”, you should also see:

Syntax for BACKUP DATABASE statement changes:

1. There is an additional optional BACKUP DATABASE clause: [ COMMENT 'string' ]. For example:
BACKUP DATABASE d TO ‘file’ COMMENT ‘x’;
This is useful for putting a ‘tag’ on an image file, and for describing the contents.

2. There is an additional optional SKIP DEPENDENCY CHECKING clause: [ SKIP DEPENDENCY CHECKING ]. For example:
BACKUP DATABASE d to ‘file’ SKIP DEPENDENCY CHECKING;
This clause has the following effect …
If database X refers to database Y (for example because there is a view in X that selects from Y), and you are backing up database X but you are not backing up database Y, and MySQL determines that the dependency is serious enough, then you will see this error:
xxxx Database %s contains a reference to database %s
But if SKIP DEPENDENCY CHECKING is true, you will not see an error. Instead you will see this warning:
yyyy Warning: Database %s contains a reference to database %s

3. Privilege requirements for BACKUP DATABASE will be:
BACKUP, a new global privilege
FILE
SELECT or equivalent privilege on all objects in database.

4. Privilege requirements for RESTORE will be:
RESTORE, a new global privilege
FILE
CREATE USER
Further privileges depending on what you are restoring:
database: CREATE, DROP, GRANT OPTION, EVENT
table: CREATE, ALTER, DELETE, DROP, GRANT OPTION, INSERT, SELECT, UPDATE, TRIGGER
index: CREATE
routine: ALTER ROUTINE, CREATE ROUTINE, EXECUTE
view: CREATE VIEW, SHOW VIEW
You may ask: how can I be expected to have a privilege on table t if I haven’t yet restored table t’s database? Answer: your privileges are part of what is restored. Thus for RESTORE which restores database1, I need all the “databases” privileges on database1, I need all the “tables” privileges on every table in database1, I need all the “stored routines” privileges on every function in database1, I need all the “views” privileges on every view in database1, I need FILE privilege always. I need CREATE USER always.

5. Privilege checking must happen before the first database is backed up or restored. This will ensure that, if there is a privilege problem, you will find out at the start, before any data is backed up or restored.

6. A “native driver” for MyISAM will reduce blocking when a MyISAM table is being backed up.

Enterprise

Users of MySQL “Enterprise” will receive “native drivers” for major MySQL storage engines including Falcon. This will reduce blocking and put the “online” into “online backup for Falcon”.

There was talk at one time about arranging for some advance features (compression, encryption) to be deleivered only as part of the MySQL Enterprise subscription. That will not be the case, according to the current plan, as far as we know. But we’re writing the technical article not the explanation of how GPL works.

Further reading

MySQL wiki “OnlineBackup”

Zmanda blog “MySQL Online Backup”

public worklog tasks containing “backup” in title/description

Slides from presentation at MySQL User Conference 2008

Posted in MySQL 6.0 New Features | No Comments »

MySQL 6.0 Feature #1: Supplementary Characters

Sunday, April 20th, 2008

Supplementary Characters
MySQL 6.0 New Features Document#1
2008-04-20

MySQL 6.0.5 is about to appear on the “MySQL 6.0 Downloads” page http://dev.mysql.com/downloads/mysql/6.0.html

This is MySQL’s alpha version, not to be confused with MySQL 5.1, which has been ‘release candidate’ for a while. The big 6.0 features are “online backup”, “Falcon”, and “supplementary characters”. This document is about the new supplementary characters feature.

What

Since MySQL 4.1, MySQL has supported two Unicode character sets, ucs2 and utf8. Luckily we don’t have to explain all about “Unicode” because there’s good definition and explanation. So if you need help with Unicode in general, read here:
http://www.unicode.org (Unicode reference)
or here:
http://dev.mysql.com/doc/refman/5.1/en/charset-unicode.html (MySQL manual)
We can concentrate in this document on just those aspects that affect the MySQL 6.0 implementation.

Specifically MySQL has supported in the past the characters of Unicode Version 3.0, in the Basic Multilingual Plane (BMP). The BMP characters have these characteristics:
* their code values are between 0 and 65535 (or U+0000 … U+FFFF)
* they can be encoded with a fixed 16-bit word, as in ucs2
* they can be encoded with variable 8, 16, or 24 bits, as in utf8
* they’re good enough for almost all characters in major languages.

So almost all MySQL users are almost always happy, and that’s no longer good enough. In the latest Unicode version 5.0, there are 99089 defined characters. They can’t all fit in the BMP. Characters that aren’t in the BMP are called “Supplementary Characters”. Most of the new characters are rare ideographs in Chinese or Japanese, some others are for languages that weren’t in Unicode before.

To support supplementary characters, MySQL has to introduce new character sets which have a broader range than, and therefore can take more space than, the old ucs2 and utf8.

The task description for supplementary characters went up on the MySQL Forge site in January 2007, with the title
WL#1213: Implement 4-byte UTF8, UTF16 and UTF32
http://forge.mysql.com/worklog/task.php?id=1213
(MySQL Forge is where MySQL occasionally publishes specifications for public review. It shows some but not all of the “Work Log” tasks that MySQL may or may not be introducing in coming versions.) The “Architecture Review” stage was passed in July. The coding is now complete except for fixing the inevitable bugs.

Comparing old and new versions, see these major differences:

MySQL 5.1                                MySQL 6.0

---------                                ---------

all Unicode 3.0 characters               all Unicode 5.0 characters

no supplementary characters              with supplementary characters

ucs2 character set                       no change

utf8 character set for up to 3 bytes     utf8 character set for up to 4 bytes

                                         utf16 character set

                                         utf32 character set

                                         utf8mb3 character set for up to 3 bytes

The changes are upward compatible, except that the maximum length of a utf8 column or index is, in terms of characters, smaller.

Why

MySQL always pays attention to Unicode improvements, so it was inevitable that this change would occur eventually. The reason that it has to happen now is: Microsoft Windows Vista. Vista supports supplementary characters for display, and the Windows API has the necessary functionality for manipulating them.

In particular, Vista allows all the characters defined by JIS X 0213:2004 (”Japanese Industrial Standard 0213 as revised in 2004″), which is often referred to as “JIS:2004″. The new characters of JIS:2004, although rare, might appear in Japanese texts created in the future. Some of the JIS:2004 characters are in the BMP, but many are supplementary. For a quick introduction to JIS:2004 considerations, see
http://en.wikipedia.org/wiki/JIS_X_0213
paying especial attention to the link about “Mapping Tables between JIS X 0213 and Unicode”
http://x0213.org/codetable/index.en.html
Similar considerations apply for the Chinese Industrial Standard, gb18030, which supersedes gb2312 and gbk. See
http://en.wikipedia.org/wiki/GB18030
It’s important to emphasize here that the MySQL 6.0 “Supplementary Characters” feature is adding support for more Japanese and Chinese characters, but not adding support for new Japanese and Chinese character sets. The new characters will only exist in MySQL’s Unicode character sets. Getting into Unicode is an essential first step.

Particularly, some users objected to this behaviour:
User defines a MySQL table with a UTF8 column.
User enters a string containing two characters: an invalid supplementary character, and a valid BMP character.
MySQL does not recognize the supplementary-character code.
But MySQL doesn’t know where the ‘invalid’ character ends. (UTF8 characters are variable length.)
So MySQL skips the rest of the string too.
The second character, the valid BMP character, is lost.
See Bug#9337 and Bug#12600 in MySQL’s bugs database, bugs.mysql.com. Users had to avoid trying to insert supplementary characters, or redefine the column as a BLOB.

Apart from East Asian language considerations, MySQL’s new feature will also interest users of languages or writing systems which were not in Unicode 3.0, such as: Gothic, Tagalog, Glagolitic, Runic, Balinese, and Cuneiform. See also
http://babelstone.blogspot.com/2005/11/whats-new-in-unicode-50.html
Sorry, folks, no Egyptian hieroglyphs until Unicode 5.2 arrives.

How Much

There is a Unicode page which shows when characters first got Unicode values
http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
By counting the items in “Newly assigned” lists on this page, we see that there are 2816 BMP characters which are “new” since version 3.0 and 47046 supplementary characters which are “new” since version 3.0. (Prior to Unicode 3.0, there were no supplementary characters.)

                      BMP        Supplementary Total

                      ---------  ------------- -----

In 3.1: March 2001       2               44976 44978

In 3.2: March 2002    1014                   2  1016

In 4.0: April 2003     449                 777  1226

In 4.1: March 2005    1011                 262  1273

In 5.0: July 2006      340                1029  1369

                      ----               ----- -----

                      2816               47046 49862

Most of the “new BMP” characters are already okay in MySQL 5.1, and all of them will be okay in all Unicode character sets in MySQL 6.0.

None of the “new supplementary” characters are okay in MySQL 5.1, but all of them will be okay in most of the Unicode character sets in MySQL 6.0.

Relating the Unicode Standard to the SQL Standard

MySQL’s goal is to “follow” the 2003 version of the SQL standard (SQL:2003), like the MySQL Reference Manual says. But the next version of the SQL standard, tentatively known as SQL:2008, is already circulating in draft form and is unlikely to change much between now and its introduction, which is imminent.

SQL:2008 refers many times to the Unicode standard, and depends on it for definitions, repertoires, and names. It does not name the Unicode version that vendors will have to support, but does refer to the “Latest Unicode Version”, which means it refers to Unicode 5.0
http://www.unicode.org/versions/enumeratedversions.html

A subtle point is that Unicode has an “encoding” named “UTF-8″, while SQL:2008 has a “character set” (as well as an encoding) named “UTF8″. MySQL of course follows the SQL standard for definitions and names. In documentation, we try to remember to put the “-” in when we specifically mean Unicode, and not Unicode as seen through SQL lenses.

The ucs2 character set is not in SQL:2008.

The utf16 and utf32 and utf8 character sets are in SQL:2008, and correspond of course to Unicode encodings UTF-16, UTF-32, and UTF-8.

Old Character set ucs2

The ucs2 character set does not change in essence.

There are some Unicode-5.0 characters which are inside the BMP, which MySQL will support with ucs2. But ucs2 remains what it always has been: a fixed 16-bit encoding for Unicode BMP characters.

New Character set utf16

The utf16 character set is the ucs2 character set, with an extension that’s necessary for supplementary characters.

For a BMP character, utf16 and ucs2 have identical storage characteristics — same code values, same encoding, same length.

For a supplementary character, utf16 has a special sequence, reminiscent of an ‘escape sequence’, for representing the character using 32 bits. This is called the “surrogate” mechanism. Briefly it works like this:
for a number greater than 0xffff, take 10 bits and add them to 0xd800 and put them in the first 16-bit word, take 10 more bits and add them to 0xdc00 and put them in the next 16-bit word.
So all supplementary characters require 32 bits in all, and the first 16 bits are a number between 0xd800 and 0xdbff, and the last 16 bits are a number between 0xdc00 and 0xdfff. Examples are in “15.5 Surrogates Area” in the Unicode 4.0 document
http://unicode.org/versions/Unicode4.0.0/ch15.pdf

For example, suppose you have the three supplementary characters illustrated in Figure 1.


Supplementary

As the figure shows, 100CC becomes D800 DCCC, and so on. You can confirm this by doing the calculation as we have described, or by using a handy online converter which lets you input code points and get UTF16 or UTF8 values:
http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=ff9d&mode=hex

Because utf16 supports surrogates and ucs2 doesn’t, there is a validity check which applies only in utf16: you can’t insert a top surrogate without a bottom surrogate, or vice versa. For example:

INSERT INTO t (ucs2_column) VALUES (0xd800); /* legal */INSERT INTO t (utf16_column)VALUES (0xd800); /* illegal */

There is no validity check for characters which are technically valid but are not true Unicode (that is, Unicode considers them to be ‘unassigned code points’ or ‘private use’ characters or even ‘illegals’ like 0xffff). For example, since U+F8FF is the Apple Logo:

INSERT INTO t (utf16_column)VALUES (0xf8ff); /* legal */

Don’t expect such characters to mean the same thing to everyone.

Since MySQL must allow for the worst case — that 1 character requires 4 bytes — the maximum length of a utf16 column or index is only one half of the maximum length for a ucs2 column or index. For example, since at the time of writing the maximum length of a Falcon index key is 1100 bytes, this is legal:

CREATE TABLE tf (s1 VARCHAR(550) CHARACTER SET ucs2) engine=falcon;

CREATE INDEX i ON tf (s1);

CREATE TABLE tg (s1 VARCHAR(275) CHARACTER SET utf16) engine=falcon;

CREATE INDEX i ON tg (s1);

You cannot use utf16 as a client character set.

Revised character set utf8

We changed the characteristics of the utf8 character set, from (maximum 3 bytes per character) to (maximum 4 bytes per character). We had reasons:
* utf8 is the ‘SQL:2003 standard’ name for the full-repertoiree character set
* because the new is ‘upward compatible’ with the old, with respect to repertoire, switching to MySQL 6.0 will involve no conversion of existing data or definitions, unless columns are very long.

Let’s not minimize that subordinate clause “unless columns are very long”. The catch, as with utf16 versus ucs2, is that the maximum length of a column or index key is unchanged in terms of bytes. Therefore it is smaller in terms of characters, because the maximum length of a character is 4 bytes instead of 3. The three things to watch for are:
(1) look at all application programs, and make sure that buffer lengths aren’t calculated with an algorithm like “number of characters times 3″
(2) look at all indexes on utf8 columns, and make sure they won’t exceed the maximum length for the storage engine. Sometimes the maximum can change due to storage engine enhancements.
(3) look at the definitions of utf8 columns, and make sure they won’t exceed the maximum length for the storage engine.
(4) do the same for TEXT as well as CHAR or VARCHAR.
The MySQL Reference Manual will call this an INCOMPATIBLE CHANGE.

If the above situations apply for you, then you will have to reduce the defined length, or you will have to change the column definition from “CHARACTER SET utf8″ to “CHARACTER SET utf8mb3″ (see utf8mb3 description in next section). Other than that, we currently expect that the transition from 5.1 to 6.0 will be painless.

For a BMP character, utf8 (5.1) and utf8 (6.0) have identical storage characteristics — same code values, same encoding, same length.

For a supplementary character, utf8 (5.1) can’t store the character at all, while utf8 (6.0) requires 4 bytes to store. Since utf8 (5.1) can’t store the character at all, you don’t have any supplementary characters in UTF8 columns in your 5.1 database, so you don’t have to worry about converting characters or losing data.

Renamed character set utf8mb3

For the few cases where it’s desirable to have complete compatibility with the old utf8 character set, you can define with “CHARACTER SET utf8mb3″, that is, “utf8 with multibyte 3″, with the maximum number of bytes = 3. The utf8mb3 character set has these characteristics:
* no support for supplementary characters
* maximum 3 bytes per character
In other words, version-6.0 utf8mb3 has exactly the same characteristics as version-5.1 utf8. We expect that utf8mb3 will only be needed when updating from MySQL version 5.1, as described in the previous section. In a few years we’ll probably deprecate utf8mb3.

We know that there will be some people who say: “MySQL should allow utf8mb3 as an alias for utf8 in MySQL 5.1, to make replication easier.” We won’t accommodate those people; sorry.

Exactly the same set of characters is available in utf8mb3 as in ucs2, that is, they have the same repertoire.

New character set utf32

The utf32 character set is fixed length (like ucs2), but allows 32 bits for every character (unlike ucs2 which uses 16 bits for every character).

Probably most people will avoid utf32 for space reasons. But it does have two advantages: it’s predictable for storage (number of bytes = number of characters times 4), and there are no tricks in encoding so the stored value = the code value.

To show how the latter advantage is useful, here’s a trick that shows how a user can figure out the utf8 value given the code value.

/* Assume code value = 100cc LINEAR B CHARIOT WHEEL */

CREATE TABLE tmp (utf32 CHAR(1) CHARACTER SET utf32,utf8 CHAR(1) CHARACTER SET utf8);

INSERT INTO tmp VALUES (0x000100cc,NULL);

UPDATE tmp SET utf8 = utf32;

SELECT HEX(utf32),HEX(utf8) FROM tmp;

Storage is big-endian, that is, the highest byte comes first and there is no need for a mark to specify the byte ordering.

MySQL is very forgiving about additions of unassigned Unicode characters, private-use-area characters, and other code values that you won’t find in the official Unicode 5.0 Character Database (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt). There is in fact only one validity check for utf32: no code value may be greater than 0×10ffff. For example this is illegal:

INSERT INTO t (utf32_column) VALUES (0x110000); /* illegal */

You cannot use utf32 as a client character set.

It should be especially useful in prepared statements.

New collations, same as the old collations

With all these changes to character sets, it would be disruptive to change all the collations too. We made the absolute minimum of changes. To wit:

The new character sets all have collations with the same naming conventions as the old ones. The complete list of Unicode collations is:

ucs2_bin           utf16_bin           utf8_bin           utf8mb3_bin_ci        utf32_bin

ucs2_czech_ci      utf16_czech_ci      utf8_czech_ci      utf8mb3_czech_ci      utf32_czech_ci

ucs2_danish_ci     utf16_danish_ci     utf8_danish_ci     utf8mb3_danish_ci     utf32_danish_ci

ucs2_esperanto_ci  utf16_esperanto_ci  utf8_esperanto_ci  utf8mb3_esperanto_ci  utf32_esperanto_ci

ucs2_estonian_ci   utf16_estonian_ci   ut8_estonian_ci    utf8mb3_estonian_ci   utf32_estonian_ci

ucs2_general_ci    utf16_general_ci    utf8_general_ci    utf8mb3_general_ci    utf32_general_ci

ucs2_hungarian_ci  utf16_hungarian_ci  utf8_hungarian_ci  utf8mb3_hungarian_ci  utf32_hungarian_ci

ucs2_icelandic_ci  utf16_icelandic_ci  utf8_icelandic_ci  utf8mb3_icelandic_ci  utf32_icelandic_ci

ucs2_latvian_ci    utf16_latvian_ci    utf8_latvian_ci    utf8mb3_latvian_ci    utf32_latvian_ci

ucs2_lithuanian_ci utf16_lithuanian_ci utf8_lithuanian_ci utf8mb3_lithuanian_ci ut32_lithuanian_ci

ucs2_persian_ci    utf16_persian_ci    utf8_persian_ci    utf8mb3_persian_ci    utf32_persian_ci

ucs2_polish_ci     utf16_polish_ci     utf8_polish_ci     utf8mb3_polish_ci     utf32_polish_ci

ucs2_roman_ci      utf16_roman_ci      utf8_roman_ci      utf8mb3_roman_ci      utf32_roman_ci

ucs2_romanian_ci   utf16_romanian_ci   utf8_romanian_ci   utf8mb3_romanian_ci   utf32_romanian_ci

ucs2_slovak_ci     utf16_slovak_ci     utf8_slovak_ci     utf8mb3_slovak_ci     utf32_slovak_ci

ucs2_slovenian_ci  utf16_slovenian_ci  utf8_slovenian_ci  utf8_slovenian_ci     utf32_slovenian_ci

ucs2_spanish_ci    utf16_spanish_ci    utf8_spanish_ci    utf8mb3_spanish_ci    utf32_spanish_ci

ucs2_spanish2_ci   utf16_spanish2_ci   utf8_spanish2_ci   utf8mb3_spanish2_ci   utf32_spanish2_ci

ucs2_swedish_ci    utf16_swedish_ci    utf8_swedish_ci    utf8mb3_swedish_ci    utf32_swedish_ci

ucs2_turkish_ci    utf16_turkish_ci    utf8_turkish_ci    utf8mb3_turkish_ci    utf32_turkish_ci

ucs2_unicode_ci    utf16_unicode_ci    utf8_unicode_ci    utf8mb3_unicode_ci    utf32_unicode_ci

The ‘general’ collations are still default but Unicode collations are recommended.

How MySQL collates new characters

For all collations except the “binary” (_bin) collations, MySQL performs a table lookup to find a character’s collating weight. This weight is visible with the new weight_string() function. (See here for description of weight_string: http://forge.mysql.com/worklog/task.php?id=3716)
But what if a character isn’t in the table, for example because it’s a “new” character? In that case things get more complex.

For BMP characters in general collations, e.g. utf8_general_ci, weight = code point.

For BMP characters in uca collations, e.g. utf8_unicode_ci,
if (code >= 0×3400 && code <= 0×4DB5)
base= 0xFB80; /* CJK Ideograph Extension */
else if (code >= 0×4E00 && code <= 0×9FA5)
base= 0xFB40; /* CJK Ideograph */
else
base= 0xFBC0; /* All other characters */
aaaa= base + (code >> 15);
bbbb= (code & 0×7FFF) | 0×8000;
For example,

mysql> SELECT HEX(weight_string(_ucs2 0x04CF COLLATE ucs2_unicode_ci));

+-----------------------------------------------------------+

| hex(weight_string(_ucs2 0x04CF collate ucs2_unicode_ci))  |

+-----------------------------------------------------------+

| FBC084CF                                                  |

+-----------------------------------------------------------+

1 row in set (0.00 sec)

Thus, U+04cf CYRILLIC SMALL LETTER PALOCHKA is, with all uca collations, greater than U+04c0 CYRILLIC LETTER PALOCHKA. Only later, after MySQL upgrades the collations according to WL#2673, will all palochkas sort together.

For supplementary characters:
the weight is the weight for 0xfffd REPLACEMENT CHARACTER.
That is, to MySQL, all supplementary characters are equal to each other, and greater than almost all BMP characters.

An example with Deseret characters and COUNT(DISTINCT):

CREATE TABLE t (s1 VARCHAR(5) CHARACTER SET utf32 COLLATE utf32_unicode_ci);

INSERT INTO t VALUES (0xfffd); /* REPLACEMENT CHARACTER */

INSERT INTO t VALUES (0x010412); /* DESERET CAPITAL LETTER BEE */

INSERT INTO t VALUES (0x010413); /* DESERET CAPITAL LETTER TEE */

SELECT COUNT(DISTINCT s1) FROM t;

The result is: 1, because Deseret Bee = Deseret Tee = Replacement Character, in the MySQL Unicode collation.

An example with cuneiform characters and WEIGHT_STRING():

/*

The four characters in the INSERT string are

00000041 # LATIN CAPITAL LETTER A

0001218F # CUNEIFORM SIGN KAB

000121A7 # CUNEIFORM SIGN KISH

00000042 # LATIN CAPITAL LETTER B

*/

CREATE TABLE t (s1 CHAR(4) CHARACTER SET utf32 COLLATE utf32_unicode_ci);

INSERT INTO t VALUES (0x000000410001218f000121a700000042);

SELECT HEX(WEIGHT_STRING(s1)) FROM t;

/*

The result is:

0E33 FFFD FFFD 0E4A

*/

where 0E33 and 0E4a are primary weights as in UCA 4.0.0
ftp://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt
where FFFD is the weight KAB and also the weight for KISH.

The “supplementary character collation” rule is non-wonderful. It was necessary, so that we wouldn’t have to rewrite the collation algorithms at the same time that we were making possibly disruptive changes to the character sets. We don’t expect the rule to cause trouble, because the characters are very rare, so it will be very rare that a multi-character string consists entirely of supplementary characters. In Japan, since the supplementary characters are obscure Kanji ideographs, the typical user doesn’t care what order they’re in anyway. However, if you really want to get rows sorted by MySQL’s rule and secondarily by code point value, it’s easy:

ORDER BY s1 COLLATE utf32_unicode_ci, s1 COLLATE utf32_bin

The utf16_bin collation

There is a difference between “ordering by the character’s code value” and “ordering by the character’s binary representation”, a difference which only appears with utf16_bin, because of surrogates.

Suppose utf16_bin was a binary comparison “byte by byte” rather than “character by character”. If that were so, then the order of characters in utf16_bin (binary collation for utf16) would differ from the order in utf8_bin. For example:

Here is a chart showing two rare characters. The first character is in the range E000-FFFF, so it is greater than a surrogate but less than a supplementary. The second character is a supplementary.

Code point  Character                    utf8         utf16

----------  ---------                    ----         -----

0FF9D       HALFWIDTH KATAKANA LETTER N  EF BE 9D     FF 9D

10384       UGARITIC LETTER DELTA        F0 90 8E 84  D8 00 DF 84

The two characters in the chart are in order by code point value, because 0xff9d < 0×10384. And they are in order by utf8 value, because 0xef < 0xf0. But they are not in order by utf16 value, if we use byte-by-byte comparison, because 0xff > 0xd8.

So MySQL’s utf16_bin collation is not “byte by byte”. It is “code point by code point”. When MySQL sees a supplementary-character encoding in utf16, it converts to the character’s code-point value, and then compares. Therefore utf8_bin and utf16_bin are the same ordering. This is consistent with the SQL:2008 standard requirement for a UCS_BASIC collation: “UCS_BASIC is a collation in which the ordering is determined entirely by the Unicode scalar values of the characters in the strings being sorted. It is applicable to the UCS character repertoire. Since every character repertoire is a subset of the UCS repertoire, the UCS_BASIC collation is potentially applicable to every character set. NOTE 11 — The Unicode scalar value of a character is its code point treated as an unsigned integer.”

If the character set is ucs2, then comparison is byte by byte, but ucs2 strings shouldn’t contain surrogates anyway.

Comparing Unicode to Unicode

Generally, if you try to compare two character columns and the character sets are different and neither character set is a proper superset of the other one, this error message appears:

"ERROR 1267 (HY000): Illegal mix of collations (...,IMPLICIT) and (...,IMPLICIT) for operation '...'")

But the mix may be legal if one character set is a subset of the other. For example:

CREATE TABLE t1 (

  latin1_column VARCHAR(5) CHARACTER SET latin1,

  utf8_column VARCHAR(5) CHARACTER SET utf8);

...

SELECT * FROM t1 WHERE latin1_column = utf8_column; /* legal */

The superset (utf8 in this case) takes precedence.

Now, it is clear that the set of characters in utf8mb3/ucs2 is a proper subset of the set of characters in utf8/utf16/utf32. So the intent is that comparisons will be legal for all collations except binary collations. However, that part of the implementation has not begun.

Not to worry. Just cast to the character set whose rules you prefer:

CREATE TABLE t1 (

  utf16_column VARCHAR(5) CHARACTER SET utf16,

  utf8_column VARCHAR(5) CHARACTER SET utf8);

...

SELECT *

FROM t1

WHERE CAST(utf16_column AS CHAR(2) CHARACTER SET utf8) = utf8_column;

Identifiers

Currently all identifiers (names of tables or columns or triggers or variables and so on) are UTF8. In the new version they will also be UTF8. If you’ve followed so far, you know that means that, technically, identifiers are in the “UTF8 with maximum 4 bytes” character set.

However, that does not mean that MySQL will allow use of supplementary characters for identifiers. If you try to use such a character in a name, you will get
an error message. For compatibility reasons, MySQL will restrict such use, at least in version 6.0.

For the moment, because the identifier character set is restricted even though it’s officially termed utf8, the maximum length of a name is the same as it used to be.
For example,

SELECT CHARACTER_MAXIMUM_LENGTH

FROM INFORMATION_SCHEMA.COLUMNS

WHERE column_name='TABLE_NAME'

AND table_schema = 'information_schema'

AND table_name='columns';

gives us ‘64′, in both MySQL 5.1 and MySQL 6.0.

That’s another step that we took to make upgrade easier, although we think it is a trifle odd.

Update

You should run mysql_upgrade whenever you change versions. That is still true when you move from MySQL 5.1 to MySQL 6.0. However, you won’t have problems that are due to existing data containing too-long index keys or existing metadata containing too-long identifiers. (You might have to look at any scripts that have long index keys or long identifiers or mention UTF8, as mentioned elsewhere, but that’s not strictly speaking an ‘upgrade’ problem.)

It’s possible to go from MySQL 5.1 to 6.0 and back again to 5.1 if you do not run mysql_upgrade. You might have these problems:
* Your database contains a UTF8 column value with supplementary characters. The supplementary characters, and everything following them in the same column, will be truncated.
* Your database contains utf8mb3, utf32, or utf16 columns. The version-5.1 DBMS will not recognize those character sets and will return an error.
It means that, if you’re using MySQL Replication and you have a 5.1 master for a 6.0 slave, failure can happen.

Update, Names Change but Numbers Stay the Same

You can see the collation number by selecting ID from INFORMATION_SCHEMA.COLLATIONS. For example (with MySQL 5.1):

mysql> select * from information_schema.collations

-> where collation_name = 'utf8_general_ci'\G

*************************** 1. row ***************************

COLLATION_NAME: utf8_general_ci

CHARACTER_SET_NAME: utf8

ID: 33

IS_DEFAULT: Yes

IS_COMPILED: Yes

SORTLEN: 1

1 row in set (0.00 sec)

But in MySQL 6.0, you will see different numbers:

mysql> SELECT * FROM information_schema.collations

-> WHERE collation_name = 'utf8_general_ci'

-> OR collation_name = 'utf8mb3_general_ci'\G

*************************** 1. row ***************************

COLLATION_NAME: utf8mb3_general_ci

CHARACTER_SET_NAME: utf8mb3

ID: 33

IS_DEFAULT: Yes

IS_COMPILED: Yes

SORTLEN: 1

*************************** 2. row ***************************

COLLATION_NAME: utf8_general_ci

CHARACTER_SET_NAME: utf8

ID: 45

IS_DEFAULT: Yes

IS_COMPILED: Yes

SORTLEN: 1

2 rows in set (0.01 sec)

What’s going on? Has the ID changed?
No, the COLLATION_NAME has changed.

Numbering for utf8 collations does not change. So when MySQL sees (for example) ‘collation id = 33′ in internal tables, it considers the column to be utf8mb3. And, since MySQL-5.1 utf8/utf8_general_ci columns were created with id=33, that means MySQL will see any ‘created-by-5.1′ utf8 columns as utf8mb3.

For example:

/* in MySQL 5.1 */

CREATE TABLE t (s1 CHAR(5) CHARACTER SET UTF8);

SHOW CREATE TABLE t;

mysql> SHOW CREATE TABLE t;

+-------+------------------------------------------+

| Table | Create Table |

+-------+------------------------------------------+

| t | CREATE TABLE `t` (

`s1` char(5) CHARACTER SET utf8 DEFAULT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

+-------+------------------------------------------+

1 row in set (0.01 sec)/* in MySQL 6.0 */

mysql> SHOW CREATE TABLE t;

+-------+------------------------------------------+

| Table | Create Table |

+-------+------------------------------------------+

| t | CREATE TABLE `t` (

`s1` char(5) CHARACTER SET utf8mb3 DEFAULT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

+-------+------------------------------------------+

1 row in set (0.01 sec)

To change any column to utf8, the user will have to say:

ALTER TABLE t MODIFY COLUMN s1 CHAR(5) CHARACTER SET UTF8;

If you don’t ALTER any columns to utf8, or create new ones with utf8, downgrade is still possible.

But there is a downside:
* Outside column definitions, ‘utf8′ remains unchanged.
So there will be _utf8 introducers, utf8 stored-procedure variables, utf8 function returns, utf8 defaults, utf8 targets in “SET NAMES” scripts, utf8 in mysqldump, etc.
* Users won’t be able to get the switch over with in a few key strokes.

So some things may change while 6.0 is still alpha.

Update, conclusion, two concepts

Coming to a conclusion about all I’ve just said re updating: conceptually there are two ways to go from 5.1 to 6.0.

1. If you start 6.0 over an existing 5.1 database, it will work without changes in keys!
However SHOW commands will report “utf8mb3″.
So will INFORMATION_SCHEMA queries.
In short:
* there are no key differences
* there are name differences.

2. If you dump a 5.1 database and then start 6.0 and load the dump using 6.0, character set data will be reported as utf8, as in 5.1.
However it will already mean “new” utf8, and that’s why I said earlier there are no big problems “unless columns are very long”. The “unless columns are very
long” constraint is applicable only in this case.
In short:
* there are key differences
* there are no name differences.

Performance

It is well known that sorting with Unicode collations takes longer than sorting with 8-bit-byte collations. However, our tests so far indicate that:
* sorting with 6.0 is slower than sorting with 5.0 anyway, as one would expect in an early product version
* sorting with utf8_unicode_ci is only about 10% slower than sorting the same values with utf8mb3_unicode_ci.

At this stage, we can only say: test before converting, but, well, you won’t be converting immediately anyway.

Converting between Unicode character sets

Here the word “conversion” includes the explicit functions CONVERT() and CAST(), and the implicit conversion that happens when you assign a source
string with CHARACTER SET x to a target with CHARACTER SET y.

CONVERTING TO UTF16:
MySQL does little validity checking. Users are able to insert 0xd800 (half a surrogate) in ucs2 columns. That is illegal in utf16 columns, where surrogates have meaning. So conversion from ucs2 to utf16 can fail. For example:

CREATE TABLE t1 (s1 VARCHAR(1) CHARACTER SET ucs2);

CREATE TABLE t2 (s1 VARCHAR(1) CHARACTER SET utf16);

NSERT INTO t1 VALUES (0xd800);     /* legal */

INSERT INTO t2 VALUES (0xd800);    /* illegal */

INSERT INTO t2 SELECT * FROM t1;   /* fails */

But valid BMP characters are okay (for valid BMP characters the repertoire of ucs2 and utf16 is the same). And valid surrogate characters are okay. For example:

CREATE TABLE t1 (s1 VARCHAR(1) CHARACTER SET ucs2);

CREATE TABLE t2 (s1 VARCHAR(1) CHARACTER SET utf16);

INSERT INTO t1 VALUES (0xD801DC12);     /* legal */

INSERT INTO t2 VALUES (0xD801DC12);     /* legal */

INSERT INTO t2 SELECT * FROM t1;        /* succeeds */

In general: bad values (values which would make a partial surrogate) are legal in utf8 and utf8mb3 and utf32, but not in utf16. Therefore conversion to utf16 may fail if there are bad values.

CONVERTING TO UCS2 OR UTF8MB3:
Since utf8 and utf16 and utf32 strings may contain supplementary characters, and since ucs2 or utf8mb3 characters may not contain them, conversion to ucs2
or utf8mb3 may fail if there are supplementary values.

Converting to other character sets

You can put supplementary characters in Unicode columns, but not in non-Unicode columns. If you convert to sjis or gb2312 or gbk or etc., the supplementary characters become ‘?’s and you see a warning. For example:

mysql> CREATE TABLE linear_b (s1 VARCHAR(5) CHARACTER SET utf8);

Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO linear_b VALUES (0xf090838c); /* wheeled chariot, again */

Query OK, 1 row affected (0.01 sec)

mysql> ALTER TABLE linear_b MODIFY COLUMN s1 CHAR(5) CHARACTER SET gbk;

Query OK, 1 row affected, 1 warning (0.03 sec)

Records: 1  Duplicates: 0  Warnings: 0

mysql> SHOW WARNINGS;

+---------+------+---------------------------------------------------------------------+

| Level   | Code | Message                                                             |

+---------+------+---------------------------------------------------------------------+

| Warning | 1366 | Incorrect string value: '\xF0\x90\x83\x8C' for column 's1' at row 1 |

+---------+------+---------------------------------------------------------------------+

1 row in set (0.00 sec)

It would be nice to say that we hope to support gb18030 and jis2004 soon, but that depends on lots of things. We’ll say nothing re roadmaps here.

Some Examples

Because your browser might not display supplementary characters correctly, we’ll use hex encoding for these examples and we won’t try to show results.

CREATE TABLE t1 (s1 CHAR(1) CHARACTER SET utf32);

INSERT INTO t1 VALUES ('a');

SELECT OCTET_LENGTH(s1) FROM t1; /* result = 4 */

SELECT _utf8mb3 'x'; /* introducers are still legal */

/* Create table, use all Unicode character sets */

CREATE TABLE t (ucs2 VARCHAR(2) CHARACTER SET ucs2,

                utf8mb3 VARCHAR(2) CHARACTER SET utf8mb3,

                utf8 VARCHAR(2) CHARACTER SET utf8,

                utf16 VARCHAR(2) CHARACTER SET utf16,

                utf32 VARCHAR(2) CHARACTER SET utf32);

/* Insert using a literal with an introducer. */

INSERT INTO t (utf8mb3) VALUES (_utf8mb3 'A');

/* Update with hex value of 'A', specifying lead 0s. */

UPDATE t SET utf32 = 0x00000041;

/* Update with hex value of 'BB', assuming lead 0s. */

UPDATE t SET utf32 = 0x4200000042;

/* Update, expecting no change of encoded values. */

UPDATE t SET utf8 = utf8mb3;

/* Update, putting half-surrogate in ucs2 column. */

UPDATE t SET ucs2 = 0xd801;

/* Update, with invalid character, so this will fail. */

UPDATE t SET utf16 = ucs2;

/* Update, with invalid character, but it won't fail. */

UPDATE t SET utf8 = ucs2;

/* Update, putting full surrogate in utf16 column.

   It stands for code point value 0x20000, which is

   the first "CJK Ideograph Extension B". */

UPDATE t SET utf16 = 0xD840DC00;

/* Select, converting to utf32. Result = 0x00020000. */

SELECT HEX(CAST(utf16 AS CHAR(1) CHARACTER SET utf32)) FROM t;

/* Select, converting to utf8mb3. Result = '?'.

   (For a conversion failure, result is always '?'. */

SELECT CONVERT(utf16 USING utf8mb3) FROM t;

/* Select, if utf16 value is exactly equal to code point

   value 0x20000. Result: 1 row. */

SELECT * FROM t WHERE

 CAST(utf16 AS CHAR(1) CHARACTER SET utf32) COLLATE utf32_bin

 = 0x00020000;

/* Select, if utf16 value is 0xfffd with general collation. */

SELECT * FROM t WHERE utf16 = 0xfffd;

Changes to associated products

Not everything happens at once. Different MySQL products work on different schedules. Some products are from other companies so they’ll naturally take a bit longer. On the other hand, features closely connected with the server are all finished / available starting with this version. Here is a partial list of MySQL products, associated products, storage engines, and dates or version numbers when they will support supplementary characters.

Connector / Java                       NOT READY YET

Connector / NET                        NOT READY YET

Connector / PHP                        NOT READY YET

Documentation / Reference Manual       NOT READY YET

Documentation / Expert Guides          NOT READY YET

Front End: mysql                       OKAY

Front End: Monitor and Advisors        NOT READY YET

Front End: MySQL Workbench             NOT READY YET

Front End: SQLyog                      NOT READY YET

Storage Engine: Falcon                 OKAY

Storage Engine: InnoDB                 OKAY

Storage Engine: Memory                 OKAY

Storage Engine: MyISAM                 OKAY

Storage Engine: Solid                  NOT READY YET

We will update this list as products change. In the meantime …
You risk data loss if you have utf8 storage with engine=myisam and you change to a storage engine which knows nothing of the change, e.g. with

ALTER TABLE t ENGINE=soliddb;

Compatibility with other DBMSs

When migrating to or from DBMS products of other vendors, keep in mind that:

DB2 may support supplementary characters with an encoding known as CESU-8. This, and the very similar “Modified UTF-8″ encodings of Sun and Java, are not true UTF-8 so MySQL will not recognize hex-encoded INSERT statements containing such supplementary characters.

SQL Server 2005, using a Microsoft employee’s words, is “surrogate safe” but not “surrogate aware”. This should mean that it’s possible to transfer Unicode data from MySQL to SQL Server without error, but doesn’t necessarily mean that SQL Server will understand it.

Oracle 10g supports Unicode well. The only thing to watch out for is the naming. These character sets approximately correspond:

MySQL 6.0        Oracle 10g

---------        ----------

UTF8             AL32UTF8 (not "UTF8", Oracle UTF8 is CESU-8!!)

UTF16            AL16UTF16UTF32 [ we think there's no exact equivalent ]

Questions or Comments

Before contacting us about a problem with supplementary characters in MySQL 6.0, check that your operating system and browser and application program handle supplementary characters and associated fonts. If the problem comes from outside MySQL, please contact the appropriate provider.

To comment about this feature, use the MySQL “Character Sets, Collation, Unicode” Forum:
http://forums.mysql.com/list.php?103

To report bugs in a alpha product, or comment on feature requests, use the MySQL Bugs Database:
http://bugs.mysql.com

This article was written before all bugs were fixed, and they may still be unfixed at the time you read this. To find the latest active bugs according to bugs.mysql.com, click Advanced Search and ask for category = Character Sets, sorted by = ID, in descending order = yes.

Posted in MySQL 6.0 New Features | No Comments »

New Features In MySQL 6.x is proudly powered by WordPress MU running on Blogs.mysql.com.
Entries (RSS) and Comments (RSS).