哪個字符集和 COLLATION

Created: November-22, 2018

有數十個字符集，有數百個排序規則。（給定的排序規則只屬於一個字符集。）請參閱 SHOW COLLATION; 的輸出。

通常只有 4 個 CHARACTER SETs：

ascii -- basic 7-bit codes.
latin1 -- ascii, plus most characters needed for Western European languages.
utf8 -- the 1-, 2-, and 3-byte subset of utf8.  This excludes Emoji and some of Chinese.
utf8mb4 -- the full set of UTF8 characters, covering all current languages.

全部包括英文字元，編碼相同。utf8 是 utf8mb4 的子集。

最佳實踐…

將 utf8mb4 用於任何可以包含多種語言的 TEXT 或 VARCHAR 列。
對於十六進位制字串（UUID，MD5 等）和簡單程式碼（country_code，postal_code 等）使用 ascii（latin1 正常）。

utf8mb4 在版本 5.5.3 之前不存在，所以 utf8 是之前最好的。

在 MySQL 之外，UTF8 意味著與 MySQL 的 utf8mb4 相同，而不是 MySQL 的 utf8。

排序以 charset 名稱開頭，通常以 _ci 結尾，表示 case and accent insensitive 或 _bin，只需比較這些位。

‘最新’的 utf8mb4 整理是 utf8mb4_unicode_520_ci，基於 Unicode 5.20。如果你正在使用單一語言，你可能需要，例如，utf8mb4_polish_ci，它將根據波蘭慣例略微重新排列字母。