facebook twitter youtube
Replication Query Optimization DB Tuning Server Load and Query Locking Linux Download Interview Questions Basic DB Commands Mysql
in Mysql - 13 Feb, 2014
by Zafar Malik - no comments
latin1 Vs utf8 Charset and index usage

In simple wording, latin supports only latin characters (like english) but utf8 supports some special characters used in other international languages like french, chineese, arabic etc. As per standard, latin1 stores 1 character in 1 byte while utf8 1 character in 1-3 bytes. But if we store all characters in latin even in a utf8 type column then it will store 1 char in 1 byte.

If you want to store some special characters (like chineese or french language) in your column and you set table/column property as latin1 then it will not show these characters properly. But if your table/column property set as utf8 then it will occupy approx. 3 bytes (it can be 1 to 4 byte) for single character and it will store them in its original format.
For more clarification please see below query and answer:

Query:
Suppose there is a title varchar(250) column and there is an index on it as Alter table mytable add index (title(16)) in utf8 charset type table;

If this columns contains a sting “This is my Title”, which contains 16 character and all latin1 type. then clear below queries:

1) As string contains 16 chars and all are latin type means it should stores only 16 bytes even table charset is utf8 or else.

2) Index on 16 bytes is sufficient to take care this 16 character string or else.

 

 

Answer:
1) Yes, latin1 (ISO-8859-1) characters can be more than 1 byte in utf8. If the characters are ASCII (as in your example string), then it would only need 1 byte for each character in utf8. If they’re non-ASCII but still latin1, then more bytes would be needed.

2) Yes, Again, assuming the characters in the 16 byte string are always ASCII, then 16 bytes in the utf8 index would cover it. However, note that for indexes on a char/varchar/text column, the index length is characters not bytes. So (16) would mean that your index could be up to 48 bytes for utf8. Also, your column definition is the same (so varchar(250) is 250 characters which is up to 750 bytes for utf8).

Leave a Reply