Question on UTF-8
JeffreyAshby
Community Edition User
Hello folks.
I am actually stumped here, so I look forward to learning from all of the nice and kind people here. Could you please tell me which format I should use for UTF-8 characters?
Thanks so much for your time.
0
Answers
UTF-8 is a variable-length character encoding and can represent any universal character in the Unicode standard.
Vertica expect to receive all data in UTF-8, and also outputs all data in UTF-8.
It is advised to select the smallest data type that matches the kind of data you plan to have and that allows all the feasible values of that data.
For example, if you plan to get only numeric data characters in the range of values –2^63+1 to 2^63-1 then prefer INTEGER data type.
When you need to define a column to handle any alphanumeric values, specify the maximum size of any string to be stored in that column.
For example, to store strings up to 10 octets in length, use one of the following definitions:
CHAR(10) --- fixed-length
VARCHAR(10) --- variable-length
See: https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/SQLReferenceManual/DataTypes/SQLDataTypes.htm
As @mosheg said - any
column_name VARCHAR(n)
orcode_name CHAR(2)
is always for a variable length or fixed length string encoded in UTF-8.This means that a string containing the Euro sign four times -
'€€€€'
- is 12 bytes long, and needs a [VAR]CHAR(12), while'££££'
is four bytes long and would fit into a [VAR]CHAR(4).