Options

Question on UTF-8

JeffreyAshbyJeffreyAshby Community Edition User

Hello folks.

I am actually stumped here, so I look forward to learning from all of the nice and kind people here. Could you please tell me which format I should use for UTF-8 characters?

Thanks so much for your time.

Answers

  • Options
    moshegmosheg Vertica Employee Administrator
    edited December 2022

    UTF-8 is a variable-length character encoding and can represent any universal character in the Unicode standard.
    Vertica expect to receive all data in UTF-8, and also outputs all data in UTF-8.
    It is advised to select the smallest data type that matches the kind of data you plan to have and that allows all the feasible values of that data.
    For example, if you plan to get only numeric data characters in the range of values –2^63+1 to 2^63-1 then prefer INTEGER data type.
    When you need to define a column to handle any alphanumeric values, specify the maximum size of any string to be stored in that column.
    For example, to store strings up to 10 octets in length, use one of the following definitions:
    CHAR(10) --- fixed-length
    VARCHAR(10) --- variable-length

    See: https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/SQLReferenceManual/DataTypes/SQLDataTypes.htm

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    As @mosheg said - any column_name VARCHAR(n) or code_name CHAR(2) is always for a variable length or fixed length string encoded in UTF-8.

    This means that a string containing the Euro sign four times - '€€€€' - is 12 bytes long, and needs a [VAR]CHAR(12), while '££££' is four bytes long and would fit into a [VAR]CHAR(4).

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file