CAST SAP HANA NVARCHAR(X) to Vertica VARCHAR(Y)

HI,
I am importing SAP HANA tables to vertica and I am encounting problems with string type conversions. In SAP HANA a column has type NVARCHAR(35), but I cannot import it into Vertica VARCHAR(35), because in some rows I have special character, like chinese and korean letters. I can partly circumvent this issue by making the vertica colum large enough like varchar(1023). However, I don't know in advance how large the column in vertica should be. Is there a way to cast/convert the SAP HANA NVARCAHR(35) such that it fits into a vertica VARCHAR of a specified length?

Best Answers

  • marcothesanemarcothesane - Select Field - Administrator
    Answer ✓

    I would not leave it at that, @joergschaber .

    To illustrate the question: An 'à' sign uses 2 bytes in a Vertica UTF-8 string. The 'a' sign uses 1 byte. The Euro sign - '€' - uses 3 bytes. Do you now create a VARCHAR(12) for a string with a maximum of 4 letters? 'Maja' uses 4 bytes. 'Künz' uses 5 bytes.
    But '€€€€' does use 12 bytes. How high is the probability to find that string in your table?

    That does not hurt when you encode in Vertica.

    But if ever you sort, hash-group-by or hash-join by an over-sized column, you will have, in the worst case, to allocate 4 times as much memory as actually needed for the hash table, and will spill to disk during the query.

    The clean way of going about this is to oversize by factor 4 as @SergeB suggested. But then, to run a
    SELECT MAX(OCTET_LENGTH(<the_big_varchar_column>)) to find out how much you actually need for it, and then to re-size the column in an ALTER TABLE statement accordingly.

Answers

  • That was the info I was looking for. Thanks!

  • Sounds reasonable. Is there a way to get the MAX(OCTET_LENGTH()) for all columns at once? OCTET_LENGTH takes only one argument.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file