CAST SAP HANA NVARCHAR(X) to Vertica VARCHAR(Y)
HI,
I am importing SAP HANA tables to vertica and I am encounting problems with string type conversions. In SAP HANA a column has type NVARCHAR(35), but I cannot import it into Vertica VARCHAR(35), because in some rows I have special character, like chinese and korean letters. I can partly circumvent this issue by making the vertica colum large enough like varchar(1023). However, I don't know in advance how large the column in vertica should be. Is there a way to cast/convert the SAP HANA NVARCAHR(35) such that it fits into a vertica VARCHAR of a specified length?
Best Answers
-
SergeB Employee
It seems that HANA's NVARCHAR(X) means up to X characters , each character being up to 4 bytes long (UTF-32).
In Vertica, VARCHAR(Y) means up to Y bytes,
So Y = X * 4 would be a safe number when converting HANA's NVARCHAR to Vertica's VARCHAR.6 -
marcothesane - Select Field - Administrator
I would not leave it at that, @joergschaber .
To illustrate the question: An
'à'
sign uses 2 bytes in a Vertica UTF-8 string. The'a'
sign uses 1 byte. The Euro sign -'€'
- uses 3 bytes. Do you now create aVARCHAR(12)
for a string with a maximum of 4 letters?'Maja'
uses 4 bytes.'Künz'
uses 5 bytes.
But'€€€€'
does use 12 bytes. How high is the probability to find that string in your table?That does not hurt when you encode in Vertica.
But if ever you sort, hash-group-by or hash-join by an over-sized column, you will have, in the worst case, to allocate 4 times as much memory as actually needed for the hash table, and will spill to disk during the query.
The clean way of going about this is to oversize by factor 4 as @SergeB suggested. But then, to run a
SELECT MAX(OCTET_LENGTH(<the_big_varchar_column>))
to find out how much you actually need for it, and then to re-size the column in anALTER TABLE
statement accordingly.1 -
marcothesane - Select Field - Administrator
The least tiresome way I can think of is my sql based profiler on GitHub: https://github.com/marco-the-sane/sqlprofile . It builds a sort-of response table containing schema and table name of the tables to be profiled, then runs SQL generating SQL for a two-pass full table profile exercise, coming up with a CREATE TABLE with type-wise optimal data types ...
5
Answers
That was the info I was looking for. Thanks!
Sounds reasonable. Is there a way to get the MAX(OCTET_LENGTH()) for all columns at once? OCTET_LENGTH takes only one argument.