Vertica Collation Problem
Hi everyone,
We have a problem with data integration. We are trying to put data to Vertica from MS SQL by using SAP DataServices, but we are getting an error(I shared in below) and whenever we try to change data types to varchar(255), we can handle that error.
We tried to change Vertica's collation but it didn't work. Before that we were seeing special characters(ş, ç eg.) as special icons like '?' or 'triangle' in SAP DataServices preview, we changed ODBC connection settings to handle that problem and it worked but still, we are getting an error if don't change default varchar values to 255.
Is there any known solution for this error?
Error
Possible causes: (1) Error in the SQL
syntax; (2) Database connection is broken; (3) Database related errors such as transaction log is full, etc.; (4) The user
defined in the datastore has insufficient privileges to execute the SQL. If the error is for preload or postload operation, or
if it is for regular load operation and load triggers are defined, check the SQL. Otherwise, for (3) and (4), contact your
local DBA
Thank you.
Best Answer
-
marcothesane - Select Field - Administrator
The error message is not very informative. I would also blame SAP Data Service for not retrieving the error message from the ODBC/JDBC driver.
What is the codepage in MS SQL? UTF-8 -> you should not have to change anything - just make sure that there is no conversion between MS SQL (and, if the data flows through it, SAP DataServices) and Vertica takes place. This is usually done by configuring the ODBC/JDBC data source. Same byte length and character length, in that case, should work.
If you have single-byte strings in MS SQL (for example, ISO 8859-1/Western European), then double the length of the character fields in Vertica as a first step:
VARCHAR(32)
should becomeVARCHAR(64)
. Then, transfer the data, they should go in without truncations or data errors. Even if the Euro sign '€' is 3 bytes in UTF-8 and one byte in ISO8859-1, Western European alphabets never use double the number of bytes than characters.Once loaded, run a
SELECT MAX(OCTET_LENGTH(affected_column)) FROM affected_table;
on each affected column, and resize the VARCHAR length accordingly1
Answers
Vertica database servers expect to receive all data in UTF-8, and Vertica outputs all data in UTF-8.
The maximum length parameter for VARCHAR and CHAR data type refers to the number of octets (bytes) that can be stored in that field, not the number of characters. When using multi-byte UTF-8 characters, make sure to size fields to accommodate from 1 to 4 bytes per character, depending on the data.
See: https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AdministratorsGuide/ConfiguringTheDB/ImplementLocalesForInternationalDataSets.htm
Hi,
Thanks for your comments, I tried to change the code page and server code page to UTF-8 in data store settings for source and target in SAP DataServices but it didn't work.
I also tried to double the varchar lengths of fields and it worked for default code page settings. Then, I ran the select statements that @marcothesane shared for varchar columns. I realized few things, some of the columns are returning higher lengths then their default varchar lengths. For example I have a column named "LOCAL_ADDRESS_001" and this column has varchar 35 value by default, when I ran the query it returns 38, I guess that's the problem but how can I describe that kind of situation?
Thanks.
Take a French first name as an example.
'Jérôme'
is 6 characters long.CHARACTER_LENGTH('Jérôme')
will accordingly return 6.OCTET_LENGTH('Jérôme')
, in Vertica, however, will return 8, as'é'
and'ô'
take two bytes in UTF-8 - while they take 1 byte in ISO 8859-1/Western European encoding.VARCHAR(35) in SQL Server will mean character length. VARCHAR(35) in Vertica always means length in bytes.
Hello,
Thank you so much @marcothesane, I'm really appreciated.
Thanks.
Hi again,
I want to ask one more thing if it's possible
How can I solve this problem? Do I need to change code page in source system or can I handle this problem in SAP DataServices by changing some settings(except changing varchar value manually)?
Thank you.
How would you regard your problem as solved?
sys.tables
andsys.columns
, while multiplyingcolumns.max_length
by 2 for all string types, and add the formatting to make aCREATE TABLE
out of it - and run thatCREATE TABLE
against Vertica. Then, get the data across, and profile for maximum lengths.There is an automatic data type profiler in GitHub: https://github.com/marco-the-sane/sqlprofile .
You can't change the encoding from single-byte to UTF-8 in MS SQL without rebuilding all tables in that database.
Or is it something else you're after?
Hi,
I was thinking about second option that you shared, I think second option is the most optimum choice for us.
Thank you.