handling special/utf-8 charcaters
Hi All,
I have a table with a column whose type is varchar(32) the value it has is
'Movies & TV' . This data is loaded by Copy command, when I query this table
like select * from activity where name='Movies & TV' it won't return any record
this is mainly because of & character there is something going on with this character
When I tried
Select ISUTF8(name) from activity it returns true,
which means the data is actually stored in the UTF-8 format.
Select length(name) and length('Movies & TV') are also same.
However, when I paste these values in the vi editor I see an extra space in the DB string.
In addition, the field name in activity table can have Chines characters too, which is stored correctly in DB now.
Any idea what is going on here? Should I specify explicit utf-8 when loading the data?
Please suggest
Thanks
Comments
Hi,
Vertica database servers expect to receive all data in UTF-8, and Vertica outputs all data in UTF-8. Client drivers automatically convert data to and from UTF-8 when sending to and receiving data from Vertica using API calls. The drivers do not transform data loaded by executing a COPY or COPY LOCAL statement.
See:
https://my.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/ConceptsGuide/Other/UnicodeCharacterEncoding.htm
Prior to running the COPY command, be sure that the file is in the correct format expected by Vertica.
See:
https://my.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/AdministratorsGuide/BulkLoadCOPY/CheckingDataFormatBeforeOrAfterLoading.htm