RJDBC UTF-8 encoding
Hi all,
Not sure if this is related specifically to Vertica or not, but I'm out of options.. so let's see.
I'm creating a Shiny app in R, where I need to read data from Vertica, and load data back into Vertica. When I started with this app, I was using RODBC (which I normally use when connecting to R). The app works fine locally, but after deploying the app to Shiny Server (which runs on Debian) the app crashes with a segfault error. I found this article, which unfortunately was not giving me an answer.
After doing some testing and reading, I finally switched to RJDBC, which works fine with Shiny Server. The only problem that I have now is the encoding of the data. Some of the data that I read from Vertica contain emoji's. Somehow these get scrambled with RJDBC, and causing my app to crash. I suspect this is related to the fact that the JDBC drivers convert everything from UTF-8 to UTF-16 (see also documentation here).
Ideally I use RODBC (I tested locally, and all emoji's look fine when using this method), but due to the segfault issue I cannot use it.
Any pointers on how to solve this are appreciated, because this problem is currently blocking any further progress on my app.
Kind Regards,
Derek
Comments
To determine if the encoding issue is a JDBC driver related problem or a R related problem, I executed the same query in Python using the jaydebeapi package, which is also using the JDBC driver. Here I get the exact same problem:
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 767-768: surrogates not allowed
. This means somewhere in the JDBC driver the conversion goes wrong.I also tried the native client (vertica_python) which gives correct results.