How to load US7_ASCII strings with FAVROPARSER
I am loading "many" TB of data daily.
It appears, AVRO file format as intermediary for Vertica loading would be a quite efficient way of loading very large data.
I can see FAVROPARSER have documented support for all Vertica data types.
How about loading strings that are US7_ASCII. AVRO by definition have strings as unicode.
Problem with unicode strings is that they are at least twice bigger in size than US7_ASCII strings.
My ballpark estimates is that around 30-40% of my data are strings or fixed chars.
If I will switch to AVRO as intermediate file format for Vertica loading, I will have additional daily data size increase in range of few TB (in uncompressed size). Not to mention conversion from ASCII chars to unicode and back on AVRO binary encoding/decoding.
That to say, AVRO format would be at least on par or better than currently used data format for loading, even with unicode string data size increase.
I would be glad if you will post here how to load US7_ASCII strings though AVRO format with FAVROPARSER (into columnar table).
Would be extremely nice if FAVROPARSER will add support for logical data type "US7_ASCII" annotation on top of bytes and fixed types. That would arguable make AVRO format most efficient and best intermediary for Vertica data loading.
With AVRO unicode strings, I am scratching my head, is it worse efforts.