How can a native ARRAY[FLOAT] be created from values in a group (in Vertica 10.0.1)?
So far is seems that there's no obvious way to convert large (50+ billion rows) table to another Vertica table with an ARRAY[FLOAT] column using SELECT ... GROUP BY key ... query (which should create a row with an array of values for all 10,000+ rows with the same key for each distinct key) given that:
1) STRING_TO_ARRAY argument can't be longer than 65,000 bytes (i.e LONG VARCHAR isn't supported - for non-obvious reasons) while required limit in this case is ~500KB (that size is supported by LISTAGG but somehow not by STRING_TO_ARRAY)
2) ARRAY_CAT is limited to only 2 arguments (each being 1D array or NULL)
3) EXPORT TO PARQUET apparently can't export VARCHARs longer than 64KB
4) EXPORT TO PARQUET apparently can't export native 1D arrays - like ARRAY[FLOAT]
5) UDx can't handle complex types (including ARRAY)
6) saving output of a SELECT query using vsql and then using COPY ... CSV (which can parse sufficiently long strings and convert those to required ARRAYs) isn't feasible due to the size of output of that SELECT query (several TB).
Any suggestions on how to do it with Vertica would be appreciated.
Answers
Are you asking if you can create an ARRAY of 10,000+ elements where each element of the ARRAY is a row of data from a 50+ billion row table?
Each element of an array should be a single FLOAT value from another (not a grouping) FLOAT column in that table. So for any given key (out of several million different keys) it's group contains 10,000+ rows and all values in one column for that group should become elements in the array corresponding to that key - for example, if we have 12,345 rows in a group (key1, val1), (key1, val2) ... (key1, val12345) initially then we should get in a converted table 1 row (key1, [val1, val2, .. val12345]).
Hi,
Something like the following kind of works...
Uh oh. So turns out the max numer of elements (as of Vertica 10.0.1) for an ARRAY of FLOATs is 8,125 elements.
Why? The current max binary size of an ARRAY is 65K. Floats are 8 bytes.
If we limit the number of elements to 8125, we can load the data...