How can a native ARRAY[FLOAT] be created from values in a group (in Vertica 10.0.1)?

IgorM_1IgorM_1
edited September 2020 in General Discussion

So far is seems that there's no obvious way to convert large (50+ billion rows) table to another Vertica table with an ARRAY[FLOAT] column using SELECT ... GROUP BY key ... query (which should create a row with an array of values for all 10,000+ rows with the same key for each distinct key) given that:
1) STRING_TO_ARRAY argument can't be longer than 65,000 bytes (i.e LONG VARCHAR isn't supported - for non-obvious reasons) while required limit in this case is ~500KB (that size is supported by LISTAGG but somehow not by STRING_TO_ARRAY)
2) ARRAY_CAT is limited to only 2 arguments (each being 1D array or NULL)
3) EXPORT TO PARQUET apparently can't export VARCHARs longer than 64KB
4) EXPORT TO PARQUET apparently can't export native 1D arrays - like ARRAY[FLOAT]
5) UDx can't handle complex types (including ARRAY)
6) saving output of a SELECT query using vsql and then using COPY ... CSV (which can parse sufficiently long strings and convert those to required ARRAYs) isn't feasible due to the size of output of that SELECT query (several TB).
Any suggestions on how to do it with Vertica would be appreciated.

Answers

  • Jim_KnicelyJim_Knicely Administrator

    Are you asking if you can create an ARRAY of 10,000+ elements where each element of the ARRAY is a row of data from a 50+ billion row table?

  • IgorM_1IgorM_1
    edited September 2020

    Each element of an array should be a single FLOAT value from another (not a grouping) FLOAT column in that table. So for any given key (out of several million different keys) it's group contains 10,000+ rows and all values in one column for that group should become elements in the array corresponding to that key - for example, if we have 12,345 rows in a group (key1, val1), (key1, val2) ... (key1, val12345) initially then we should get in a converted table 1 row (key1, [val1, val2, .. val12345]).

  • Jim_KnicelyJim_Knicely Administrator
    edited September 2020

    Hi,

    Something like the following kind of works...

    dbadmin=> \d big_float
                                      List of Fields by Tables
     Schema |   Table   | Column | Type  | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-----------+--------+-------+------+---------+----------+-------------+-------------
     public | big_float | a      | int   |    8 |         | f        | f           |
     public | big_float | b      | float |    8 |         | f        | f           |
    (2 rows)
    
    dbadmin=> SELECT a, COUNT(b) FROM big_float GROUP BY a ORDER BY a;
     a | COUNT
    ---+-------
     1 | 10000
     2 | 10000
     3 | 10000
    (3 rows)
    
    dbadmin=> SELECT a, octet_length(LISTAGG(b USING PARAMETERS max_length=320000)) FROM big_float GROUP BY a ORDER BY a;
     a | octet_length
    ---+--------------
     1 |       171054
     2 |       171110
     3 |       171140
    (3 rows)
    
    dbadmin=> CREATE TABLE big_float_array(a INT, b ARRAY[FLOAT]);
    CREATE TABLE
    
    dbadmin=> \! vsql -Atc "SELECT a, '[' || LISTAGG(b USING PARAMETERS max_length=320000) || ']' FROM big_float GROUP BY a ORDER BY a;" -o /home/dbadmin/big_float_array.txt
    
    dbadmin=> COPY big_float_array FROM '/home/dbadmin/big_float_array.txt' REJECTED DATA TABLE big_float_array_bad;
     Rows Loaded
    -------------
               0
    (1 row)
    
    dbadmin=> SELECT rejected_reason FROM big_float_array_bad;
                        rejected_reason
    --------------------------------------------------------
     Too many array elements for type (limit 8125 elements)
     Too many array elements for type (limit 8125 elements)
     Too many array elements for type (limit 8125 elements)
    (3 rows)
    

    Uh oh. So turns out the max numer of elements (as of Vertica 10.0.1) for an ARRAY of FLOATs is 8,125 elements.

    Why? The current max binary size of an ARRAY is 65K. Floats are 8 bytes.

    dbadmin=> SELECT 65000 // 8 "Max Number of Array Elements";
     Max Number of Array Elements
    ------------------------------
                             8125
    (1 row)
    

    If we limit the number of elements to 8125, we can load the data...

    dbadmin=> \! vsql -Atc "SELECT a, '[' || LISTAGG(b USING PARAMETERS max_length=320000) || ']' FROM (SELECT a, b FROM big_float LIMIT 8125 OVER (PARTITION BY a ORDER BY b)) foo GROUP BY a ORDER BY 1;" -o /home/dbadmin/big_float_array.txt
    
    dbadmin=> COPY big_float_array FROM '/home/dbadmin/big_float_array.txt' REJECTED DATA TABLE big_float_array_bad2;
     Rows Loaded
    -------------
               3
    (1 row)
    
    dbadmin=> SELECT a, b[0], APPLY_COUNT(b) FROM big_float_array;
     a |          b           | APPLY_COUNT
    ---+----------------------+-------------
     1 | 0.000534397549927235 |        8125
     2 |  0.00119081977754831 |        8125
     3 | 0.000372976996004581 |        8125
    (3 rows)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file