Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)

edited December 2021 in General Discussion

What I have:

Vertica Community Edition on Ubuntu (Single node, 32 GB)


Hello,

I am Community Edition user of Vertica. I am loading the JSON file like:

_CREATE EXTERNAL TABLE customers(id INT, address VARCHAR, transactions ARRAY[INT])
    AS COPY FROM 'cust.json' PARSER fjsonparser();_

The table is also being successfully copied with all the data with all of the 6 Million rows.

When I am running a simple select count(*) statement, I am getting the above error (title). This error isn't occurring on a table with 200,000 rows or 800,000 rows.

I have also referred to this link regarding the same error:

Even when I increase the memory size of the unsuccessful query's resource pool, I am facing the same error.

Is there anything additional that needs to be ensured in order to have a successful run?

Any help will be appreciated.

Thanks

Answers

  • Jim_KnicelyJim_Knicely Administrator
    edited December 2021

    @svijay - Inetresting. What version of Vertica are you running? See if you can size the array column appropriately. By default, they are pretty big.

    Example:

    dbadmin=> CREATE EXTERNAL TABLE customers(id INT, address VARCHAR, transactions ARRAY[INT])
    dbadmin-> AS COPY FROM 'cust.json' PARSER fjsonparser()
    dbadmin-> ;
    CREATE TABLE
    
    dbadmin=> \d customers
                                                List of Fields by Tables
     Schema |   Table   |    Column    |        Type        | Size  | Default | Not Null | Primary Key | Foreign Key
    --------+-----------+--------------+--------------------+-------+---------+----------+-------------+-------------
     public | customers | id           | int                |     8 |         | f        | f           |
     public | customers | address      | varchar(80)        |    80 |         | f        | f           |
     public | customers | transactions | array[int8](65000) | 65000 |         | f        | f           |
    (3 rows)
    

    I can make it smaller by specifying the max number of expected elements in the array.

    dbadmin=> DROP TABLE customers;
    DROP TABLE
    
    dbadmin=> CREATE EXTERNAL TABLE customers(id INT, address VARCHAR, transactions ARRAY[INT ,10])
    dbadmin-> AS COPY FROM 'cust.json' PARSER fjsonparser();
    CREATE TABLE
    
    dbadmin=> \d customers;
                                              List of Fields by Tables
     Schema |   Table   |    Column    |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-----------+--------------+-----------------+------+---------+----------+-------------+-------------
     public | customers | id           | int             |    8 |         | f        | f           |
     public | customers | address      | varchar(80)     |   80 |         | f        | f           |
     public | customers | transactions | array[int8, 10] |   80 |         | f        | f           |
    (3 rows)
    

    See: Syntax for Column Definition

    BTW: When you create an external table via the CREATE EXTERNAL TABLE command, no data is loaded into Vertica. In fact, with external tables, data is never loaded into Vertica. It is always read from the source wherever that might might be :)

  • edited December 2021

    Thanks for your response Jim. These are some helpful tips. I am currently using Vertica_11.0.0-1

    However, none of my fields are of the 'array' type, as that was just an example I had listed :smile:

    My fields are more of the 'number', 'varchar' and 'string' type where I have been highlighting the sizes for each of them.

    How would you recommend I load the JSON file into Vertica? I am currently using WinSCP to load the file into the VM and then using the copy command

    CREATE EXTERNAL TABLE customers(id INT, address VARCHAR)
    AS COPY FROM '/home/cust.json' PARSER fjsonparser();
    

    Looking forward to hearing from you.

    Thank you!

  • Jim_KnicelyJim_Knicely Administrator
    edited December 2021

    Hi,

    Instead of an External Table you could use a Columnar (managed) Table, then load the data using the COPY command.

    Example:

    dbadmin=> \! /home/dbadmin/cat cust.json
    {"id":1,"name":"Santa","address":"North Pole"}
    
    dbadmin=> CREATE TABLE customers (id INT, address VARCHAR);
    CREATE TABLE
    
    dbadmin=> COPY customers FROM '/home/dbadmin/cust.json' PARSER fjsonparser();
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM customers;
     id |  address
    ----+------------
      1 | North Pole
    (1 row)
    
    dbadmin=> DROP TABLE customers;
    DROP TABLE
    
    dbadmin=> CREATE TABLE customers (id INT, name VARCHAR, address VARCHAR);
    CREATE TABLE
    
    dbadmin=> COPY customers FROM '/home/dbadmin/cust.json' PARSER fjsonparser();
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM customers;
     id | name  |  address
    ----+-------+------------
      1 | Santa | North Pole
    (1 row)
    

    If you want to skip the WINSCP step, you can install vsql on your local system (i.e. where the JSON data files reside).

    For example, I can load a JSON file from my Windows Laptop to my Vertica database running in the cloud. But this time I have to use COPY LOCAL to load the data.

    C:\Users\knicely>vsql -h remote.vertica.site -U dbadmin
    Password:
    Welcome to vsql, the Vertica Analytic Database interactive terminal.
    
    Type:  \h or \? for help with vsql commands
           \g or terminate with semicolon to execute query
           \q to quit
    
    Warning: Console code page (437) differs from Windows code page (1252)
             8-bit characters may not work correctly. See vsql reference
             page "Notes for Windows users" for details.
    
    dbadmin=> \! more cust.json
    {"id":1,"name":"Santa","address":"North Pole"}
    
    dbadmin=> TRUNCATE TABLE customers;
    TRUNCATE TABLE
    
    dbadmin=> COPY customers FROM LOCAL 'cust.json' PARSER fjsonparser();
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM customers;
     id | name  |  address
    ----+-------+------------
      1 | Santa | North Pole
    (1 row)
    
    
  • Hey Jim,

    I dropped the external table and created it again as:

    CREATE TABLE CUSTOMERS (
        C_CUSTKEY NUMBER(38,0),
        C_NAME VARCHAR(25),
        C_ADDRESS VARCHAR(40),
        C_NATIONKEY NUMBER(38,0),
        C_PHONE VARCHAR(15),
        C_ACCTBAL NUMBER(12,2),
        C_MKTSEGMENT VARCHAR(10),
        C_COMMENT VARCHAR(117)
    )
    

    But as you had mentioned, when I tried to copy it from the VM, I'm getting an error:

    COPY CUSTOMERS FROM '/home/dbadmin/customers.json' PARSER fjsonparser();
    
    

    Syntax error at or near "COPY" at character 215

    Thanks

  • Jim_KnicelyJim_Knicely Administrator
    edited December 2021

    Did you have a semi-colon after the CREATE TABLE statement?

    Like this:

    CREATE TABLE CUSTOMERS (
        C_CUSTKEY NUMBER(38,0),
        C_NAME VARCHAR(25),
        C_ADDRESS VARCHAR(40),
        C_NATIONKEY NUMBER(38,0),
        C_PHONE VARCHAR(15),
        C_ACCTBAL NUMBER(12,2),
        C_MKTSEGMENT VARCHAR(10),
        C_COMMENT VARCHAR(117)
    );
    
  • Jim_KnicelyJim_Knicely Administrator
    edited December 2021

    One more thing, try not use the data type NUMERIC(38,0) when you can use INT instead because an INT has a smaller footprint!

    dbadmin=> CREATE TABLE an_int_is_smaller(c1 NUMERIC(38,0), c2 INT);
    CREATE TABLE
    
    dbadmin=> \d an_int_is_smaller
                                              List of Fields by Tables
     Schema |       Table                   | Column |     Type                 | Size   | Default  | Not Null | Primary Key | Foreign Key
    ----------+--------------------+----------+-----------------+------+---------+----------+--------------+-------------
     public     | an_int_is_smaller | c1            | numeric(38,0)  |   24     |                  | f               | f                        |
     public     | an_int_is_smaller | c2            | int                         |     8      |                  | f              | f                         |
    (2 rows)
    
  • Hello, when you get the error, do you also see the following hint?

    HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file

    As suggested in hint, you might want to double check your record terminators.

  • Thanks @Jim_Knicely and @SergeB . The missing semicolon did not create an issue.

    The copy command also seems to be running now as it didn't before.

    I am using the TPCH SF1 data and the tables are being created and having the data copied. However for tables > lineitem and > orders, the error mentions the title of this thread. It's not occurring for any of the other 6 tables.

    Here is a link for your reference on TPCH data:

    https://docs.snowflake.com/en/user-guide/sample-data-tpch.html

    Since I have already tried the first comment in this thread (mine), is there anything else that needs to be done to remediate this?

    Thanks

  • Jim_KnicelyJim_Knicely Administrator
    edited January 1

    @svijay - Here are a few suggestions...

    1. Try adding the record_terminator parameter:

    COPY CUSTOMERS FROM '/home/dbadmin/customers.json' PARSER fjsonparser(record_terminator=E'\n');

    Make sure to actually use the record terminator that is in the files. :)

    1. Split the JSON files into smaller files and load those.

    2. Fiddle with the UDLMaxDataBufferSize parameter.

    dbadmin=> SELECT current_value, default_value, description FROM vs_configuration_parameters WHERE parameter_name = 'UDLMaxDataBufferSize';
     current_value | default_value |                                         description
    -----------------+----------------+----------------------------------------------------------------------------------------------
     268435456     | 268435456     | The maximum buffer size of a data stream for a user-defined load function (should be >= 1MB)
    (1 row)
    

    The default might be a bit low for your needs. It is recommended to only change its value under support supervision, but you can make changes at the session level as to not affect the entire DB.

    Example:

    dbadmin=> SELECT 268435456*4; -- What is 4x the default?
      ?column?
    ------------
     1073741824
    (1 row)
    
    dbadmin=> ALTER SESSION SET UDLMaxDataBufferSize = 1073741824;
    ALTER SESSION
    
    dbadmin=> SELECT current_value, default_value, description FROM vs_configuration_parameters WHERE parameter_name = 'UDLMaxDataBufferSize';
     current_value | default_value |                                         description
    -----------------+----------------+----------------------------------------------------------------------------------------------
     1073741824    | 268435456     | The maximum buffer size of a data stream for a user-defined load function (should be >= 1MB)
    (1 row)
    
  • SergeBSergeB Employee

    @svijay Could you share a snippet (first 3-4 records) of your JSON file?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.