Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)

svijay · December 2021

What I have:

Vertica Community Edition on Ubuntu (Single node, 32 GB)

Hello,

I am Community Edition user of Vertica. I am loading the JSON file like:

_CREATE EXTERNAL TABLE customers(id INT, address VARCHAR, transactions ARRAY[INT])
    AS COPY FROM 'cust.json' PARSER fjsonparser();_

The table is also being successfully copied with all the data with all of the 6 Million rows.

When I am running a simple select count(*) statement, I am getting the above error (title). This error isn't occurring on a table with 200,000 rows or 800,000 rows.

I have also referred to this link regarding the same error:

https://forum.vertica.com/discussion/241057/cannot-allocate-sufficient-memory-for-copy-statement-536870912-requested-268435456-permitted

Even when I increase the memory size of the unsuccessful query's resource pool, I am facing the same error.

Is there anything additional that needs to be ensured in order to have a successful run?

Any help will be appreciated.

Thanks

Jim_Knicely · December 2021

@svijay - Inetresting. What version of Vertica are you running? See if you can size the array column appropriately. By default, they are pretty big.

Example:

dbadmin=> CREATE EXTERNAL TABLE customers(id INT, address VARCHAR, transactions ARRAY[INT])
dbadmin-> AS COPY FROM 'cust.json' PARSER fjsonparser()
dbadmin-> ;
CREATE TABLE

dbadmin=> \d customers
                                            List of Fields by Tables
 Schema |   Table   |    Column    |        Type        | Size  | Default | Not Null | Primary Key | Foreign Key
--------+-----------+--------------+--------------------+-------+---------+----------+-------------+-------------
 public | customers | id           | int                |     8 |         | f        | f           |
 public | customers | address      | varchar(80)        |    80 |         | f        | f           |
 public | customers | transactions | array[int8](65000) | 65000 |         | f        | f           |
(3 rows)

I can make it smaller by specifying the max number of expected elements in the array.

dbadmin=> DROP TABLE customers;
DROP TABLE

dbadmin=> CREATE EXTERNAL TABLE customers(id INT, address VARCHAR, transactions ARRAY[INT ,10])
dbadmin-> AS COPY FROM 'cust.json' PARSER fjsonparser();
CREATE TABLE

dbadmin=> \d customers;
                                          List of Fields by Tables
 Schema |   Table   |    Column    |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
--------+-----------+--------------+-----------------+------+---------+----------+-------------+-------------
 public | customers | id           | int             |    8 |         | f        | f           |
 public | customers | address      | varchar(80)     |   80 |         | f        | f           |
 public | customers | transactions | array[int8, 10] |   80 |         | f        | f           |
(3 rows)

See: Syntax for Column Definition

BTW: When you create an external table via the CREATE EXTERNAL TABLE command, no data is loaded into Vertica. In fact, with external tables, data is never loaded into Vertica. It is always read from the source wherever that might might be

svijay · December 2021

Thanks for your response Jim. These are some helpful tips. I am currently using Vertica_11.0.0-1

However, none of my fields are of the 'array' type, as that was just an example I had listed

My fields are more of the 'number', 'varchar' and 'string' type where I have been highlighting the sizes for each of them.

How would you recommend I load the JSON file into Vertica? I am currently using WinSCP to load the file into the VM and then using the copy command

CREATE EXTERNAL TABLE customers(id INT, address VARCHAR)
AS COPY FROM '/home/cust.json' PARSER fjsonparser();

Looking forward to hearing from you.

Thank you!

Jim_Knicely · December 2021

Hi,

Instead of an External Table you could use a Columnar (managed) Table, then load the data using the COPY command.

Example:

dbadmin=> \! /home/dbadmin/cat cust.json
{"id":1,"name":"Santa","address":"North Pole"}

dbadmin=> CREATE TABLE customers (id INT, address VARCHAR);
CREATE TABLE

dbadmin=> COPY customers FROM '/home/dbadmin/cust.json' PARSER fjsonparser();
 Rows Loaded
-------------
           1
(1 row)

dbadmin=> SELECT * FROM customers;
 id |  address
----+------------
  1 | North Pole
(1 row)

dbadmin=> DROP TABLE customers;
DROP TABLE

dbadmin=> CREATE TABLE customers (id INT, name VARCHAR, address VARCHAR);
CREATE TABLE

dbadmin=> COPY customers FROM '/home/dbadmin/cust.json' PARSER fjsonparser();
 Rows Loaded
-------------
           1
(1 row)

dbadmin=> SELECT * FROM customers;
 id | name  |  address
----+-------+------------
  1 | Santa | North Pole
(1 row)

If you want to skip the WINSCP step, you can install vsql on your local system (i.e. where the JSON data files reside).

For example, I can load a JSON file from my Windows Laptop to my Vertica database running in the cloud. But this time I have to use COPY LOCAL to load the data.

C:\Users\knicely>vsql -h remote.vertica.site -U dbadmin
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.

Type:  \h or \? for help with vsql commands
       \g or terminate with semicolon to execute query
       \q to quit

Warning: Console code page (437) differs from Windows code page (1252)
         8-bit characters may not work correctly. See vsql reference
         page "Notes for Windows users" for details.

dbadmin=> \! more cust.json
{"id":1,"name":"Santa","address":"North Pole"}

dbadmin=> TRUNCATE TABLE customers;
TRUNCATE TABLE

dbadmin=> COPY customers FROM LOCAL 'cust.json' PARSER fjsonparser();
 Rows Loaded
-------------
           1
(1 row)

dbadmin=> SELECT * FROM customers;
 id | name  |  address
----+-------+------------
  1 | Santa | North Pole
(1 row)

svijay · December 2021

Hey Jim,

I dropped the external table and created it again as:

CREATE TABLE CUSTOMERS (
    C_CUSTKEY NUMBER(38,0),
    C_NAME VARCHAR(25),
    C_ADDRESS VARCHAR(40),
    C_NATIONKEY NUMBER(38,0),
    C_PHONE VARCHAR(15),
    C_ACCTBAL NUMBER(12,2),
    C_MKTSEGMENT VARCHAR(10),
    C_COMMENT VARCHAR(117)
)

But as you had mentioned, when I tried to copy it from the VM, I'm getting an error:

COPY CUSTOMERS FROM '/home/dbadmin/customers.json' PARSER fjsonparser();

Syntax error at or near "COPY" at character 215

Thanks

Jim_Knicely · December 2021

Did you have a semi-colon after the CREATE TABLE statement?

Like this:

CREATE TABLE CUSTOMERS (
    C_CUSTKEY NUMBER(38,0),
    C_NAME VARCHAR(25),
    C_ADDRESS VARCHAR(40),
    C_NATIONKEY NUMBER(38,0),
    C_PHONE VARCHAR(15),
    C_ACCTBAL NUMBER(12,2),
    C_MKTSEGMENT VARCHAR(10),
    C_COMMENT VARCHAR(117)
);

Jim_Knicely · December 2021

One more thing, try not use the data type NUMERIC(38,0) when you can use INT instead because an INT has a smaller footprint!

dbadmin=> CREATE TABLE an_int_is_smaller(c1 NUMERIC(38,0), c2 INT);
CREATE TABLE

dbadmin=> \d an_int_is_smaller
                                          List of Fields by Tables
 Schema |       Table                   | Column |     Type                 | Size   | Default  | Not Null | Primary Key | Foreign Key
----------+--------------------+----------+-----------------+------+---------+----------+--------------+-------------
 public     | an_int_is_smaller | c1            | numeric(38,0)  |   24     |                  | f               | f                        |
 public     | an_int_is_smaller | c2            | int                         |     8      |                  | f              | f                         |
(2 rows)

SergeB · December 2021

Hello, when you get the error, do you also see the following hint?

HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file

As suggested in hint, you might want to double check your record terminators.

svijay · December 2021

Thanks @Jim_Knicely and @SergeB . The missing semicolon did not create an issue.

The copy command also seems to be running now as it didn't before.

I am using the TPCH SF1 data and the tables are being created and having the data copied. However for tables > lineitem and > orders, the error mentions the title of this thread. It's not occurring for any of the other 6 tables.

Here is a link for your reference on TPCH data:

https://docs.snowflake.com/en/user-guide/sample-data-tpch.html

Since I have already tried the first comment in this thread (mine), is there anything else that needs to be done to remediate this?

Thanks

Jim_Knicely · December 2021

@svijay - Here are a few suggestions...

Try adding the record_terminator parameter:

COPY CUSTOMERS FROM '/home/dbadmin/customers.json' PARSER fjsonparser(record_terminator=E'\n');

Make sure to actually use the record terminator that is in the files.

Split the JSON files into smaller files and load those.
Fiddle with the UDLMaxDataBufferSize parameter.

dbadmin=> SELECT current_value, default_value, description FROM vs_configuration_parameters WHERE parameter_name = 'UDLMaxDataBufferSize';
 current_value | default_value |                                         description
-----------------+----------------+----------------------------------------------------------------------------------------------
 268435456     | 268435456     | The maximum buffer size of a data stream for a user-defined load function (should be >= 1MB)
(1 row)

The default might be a bit low for your needs. It is recommended to only change its value under support supervision, but you can make changes at the session level as to not affect the entire DB.

Example:

dbadmin=> SELECT 268435456*4; -- What is 4x the default?
  ?column?
------------
 1073741824
(1 row)

dbadmin=> ALTER SESSION SET UDLMaxDataBufferSize = 1073741824;
ALTER SESSION

dbadmin=> SELECT current_value, default_value, description FROM vs_configuration_parameters WHERE parameter_name = 'UDLMaxDataBufferSize';
 current_value | default_value |                                         description
-----------------+----------------+----------------------------------------------------------------------------------------------
 1073741824    | 268435456     | The maximum buffer size of a data stream for a user-defined load function (should be >= 1MB)
(1 row)

SergeB · January 2022

@svijay Could you share a snippet (first 3-4 records) of your JSON file?

Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)

Answers

Leave a Comment