vertica copy command bug?

'm getting this error while trying to run a copy table command to load a JSON. I also tried creating a resource pool with higher MAXQUERYMEMORYSIZE, QUERY_BUDGET_KB, and MEMORY SIZE value but still getting the same error.
below is the command:

copy_user=> copy test_load from local '/home/dbadmin/test.json' PARSER fjsonparser() rejected data as table test_load_rejected;
ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted) HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file

pool_name | memory_size_kb | max_memory_size_kb | planned_concurrency | max_concurrency | query_budget_kb

-----------+----------------+--------------------+---------------------+-----------------+-----------------

copy_pool | 0 | 173225413 | 1 | | 164564128

The interesting thing is even If I change the JSON size, the bytes requested in the error remains the same.
(536870912 requested, 268435456 permitted)

This looks like a bug to me since I even tried to run the query with a much higher query budget pool. The JSON is hardly ~ 300 MB.
Any help would be appreciated.

Best Answer

  • Bryan_HBryan_H Vertica Employee Administrator
    Answer ✓

    I am able to reproduce the issue:

    dbadmin=> create flex table forumtest();
    CREATE TABLE
    dbadmin=> copy forumtest from local 'test_file.json' parser fjsonparser();
    ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)
    HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file

    However, the issue is that the script creates one very large JSON array. The test script works when I remove the square brackets and the commas at the end of each line:

    #!/bin/bash
    for((i=0;i<10000000;i++))
    do
    echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file_1.json;
    done
    echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file_1.json;

    dbadmin=> copy forumtest from local 'test_file_1.json' parser fjsonparser();

    Rows Loaded

    10000001
    

    (1 row)

Answers

  • Bryan_HBryan_H Vertica Employee Administrator

    What exact version of Vertica is this?
    Are there any other messages in vertica.log or the UDx logs on the node that receives the request?
    How big is each JSON object in the file, or is it one big JSON object?

  • SruthiASruthiA Vertica Employee Administrator

    @rajatpaliwal86 we had this issue once in the past and it was fixed in 9.1.1. If your vertica version is 9.2 or 9.3 please open support case. it looks like a bug.

  • rajatpaliwal86rajatpaliwal86 Vertica Customer

    @Bryan_H said:
    What exact version of Vertica is this?
    Are there any other messages in vertica.log or the UDx logs on the node that receives the request?
    How big is each JSON object in the file, or is it one big JSON object?

    It is from Vertica 9.2.
    The file contains an array of very small JSON objects.

  • rajatpaliwal86rajatpaliwal86 Vertica Customer

    @SruthiA said:
    @rajatpaliwal86 we had this issue once in the past and it was fixed in 9.1.1. If your vertica version is 9.2 or 9.3 please open support case. it looks like a bug.

    It is from Vertica 9.2.
    Where can I open the case?

  • SruthiASruthiA Vertica Employee Administrator

    Please click on "Service Requests" in the below link and open a support case by logging in into the portal

    https://softwaresupport.softwaregrp.com/

  • rajatpaliwal86rajatpaliwal86 Vertica Customer

    @Bryan_H
    This is the log I can see during the error:

    2020-02-07 06:21:46.458 Subsession:7fac911f8700 [Txn] Commit Complete: Txn: a0000000001ec1 at epoch 0x1a and new global catalog version 1458
    2020-02-07 06:21:46.817 EEcmdq:7fac7872a700 [UserMessage] COPY - Tearing down ContinuousUDL coroutine stack
    2020-02-07 06:21:46.907 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 00000/5077: Total memory required by query: [725092 KB]
    2020-02-07 06:21:46.929 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 53200/2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)
    HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file
    LOCATION: alloc, /data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_2_0-x_grader/build/vertica/EE/UDL/LoadBufferManager.cpp:129
    2020-02-07 06:22:08.016 DiskSpaceRefresher:7fac92ffd700 [Util] Task 'DiskSpaceRefresher' enabled
    2020-02-07 06:22:17.009 ManageEpochs:7fac92ffd700-a0000000001ec2 [Txn] Begin Txn: a0000000001ec2 'Manage Epoch'

  • SergeBSergeB - Select Field - Employee

    Does your JSON file contain multiple JSON records?

  • rajatpaliwal86rajatpaliwal86 Vertica Customer

    @SergeB said:
    Does your JSON file contain multiple JSON records?

    Yes, the JSON is created by a bash script. The file contains array of small JSON objects.

  • rajatpaliwal86rajatpaliwal86 Vertica Customer
    edited February 2020

    To reproduce the issue, you can use the below script to create the sample JSON

    echo "[" > test_file.json;
    for((i=0;i<10000000;i++))
    do
    echo "{\"name\" : \"john\", \"age\" : 24}," >> test_file.json;
    done

    echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file.json;
    echo "]" >> test_file.json;

  • SergeBSergeB - Select Field - Employee

    So it does look like the file contains only one JSON record (which happens to be an array)? If so the error is probably expected...
    If you want to process each array element as one ROW, you should look at using start_point ?
    For instance, take this file
    {"mynamearray" : [ {"name": "john", "age": 24},{"name": "john", "age": 24},{"name": "john", "age": 24}]}
    create flex table xfoo();
    COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER();

    Rows Loaded

           1
    

    COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER(start_point='mynamearray');

    Rows Loaded

           3
    
  • RasoolShaikRasoolShaik Community Edition User
    I am trying to load data using insert command and it is taking too much time to load data.

    Used copy command and it works fine.

    My problem is I don't want to load data in some columns and these columns need not be present in file.

    How to form a copy query to load data quickly.

    Please do needful.

    Thanks.
  • RasoolShaikRasoolShaik Community Edition User
    edited February 2020
    Jim my issue is :
    In file columns are firstname, lastname, age
    Table columns are :
    Firstname, lastname, age, filename, key_1,key_2,key_3,insertdt, updatedt

    Now i dont want to load data in key_x columns.
    how to form a copy query using fillers
    Please help
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited February 2020

    Oh. In this particular case, you do this:

    dbadmin=> CREATE TABLE t (Firstname VARCHAR, lastname VARCHAR, age INT, filename VARCHAR, key_1 INT, key_2 INT, key_3 INT, insertdt DATE, updatedt DATE);
    CREATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/t.txt
    peter|griffin|43
    brian|griffin|8
    
    dbadmin=> COPY t (Firstname, lastname, age) FROM '/home/dbadmin/t.txt';
     Rows Loaded
    -------------
               2
    (1 row)
    
    dbadmin=> SELECT * FROM t;
     Firstname | lastname | age | filename | insertdt | updatedt
    -----------+----------+-----+----------+----------+----------
     brian     | griffin  |   8 |          |          |
     peter     | griffin  |  43 |          |          |
    (2 rows)
    
  • RasoolShaikRasoolShaik Community Edition User
    edited February 2020
    Thanks. It helps.

    Another query how to track when the data is inserted we generally use now() as insertdt and same to updatedt

    How it will be using copy command and how to populate filename t.csv in filename column.

    Ex:

    firstname|lastname|age|filename|key1|key2|key3|insertdt|updatedt|language
    Rasool|shaik|25|t.csv||||2020:02:18 11:44:49|2020:02:18 11:44:49|English

    In file columns are :
    Firstname, lastname, age, language

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file