vertica copy command bug?

'm getting this error while trying to run a copy table command to load a JSON. I also tried creating a resource pool with higher MAXQUERYMEMORYSIZE, QUERY_BUDGET_KB, and MEMORY SIZE value but still getting the same error.
below is the command:

copy_user=> copy test_load from local '/home/dbadmin/test.json' PARSER fjsonparser() rejected data as table test_load_rejected;
ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted) HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file

pool_name | memory_size_kb | max_memory_size_kb | planned_concurrency | max_concurrency | query_budget_kb

-----------+----------------+--------------------+---------------------+-----------------+-----------------

copy_pool | 0 | 173225413 | 1 | | 164564128

The interesting thing is even If I change the JSON size, the bytes requested in the error remains the same.
(536870912 requested, 268435456 permitted)

This looks like a bug to me since I even tried to run the query with a much higher query budget pool. The JSON is hardly ~ 300 MB.
Any help would be appreciated.

Best Answer

Answers

  • Bryan_HBryan_H Employee

    What exact version of Vertica is this?
    Are there any other messages in vertica.log or the UDx logs on the node that receives the request?
    How big is each JSON object in the file, or is it one big JSON object?

  • SruthiASruthiA Employee

    @rajatpaliwal86 we had this issue once in the past and it was fixed in 9.1.1. If your vertica version is 9.2 or 9.3 please open support case. it looks like a bug.

  • @Bryan_H said:
    What exact version of Vertica is this?
    Are there any other messages in vertica.log or the UDx logs on the node that receives the request?
    How big is each JSON object in the file, or is it one big JSON object?

    It is from Vertica 9.2.
    The file contains an array of very small JSON objects.

  • @SruthiA said:
    @rajatpaliwal86 we had this issue once in the past and it was fixed in 9.1.1. If your vertica version is 9.2 or 9.3 please open support case. it looks like a bug.

    It is from Vertica 9.2.
    Where can I open the case?

  • SruthiASruthiA Employee

    Please click on "Service Requests" in the below link and open a support case by logging in into the portal

    https://softwaresupport.softwaregrp.com/

  • @Bryan_H
    This is the log I can see during the error:

    2020-02-07 06:21:46.458 Subsession:7fac911f8700 [Txn] Commit Complete: Txn: a0000000001ec1 at epoch 0x1a and new global catalog version 1458
    2020-02-07 06:21:46.817 EEcmdq:7fac7872a700 [UserMessage] COPY - Tearing down ContinuousUDL coroutine stack
    2020-02-07 06:21:46.907 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 00000/5077: Total memory required by query: [725092 KB]
    2020-02-07 06:21:46.929 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 53200/2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)
    HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file
    LOCATION: alloc, /data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_2_0-x_grader/build/vertica/EE/UDL/LoadBufferManager.cpp:129
    2020-02-07 06:22:08.016 DiskSpaceRefresher:7fac92ffd700 [Util] Task 'DiskSpaceRefresher' enabled
    2020-02-07 06:22:17.009 ManageEpochs:7fac92ffd700-a0000000001ec2 [Txn] Begin Txn: a0000000001ec2 'Manage Epoch'

  • SergeBSergeB Employee

    Does your JSON file contain multiple JSON records?

  • @SergeB said:
    Does your JSON file contain multiple JSON records?

    Yes, the JSON is created by a bash script. The file contains array of small JSON objects.

  • edited February 8

    To reproduce the issue, you can use the below script to create the sample JSON

    echo "[" > test_file.json;
    for((i=0;i<10000000;i++))
    do
    echo "{\"name\" : \"john\", \"age\" : 24}," >> test_file.json;
    done

    echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file.json;
    echo "]" >> test_file.json;

  • SergeBSergeB Employee

    So it does look like the file contains only one JSON record (which happens to be an array)? If so the error is probably expected...
    If you want to process each array element as one ROW, you should look at using start_point ?
    For instance, take this file
    {"mynamearray" : [ {"name": "john", "age": 24},{"name": "john", "age": 24},{"name": "john", "age": 24}]}
    create flex table xfoo();
    COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER();

    Rows Loaded

           1
    

    COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER(start_point='mynamearray');

    Rows Loaded

           3
    
  • I am trying to load data using insert command and it is taking too much time to load data.

    Used copy command and it works fine.

    My problem is I don't want to load data in some columns and these columns need not be present in file.

    How to form a copy query to load data quickly.

    Please do needful.

    Thanks.
  • edited February 18
    Jim my issue is :
    In file columns are firstname, lastname, age
    Table columns are :
    Firstname, lastname, age, filename, key_1,key_2,key_3,insertdt, updatedt

    Now i dont want to load data in key_x columns.
    how to form a copy query using fillers
    Please help
  • Jim_KnicelyJim_Knicely Administrator
    edited February 18

    Oh. In this particular case, you do this:

    dbadmin=> CREATE TABLE t (Firstname VARCHAR, lastname VARCHAR, age INT, filename VARCHAR, key_1 INT, key_2 INT, key_3 INT, insertdt DATE, updatedt DATE);
    CREATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/t.txt
    peter|griffin|43
    brian|griffin|8
    
    dbadmin=> COPY t (Firstname, lastname, age) FROM '/home/dbadmin/t.txt';
     Rows Loaded
    -------------
               2
    (1 row)
    
    dbadmin=> SELECT * FROM t;
     Firstname | lastname | age | filename | insertdt | updatedt
    -----------+----------+-----+----------+----------+----------
     brian     | griffin  |   8 |          |          |
     peter     | griffin  |  43 |          |          |
    (2 rows)
    
  • edited February 18
    Thanks. It helps.

    Another query how to track when the data is inserted we generally use now() as insertdt and same to updatedt

    How it will be using copy command and how to populate filename t.csv in filename column.

    Ex:

    firstname|lastname|age|filename|key1|key2|key3|insertdt|updatedt|language
    Rasool|shaik|25|t.csv||||2020:02:18 11:44:49|2020:02:18 11:44:49|English

    In file columns are :
    Firstname, lastname, age, language

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.