vertica copy command bug?
'm getting this error while trying to run a copy table command to load a JSON. I also tried creating a resource pool with higher MAXQUERYMEMORYSIZE, QUERY_BUDGET_KB, and MEMORY SIZE value but still getting the same error.
below is the command:
copy_user=> copy test_load from local '/home/dbadmin/test.json' PARSER fjsonparser() rejected data as table test_load_rejected;
ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted) HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file
pool_name | memory_size_kb | max_memory_size_kb | planned_concurrency | max_concurrency | query_budget_kb
-----------+----------------+--------------------+---------------------+-----------------+-----------------
copy_pool | 0 | 173225413 | 1 | | 164564128
The interesting thing is even If I change the JSON size, the bytes requested in the error remains the same.
(536870912 requested, 268435456 permitted)
This looks like a bug to me since I even tried to run the query with a much higher query budget pool. The JSON is hardly ~ 300 MB.
Any help would be appreciated.
Best Answer
-
Bryan_H Vertica Employee Administrator
I am able to reproduce the issue:
dbadmin=> create flex table forumtest();
CREATE TABLE
dbadmin=> copy forumtest from local 'test_file.json' parser fjsonparser();
ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)
HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed fileHowever, the issue is that the script creates one very large JSON array. The test script works when I remove the square brackets and the commas at the end of each line:
#!/bin/bash
for((i=0;i<10000000;i++))
do
echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file_1.json;
done
echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file_1.json;dbadmin=> copy forumtest from local 'test_file_1.json' parser fjsonparser();
Rows Loaded
10000001
(1 row)
0
Answers
What exact version of Vertica is this?
Are there any other messages in vertica.log or the UDx logs on the node that receives the request?
How big is each JSON object in the file, or is it one big JSON object?
@rajatpaliwal86 we had this issue once in the past and it was fixed in 9.1.1. If your vertica version is 9.2 or 9.3 please open support case. it looks like a bug.
It is from Vertica 9.2.
The file contains an array of very small JSON objects.
It is from Vertica 9.2.
Where can I open the case?
Please click on "Service Requests" in the below link and open a support case by logging in into the portal
https://softwaresupport.softwaregrp.com/
@Bryan_H
This is the log I can see during the error:
2020-02-07 06:21:46.458 Subsession:7fac911f8700 [Txn] Commit Complete: Txn: a0000000001ec1 at epoch 0x1a and new global catalog version 1458
2020-02-07 06:21:46.817 EEcmdq:7fac7872a700 [UserMessage] COPY - Tearing down ContinuousUDL coroutine stack
2020-02-07 06:21:46.907 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 00000/5077: Total memory required by query: [725092 KB]
2020-02-07 06:21:46.929 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 53200/2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)
HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file
LOCATION: alloc, /data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_2_0-x_grader/build/vertica/EE/UDL/LoadBufferManager.cpp:129
2020-02-07 06:22:08.016 DiskSpaceRefresher:7fac92ffd700 [Util] Task 'DiskSpaceRefresher' enabled
2020-02-07 06:22:17.009 ManageEpochs:7fac92ffd700-a0000000001ec2 [Txn] Begin Txn: a0000000001ec2 'Manage Epoch'
Does your JSON file contain multiple JSON records?
Yes, the JSON is created by a bash script. The file contains array of small JSON objects.
To reproduce the issue, you can use the below script to create the sample JSON
echo "[" > test_file.json;
for((i=0;i<10000000;i++))
do
echo "{\"name\" : \"john\", \"age\" : 24}," >> test_file.json;
done
echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file.json;
echo "]" >> test_file.json;
So it does look like the file contains only one JSON record (which happens to be an array)? If so the error is probably expected...
If you want to process each array element as one ROW, you should look at using start_point ?
For instance, take this file
{"mynamearray" : [ {"name": "john", "age": 24},{"name": "john", "age": 24},{"name": "john", "age": 24}]}
create flex table xfoo();
COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER();
Rows Loaded
COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER(start_point='mynamearray');
Rows Loaded
Used copy command and it works fine.
My problem is I don't want to load data in some columns and these columns need not be present in file.
How to form a copy query to load data quickly.
Please do needful.
Thanks.
Check out the FILLER parameter of the COPY command!
See:
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm
In file columns are firstname, lastname, age
Table columns are :
Firstname, lastname, age, filename, key_1,key_2,key_3,insertdt, updatedt
Now i dont want to load data in key_x columns.
how to form a copy query using fillers
Please help
Oh. In this particular case, you do this:
Another query how to track when the data is inserted we generally use now() as insertdt and same to updatedt
How it will be using copy command and how to populate filename t.csv in filename column.
Ex:
firstname|lastname|age|filename|key1|key2|key3|insertdt|updatedt|language
Rasool|shaik|25|t.csv||||2020:02:18 11:44:49|2020:02:18 11:44:49|English
In file columns are :
Firstname, lastname, age, language