vertica copy command bug?

rajatpaliwal86 · February 2020

'm getting this error while trying to run a copy table command to load a JSON. I also tried creating a resource pool with higher MAXQUERYMEMORYSIZE, QUERY_BUDGET_KB, and MEMORY SIZE value but still getting the same error.
below is the command:

copy_user=> copy test_load from local '/home/dbadmin/test.json' PARSER fjsonparser() rejected data as table test_load_rejected;
ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted) HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file

-----------+----------------+--------------------+---------------------+-----------------+-----------------

copy_pool | 0 | 173225413 | 1 | | 164564128

The interesting thing is even If I change the JSON size, the bytes requested in the error remains the same.
(536870912 requested, 268435456 permitted)

This looks like a bug to me since I even tried to run the query with a much higher query budget pool. The JSON is hardly ~ 300 MB.
Any help would be appreciated.

Bryan_H · February 2020

I am able to reproduce the issue:

dbadmin=> create flex table forumtest();
CREATE TABLE
dbadmin=> copy forumtest from local 'test_file.json' parser fjsonparser();
ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)
HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file

However, the issue is that the script creates one very large JSON array. The test script works when I remove the square brackets and the commas at the end of each line:

#!/bin/bash
for((i=0;i<10000000;i++))
do
echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file_1.json;
done
echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file_1.json;

dbadmin=> copy forumtest from local 'test_file_1.json' parser fjsonparser();

Rows Loaded
10000001
(1 row)

Bryan_H · February 2020

What exact version of Vertica is this?
Are there any other messages in vertica.log or the UDx logs on the node that receives the request?
How big is each JSON object in the file, or is it one big JSON object?

SruthiA · February 2020

@rajatpaliwal86 we had this issue once in the past and it was fixed in 9.1.1. If your vertica version is 9.2 or 9.3 please open support case. it looks like a bug.

rajatpaliwal86 · February 2020

@Bryan_H said:
What exact version of Vertica is this?
Are there any other messages in vertica.log or the UDx logs on the node that receives the request?
How big is each JSON object in the file, or is it one big JSON object?

It is from Vertica 9.2.
The file contains an array of very small JSON objects.

rajatpaliwal86 · February 2020

@SruthiA said:
@rajatpaliwal86 we had this issue once in the past and it was fixed in 9.1.1. If your vertica version is 9.2 or 9.3 please open support case. it looks like a bug.

It is from Vertica 9.2.
Where can I open the case?

SruthiA · February 2020

Please click on "Service Requests" in the below link and open a support case by logging in into the portal

https://softwaresupport.softwaregrp.com/

rajatpaliwal86 · February 2020

@Bryan_H
This is the log I can see during the error:

2020-02-07 06:21:46.458 Subsession:7fac911f8700 [Txn] Commit Complete: Txn: a0000000001ec1 at epoch 0x1a and new global catalog version 1458
2020-02-07 06:21:46.817 EEcmdq:7fac7872a700 [UserMessage] COPY - Tearing down ContinuousUDL coroutine stack
2020-02-07 06:21:46.907 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 00000/5077: Total memory required by query: [725092 KB]
2020-02-07 06:21:46.929 Init Session:7faca11dd700-a0000000001e9f @v_athenadb_node0001: 53200/2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)
HINT: This can happen if you have very wide rows, or if you have specified an incorrect record terminator. (Check your escaping -- for example, "'\t'" is not a tab; you must use "E'\t'".) It can also happen if you are reading from a corrupted compressed file
LOCATION: alloc, /data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_2_0-x_grader/build/vertica/EE/UDL/LoadBufferManager.cpp:129
2020-02-07 06:22:08.016 DiskSpaceRefresher:7fac92ffd700 [Util] Task 'DiskSpaceRefresher' enabled
2020-02-07 06:22:17.009 ManageEpochs:7fac92ffd700-a0000000001ec2 [Txn] Begin Txn: a0000000001ec2 'Manage Epoch'

SergeB · February 2020

Does your JSON file contain multiple JSON records?

rajatpaliwal86 · February 2020

@SergeB said:
Does your JSON file contain multiple JSON records?

Yes, the JSON is created by a bash script. The file contains array of small JSON objects.

rajatpaliwal86 · February 2020

To reproduce the issue, you can use the below script to create the sample JSON

echo "[" > test_file.json;
for((i=0;i<10000000;i++))
do
echo "{\"name\" : \"john\", \"age\" : 24}," >> test_file.json;
done

echo "{\"name\" : \"john\", \"age\" : 24}" >> test_file.json;
echo "]" >> test_file.json;

SergeB · February 2020

So it does look like the file contains only one JSON record (which happens to be an array)? If so the error is probably expected...
If you want to process each array element as one ROW, you should look at using start_point ?
For instance, take this file
{"mynamearray" : [ {"name": "john", "age": 24},{"name": "john", "age": 24},{"name": "john", "age": 24}]}
create flex table xfoo();
COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER();

Rows Loaded

COPY xfoo from '/tmp/array.json' PARSER FJSONPARSER(start_point='mynamearray');

Rows Loaded

RasoolShaik · February 2020

I am trying to load data using insert command and it is taking too much time to load data.

Used copy command and it works fine.

My problem is I don't want to load data in some columns and these columns need not be present in file.

How to form a copy query to load data quickly.

Please do needful.

Thanks.

Jim_Knicely · February 2020

Check out the FILLER parameter of the COPY command!

See:
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm

RasoolShaik · February 2020

Jim my issue is :
In file columns are firstname, lastname, age
Table columns are :
Firstname, lastname, age, filename, key_1,key_2,key_3,insertdt, updatedt

Now i dont want to load data in key_x columns.
how to form a copy query using fillers
Please help

Jim_Knicely · February 2020

Oh. In this particular case, you do this:

dbadmin=> CREATE TABLE t (Firstname VARCHAR, lastname VARCHAR, age INT, filename VARCHAR, key_1 INT, key_2 INT, key_3 INT, insertdt DATE, updatedt DATE);
CREATE TABLE

dbadmin=> \! cat /home/dbadmin/t.txt
peter|griffin|43
brian|griffin|8

dbadmin=> COPY t (Firstname, lastname, age) FROM '/home/dbadmin/t.txt';
 Rows Loaded
-------------
           2
(1 row)

dbadmin=> SELECT * FROM t;
 Firstname | lastname | age | filename | insertdt | updatedt
-----------+----------+-----+----------+----------+----------
 brian     | griffin  |   8 |          |          |
 peter     | griffin  |  43 |          |          |
(2 rows)

RasoolShaik · February 2020

Thanks. It helps.

Another query how to track when the data is inserted we generally use now() as insertdt and same to updatedt

How it will be using copy command and how to populate filename t.csv in filename column.

Ex:

firstname|lastname|age|filename|key1|key2|key3|insertdt|updatedt|language
Rasool|shaik|25|t.csv||||2020:02:18 11:44:49|2020:02:18 11:44:49|English

In file columns are :
Firstname, lastname, age, language

We're Moving!

Create My New Community Account Now

vertica copy command bug?

Best Answer

Rows Loaded

Answers

Rows Loaded

Rows Loaded

Leave a Comment