When I insert my data to vertica

choisika · March 2016

I have one question

I want to insert my data that is too long length

but I don't know it's real length

so,

a lot of data use just varchart(50) so

i made the space varchart(60)

but some word need more space. vertica return error but don't tell me the space...

how i catch my data's max length...

Casey_S · March 2016

Hi choisika,

Using COPY will get you much better performance than doing bulk INSERTs. So I recommend you use COPY instead of INSERT. Using COPY will also allow you to get information about the rejected data.

For example, this COPY statement will write out the reason data is rejected (known as Exceptions) as well as the rejected data to two separate files:

COPY my_test_table (test_col varchar(50)) FROM LOCAL '/path/to/your_data.csv' DELIMITER ',' EXCEPTIONS '/home/username/except.out' REJECTED DATA '/home/username/reject.out' ENCLOSED BY '"';

This example loads a CSV file, but it can be modified for any other supported Vertica data source.

You could also load your data into Vertica with a column that is large enough to accept all of your data, and then use the MAX() function to figure out the maximum string length. Once you have the maximum string length you can create a new table or column and add the data, dropping the old column.

Let me know if this helps solve your problem.

-C

Adrian_Oprea_1 · March 2016

Ok , first you need to cleanup your data.

Do you need this empty space to be inserted into the database ?

No i don`t need to is just the file comes like this !

Ok fix this :

- remove all spaces(multiple space) and add only one.

sed 's/ \+/ /g' filename > newfilename  ##replaces each instance of one or more spaces with a single space

See the lenght of your columns before inserting into the database:

[aodba]# cat bla
12345,abcdsdsdsdsd sdsd sdsd esdfsfsefrs sdfsfs sdfsf dfs ,sdsfds123 3123 1 13124
asass, 34234 sdad ,34234 sdad ads adsad
wadads,asda,d
sas asdad a,asd  5453 345 ,dfsf

[aodba]# cat bla | awk -F',' '{print length($1)" , "length($2)" , "length($3)}'
5 , 52 , 22
5 , 12 , 20
6 , 4 , 1
11 , 14 , 5

- you can get only one columns and sort it by max val first :

[aodba]# cat bla | awk -F',' '{print length($1)}'|sort
11
5
5
6

- you can do this for all columns , this script can be incremented to fit all columns and order them, but you will have to do this .

For more info can you put in your copy command , so some of the experts will be able to help you

thx - hope this was helpful

choisika · March 2016

Sorry, my real problem is another

"Too many ROS containers exist for the following projections" its my real problem

thank you for your kindness "Casey_s" and "Adrian_Oprea_1"

Have a nice day!!!

Adrian_Oprea_1 · March 2016

Are you using serial inserts ?

This will trigger to many ROS containers to come up .

Try to use the COPY command to insert data into your table.

Also try to run merge on the table with the ROS issue:

select do_tm_task('MergeOut','<PROJECTION SCHEMA>.<PROJECTION NAME>');

Find the tables with most ROS containers :

SELECT projection_schema || '.' || projection_name, ros_count FROM projection_storage
ORDER BY ros_count DESC
LIMIT 10;

choisika · March 2016

Hi Adrian_Oprea_1

Thank you!!

you are very very kind!!!!

I have one more question,,,

I carried my data to vertica use pentaho,,

then.. If I use mergeout code, where i write?

Adrian_Oprea_1 · March 2016

In pentaho there is an option See this link for more in detail http://wiki.pentaho.com/display/EAI/Vertica+Bulk+Loader

Also see --

Best Practices for Loading and Updating Data

To avoid creating too many or too few ROS containers, follow these recommendations:

• Load and delete data less frequently in larger batches.

• If you are working with many small files of raw data, use a single COPY statement to load these files at one time. Use the wildcard support to COPY from all files, or concatenate all files to a named pipe and have COPY load from the pipe.

• Use a single DELETE/UPDATE statement to delete or update batches of rows. When possible, delete or update batches of rows at a time to avoid creating too many delete vector ROS containers. There are many possible reasons as to why a load is not performing to expectations. Both Pentaho and HP Vertica provide a number of documents and best practices that can help you troubleshoot data load issues.

Also see Vertica Docs on TM management:

https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/TupleMover/TuningTheTupleMover.htm

-- this will help you understand better how yor TM is doing and what needs to be done.

When I insert my data to vertica

Comments

Leave a Comment