When I insert my data to vertica

I have one question 

 

I want to insert my data that is too long length

 

but I don't know it's real length

 

so, 

 

a lot of data use just varchart(50)  so

 

i made the space varchart(60)

 

but some word need more space. vertica return error but don't tell me the space...

 

how i catch my data's max length...

Comments

  • Hi choisika,

     

    Using COPY will get you much better performance than doing bulk INSERTs. So I recommend you use COPY instead of INSERT. Using COPY will also allow you to get information about the rejected data.

     

    For example, this COPY statement will write out the reason data is rejected (known as Exceptions) as well as the rejected data to two separate files:

     

    COPY my_test_table (test_col varchar(50)) FROM LOCAL '/path/to/your_data.csv' DELIMITER ',' EXCEPTIONS '/home/username/except.out' REJECTED DATA '/home/username/reject.out' ENCLOSED BY '"';

    This example loads a CSV file, but it can be modified for any other supported Vertica data source.

     

    You could also load your data into Vertica with a column that is large enough to accept all of your data, and then use the MAX() function to figure out the maximum string length. Once you have the maximum string length you can create a new table or column and add the data, dropping the old column.

     

    Let me know if this helps solve your problem.

     

     

    -C

     

     

  • Ok , first you need to cleanup your data.

     

    Do you need this empty space to be inserted into the database ? 

     

    No i don`t need to is just the file comes like this  ! 

    Ok fix this :

    - remove all spaces(multiple space) and add only one.

     

    sed 's/ \+/ /g' filename > newfilename  ##replaces each instance of one or more spaces with a single space

    See the lenght of your columns before inserting into the database:

    [aodba]# cat bla
    12345,abcdsdsdsdsd sdsd sdsd esdfsfsefrs sdfsfs sdfsf dfs ,sdsfds123 3123 1 13124
    asass, 34234 sdad ,34234 sdad ads adsad
    wadads,asda,d
    sas asdad a,asd 5453 345 ,dfsf

    [aodba]# cat bla | awk -F',' '{print length($1)" , "length($2)" , "length($3)}'
    5 , 52 , 22
    5 , 12 , 20
    6 , 4 , 1
    11 , 14 , 5

    - you can get only one columns and sort it by max val first :

    [aodba]# cat bla | awk -F',' '{print length($1)}'|sort
    11
    5
    5
    6

    - you can do this for all columns , this script can be incremented to fit all columns and order them, but you will have to do this :) .

     

     

    For more info can you put in your copy command , so some of the experts will be able to help you :)

     

    thx - hope this was helpful

  •  

    Sorry, my real problem is another

     

    "Too many ROS containers exist for the following projections" its my real problem

     

    thank you for your kindness "Casey_s" and "Adrian_Oprea_1" 

     

    Have a nice day!!! 

  •  Are you using serial inserts ? 

     

    This will trigger to many ROS containers to come up . 

    Try to use the COPY command to insert data into your table.

     

    Also try to run merge on the table with the ROS issue:

    select do_tm_task('MergeOut','<PROJECTION SCHEMA>.<PROJECTION NAME>');

     Find the tables with most ROS containers :

    SELECT projection_schema || '.' || projection_name, ros_count FROM projection_storage
    ORDER BY ros_count DESC
    LIMIT 10;

     

  • Hi Adrian_Oprea_1

     

    Thank you!!

     

    you are very very kind!!!!

     

     I have one more question,,,

     

    I carried my data to vertica use pentaho,,

     

    then.. If I use mergeout code, where i write?

     

     

  •  

    In pentaho there is an option See this link for more in detail http://wiki.pentaho.com/display/EAI/Vertica+Bulk+Loader

     

    Also see -- 

    Best Practices for Loading and Updating Data

    To avoid creating too many or too few ROS containers, follow these recommendations:

    • Load and delete data less frequently in larger batches.

    • If you are working with many small files of raw data, use a single COPY statement to load these files at one time. Use the wildcard support to COPY from all files, or concatenate all files to a named pipe and have COPY load from the pipe.

    • Use a single DELETE/UPDATE statement to delete or update batches of rows. When possible, delete or update batches of rows at a time to avoid creating too many delete vector ROS containers. There are many possible reasons as to why a load is not performing to expectations. Both Pentaho and HP Vertica provide a number of documents and best practices that can help you troubleshoot data load issues.

     

    Also see Vertica Docs on TM management:

    https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/TupleMover/TuningTheTupleMover.htm

     

    -- this will help you understand better how yor TM is doing and what needs to be done. 

     

     

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file