ERROR 2245: Attempted to create too many ROS containers

Hi,

We have multiple partition tables being loaded sequentially by copy command from hdfs. However, after multiple runs we start getting the below error.

ERROR 2245:  Attempted to create too many ROS containers

We are using copy and delete command simultaneously. we can't use drop partition as we might have to rollback the loading incase it fails in between.


Using manual mergeout or SET_CONFIG_PARAMETER is not helping out. We can't use purge as well as we might need to roll back the load incase it fails in between.

This page states that: https://community.vertica.com/vertica/topics/tuple_mover_falls_behind_resulting_in_constant_growth_o...

The issue has been acknowlegded as a bug by Vertica and they had hoped to release a fix in the next version.

For now we have dropped partition for all our table(which is not suggested) in order to avoid 
"ERROR 2245:  Attempted to create too many ROS containers".


Can you please suggest a any alternative plan for this kind of scenario?

Regards,
-Amit

Comments

  • Hi Amit,

    Could you clarify why you can't use PURGE?  The typical approach in this type of scenario is to perform your inserts and deletes, then commit (or roll back), then purge.  This way, you can roll back your transaction if you need to; purge just cleans up deleted rows.

    Also -- are you using COPY .. DIRECT?  Does each individual load fit completely within the WOS?  (Could you easily tune either the WOS size or your ETL batch size s.t. each individual load does fit within the WOS?)  If possible, you will avoid this issue, and probably see much better performance overall, if you are able to perform all of your inserts and deletes in WOS; then commit and do a moveout (not mergeout) to simultaneously clean up after all of the deletes and compress the results of your ETL into a single on-disk ROS container without any delete-vector files.  (Many, though not all, mergeout problems are caused by moveout problems; overflowing and/or not using the WOS.)

    Adam
  • Hi Adam,

    Even though I am not using Delete or Purge..if I use only Copy(without direct) multiple times still I get this error. 

    Regards,
    -Amit
  • Hi Amit,

    I also asked "Does each individual load fit completely within the WOS?"  Any thoughts there?  (You can see WOS usage by looking at the "wosdata" pool in the "resource_pool_status" table, I believe.)

    Another question -- out of curiosity, how many partitions do your tables have?  And how many of them receive new data for each new COPY statement?  For example, if you're partitioned by date and loading chronologically, you'll only be loading into the most recent partitions; but if you're partitioned by (for example) username and loading chronologically, any given day will probably have data for each of many usernames, so, each of many partitions.

    Thanks,
    Adam
  • Hi Adam,

    The WOS threshold size is around 2 GB, my each copy command definitely loads much much less size  than that.  But I run multiple copy commands immediately one after another or sequentially in different session, then I encounter this error.

    Yes, My data always has new partitions as its partitioned by Date.

    Sometimes, it is partitioned by multiple columns. Varies from table to table.
    Let me know if you need any further details.

    Regards,
    -Amit
  • Also I tried doing manual merge out multiple times: 

    SELECT DO_TM_TASK('mergeout','database.schema.table');

    It does not have any impact on the total number of ROS count, when I check the ros count in projection_storage table  , it still looks the same.



  • Hi Amit,

    Thanks for the update.  I've asked a few questions which you haven't answered yet.  I've also suggested a few lines of investigation which you haven't looked at closely yet.  If you have any more information, let me know.

    If you're having trouble figuring out what's going on here, I would encourage you to get in touch with our Support or Professional Services teams.  Usually this error indicates that you are either loading in an unusual way or you have projections that are not configured according to our recommendations, such that you have ended up with far too many small ROS containers.  This can happen in various different ways -- partitions that are too granular, small COPYs into a WOS that's already full, etc.  Our Professional Services folks are very good at helping with this sort of tuning.

    You could also try installing and looking at the Vertica Management Console.  It will warn you about various potential configuration issues.  (Not necessarily all possible issues, but it might help.)

    Adam
  • Hi Adam,

    I went through your replies above: Below are the questions you have asked:


    You asked me  "Does each individual load fit completely within the WOS?" 

    My Answer to that was(above):

    The WOS threshold size is around 2 GB, my each copy command definitely loads much much less size  than that.  But I run multiple copy commands immediately one after another or sequentially in different session, then I encounter this error.

    You asked me  "Could you clarify why you can't use PURGE? 

    My answer to that was:

    I am getting the same error, even though I use only copy commands. I can try using Purge though once this issue with Copy is fixed.

    You asked me :
    Also -- are you using COPY .. DIRECT?

    My Answer:
    No, I am using Copy with webhdfs.

    You asked me:

    Another question -- out of curiosity, how many partitions do your tables have?  And how many of them receive new data for each new COPY statement?  

    To which I had answered above:

    Yes, My data always has new partitions  every time its loaded as its partitioned by Date.


    Please let me know if you need any further information.

    I tried doing manual merge out multiple times: 

    SELECT DO_TM_TASK('mergeout','database.schema.table');

    It does not have any impact on the total number of ROS count, when I check the ros count in projection_storage table  , it still looks the same.

    I also tried doing Merge_Partition, this works for me as I can see the ros count decreasing.

    Could you please suggest me a appropriate way. Whether using Merge_Partition  will have any other impact on our database? 

    Please let me know if you need any further information.

    Regards,-Amit
  • increase the ContainersPerProjectionLimit worked for me.

    check with below command.

    SELECT * FROM CONFIGURATION_PARAMETERS WHERE parameter_name = 'ContainersPerProjectionLimit' ;

    change/increase by below command.

    SELECT SET_CONFIG_PARAMETER('ContainersPerProjectionLimit', 2048);

  • Applies to Vertica 7.2.x and earlier> @Jim_Knicely said:

    Second link says: Applies to Vertica 7.2.x and earlier. Is that a typo or is there an updated version of article?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file