Avoiding too-many-ros-containers-error

Hallo, My name is Katrin and I work for 24/7 media in Germany. Recently we switched from infinidb to vertica. Everything was fine. But a few weeks ago, we recognized an error during our load processes: ERROR 5065: Too many ROS containers exist for the following projections ... For the first time a manual mergeout solved the problem. But now the problem occurs more often and for more projections. Now my question: We have an aggregation table called pts_agg_url_report, partitioned by month, with two projections, segmented by a hash. Every hour, we delete the last 4 days with a statement like "delete from pts_agg_url_report where agg_date >= (current_date -4)". Then we insert 4 days data with a statement like "insert into pts_agg_url_report(agg_date, ...) select view_date, ... from pts_raw_data_delivery_detail where view_date >= (current_date -4) group by view_date" Is it possible to avoid the too-many-rows-error or reducing the frequency of the too-many-rows-error by doing the following? 1. partitioning pts_agg_url_report by date (!!! not day in year, because we need to store data of 2 years !!!) 2. Every hour dropping the relevant partitions and inserting like before. A mergeout for the aggregation table takes a few hours. Another question: Can we still load data in these tables during a mergeout? Or do we have to wait until the mergeout is finished? Thank you. Kind regards Katrin

Comments

  • Hi Katrin, Regarding the "too many ROS containers" issue, you might want to follow this thread: https://community.vertica.com/vertica/topics/tuple_mover_falls_behind_resulting_in_constant_growth_of_ros_containers (Note that ROS containers are different from rows; this isn't a "too many rows" error. You can look up ROS containers in our documentation if you're curious to learn more.) Regarding partitioning, I would definitely recommend using DROP PARTITION rather than DELETE for this. Not necessarily because it will fix the ROS-containers issue (though it might); more because if you have so many deleted rows in a table with Vertica, you will have extraordinarily bad query performance, and will probably not be able to recover from a crash in a timely manner. (This is true in other databases as well; it's just more dramatic in Vertica, we've optimized heavily for workloads based on SELECT, COPY (or batch-INSERT), and INSERT/SELECT at the cost of UPDATE and DELETE.) Though, actually, you will find that partitioning by hour over two years will give you too many partitions, and lead to both slow query performance and errors... It sounds to me like you want a staging table -- keep the last 4 days partitioned by hour in a separate table, and use that as a working space for all of these modifications of the data; then every hour, move data that is more than 4 days old into the big table that's archived for two years. If you want to be able to query the two tables together, you can put a view on top that UNIONs them together. Adam
  • Also, yes, you can continue to use (load into and query from) tables during a mergeout. Though you may see lower query performance since the cluster is also busy doing the mergeout.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file