Resource Management best practices for “ WOS Full; failover to DIRECT” message

Hi What is the best practices configuration for Resource Management in a case of many “ WOS Full; failover to DIRECT” message on vertica.log file , assuming Trickle Load which is frequency can’t be adjust .

Comments

  • The switch to ROS impact the performance of the load rate dramatically .
  • Its negative impact
  • Hi, How big is your wos pool? Vertica recommends 2GB. If it gets full because you are doing a single load bigger than that, you should do it direct. It is too much data that you can loose in case of power failure. If it gets full because you are doing too many loads, perhaps you can change the MoveOutInterval, so the data is moved more often but consider the MergeOutInterval too so you don't end up with ROS push back. However, you need to also be careful on making those parameters too small or your data can end up processing the files too many times. ( I hope make sense, there is too many details to consider) I recommend you a section in the documentation that talks about how to tune the tuple mover. That could help to shed some light. https://my.vertica.com/docs/6.1.x/HTML/index.htm#14361.htm Hope this helps, Eugenia
  • Thanks , to minimize ROS pushback i will change the WOSDATA to 4G
  • Hi, Sorry, I think I was not clear. I do not recommend to increase the WOSDATA to 4gb. It is too big, Vertica recommends to have 2GB Ros pushback, it is when you reach the max number of ros that by default is 1024. One reason to get ROS pushback is when you move the data from WOS to ROS to often and could create too many ROS containers because those smaller files do not mergeout fast enough. ROS pushback is not that the WOS spill to disk. Hope this is more clear. If you want the WOS to empty more often you could change the MoveOutInterval that by default is set to 5min, but also change the MergeOutInterval so it merges the new ROS. However, do not set those parameters too small. You can check the resource_pool_status and see how much space is being used. select * from resource_pool_status; Eugenia
  • It was very clear before ! , when i change the moveout and meragout frequency the performance was drop probably as a result of TM lock . See below the resource_pool_status status during the time of WOS Full; failover to DIRECT imageimage
  • Hi , When change the 4G no message like this anymore . The moveout and meragout change is very depended on the load type , in my load scenario i find the use of 4G for WOSdata more useful
  • Hi, What is your moveout and mergeout intervals? select * from configuration_parameters where parameter_name ilike '%outinterval%'; Eugenia
  • MergeOutInterval =300 MoveOutInterval=100
  • Hi, Of course if you increase it to 4GB the message will stop but be aware that if for some reason vertica ends abruptly or computer power off, you can lose up to 4gb of data. In addition, you have 4gb that could be in use for the WOS and not for your queries. The change is OK, but I want you to know the implications so you make an informed decision. Eugenia
  • Thanks i understend that , but the lost should be only in case of single node implemenation .
  • Well, if you lose power to one node in a cluster (and you have at least K=1 safety, which is the default), then you're correct that you won't lose data. But when was the last time you lost power to just one machine in a rack?
  • Sure , anyway you do not provide soultion for the data lost , you just advice how to minimize it , anyway , looks like it should be take as a considuration Thanks a lot

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file