how to fend off against EC2 failures / EBS corruptions
For folks running on AWS / EC2, from time to time a node fails due to a transient communication error with EBS. This may also leave some corrupt data files and vertica process requests the the force cleanup restart.
(1) Is to advisable to change the vertica daemon script that start vertica upon a host reboot to always invoke the force flag ( "--force" ) so that force restart is commonly used ? Can --force node restart be harmful ?
(2) Is it possible to detect the node down and trigger a host restart ( and vertica process restart with force data file cleanup as per point above ) ?
any recommendations on how to provide a more resilient system on AWS ?