db crash suddenly, if the records which locate in WOS but not flush into ROS would be lost
if I insert a record into the VERTICA db with a sql “insert into xxx values(….)”. And then I commit the transaction. The record will be locate in the WOS, right?
Now if the db host server crash suddenly before the record is flush to the ROS, whether this record would be lost? I don’t find any redo log to guarantee the commited data.
0
Comments
For a single node database, you are correct. This is why Vertica will provide regular warnings if you try to run a single-node database in production. Single-node Vertica DB's are for development purposes only.
For a multi-node database, you are still partially correct: Vertica does not trust disk to maintain consistency. Physical disk (even with RAID) can be frustratingly fallable... Many lower-end drives and RAID cards cache and re-order writes even when instructed not to, which can corrupt redo logs in the event of a hard-crash. Even with good hardware, drives do periodically fail. Most big Vertica clusters do have RAID arrays. You would expect that they could recover from losing a drive; but classic hard drives are fundamentally electromechanical devices, they do have read errors occasionally, and a RAID rebuild requires correctly reading a huge number of bytes. Look up some numbers here; you might be surprised.
So, what do we do? On clusters of at least 3 nodes (the minimum recommended size for production deployments), when INSERT completes, we guarantee that the data is present on *at least 2 nodes*. If one node crashes, Vertica just fails over to that node's buddy and keeps going. When that node comes back, it recovers data from its buddy. "RAIN" (redundant array of independent nodes), rather than RAID.
If too many of your nodes fail at once, same as if you lose multiple physical drives in a RAID, then yes, WOS data will be lost. Vertica does have some redo logging on its catalog; as a result, if you are able to recover data from enough of the disabled nodes, while some commits may have become corrupt and been lost, Vertica can at least roll back to a consistent state. This enables users who use Vertica in its traditional usage model as an analytics system, feeding it information from other automated systems, to backfill recent data from those systems.
Adam
Hi Adam,
So these inserts into WOS are replicated accross 2 nodes is that correct? Providing redundancy if one of those nodes fails. So presumably in the event of the entire system being brought down (all nodes), it's important to ensure nothing is stored in WOS at that time or all that data would be lost?
Thanks,
Anthony