Recovering large fact table with a corrupted file
Querying a large fact table (~1 PTB) that is partition by month on a 12 node AWS cluster got the following error:
SELECT s3_bucket_month, s3_bucket_date, COUNT(*) AS event_cnt FROM mlog.f_event_log_b0 WHERE s3_bucket_month='2018-10-01' GROUP BY 1, 2 ORDER BY 1 DESC, 2 DESC;
[Error] Script lines: 37-53 ------------------------
[Vertica][VJDBC](4519) ERROR: Read failed in FileColumnReader: /vertica/data/a4db/v_a4db_node0008_data/277/025058844f8c6b7301ff124850f95d710110000000301bcd_0.gt Input/output error
The table itself is partitioned by s3_bucket_month, managed to identity that the _b0 buddy projection has the corrupt file, b1 works fine. We are using version v9.2.1-6 of Vertica
1) What is the best practice to deal with this issue? If I delete this file, and recover the node, the recovery might take a long time. Is there a way to guess how long it would take?
2) Would it be faster to to create an other table with same structure, insert data from the good projection into it and swap the partitions?
3) The node is a D2.8xl node on AWS. Do you recommend the entire node to be replaced?
4) How likely are these type of error to happen on PTB scale installations?
Thanks for anyone suggestions.