Recovering large fact table with a corrupted file
Hello,
Querying a large fact table (~1 PTB) that is partition by month on a 12 node AWS cluster got the following error:
SELECT s3_bucket_month, s3_bucket_date, COUNT(*) AS event_cnt FROM mlog.f_event_log_b0 WHERE s3_bucket_month='2018-10-01' GROUP BY 1, 2 ORDER BY 1 DESC, 2 DESC;
[Error] Script lines: 37-53 ------------------------
[Vertica][VJDBC](4519) ERROR: Read failed in FileColumnReader: /vertica/data/a4db/v_a4db_node0008_data/277/025058844f8c6b7301ff124850f95d710110000000301bcd_0.gt Input/output error
The table itself is partitioned by s3_bucket_month, managed to identity that the _b0 buddy projection has the corrupt file, b1 works fine. We are using version v9.2.1-6 of Vertica
Questions:
1) What is the best practice to deal with this issue? If I delete this file, and recover the node, the recovery might take a long time. Is there a way to guess how long it would take?
2) Would it be faster to to create an other table with same structure, insert data from the good projection into it and swap the partitions?
3) The node is a D2.8xl node on AWS. Do you recommend the entire node to be replaced?
4) How likely are these type of error to happen on PTB scale installations?
Thanks for anyone suggestions.
Best Answer
-
emoreno Employee
Hi,
The simplest option should be 2. Les risk and easy to do.About question 3, you should check if you have more corrupted files. Maybe a disk failed so replacing the node may be the option. You could run :
https://www.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AdministratorsGuide/OperatingTheDatabase/IndexCRC/RunningTheCheckCRCOption.htm?Highlight=CRCAbout question 4: I don't think is related to Vertica but rather a disk failure. The recovery should take care of corrupted files, but as you said in point 1, recovery may take some times time; it is why I think that option 2 is the best as you can do it while the db is up, but 1 and 2 should work.
Hope this helps.
Eugneia5
Answers
Thanks Eugenia!