Recovering large fact table with a corrupted file

atomix · November 2019

Hello,
Querying a large fact table (~1 PTB) that is partition by month on a 12 node AWS cluster got the following error:

    SELECT
      s3_bucket_month,
      s3_bucket_date,
      COUNT(*) AS event_cnt
    FROM
      mlog.f_event_log_b0
    WHERE
      s3_bucket_month='2018-10-01'
    GROUP BY
      1, 2
    ORDER BY
      1 DESC, 2 DESC;

[Error] Script lines: 37-53 ------------------------

 [Vertica][VJDBC](4519) ERROR: Read failed in FileColumnReader: /vertica/data/a4db/v_a4db_node0008_data/277/025058844f8c6b7301ff124850f95d710110000000301bcd_0.gt Input/output error

The table itself is partitioned by s3_bucket_month, managed to identity that the _b0 buddy projection has the corrupt file, b1 works fine. We are using version v9.2.1-6 of Vertica
Questions:
1) What is the best practice to deal with this issue? If I delete this file, and recover the node, the recovery might take a long time. Is there a way to guess how long it would take?
2) Would it be faster to to create an other table with same structure, insert data from the good projection into it and swap the partitions?
3) The node is a D2.8xl node on AWS. Do you recommend the entire node to be replaced?
4) How likely are these type of error to happen on PTB scale installations?

Thanks for anyone suggestions.

emoreno · November 2019

Hi,
The simplest option should be 2. Les risk and easy to do.

About question 3, you should check if you have more corrupted files. Maybe a disk failed so replacing the node may be the option. You could run :
https://www.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AdministratorsGuide/OperatingTheDatabase/IndexCRC/RunningTheCheckCRCOption.htm?Highlight=CRC

About question 4: I don't think is related to Vertica but rather a disk failure. The recovery should take care of corrupted files, but as you said in point 1, recovery may take some times time; it is why I think that option 2 is the best as you can do it while the db is up, but 1 and 2 should work.

Hope this helps.
Eugneia

atomix · November 2019

Thanks Eugenia!

Recovering large fact table with a corrupted file

Best Answer

Answers

Leave a Comment