Recovering large fact table with a corrupted file

atomixatomix
edited November 2019 in General Discussion

Hello,
Querying a large fact table (~1 PTB) that is partition by month on a 12 node AWS cluster got the following error:

    SELECT
      s3_bucket_month,
      s3_bucket_date,
      COUNT(*) AS event_cnt
    FROM
      mlog.f_event_log_b0
    WHERE
      s3_bucket_month='2018-10-01'
    GROUP BY
      1, 2
    ORDER BY
      1 DESC, 2 DESC;

[Error] Script lines: 37-53 ------------------------

 [Vertica][VJDBC](4519) ERROR: Read failed in FileColumnReader: /vertica/data/a4db/v_a4db_node0008_data/277/025058844f8c6b7301ff124850f95d710110000000301bcd_0.gt Input/output error 

The table itself is partitioned by s3_bucket_month, managed to identity that the _b0 buddy projection has the corrupt file, b1 works fine. We are using version v9.2.1-6 of Vertica
Questions:
1) What is the best practice to deal with this issue? If I delete this file, and recover the node, the recovery might take a long time. Is there a way to guess how long it would take?
2) Would it be faster to to create an other table with same structure, insert data from the good projection into it and swap the partitions?
3) The node is a D2.8xl node on AWS. Do you recommend the entire node to be replaced?
4) How likely are these type of error to happen on PTB scale installations?

Thanks for anyone suggestions.

Tagged:

Best Answer

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file