Strange Error message during query

When I run a particular query I get this error message ERROR 3409: FileColumnReader: block /Fast/vertica/testdb1/testdb1/v_testdb1_node0003_data/375/54043195528488375/54043195528488375_0.fdb @ 1384889305 's CRC 3c98b7217e1eaaa5 doesn't match record 70a4b7291ed9714a HINT: Data file may be corrupt. Ensure that all hardware (disk and memory) is working properly. Possible solutions are to delete the file /Fast/vertica/testdb1/testdb1/v_testdb1_node0003_data/375/54043195528488375/54043195528488375_0.fdb while the node is down, and then allow the node to recover, or truncate the table data The database had been fine. After this point all errors around same tables get errors. There is nothing in the logs and all nodes are still runing fine. Where should I be looking to try to figure out what is going wrong. Setup is a 3 node cluster on community edition 6.1.2

Comments

  • Hi Dwayne, This is a CRC error, and the file in question is a ROS file (ie., Vertica's data storage). CRC is a form of checksum. In short, the messages means what it says -- one of Vertica's underlying data files has become corrupt. This typically happens due either to a hardware failure in the drive housing the data, or a bug in the filesystem on the machine in question. The error message correctly informs you of your options about how to proceed. If you have K-safety (K=1 or greater on this table), and if only one of the two superprojections on the table is affected, the simplest option is to delete the file; Vertica will recover from the buddy projection. (Though this can take a long time if you have customized your buddy projections to have different sort orders.) A third option, if you have a backup, would be to restore from that backup. If you would like to mitigate the possibility of this happening in the future, please make sure to follow Vertica's best practices regarding hard-disk configuration. In particular, we recommend using ext3 (or ext4 against a recent kernel -- various issues have been fixed in ext4 since its initial release by the major distributions) running directly on top of a RAID array from a reputable vendor. Avoid LVM, or any other layer of indirection; we've seen bugs in those. Definitely beware of any kind of network-attached storage; we've seen lots of issues with those, they're often not designed to handle the kinds of loads that Vertica throws at them. Even with all of that, sometimes this just happens. The typical modern consumer hard drive is rated for an error rate of 10^-14 -- one in a hundred-trillion bits will be incorrect. (You can check the spec sheet for your drives; some enterprise-class drives are better.) A hundred-trillion bits is roughly 12 trillion bytes, or 12 TB. So if you rebuild a 12TB RAID array, even with zero bugs anywhere and everything operating as specified (which is a pretty optimistic outlook on things), it's reasonable to expect some bad bit of data to have been copied from the old drives to the new drive. This is why keeping multiple backups is important :-) Adam
  • Hi, I am pretty much new to Vertica, and am getting the same error. I am clueless for a solution here. Could you please help me resolve this bug.

  • We found a two-step solutions:
    1. Perform a disk scan to check for bad blocks or any other hardware issues.
    2. Restore from backup.
    If you skip the first step, you might be missing important problems

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file