Reindexing After 6.0 Upgrade

Deep in the "Vertica® Enterprise Edition 6.0 Administrator's Guide", section, "Working with the Vertica Index Tool" discusses reindexing after upgrading to Vertica 6.0. No mention of reindexing is made in "Vertica® Enterprise Edition 6.0 Installation Guide". So I didn't do it. Have I made a fatal error?

Comments

  • After upgrading to 6.0, I upgraded to 6.1.1. Vertica seems to be working properly.
  • Hi Jack, as you've noticed, you have not made a fatal error :-) Running the index tool will, however, improve query performance for some queries that make use of the new metadata fields that are computed through the reindexing process.
  • That's a relief. Re-indexing 22T could be time-consuming.
  • Hi Jack: I'm opening a doc request to a) add a note to the install guide about 'optionally' reindexing, and b) describe why you might want to reindex, as Adam describes. Thanks for asking about this.
  • I was told Vertica does not maintain any indexes and this was one of the attractive features compared to other legacy DBs. Its not just usage of maintaining indexes but the thing that is concerning is - To reindex with the database cluster DOWN. Can you please provide some estimates of how long this should take for 1 TB, 10 TB and 100TB DBs?
  • I would expect it to take approximately as long as a single simple table scan on the table in question. Of course, how long a table scan takes depends entirely on your hardware. Largely your disk IO performance, as compared to the size of the table on disk. The update will, eventually, happen online too. You don't need to perform an offline reindex if you don't want to. You won't see any performance loss as compared to the older Vertica version, nor will you see any expensive background update task. You just won't see the performance and reliability gains right away. The "index" in question is not at all an index in the traditional sense. (It's not accessible from SQL, you don't have queries against an index, etc.) Vertica, as you know, uses projections exclusively. The projection data is written out to multiple large, sorted ROS files. This "index" is simply a very small (a few bytes of header per many many records) table-of-contents of what's in a given ROS file, that we need in order to parse that file. The "reindex" is because we added a couple fields to that header, to enable new optimizations and to add some protection against disk corruption. For example, the index now contains a "CRC" checksum for the ROS file, so that you (and we) can detect disk corruption. But checksums only work if you know what they're supposed to be; we have to read and checksum existing ROS files to figure that out for them. The checksum, and the other new fields, are automatically added to new ROS containers. So as you add new data, and as Vertica merges old ROS files internally, they will automatically gain the new fields. Old files will not be updated in-place; no need to use all those system resources updating a perfectly-good ROS; they'll be used as-is until they're next merged or otherwise rewritten.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file