Can Smallish ref tables have full data on each node? To improve join performance?

What is the best way to define tables (specifically the partitions) for reference tables. Since these tables will be used often in joins on tables with various (mostly date based) partition clause. I worry about the effect on performance. Many times we try to flatten tables in vertica but sometimes this is impractical and we want a join.

Comments

  • Hi Jim, If you want a table to be stored in full on every node, simply create an UNSEGMENTED projection for it. Note that Vertica's Database Designer (in adminTools) will set these up for you automatically/correctly, based on a sample of your queries. We generally discourage creating projections manually: Unless you have a whole lot of experience with the technology behind column-store databases, the DBD will likely automatically come up with a much better projection design than you'd do manually. There are many little subtle bits to get right, that can have large performance impacts. Also, regarding flattening tables: In Vertica, particularly if you've run the DBD and/or manually gotten your table sort orders right, joins can be dramatically faster than in other database systems. So, if you have some free time, I would encourage you to try out a combination of the DBD and some quick performance tests to see how much you're really getting by flattening your tables. Adam
  • Just to confirm. You are saying that the table partition will not affect the join performance. While an unsegmented projection can benefit join performance. DBD is a good idea and will gen these for me (or I should expect that it may often do so). FYI one reason we flatten tables is to lock in the join results. We keep data for a long time and changes in schema and join values are problematic for older data. Its not just for performance.
  • Table partitioning may also affect join performance, but the effect would be very data-dependent. For example, if you have a predicate on your JOIN and Vertica can easily prove (based on the partitioning clause) that no records in a given partition will match the predicate, then Vertica won't evaluate that partition. (Of course, if you make your partitions too small, then running this proof becomes more expensive than just scanning the partition. Particularly since partitions are also sorted, so we can check the contents with a quick binary search. Partitioning is intended to be a very coarse mechanism; sorting gets the more fine-grained stuff.) An unsegmented projection can help with join performance in some cases, though, yes. Again, depends on the data and query. If you want to lock in join results, you might want to look into Pre-Join Projections. They effectively store data the way that you're storing it now, but present it as separate tables (so your application logic doesn't have to change). The DBD will not automatically create these because they have a significant impact on load performance, and the improvement in query performance is not always significant. But if you're only using them for historic data, it's possible that they'd be of use to you.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file