Are queries against default super projections distributed?

The documentation seems to indicate that those queries can still be distributed: "ALL NODES—Automatically replicates the unsegmented projection on each node. To perform distributed query execution, Vertica requires an unsegmented copy of each small table superprojection on each node." However, we've noticed that when the query planner uses a default super projection (UNSEGMENTED ALL NODES) for a query, the I/O and CPU load doesn't appear to get distributed across multiple nodes in the cluster. Is this expected? Can queries only be parallelized when they hit a projection which has been segmented by hash or range? Thanks, Emanuel Pordes

Comments

  • Hi Emanuel, that's correct -- queries that are exclusively against unsegmented projections are not parallelized. Vertica can automatically update auto-projections to optimize for parallel execution (including adding segmentation as needed, etc). Just run the "Database Designer" tool in adminTools. It tells the server that you have loaded a representative sample of your data and have run a representative sample of typical queries, and instructs it to go optimize accordingly. The key word in that bit of documentation is "small". In data warehousing, it's common to think about model schemas as having one very large table (the "fact" table) and a variety of very-small tables (the "dimension" tables). The large table is traditionally always segmented; it's too big not to segment. Then you store all the small tables unsegmented so that scans across the large table can occur in parallel on all nodes and always have all data in the small tables available locally regardless of what joins they may want to perform. Though, of course, this is just one possible layout of a database.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file