Query distribution

I'm running some intensive queries that take over 1 minute to complete on a 4 node cluster. When I look at my CPU usage on each node, I notice that only 1 node is really being used. What can I do to make sure my query gets distributed to all nodes? Things I've done: - ANALYZE_STATISTICS on all involved tables - UNSEGMENTED ALL NODES for small dimension tables - create SEGMENTED HASH projection for big fact table (1 billion rows) - looked at EXPLAIN output. I don't see any BROADCAST or REDISTRIBUTE of data Any help would be appreciated.

Comments

  • Hi Kris, Hm... I'm afraid this sort of question usually needs a little more detail in order to get to an answer. Would you be willing to share examples of any of your queries? (Indicating which tables in each query are UNSEGMENTED and which are/is SEGMENTED ALL NODES.) Or their EXPLAIN plans? Also, I'm assuming that the one node is the query initiator node (the node that you connected to in order to submit the query)? That's usually where bottlenecks occur. If not, it would be quite interesting. I would not expect to see any BROADCAST or REDISTRIBUTE of data in the setup that you describe -- I'm assuming that you queries include the big fact table? That table is, as you note, already distributed; no need to spend cycles shipping it around the network again, we just make use of the existing distribution. Some algorithms are fundamentally serial; even if the data is distributed, we have to pull it back to the initiator. The most common case is window functions -- functions with an "OVER ()" clause (and with nothing inside the parentheses) will always run serially on one node. The fix there is to put a PARTITION BY expression inside the OVER clause -- search the documentation for "ANALYTIC_FUNCTION" for details if this is relevant to you and you'd like to learn more. Also -- Vertica does assume that all nodes in the cluster have equivalent computational power. If this is not the case, there's no easy solution. (The fundamental problem is that different queries bottleneck on different system resources so we don't know how to distribute the data up front to compensate, and redistributing at runtime can be more expensive than just waiting for the slow node to finish.) But you can manually tweak the segmentation expression to adjust which node has how much data. Adam
  • Hey Adam, thanks for the reply. By the way, we briefly met at VLDB in Italy a couple of weeks ago. I was the Belgian guy. Don't know if you remember me. I'll post the exact query I ran when I'm back in the office on Monday. I changed my design a bit so I don't need joins anymore for the moment. But queries are still mainly run on one machine. It is something along the lines of this: SELECT DISTINCT field1, field2, field3 from big_input_table; I use these kind of queries to build the fact table and dimension tables. I typically run 10 of them in parallel. When firing 10 queries, and I look at 'top' output, one node has 1000% CPU usage, then 900%, 800%, ... 100% . Looks like each query uses one core on one node. And yes, all hardware is exactly the same for all the nodes. Strangely enough, t's not the initiator node that is doing all the work. I'll double-check that on Monday. Earlier, I did use an OVER() query, but now I don't need it any more when I query the big input table.
  • Ah, hey Kris! Right, good to hear from you. Hm... Definitely interesting that it's not the initiator doing the work. I'd be curious to hear more information. (Is it always the same node?) It's a known issue that some queries only use one CPU core per node (though typically on all nodes at once). A few other threads on these forums discuss it, including some things to try to work around it. It's a known issue; the last several releases have steadily been cutting down on the set of queries affected by it. Adam
  • I think I found the problem. I think my data segmentation is skewed. I've looked into some v_monitor tables, bumping into 'system_resource_usage' . I noticed that node 4 is constantly broadcasting data while the other 3 nodes are receiving data. At the same time, I see a 'LOCAL RESEGMENT GROUPS' going on in my EXPLAIN plan. I segmented my table on a hash of a column with around 30K different string values, for a 1 billion row table. Is that not enough? Should I ideally hash on a unique value per row? Thanks for all the help so far!
  • Hi Kris, Hm... Well, first I'd suggest verifying that your data is actually skewed (and by how much). Eugenia, a co-worker here, actually just posted on another thread about how to do that (it's just a SQL query): https://community.vertica.com/vertica/topics/how_many_nodes_in_an_ideal_cluster#reply_12934369 As for hashing, yes, it is generally preferable to hash on a value that is unique per-row. Hash distribution simply computes the hash and sends each record to the corresponding node. The hash function tries to give an even distribution over input values, but if you have ten million rows with the same string value, a hash distribution will be forced to put all ten million rows on the same node. (Otherwise we no longer have data locality, which is what the hash distribution enforces.) You can hash-segment on multiple columns, if that helps. Adam
  • select node_name, sum(used_bytes) from projection_storage group by 1; v_db_node0001 383457989898 v_db_node0002 383535148925 v_db_node0003 460492774037 v_db_node0004 460420355372 There are plenty of tables and schemes in our DB. Can I see somehow how one specific table is distributed? I'm recreating the input table with an IDENTIY column. I couldn't just add the IDENTITY column by the way. Strange limitation.
  • I just answered my own question: v_db_node0001 156249782 v_db_node0002 207114654 v_db_node0004 124035303 v_db_node0003 36596843000 That might be an issue :D

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file