Best practices to deal with event – “GROUP BY key set did not fit in memory, using external sort gro
Hi , What is the best practices to deal with event – “GROUP BY key set did not fit in memory, using external sort grouping” the suggested_action column include “Consider a sorted projection. Increase memory available to the plan.” , I am running Trickle Load using java into WOS with parallel 10 threads , only super projection existing . Thanks
0
Comments
In our vertica database, for one of the join query fired between a 100M fact table and 12k dimesion table, i can see the following message "SPI is disabled due to ineffectiveness" in the query_events table. Does it interpret, that it is disabled for this query or is it the case "it is disabled for the Vertica Cluster". If it is disabled for the cluster, is there a way we can enable it in order to check join performance? Also, is there a way through which we can check that SIP is being applied apart from the explain plan?
If you're seeing that message, that just means that it's disabled for the query. Other queries will continue to use SIP.
In general, Vertica will use SIP if it thinks it's useful, and it will disable SIP if it's just getting in the way. The EXPLAIN plan is probably the best way to tell whether a given query would try to use SIP if it were executed. For running queries, as you've noted, the query_events table also works.
Adam
For query:
select a.col1 from a,b where a.col2=b.col2;
explain=> Uses: Runtime Filter: (SIP4(HashJoin): a.col2),
So, the thing is my explain shows that it is using SIP:"Runtime Filters: (SIP1(HashJoin)SIP(" .But, in
query_events it is saying "it is disabled due to ineffectiveness". So, can we somehow check, that if SIP is being actually used.
Also, can some changes be made at the projection specific to the fact table to enforce SIP and then check it?
In order to do check SIP as per documentation or available material on web, i tried the following query:
select a.col1 from a,b where a.col2=b.col2 where b.col2=5; --passing additional predicate for less rows in hash table.
explain=> Instead of SIP hash join it did SIP Merge join, Any reasons why?
Query events table showed-> The optimizer created a transitive predicate due to a join condition: Informational; No user action is necessary.
Also is there any useful link for SIP Merge join. Or, is it similarly implemented for other DB's like Teradata?
Hm... I'm not familiar with SIP in Teradata. (Vertica's SIP doesn't describe how a join works; it is a mechanism that joins and other operators can use.) SIP is not something that you would turn on or not for a particular query; that doesn't even make sense for our implementation... These messages are informational only (they are sometimes used by our Customer Support to investigate certain types of issues); they're not something that you need to act on.
Perhaps you could explain your interest in SIP in more detail? Maybe in other systems it is important to tune their SIP mechanism manually?
Regarding checking whether SIP was actually used, you did check :-) As the message indicates, SIP was turned off at runtime in this case.
Vertica doesn't stop optimizing once it has a query plan. In this case, our optimizer decided that SIP might help, so it tried it initially; but the optimizer determined while running the query that SIP filtering was not accelerating the query in practice (given the actual data being fetched by the query), so it decided to stop using SIP.
Regarding the choice of hash vs merge join -- it sounds like this was a merge join, not a sort-merge join? (If you are coming from another database system, what you think of as a "merge join" is what we call a "sort-merge join", or a merge join with a sort operation attached.) Vertica stores its data pre-sorted. A merge join that doesn't have to sort the data first (because it is already sorted on disk) is very fast, much faster than a hash join; it also uses dramatically less memory. So we generally prefer it. In some situations, we may prefer a sort-merge join to a hash join too; generally if it is advantageous for some reason to have the data in sorted order after the join.
Adam
There is undocumented method to disable SIP ( use it only for testing ) on session level , so you will be able to know how SIP impact on your query performance
Disable
SELECT add_vertica_options('BASIC', 'DISABLE_SIPS');
Enable
SELECT clr_vertica_options('BASIC', 'DISABLE_SIPS');
Is there a way to find such functions? I have seen many such non documented (in vertica documentation) functions on forums. Would be really helpful, if you could share some links?