vertica joins weird behaviour
I have two tables
facts_ns_events with around 250 million rows
rule_results with around 5 million rows
both these tables JOINed on a common field flow_id. when I'm trying to do an INNER JOIN query it's very slow.
select rule_results.new_item_value from rule_results JOIN facts_ns_events ON rule_results.flow_id = facts_ns_events.flow_id and rule_results.new_item_type='xyz';
took around 100 seconds
However if in the same query I refer a different column then its get executed extremely fast (.3 seconds)
select rule_results.flow_id from rule_results JOIN facts_ns_events ON rule_results.flow_id = facts_ns_events.flow_id and rule_results.new_item_type='xyz';
How can a different column refer can make such a huge difference?
flow_id is int and new_item_value is varchar(1024).
varchar(1024) is going to be inherently slower than an int. That's true in any database. But the more likely reason is that it's not sorted on that column, and it is sorted on flow_id. We'd need to know the projection definitions to know for sure. If you want to boost the performance of this, try adding new_item_value into the order by clause of the projection definition. Or just run the query through database designer, and see what it suggests.5
Thanks for the reply
Ok, I'll try creating a projection but still, 100 seconds difference is very huge for such small tables. I'll update here again
I tried creating a projection which database designer suggested, now the query returns very fast.
Thank You for your help.
I was thinking about this, and I think I know what's happening. Run an EXPLAIN on both versions. In a columnar database, there is a cost for selecting columns. The cost of selecting an int column is far less than selecting a varchar(1024) column. What's probably happening here is that it may be reversing the inner/outer join between the two version. The column is coming from the same table, soit might not be, but it's possible. An explain would show you the difference between the two.
Yes, Vertica is reversing the inner/outer join between the two versions according to the EXPLAIN command. What is the difference between the two? Is there any option to hint the Vertica to use the same strategy(int) for the varchar version?