How does vertica perform when there are multiple conditions in the where statement?
- filter in order like a funnel, the result that satisfies the latter condition must satisfy the former condition, and the latter is a subset of the former
- or get the results of each condition separately and then take the intersection of them?
This is a sample. From the results, it seems that the second screening.
https://drive.google.com/file/d/1vftzq4ZVHv4m1LerQDjPZmhV6Oj-vxon/view?usp=share_link
Best Answers
-
Bryan_H Vertica Employee Administrator
Vertica will attempt to prune predicates and reduce the number of rows as early as possible in the query plan. Check the query plan to see order of operations.
In this case, there are two subqueries present. Thus it is necessary to exercise option 2 and get results of each subquery.
There may be an advantage: each subquery can use a separate projection. It may be possible to improve overall query time with a projection on tb_lac_cell_code order by up_area_code, or a projection on tb_ads_timely with msisdn RLE encoded and order by msisdn, report_time to improve select/group and min/max calculations.
Another possible optimization might be to join the tables on lac_cell and select on up_area_code that way to eliminate a subquery, or flatten the table to add up_area_code to tb_ads_timely and avoid a subquery or join.0 -
Bryan_H Vertica Employee Administrator
Vertica filters by pushing predicates as close to the scan operator as possible and also operating on smallest to largest table based on statistics. Other cost optimizations are also applied. The best way to identify Vertica optimizations is with EXPLAIN. It is usually also possible to identify further optimizations by examining plan operators.
Without seeing the plan for this query, I can't see additional improvements in the SQL, but I would guess the following projection designs would help:- ORDER BY a.msisdn and b.serv_number to improve JOIN performance
- ORDER BY a.report_time to improve GROUP BY / HAVING performance
0
Answers
I saw a statement that MySQL will do a funnel filter from top to bottom, if there is no subquery in the where clause, will vertica filter like MySQL does?
Ask more, is there space for optimization in the following statement?
SELECT a.msisdn
FROM cloud_financial.tb_ads_timely_user_today_943 a
JOIN cloud_financial.tb_ele_return_list_d_943 b
ON a.msisdn = b.serv_number
GROUP BY a.msisdn
HAVING DATEDIFF(second, MIN(a.report_time), MAX(a.report_time)) > 14400;