Why does the query slow down after partitioning?
jiahe1224
Vertica Customer ✭
Best Answers
-
Bryan_H Vertica Employee Administrator
EXPLAIN query plan might show a difference in processing. However, I see two factors that generally slow down queries:
Column is not compressed or encoded to optimize storage ("ZSTD_FAST_COMP" is recommended for varchar)
Column is not ORDER BY to optimize predicate search ("ORDER BY lac_cell" will improve WHERE clause)
EXPLAIN may show additional factors that affect run time.0 -
SruthiA Vertica Employee Administrator
@jiahe1224 yes. better to use RLE encoding if deal_date column is a low cardinality column
0
Answers
I did not write order by lac_cell when the first table was created, and I wrote order by lac_cell for the next two tables, but the projection shows that the lac_cell of the three tables are all sorted. In addition, for encoding, all super projections all three columns are auto encoded, my deal_date is low cardinality, maybe I should use RLE encoding?
Thanks again for your answer! This forum is great, I don't have any friends around me who can ask questions.
deal_date column is a low cardinality integer column
Thank you for your answer, I would like to ask again, what encoding is suitable for unsorted char fields with high cardinality? Actually the field is phone number
ZSTD_FAST_COMP is recommended for unsorted types.
I would like to ask you another question, when do I need to put the column in the order by? Suppose I have a query like this,
SELECT col3
FROM table_tmp
WHERE col1 = value1
AND col2 IN (....);
col1 is the column used for partitioning, and table_tmp only reserves one partition(In this case, is col1=value1 unnecessary in the where statement? Because when I insert the new partition data, the original partition will be deleted immediately. These are automatically executed by bash, and the query request is unlikely to encounter the situation that there are two partitions in the table.). When I manually create a projection, should I put col1 and col2 or only col2 in the order by statement?
It might not be necessary in all cases, however I would include the partition column in the sort order because this will help the optimizer when planning operations across partitions, for example date ranges or multiple predicates, and will also help in the case where the partition column is used for JOIN.
In my specific business scenario, if there are multiple partitions in the table, col1=value1 needs to be added, because only the data of the latest partition is required. Generally, it is better to add the partition column to the order by statement of projection, but only one partition is reserved in my table, is it still necessary?
SELECT col3
FROM table_tmp
WHERE col1 = value1
AND col2 IN (....);
Sorting is not required in actual query, but there must be sorting in projection. As you said, the value of col1 does not need to be limited, only the value of col2 needs to be limited
Projection sorting also helps with other filters, join and group by operations.
Considering your table is partitioned, did you take the chance to look at our range partition projection feature?
It builds the projection only on the last XX partitions, where you define the XX, could be a relative time from now.