Live Aggregate Projection Limitations
mferrall
Vertica Customer ✭
I have a table with the following schema
utc_time timestamptz, -- this is always stored at the hourly grain in UTC dim1 int, dim2 varchar(150), metric int
Data notes
- Dim 2 is very high cardinality
Use cases
- Frequently summarizing data for Dim1 by day at arbitrary time zones
- Less often, drilling down into data at the day > dim1 > dim 2 level
I created a Live Aggregate Projection, as it seemed like it'd serve use cases 1 and 2 pretty well. Its defined
select utc_time, dim1 sum(metric) as metric group by utc_time, dim1
However, when I query it like below, Vertica says it is infeasible to use the LAP (tested by providing a hint). Use of the data looks like
select date(utc_time at ?timezone) as day, dim1, sum(metric) as total group by 1, 2
Question
- Is Vertica able to use Live Aggregate Projections when you group by an expression like aggregating at a date level?
- If it is, what am I doing wrong?
- If it can not, would you model the data differently? What other tooling does Vertica offer to support a use case like this?
0
Answers
I'm wondering if the timezone designation is throwing it off. Timezone isn't immutable, so it could be problematic here.
Is it possible to store the data with the timezone so that you don't have to do the conversion during the query?
I just ran a quick test to see if timestamptz is whats throwing it off, converting it to timestamp. I found the same behavior, when I group by
utc_time, dim1
, it uses the LAP, when I group bydate(utc_time),dim1
, it does not. The time is the first column in the LAP group by. When I provide a hint that it should use the LAP, it says its infeasible. When I specify the LAP directly (in the from, like a table) the query runs with either grouping.To the second part of your comment, are you suggesting that I store date and timezone as separate columns, so it'd be a matter of filtering to pre-derived dates for a time zone of interest? That does make sense, we just have a huge number of timezones that we support. That might be our next step, I was just hoping for something more flexible, and was hoping I had more tooling to support this. If you (or others) have other thoughts on how to test this further, I'm all ears!
It's a little hard to debug over a forum like this. I'd need to dig deeper to figure out what's going on here, but I would suggest you open a support case to ensure there's nothing going on behind that could be causing it not to behave as you expect.
Thank you for your input, I'll look into opening a ticket
If the query groups by something else than the grouping expression of the Live Aggregate Projection, then the LAP won't be used.
The LAP groups by:
utc_time
, dim1
The query groups by:
date(utc_time at ?timezone)
, dim1
The optimiser has no reason to believe that the data in the LAP is partially pre-aggregated to satisfy the query.
@marcothesane This remains true even when the field I've aggregated away,
dim2
in this example, is very high cardinality? This isn't intuitive to me, but I'll think about it more.Beyond that, does Vertica offer any tooling to improve performance for a use case like I've described, repeated below