Can this be completed in a single query?
I have two tables. It is much more complex data than the example but I am trying to simplify it.
LUNCH - DINNER with the same columns.
LUNCH Table:
Date | LogTime | Who | Color | Food | Started | Finished
2014-01-01 | 09:00:00 | Bill | Red | Beets | 09:01:00 | 10:00:00
2014-01-01 | 09:15:00 | Tom | Red | Tomatos | 09:16:00 | 09:45:00
2014-02-02 | 10:00:00 | Bill | Blue | Beets | 10:15:00 | 10:45:00
DINNER Table:
Date | LogTime | Who | Color | Food | Started | Finished
2014-01-01 | 22:00:00 | Bill | Red | Chicken | 22:01:00 | 23:00:00
2014-01-01 | 21:15:00 | Tom | Red | Turkey | 21:16:00 | 21:45:00
2014-02-03 | 13:00:00 | Bill | Red | Stew | 14:15:00 |14:45:00
I am trying to output data grouped by WHO column. Example of what I am trying to achieve.
WHO | Total | Total_red | Total_Blue | Total_Beets | Total_Tomatos | Total_Turkey
Bill 4 3 1 2 0 0
Total_Chicken | Last_Lunch_Date| Last_Lunch_Color | Last_Dinner_Date | Last_Dinner_Color
1 2014-02-02 Blue 2014-02-03 Red
Can this be done in a single query?
Actual Table is about 600 Million rows
LUNCH - DINNER with the same columns.
LUNCH Table:
Date | LogTime | Who | Color | Food | Started | Finished
2014-01-01 | 09:00:00 | Bill | Red | Beets | 09:01:00 | 10:00:00
2014-01-01 | 09:15:00 | Tom | Red | Tomatos | 09:16:00 | 09:45:00
2014-02-02 | 10:00:00 | Bill | Blue | Beets | 10:15:00 | 10:45:00
DINNER Table:
Date | LogTime | Who | Color | Food | Started | Finished
2014-01-01 | 22:00:00 | Bill | Red | Chicken | 22:01:00 | 23:00:00
2014-01-01 | 21:15:00 | Tom | Red | Turkey | 21:16:00 | 21:45:00
2014-02-03 | 13:00:00 | Bill | Red | Stew | 14:15:00 |14:45:00
I am trying to output data grouped by WHO column. Example of what I am trying to achieve.
WHO | Total | Total_red | Total_Blue | Total_Beets | Total_Tomatos | Total_Turkey
Bill 4 3 1 2 0 0
Total_Chicken | Last_Lunch_Date| Last_Lunch_Color | Last_Dinner_Date | Last_Dinner_Color
1 2014-02-02 Blue 2014-02-03 Red
Can this be done in a single query?
Actual Table is about 600 Million rows
0
Comments
>>Can this be done in a single query?
Take a look here: http://www.vertica-forums.com/viewtopic.php?f=48&t=849&p=2648
PS
I didn't tested but looks like Sajan's solution will be better (faster).
Sajan, I had exactly what you had MINUS the 'lunch/dinner' as Meal. Something so simple and it solved a lot! Thank you!
Going to make it harder.. I guess this was too easy.
Add another column called "Max_Consecutive_Times_WHO_ate_Beets" - Can that be done adding it to this one query?
Create view vw_lunch_dinner as
select Date,LogTime ,Who,Color,Food ,Started ,Finished,'lunch' as meal from lunch,,case when food='Beets' then (count(1) over (partition by who,food order by dt,logtime)) else 0 end as consec_beets_count
union all
select Date,LogTime ,Who,Color,Food ,Started ,Finished,'dinner' as meal,case when food='Beets' then (count(1) over (partition by who,food order by dt,logtime)) else 0 end as consec_beets_count from dinner;
Now that the analytic count function has the consecutive occurrence calculated as part of the view statement, run the following to get the max alongside other values:
Select who, count(*) as total, sum(case when colow='Red' then 1 else 0 end) as total_red, sum(case when food='Beets' then 1 else 0 end) as Total_Beets, max(case when meal='lunch' then date else null end) as last_lunch_date, max(consec_beets_count) as max_consec_beets_count
from vw_lunch_dinner
group by who;
A couple of points of note:
- If you want to account for the consecutive counts between the two tables that make up the view, this approach will not work, and you will have to nest an additional layer to run the analytic count on the unioned view and then do a max.
- The Analytical function is likely to slow down your query at the scale you want to run. If you expect to run this frequently, you should consider creating a projection to optimize this function.
/SajanJust 1 more... promise!
Adding three columns. Consecutive_Times_at_Dinner and Consecutive_Times_at_Lunch
Last (DESC)
Example
Lunch
01-01 Bill
01-02 Bill
01-04 Bill
Dinner
01-03 Bill
01-05 Bill
01-06 Bill
Query result would be:
Consecutive_Times_at_Dinner = 2
Consecutive_Times_at_Lunch = 1
Last = Dinner
vs.
Lunch
01-02 Bill
01-04 Bill
01-06 Bill
01-08 Bill
Dinner
01-01 Bill
01-03 Bill
01-05 Bill
01-07 Bill
Query result would be:
Consecutive_Times_at_Dinner = 1
Consecutive_Times_at_Lunch = 1
Last = Lunch