Can this be completed in a single query?

I have two tables. It is much more complex data than the example but I am trying to simplify it.

LUNCH - DINNER with the same columns.

LUNCH Table:
Date             | LogTime  | Who | Color | Food       | Started   | Finished
2014-01-01  | 09:00:00  | Bill    | Red   | Beets      | 09:01:00 | 10:00:00
2014-01-01  | 09:15:00  | Tom |  Red   | Tomatos | 09:16:00 | 09:45:00
2014-02-02  | 10:00:00  | Bill   |  Blue   | Beets     |  10:15:00 | 10:45:00

DINNER Table:
Date             | LogTime  | Who | Color | Food       | Started   | Finished
2014-01-01  | 22:00:00  | Bill    | Red   | Chicken  | 22:01:00 | 23:00:00
2014-01-01  | 21:15:00  | Tom |  Red   | Turkey    | 21:16:00 | 21:45:00
2014-02-03  | 13:00:00  | Bill   |  Red   | Stew       |  14:15:00 |14:45:00

I am trying to output data grouped by WHO column. Example of what I am trying to achieve.

WHO | Total | Total_red | Total_Blue | Total_Beets | Total_Tomatos | Total_Turkey
Bill        4              3                1                    2                          0                     0

Total_Chicken | Last_Lunch_Date| Last_Lunch_Color | Last_Dinner_Date | Last_Dinner_Color
           1                    2014-02-02           Blue                       2014-02-03              Red


Can this be done in a single query?
Actual Table is about 600 Million rows

Comments

  • Since the two tables have the same columns, you can create a view with a union of the two tables and then run your queries on the view. The view is just a convenience here, you can put the union or union all inline as well. Once you have the view, you can use aggregate functions to get the required results: Create view vw_lunch_dinner as select Date,LogTime ,Who,Color,Food ,Started ,Finished,'lunch' as meal from lunch union all select Date,LogTime ,Who,Color,Food ,Started ,Finished,'dinner' as meal from dinner; Now that you have the data as a single view, you can query as if the data was in a single table: Select who, count(*) as total, sum(case when colow='Red' then 1 else 0 end) as total_red, sum(case when food='Beets' then 1 else 0 end) as Total_Beets, max(case when meal='lunch' then date else null end) as last_lunch_date from vw_lunch_dinner group by who; /Sajan
  • Hi!

    >>Can this be done in a single query?
    Take a look here: http://www.vertica-forums.com/viewtopic.php?f=48&t=849&p=2648

    PS
    I didn't tested but looks like Sajan's solution will be better (faster).


  • Thank you for the replies!!

    Sajan, I had exactly what you had MINUS the 'lunch/dinner' as Meal. Something so simple and it solved a lot! Thank you!

    Going to make it harder.. I guess this was too easy.
    Add another column called "Max_Consecutive_Times_WHO_ate_Beets" - Can that be done adding it to this one query?



  • You've definitely up'ed the level :). Change the view to add an extra column:


    Create view vw_lunch_dinner as 
    select Date,LogTime ,Who,Color,Food ,Started ,Finished,'lunch' as meal from lunch,,case when food='Beets' then (count(1) over (partition by who,food order by dt,logtime)) else 0 end as consec_beets_count 
    union all 
    select Date,LogTime ,Who,Color,Food ,Started ,Finished,'dinner' as meal,case when food='Beets' then (count(1) over (partition by who,food order by dt,logtime)) else 0 end as consec_beets_count from dinner; 

    Now that the analytic count function has the consecutive occurrence calculated as part of the view statement, run the following to get the max alongside other values:

    Select who, count(*) as total, sum(case when colow='Red' then 1 else 0 end) as total_red, sum(case when food='Beets' then 1 else 0 end) as Total_Beets, max(case when meal='lunch' then date else null end) as last_lunch_date, max(consec_beets_count) as max_consec_beets_count 
    from vw_lunch_dinner 
    group by who; 


    A couple of points of note:
    • If you want to account for the consecutive counts between the two tables that make up the view, this approach will not work, and you will have to nest an additional layer to run the analytic count on the unioned view and then do a max.
    • The Analytical function is likely to slow down your query at the scale you want to run. If you expect to run this frequently, you should consider creating a projection to optimize this function.
    /Sajan


  • Awesome Sajan!! You are on a ROLL!
    Just 1 more... promise!

    Adding three columns. Consecutive_Times_at_Dinner and Consecutive_Times_at_Lunch
    Last (DESC)

    Example

    Lunch
    01-01 Bill
    01-02 Bill
    01-04 Bill

    Dinner
    01-03 Bill
    01-05 Bill
    01-06 Bill

    Query result would be:
    Consecutive_Times_at_Dinner = 2
    Consecutive_Times_at_Lunch = 1
    Last = Dinner

    vs.

    Lunch
    01-02 Bill
    01-04 Bill
    01-06 Bill
    01-08 Bill

    Dinner
    01-01 Bill
    01-03 Bill
    01-05 Bill
    01-07 Bill

    Query result would be:
    Consecutive_Times_at_Dinner = 1
    Consecutive_Times_at_Lunch = 1
    Last = Lunch
  • Anyone for the last bit of the question?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file