We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Live Aggregate Projections to calculate EPS in a system — Vertica Forum

Live Aggregate Projections to calculate EPS in a system

Hi,
I am trying to create live aggregate projections with a group by clause and then call a select to optimize the query execution. I can see that the projection gets created but when I call the EXPLAIN on my select query, I dont see it is using the created projection. Could you please help.

PROJECTION Statement:
create projection events_eps (id, deviceReceiptTime, count) as select id, deviceReceiptTime, count(id) from events group by id, deviceReceiptTime;

SELECT statement:
EXPLAIN select cast(deviceReceiptTime / 3600000 as integer) as time, count(id) from events where deviceReceiptTime>(EXTRACT(EPOCH from '2019-04-06T15:05:18.659Z'::TIMESTAMPTZ) - 1366518)1000 and deviceReceiptTime<(EXTRACT(EPOCH from '2019-04-06T15:05:18.659Z'::TIMESTAMPTZ)) 1000 group by deviceReceiptTime/3600000;

EXPLAIN Result:
Access Path:
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 8M, Rows: 3M] (PATH ID: 1)
| Aggregates: count(events.id)
| Group By: (events.deviceReceiptTime / 3600000)
| +---> STORAGE ACCESS for events [Cost: 7M, Rows: 1B] (PATH ID: 2)
| | Projection: events_super
| | Materialize: events.deviceReceiptTime, events.id
| | Filter: ((events.deviceReceiptTime > 1553196600659.000000) AND (events.deviceReceiptTime < 1554563118659.000000))

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    The problem is your GROUP BY in the SELECT is different than the GROUP BY in the LAP.

    Example:

    dbadmin=> create schema investigation;
    CREATE SCHEMA
    
    dbadmin=> create table investigation.events ( id int, deviceReceiptTime number(25, 6));
    CREATE TABLE
    
    dbadmin=> insert into investigation.events select 1, 1553196600.659000;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> insert into investigation.events select 1, 1553196600.659000;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> insert into investigation.events select 1, 1553196600.659000;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> commit;
    COMMIT
    
    dbadmin=> create projection events_eps (id, deviceReceiptTime, count) as select id, deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime;
    WARNING 6852:  Live Aggregate Projection "events_eps" will be created for "events". Data in "events" will be neither updated nor deleted
    WARNING 4468:  Projection <investigation.events_eps> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
              The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
    CREATE PROJECTION
    dbadmin=> select refresh('investigation.events');
                                                                                                                                                    refresh                    
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Refresh completed with the following outcomes:
    Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
    ----------------------------------------------------------------------------------------
    "investigation"."events_eps": [events] [refreshed] [scratch] [0] [0]
    
    (1 row)
    
    dbadmin=> select analyze_statistics('investigation.events');
     analyze_statistics
    --------------------
                      0
    (1 row)
    

    The follow SQL statements will use the LAP:

    dbadmin=> explain select id, deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime;
    
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select id, deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime;
    
     Access Path:
     +-GROUPBY PIPELINED [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
     |  Aggregates: count(events_eps.count)
     |  Group By: events_eps.id, events_eps.deviceReceiptTime
     | +---> STORAGE ACCESS for investigation.events_eps (Rewritten LAP) [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
     | |      Projection: investigation.events_eps
     | |      Materialize: events_eps.id, events_eps.deviceReceiptTime, events_eps.count
     ------------------------------
    
    dbadmin=> explain select deviceReceiptTime, count(id) from investigation.events group by deviceReceiptTime;
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select deviceReceiptTime, count(id) from investigation.events group by deviceReceiptTime;
    
     Access Path:
     +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
     |  Aggregates: count(events_eps.count)
     |  Group By: events_eps.deviceReceiptTime
     | +---> STORAGE ACCESS for investigation.events_eps (Rewritten LAP) [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
     | |      Projection: investigation.events_eps
     | |      Materialize: events_eps.deviceReceiptTime, events_eps.count
     ------------------------------
    

    But the next one will not as it has a calculation on a GROUP BY column:

    dbadmin=> explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
    
     Access Path:
     +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 35, Rows: 1] (PATH ID: 1)
     |  Aggregates: count(events.id)
     |  Group By: (events.deviceReceiptTime / 3600000)
     | +---> STORAGE ACCESS for events [Cost: 34, Rows: 3] (PATH ID: 2)
     | |      Projection: investigation.events_super
     | |      Materialize: events.id, events.deviceReceiptTime
     ------------------------------
    

    You need to create a LAP with a matching GROUP BY:

    dbadmin=> create projection events_eps2 (id, deviceReceiptTime, count) as select id, deviceReceiptTime / 3600000 deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime / 3600000;
    WARNING 6852:  Live Aggregate Projection "events_eps2" will be created for "events". Data in "events" will be neither updated nor deleted
    WARNING 4468:  Projection <investigation.events_eps2> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
              The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
    CREATE PROJECTION
    
    dbadmin=> select refresh('investigation.events');
                                                                                                                                                     refresh                   
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Refresh completed with the following outcomes:
    Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
    ----------------------------------------------------------------------------------------
    "investigation"."events_eps2": [events] [refreshed] [scratch] [0] [0]
    
    (1 row)
    
    dbadmin=> select analyze_statistics('investigation.events');
     analyze_statistics
    --------------------
                      0
    (1 row)
    

    Now the query will use the new LAP:

    dbadmin=> explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
    
     Access Path:
     +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
     |  Aggregates: count(events_eps2.count)
     |  Group By: events_eps2.deviceReceiptTime
     | +---> STORAGE ACCESS for investigation.events_eps2 (Rewritten LAP) [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
     | |      Projection: investigation.events_eps2
     | |      Materialize: events_eps2.deviceReceiptTime, events_eps2.count
     ------------------------------
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file