Live Aggregate Projections to calculate EPS in a system

akumargaraiakumargarai Registered User
edited April 7 in Vertica Forum

Hi,
I am trying to create live aggregate projections with a group by clause and then call a select to optimize the query execution. I can see that the projection gets created but when I call the EXPLAIN on my select query, I dont see it is using the created projection. Could you please help.

PROJECTION Statement:
create projection events_eps (id, deviceReceiptTime, count) as select id, deviceReceiptTime, count(id) from events group by id, deviceReceiptTime;

SELECT statement:
EXPLAIN select cast(deviceReceiptTime / 3600000 as integer) as time, count(id) from events where deviceReceiptTime>(EXTRACT(EPOCH from '2019-04-06T15:05:18.659Z'::TIMESTAMPTZ) - 1366518)1000 and deviceReceiptTime<(EXTRACT(EPOCH from '2019-04-06T15:05:18.659Z'::TIMESTAMPTZ)) 1000 group by deviceReceiptTime/3600000;

EXPLAIN Result:
Access Path:
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 8M, Rows: 3M] (PATH ID: 1)
| Aggregates: count(events.id)
| Group By: (events.deviceReceiptTime / 3600000)
| +---> STORAGE ACCESS for events [Cost: 7M, Rows: 1B] (PATH ID: 2)
| | Projection: events_super
| | Materialize: events.deviceReceiptTime, events.id
| | Filter: ((events.deviceReceiptTime > 1553196600659.000000) AND (events.deviceReceiptTime < 1554563118659.000000))

Comments

  • Jim_KnicelyJim_Knicely Administrator, Moderator, Employee, Registered User, VerticaExpert

    Hi,

    The problem is your GROUP BY in the SELECT is different than the GROUP BY in the LAP.

    Example:

    dbadmin=> create schema investigation;
    CREATE SCHEMA
    
    dbadmin=> create table investigation.events ( id int, deviceReceiptTime number(25, 6));
    CREATE TABLE
    
    dbadmin=> insert into investigation.events select 1, 1553196600.659000;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> insert into investigation.events select 1, 1553196600.659000;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> insert into investigation.events select 1, 1553196600.659000;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> commit;
    COMMIT
    
    dbadmin=> create projection events_eps (id, deviceReceiptTime, count) as select id, deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime;
    WARNING 6852:  Live Aggregate Projection "events_eps" will be created for "events". Data in "events" will be neither updated nor deleted
    WARNING 4468:  Projection <investigation.events_eps> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
              The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
    CREATE PROJECTION
    dbadmin=> select refresh('investigation.events');
                                                                                                                                                    refresh                    
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Refresh completed with the following outcomes:
    Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
    ----------------------------------------------------------------------------------------
    "investigation"."events_eps": [events] [refreshed] [scratch] [0] [0]
    
    (1 row)
    
    dbadmin=> select analyze_statistics('investigation.events');
     analyze_statistics
    --------------------
                      0
    (1 row)
    

    The follow SQL statements will use the LAP:

    dbadmin=> explain select id, deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime;
    
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select id, deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime;
    
     Access Path:
     +-GROUPBY PIPELINED [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
     |  Aggregates: count(events_eps.count)
     |  Group By: events_eps.id, events_eps.deviceReceiptTime
     | +---> STORAGE ACCESS for investigation.events_eps (Rewritten LAP) [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
     | |      Projection: investigation.events_eps
     | |      Materialize: events_eps.id, events_eps.deviceReceiptTime, events_eps.count
     ------------------------------
    
    dbadmin=> explain select deviceReceiptTime, count(id) from investigation.events group by deviceReceiptTime;
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select deviceReceiptTime, count(id) from investigation.events group by deviceReceiptTime;
    
     Access Path:
     +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
     |  Aggregates: count(events_eps.count)
     |  Group By: events_eps.deviceReceiptTime
     | +---> STORAGE ACCESS for investigation.events_eps (Rewritten LAP) [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
     | |      Projection: investigation.events_eps
     | |      Materialize: events_eps.deviceReceiptTime, events_eps.count
     ------------------------------
    

    But the next one will not as it has a calculation on a GROUP BY column:

    dbadmin=> explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
    
     Access Path:
     +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 35, Rows: 1] (PATH ID: 1)
     |  Aggregates: count(events.id)
     |  Group By: (events.deviceReceiptTime / 3600000)
     | +---> STORAGE ACCESS for events [Cost: 34, Rows: 3] (PATH ID: 2)
     | |      Projection: investigation.events_super
     | |      Materialize: events.id, events.deviceReceiptTime
     ------------------------------
    

    You need to create a LAP with a matching GROUP BY:

    dbadmin=> create projection events_eps2 (id, deviceReceiptTime, count) as select id, deviceReceiptTime / 3600000 deviceReceiptTime, count(id) from investigation.events group by id, deviceReceiptTime / 3600000;
    WARNING 6852:  Live Aggregate Projection "events_eps2" will be created for "events". Data in "events" will be neither updated nor deleted
    WARNING 4468:  Projection <investigation.events_eps2> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
              The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
    CREATE PROJECTION
    
    dbadmin=> select refresh('investigation.events');
                                                                                                                                                     refresh                   
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Refresh completed with the following outcomes:
    Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
    ----------------------------------------------------------------------------------------
    "investigation"."events_eps2": [events] [refreshed] [scratch] [0] [0]
    
    (1 row)
    
    dbadmin=> select analyze_statistics('investigation.events');
     analyze_statistics
    --------------------
                      0
    (1 row)
    

    Now the query will use the new LAP:

    dbadmin=> explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     explain select deviceReceiptTime / 3600000, count(id) from investigation.events group by deviceReceiptTime / 3600000;
    
     Access Path:
     +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
     |  Aggregates: count(events_eps2.count)
     |  Group By: events_eps2.deviceReceiptTime
     | +---> STORAGE ACCESS for investigation.events_eps2 (Rewritten LAP) [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
     | |      Projection: investigation.events_eps2
     | |      Materialize: events_eps2.deviceReceiptTime, events_eps2.count
     ------------------------------
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file