Optimize projection for delete
Hi, I have a (big) table (23B rows right now) which is meant to store only 3 months of data, without duplicates. We thus run 2 delete statements daily or more: - delete where date < 90 days - delete duplicates I am trying to optimise those steps which are taking way too long. Info: Vertica 6.1.0, 5 nodes, k-safe=2 Table (with only the relevant columns displayed to limit the size of the text):
CREATE TABLE analytics.set_pageview ( visitorId int, sessionId int, pageInSession int, pageType varchar(32), siteId int, channelId int, pageviewDate date, [14 extra rows] ); CREATE PROJECTION analytics.set_pageview_date /*+createtype(L)*/ ( visitorId, sessionId, pageInSession, pageType, siteId ENCODING RLE, channelId ENCODING RLE pageviewDate ENCODING RLE, [14 extra columns] ) AS SELECT set_pageview.visitorId, set_pageview.sessionId, set_pageview.pageInSession, set_pageview.pageType, set_pageview.siteId, set_pageview.channelId, set_pageview.pageviewDate, [14 extra columns] FROM analytics.set_pageview ORDER BY set_pageview.pageviewDate, set_pageview.visitorId, set_pageview.sessionId, set_pageview.pageInSession, set_pageview.siteId, set_pageview.channelId, set_pageview.pageType SEGMENTED BY hash(set_pageview.pageviewDate, set_pageview.channelId, set_pageview.siteId, set_pageview.pageType, set_pageview.pageInSession, set_pageview.visitorId, set_pageview.sessionId) ALL NODES KSAFE 2;As you can see, I forced the RLE on a few columns which I know are low cardinality (less than 1000 rows, 90 only for page date). Then the 2 statements I am running are:
explain delete from analytics.set_pageview WHERE pageviewDate < CURRENT_DATE() - 90; Access Path: +-DML DELETE [Cost: 0, Rows: 0] | Target Projection: analytics.set_pageview_gui_date_b2 | Target Projection: analytics.set_pageview_gui_date_b1 | Target Projection: analytics.set_pageview_gui_date_b0 | Target Prep: | Execute on: All Nodes | +---> STORAGE ACCESS for <NO alias=""> [Cost: 8M, Rows: 921M (NO STATISTICS)] (PATH ID: 1) | | Projection: analytics.set_pageview_gui_date_b1 | | Materialize: set_pageview_gui.pageviewDate, set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession, set_pageview_gui.siteId, set_pageview_gui.channelId, set_pageview_gui.pageType, set_pageview_gui.globalUserId, set_pageview_gui.portalVersion, set_pageview_gui.pageId, set_pageview_gui.pageTypeDetail, set_pageview_gui.pageviewDts, set_pageview_gui.referrerTag, set_pageview_gui.campaignMedium, set_pageview_gui.campaignSource, set_pageview_gui.campaignName, set_pageview_gui.campaignElement, set_pageview_gui.adwordsAdgroupId, set_pageview_gui.adwordsKeyword, set_pageview_gui.isAuthorized, set_pageview_gui.searchTerm, set_pageview_gui.timeOnPage, set_pageview_gui.epoch | | Filter: (set_pageview_gui.pageviewDate < (('now()')::date - 90)) | | Execute on: All Nodes </NO>Note that I just ran analyze_statistics, so I am not sure what the 'no statistics' means. Q: how can I have the filter applied before the materialisation? I though the projection was perfectly optimised for this case. This is the second delete statement:
explain delete from analytics.set_pageview_gui where (visitorId , sessionId, pageInSession, pageviewDts) in ( select visitorId, sessionId, pageInSession, min(pageviewDts) from analytics.set_pageview_gui group by visitorId, sessionId, pageInSession having count(*) > 1 ); Access Path: +-DML DELETE [Cost: 0, Rows: 0] | Target Projection: analytics.set_pageview_gui_date_b2 | Target Projection: analytics.set_pageview_gui_date_b1 | Target Projection: analytics.set_pageview_gui_date_b0 | Target Prep: | Execute on: All Nodes | +---> JOIN HASH [Semi] [Cost: 262M, Rows: 12B] (PATH ID: 1) Inner (BROADCAST) | | Join Cond: (set_pageview_gui.visitorId = VAL(2)) AND (set_pageview_gui.sessionId = VAL(2)) AND (set_pageview_gui.pageInSession = VAL(2)) AND (set_pageview_gui.pageviewDts = VAL(2)) | | Materialize at Output: set_pageview_gui.globalUserId, set_pageview_gui.portalVersion, set_pageview_gui.pageId, set_pageview_gui.pageType, set_pageview_gui.pageTypeDetail, set_pageview_gui.referrerTag, set_pageview_gui.campaignMedium, set_pageview_gui.campaignSource, set_pageview_gui.campaignName, set_pageview_gui.campaignElement, set_pageview_gui.adwordsAdgroupId, set_pageview_gui.adwordsKeyword, set_pageview_gui.isAuthorized, set_pageview_gui.searchTerm, set_pageview_gui.siteId, set_pageview_gui.channelId, set_pageview_gui.timeOnPage, set_pageview_gui.pageviewDate, set_pageview_gui.epoch | | Execute on: All Nodes | | +-- Outer -> STORAGE ACCESS for <NO alias=""> [Cost: 43M, Rows: 23B] (PATH ID: 2) | | | Projection: analytics.set_pageview_gui_date_b2 | | | Materialize: set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession, set_pageview_gui.pageviewDts | | | Execute on: All Nodes | | | Runtime Filters: (SIP1(HashJoin): set_pageview_gui.visitorId), (SIP2(HashJoin): set_pageview_gui.sessionId), (SIP3(HashJoin): set_pageview_gui.pageInSession), (SIP4(HashJoin): set_pageview_gui.pageviewDts), (SIP5(HashJoin): set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession, set_pageview_gui.pageviewDts) | | +-- Inner -> SELECT [Cost: 146M, Rows: 53M] (PATH ID: 3) | | | Execute on: All Nodes | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 146M, Rows: 53M] (PATH ID: 4) | | | | Aggregates: count(*), min(set_pageview_gui.pageviewDts) | | | | Group By: set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession | | | | Filter: (<SVAR> > 1) | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for set_pageview_gui [Cost: 43M, Rows: 23B] (PATH ID: 5) | | | | | Projection: analytics.set_pageview_gui_date_b1 | Target Projection: analytics.set_pageview_gui_date_b2 | Target Projection: analytics.set_pageview_gui_date_b1 | Target Projection: analytics.set_pageview_gui_date_b0 | Target Prep: | Execute on: All Nodes | +---> JOIN HASH [Semi] [Cost: 262M, Rows: 12B] (PATH ID: 1) Inner (BROADCAST) | | Join Cond: (set_pageview_gui.visitorId = VAL(2)) AND (set_pageview_gui.sessionId = VAL(2)) AND (set_pageview_gui.pageInSession = VAL(2)) AND (set_pageview_gui.pageviewDts = VAL(2)) | | Materialize at Output: set_pageview_gui.globalUserId, set_pageview_gui.portalVersion, set_pageview_gui.pageId, set_pageview_gui.pageType, set_pageview_gui.pageTypeDetail, set_pageview_gui.referrerTag, set_pageview_gui.campaign Medium, set_pageview_gui.campaignSource, set_pageview_gui.campaignName, set_pageview_gui.campaignElement, set_pageview_gui.adwordsAdgroupId, set_pageview_gui.adwordsKeyword, set_pageview_gui.isAuthorized, set_pageview_gui.searchTerm, se t_pageview_gui.siteId, set_pageview_gui.channelId, set_pageview_gui.timeOnPage, set_pageview_gui.pageviewDate, set_pageview_gui.epoch | | Execute on: All Nodes | | +-- Outer -> STORAGE ACCESS for <NO alias=""> [Cost: 43M, Rows: 23B] (PATH ID: 2) | | | Projection: analytics.set_pageview_gui_date_b2 | | | Materialize: set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession, set_pageview_gui.pageviewDts | | | Execute on: All Nodes | | | Runtime Filters: (SIP1(HashJoin): set_pageview_gui.visitorId), (SIP2(HashJoin): set_pageview_gui.sessionId), (SIP3(HashJoin): set_pageview_gui.pageInSession), (SIP4(HashJoin): set_pageview_gui.pageviewDts), (SIP5(HashJoin): set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession, set_pageview_gui.pageviewDts) | | +-- Inner -> SELECT [Cost: 146M, Rows: 53M] (PATH ID: 3) | | | Execute on: All Nodes | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 146M, Rows: 53M] (PATH ID: 4) | | | | Aggregates: count(*), min(set_pageview_gui.pageviewDts) | | | | Group By: set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession | | | | Filter: (<SVAR> > 1) | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for set_pageview_gui [Cost: 43M, Rows: 23B] (PATH ID: 5) | | | | | Projection: analytics.set_pageview_gui_date_b1 | | | | | Materialize: set_pageview_gui.visitorId, set_pageview_gui.sessionId, set_pageview_gui.pageInSession, set_pageview_gui.pageviewDts | | | | | Execute on: All Nodes </SVAR></NO></SVAR></NO>I wonder how to optimise the deletes for those 2 statements together. Thanks for any help,
0
Comments