The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
How to use live aggregated projection ?
I have situation where I need to use aggregated query within a sub query.
select
x.col1,
x.col2,
x.col3,
b.col5,
b.col6
(
select
a.col1,
sum(a.col2) as col2,
sum(a.col3) as col3,
sum(nvl(col3,0)) as col4
from
table1 a
where
col4 between to_date('09/01/2016', 'MM/dd/yyyy') AND to_date('08/31/2018', 'MM/dd/yyyy')
group by
col1
) x
join table2 b on x.col1 = b.col1
Here in this query, inner sub-query is taking to long time. So I try to use live aggregated projection. After creating a projection like:
CREATE PROJECTION table1_lap AS
SELECT
col1
SUM(col2) col2,
SUM(col3) col3
FROM table1
GROUP BY col1;
But execution time did not reduce. Please tell me the solution.
Thank you....
Bimod
0
Answers
Hi bdpanta,
I think you are close. The LAP projection needs to specify columns to store the selected column data in. So your create would be more like:
CREATE PROJECTION table1_lap (col1, col2, col3) AS
SELECT
col1
SUM(col2) col2,
SUM(col3) col3
FROM table1
GROUP BY col1;
If it syntactically qualifies as a LAP definition you should get a couple WARNING messages similar to below:
WARNING 6852: Live Aggregate Projection "table1_lap" will be created for "table1". Data in "table1" will be neither updated nor deleted
WARNING 4468: Projection is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
Once refreshed the LAP should get used if the optimizer determines it satisfies the query. Here's a very simple example.
dbadmin=> create table table1 (col1 int, col2 int, col3 int);
CREATE TABLE
dbadmin=> insert /+direct/ into table1 values('1','1','1');
OUTPUT
dbadmin=> insert /+direct/ into table1 values('2','2','2');
OUTPUT
dbadmin=> commit;
COMMIT
note the projection used is the default super projection
dbadmin=> explain select col2,col3 from table1;
QUERY PLAN
QUERY PLAN DESCRIPTION:
explain select col2,col3 from table1;
Access Path:
+-STORAGE ACCESS for table1 [Cost: 18, Rows: 2 (NO STATISTICS)] (PATH ID: 1)
| Projection: public.table1_super
| Materialize: table1.col2, table1.col3
create the LAP projection
dbadmin=> create projection table1_lap (col2, col3) as select col2,col3 from table1;
WARNING 4468: Projection <public.table1_lap> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
refresh
dbadmin=> select refresh();
refresh
---------------------------------------------------------------------------------------------------------------- Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
"public"."table1_lap": [table1] [refreshed] [scratch] [0] [0]
rerun the explain select col2,col3 and notice it now chose the LAP projection
also notice the Cost: is 10 vs 18 with the super projection, indicating it should be faster
dbadmin=> explain select col2,col3 from table1;
QUERY PLAN
QUERY PLAN DESCRIPTION:
explain select col2,col3 from table1;
Access Path:
+-STORAGE ACCESS for table1 [Cost: 10, Rows: 2 (NO STATISTICS)] (PATH ID: 1)
| Projection: public.table1_lap
| Materialize: table1.col2, table1.col3
I hope it helps,