Prejoin projection is deprecated > Flattened Tables

I was asking in a other discussion but since there is no discussion focused on Prejoin Projection and the replacement for this since v8.1 Flattened Tables, I created this one.

Basically, what is the best practice to improve the performance of a query with joins in the V9.X?

Before V8.1, we would create a projection (for example: CREATE PROJECTION prj_fact_dimensions as select fact.f1, fact.f2, ..., fact.f50, dimension1.d1, dimension1.d2, ...., dimension1.d50, dimension2.di1, dimension2.di2, ..., dimension2.di50 from fact, dimension1, dimension2 where fact.id_di1 = dimension1.id_di1 and fact.id_di2 = dimension2.id_di2 order by f1), now this would result in a error: "Prejoin projection is deprecated".

With this example what is the best solution on V9.1?

Comments

  • thiago_rodrigothiago_rodrigo Registered User

    Bellow is the post I made on other discussion, with my problem with more details:

    Hello, Ben_Vandiver! I'm having a similar problem here. I have a fact table with millions of registers with some dimensions. And I need to improve a performance of a query, initially I was trying to create a projection and end up with the error "Prejoin projection is deprecated", then I looked up more information about flattened tables that you suggested as a solution.
    However, I have some doubts about this:

    1- I was planning on creating a flattened table that used the fact table and the dimensions, but do not alter them, and them be able to create other flattened tables to aggregate data from fact and dimension tables.

    2- Is there a way to create a flattened table totally based on the query I was using to create a projection? I was expecting something like create table fact_wide set using [query], is there a way to do it?

    3- I only found 2 suggestions of use of flattened tables, creating a table based on other (like: CREATE TABLE fact_wide as select a, b,c from fact; and later add dimension columns one by one with a query related), or altering the fact table adding the dimension columns. Then I come to questions, if I create a flattened table like fact_wide, how the fact data can be refreshed on fact_wide?

    4- Also if I alter the fact table (which I'm not sure is a good practice), do I really have to add a query column by column even if its the same dimension table (in my case there are a lot of columns in each dimension)

  • Jim_KnicelyJim_Knicely Administrator, Moderator, Employee, Registered User, VerticaExpert
    edited October 19

    Hi,

    You do need to add a "set using" or "default" query for each dimension table you want in the flattened table. God news, you should be able to generate some ALTER TABLE commands to add the columns to the fact table for you:

    Simple Example:

    dbadmin=> \d dim1;
                                        List of Fields by Tables
     Schema | Table | Column |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------+--------+--------------+------+---------+----------+-------------+-------------
     public | dim1  | c1     | int          |    8 |         | t        | t           |
     public | dim1  | c2     | varchar(100) |  100 |         | f        | f           |
     public | dim1  | c3     | varchar(100) |  100 |         | f        | f           |
     public | dim1  | c4     | varchar(100) |  100 |         | f        | f           |
    (4 rows)
    
    dbadmin=> \d fact;
                                      List of Fields by Tables
     Schema | Table | Column | Type | Size | Default | Not Null | Primary Key |   Foreign Key
    --------+-------+--------+------+------+---------+----------+-------------+-----------------
     public | fact  | pk     | int  |    8 |         | t        | f           |
     public | fact  | c1     | int  |    8 |         | t        | f           | public.dim1(c1)
    (2 rows)
    
    dbadmin=> SELECT * FROM dim1;
     c1 |  c2   |  c3   |  c4
    ----+-------+-------+-------
      1 | TEST1 | TEST2 | TEST3
    (1 row)
    
    dbadmin=> SELECT * FROM fact;
     pk | c1
    ----+----
      1 |  1
    (1 row)
    
    dbadmin=> SELECT pk, c2, c3, c4 FROM fact JOIN dim1 USING (c1);
     pk |  c2   |  c3   |  c4
    ----+-------+-------+-------
      1 | TEST1 | TEST2 | TEST3
    (1 row)
    
    dbadmin=> CREATE PROJECTION fact_dim1_pr AS SELECT pk, c2, c3, c4 FROM fact JOIN dim1 USING (c1);
    ROLLBACK 8678:  Prejoin projection is deprecated
    
    dbadmin=> SELECT 'ALTER TABLE ' || cc.table_schema || '.' || cc.table_name || ' ADD COLUMN ' || c.column_name || ' ' || data_type ||
    dbadmin->        ' SET USING (SELECT ' || c.table_schema || '.' || c.table_name || '.' || c.column_name || ' FROM ' || c.table_schema || '.' || c.table_name ||
    dbadmin->        ' WHERE ' || cc.table_schema || '.' || cc.table_name || '.' || cc.column_name || ' = ' || c.table_schema || '.' || c.table_name || '.' || cc.column_name || ');'
    dbadmin->   FROM constraint_columns cc
    dbadmin->   JOIN columns c
    dbadmin->     ON c.table_id = cc.reference_table_id
    dbadmin->    AND c.column_name <> cc.column_name
    dbadmin->  WHERE cc.table_schema = 'public' AND cc.table_name = 'fact'
    dbadmin->    AND cc.constraint_type = 'f';
                                                                       ?column?
    ----------------------------------------------------------------------------------------------------------------------------------------------
     ALTER TABLE public.fact ADD COLUMN c2 varchar(100) SET USING (SELECT public.dim1.c2 FROM public.dim1 WHERE public.fact.c1 = public.dim1.c1);
     ALTER TABLE public.fact ADD COLUMN c3 varchar(100) SET USING (SELECT public.dim1.c3 FROM public.dim1 WHERE public.fact.c1 = public.dim1.c1);
     ALTER TABLE public.fact ADD COLUMN c4 varchar(100) SET USING (SELECT public.dim1.c4 FROM public.dim1 WHERE public.fact.c1 = public.dim1.c1);
    (3 rows)
    
    dbadmin=> ALTER TABLE public.fact ADD COLUMN c2 varchar(100) SET USING (SELECT public.dim1.c2 FROM public.dim1 WHERE public.fact.c1 = public.dim1.c1);
    ALTER TABLE
    
    dbadmin=> ALTER TABLE public.fact ADD COLUMN c3 varchar(100) SET USING (SELECT public.dim1.c3 FROM public.dim1 WHERE public.fact.c1 = public.dim1.c1);
    ALTER TABLE
    
    dbadmin=> ALTER TABLE public.fact ADD COLUMN c4 varchar(100) SET USING (SELECT public.dim1.c4 FROM public.dim1 WHERE public.fact.c1 = public.dim1.c1);
    ALTER TABLE
    
    dbadmin=> SELECT refresh_columns('public.fact', '', 'REBUILD');
          refresh_columns
    ---------------------------
     refresh_columns completed
    (1 row)
    
    dbadmin=> SELECT * FROM fact;
     pk | c1 |  c2   |  c3   |  c4
    ----+----+-------+-------+-------
      1 |  1 | TEST1 | TEST2 | TEST3
    (1 row)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file