the default data sorting doesn't work on projection

I want to create a projection to sort the data with DESC mode, due to the projection (v7.2) doesn't support the DESC sorting. so I created a calculate field which based on the Dim_Reg_Date field. The logic is below:
datediff(day,Dim_Reg_Date,'9999-12-30') as sfRegDate

and then I create a new projection to apply sorting by the new calculated field:

create projection dMart.p_t_trade_info_us_regDate
(
Dim_Reg_Date ENCODING RLE
,sfRegDate ENCODING RLE
,DIM_CountryCode ENCODING RLE
)
AS
select
T_Trade_Info_US.dim_reg_date
,T_Trade_Info_US.sfRegDate
,T_Trade_Info_US.dim_countrycode
from dMart.T_Trade_Info_US
order by sfRegDate
UNSEGMENTED ALL NODES;
select refresh('dmart.t_trade_info_us');

after I done the preparation, I want to query the data with the following script:

select distinct Dim_Reg_Date,sfRegDate,DIM_CountryCode
from dMart.p_t_trade_info_us_regDate
limit 4;

the result still doesn't sorted. and the result data is randomly changed while I run the select query many times.

first time:
2016-02-29 2916035 US
2017-06-23 2915555 US
2017-06-04 2915574 US
2017-07-12 2915536 US

second time:
2016-03-14 2916021 US
2017-07-07 2915541 US
2017-06-18 2915560 US
2017-07-26 2915522 US

so I have to use the sort as below:
select distinct Dim_Reg_Date,sfRegDate,DIM_CountryCode
from dMart.p_t_trade_info_us_regDate
order by sfRegDate
limit 4;

the following result is correct.
2017-11-06 2915419 US
2017-11-05 2915420 US
2017-11-04 2915421 US
2017-11-03 2915422 US

(I need to waiting for 20+ sec while I appended order by clause, if not, just 1 second completed, so I won't to use the order by clause in my data query script.)
any one can help me on the issue?

thanks!

Comments

  • Jim_KnicelyJim_Knicely Administrator

    Run an EXPLAIN on the SQL to make sure your projection is being used. I bet the default SUPER projection is being used and it's probably segmented.

    EXPLAIN select distinct Dim_Reg_Date,sfRegDate,DIM_CountryCode
    from dMart.p_t_trade_info_us_regDate
    limit 4;

  • thanks, Jim
    Yes, I did it, the new created projection had been used, not the default super projection.

    As I mentioned, the result data doesn't be sorted, even i sorted the data in related projection already.

    I don't know why.

  • Jim_KnicelyJim_Knicely Administrator

    Hi,

    Take a look at the attachment. I create a simple example that I believe mimics what you are trying to do and it works okay for me. Am I doing something differently?

  • Hello, Jim
    Appreciate for your detail instruction.
    I follow your instruction to test it, but unfortunately, the result is against with yours.
    Before I applied the customized projection, the result data is same for each run, and after I applied the projection, the data sorting is different for each times.

    and most important, the default sorting which defined in projection doesn't work.

    for details, please reference the attachment.

    looking forward your great support!

  • Jim_KnicelyJim_Knicely Administrator
    edited November 2017

    Hmm. I did my test on a three node cluster. It works there. But when I do the same test on a single node, as you did, I see the same as you. That is, the projection we created returns a different result set for each run (because of the DISTINCT clause).

    So, to work around that, try this SQL:

    with /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;

    Works for me on my single node :)

    Not sure how well the query above will work with your data set.

    I tried it on the TEST table we are using with 10,354,984 records:

    dbadmin=> select count(*) from test;
      count
    ----------
     10354984
    (1 row)
    
    dbadmin=> \timing
    Timing is on.
    dbadmin=> with /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
     c1 | c2
    ----+----
      0 |  0
      0 |  1
      0 |  2
      0 |  3
    (4 rows)
    
    Time: First fetch (4 rows): 2571.909 ms. All rows formatted: 2571.995 ms
    
  • Hi, Jim

    I have tried the new statement, but still doesn't work.
    and I got one warning, not sure whether that impacts my data result?

    OneDB=> with /+ENABLE_WITH_CLAUSE_MATERIALIZATION/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
    WARNING 4486: Projections are always created and persisted in the default Vertica locale. The current locale is LEN_S1
    c1 | c2
    ----+----
    0 | 3
    1 | 2
    1 | 3
    1 | 5
    (4 rows)

    Time: First fetch (4 rows): 33.529 ms. All rows formatted: 33.579 ms

  • my main concern is the lower performance while I applied the sort by clause.

    I have one table contains more than 21,000,000 records. and the result data need to be sorted by registered date.

    If just query the data without order by clause, 547 ms only, after i applied order by, more than 17s. so that will impact the user experience extremely.

    So i want to set up the default sorting in projection, and just query the data from projection without order by clause.

  • Jim_KnicelyJim_Knicely Administrator
    edited November 2017

    Fyi ... the SQL command you used was incorrect:

    Use:

    /*+ ENABLE_WITH_CLAUSE_MATERIALIZATION */

    Not:

    /+ENABLE_WITH_CLAUSE_MATERIALIZATION/

    You need this as "When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes."

    See:
    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/Queries/Subqueries/WithClauseMaterialization.htm

  • Jim_KnicelyJim_Knicely Administrator

    Here is a test against 21,000,000 records on a single node VM (Very weak machine):

    dbadmin=> select count(*) from test;
      count
    ----------
     21000000
    (1 row)
    
    dbadmin=> \timing
    Timing is on.
    
    dbadmin=> with /*+ ENABLE_WITH_CLAUSE_MATERIALIZATION */ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
     c1 | c2
    ----+----
      0 |  0
      0 |  1
      0 |  2
      0 |  3
    (4 rows)
    
    Time: First fetch (4 rows): 4616.003 ms. All rows formatted: 4616.050 ms
    
  • Hi, Jim
    Our running script is correct while I used it, just post the incorrect one.

    but i still got the incorrect sorting after I try your script.

    how about the warning message? does the locale impact the default sorting?
    WARNING 4486: Projections are always created and persisted in the default Vertica locale. The current locale is LEN_S1

    or my VERTICA need to be upgraded?

  • Jim_KnicelyJim_Knicely Administrator
    edited November 2017

    Hi,

    The warning just tells you that you are using the locale LEN_S1 at the session levels which is different than the default. IMHO using LEN_S1 is a bad idea in Vertica as un-optimized query plans can result. However, in our test we are dealing with integers so it shouldn't matter.

    But you can try to set the session locale to the default. In vsql run:

    \locale en_US@collation=binary;

    I tried LEN_S1 and it had no effect on my end.

    Anyway, did you try running the statement...

    with /+ENABLE_WITH_CLAUSE_MATERIALIZATION/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;

    ... a few times? Do you get different results each time?

    If so, can you post the explain plan?

    explain with /+ENABLE_WITH_CLAUSE_MATERIALIZATION/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;

    We need to make sure the WITH clause materialization is happening.

    If not, try running this:

    ALTER SESSION SET PARAMETER EnableWithClauseMaterialization=1;

    Then run the query below a few times...

    with test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;

    Thanks!

  • Hi, Jim

    that's pretty good to change the locale. thanks for your idea.

    for our original question, the default sorting doesn't work still now.

    by the way, i have inserted one new record, strange, i can't find it by the query what you provided.

  • Jim_KnicelyJim_Knicely Administrator
    edited November 2017

    Hi,

    I think what is messing things up is the DISTINCT key word. It seems to work otherwise. If you need distinct values, I believe the only way to get the order by to work is to use an ORDER BY. We can tune the underlying projection for a GROUP BY.

    Take a peek at the attached example. of how I did that. Querying 33,554,432 records returns the correct data from our sample query in < 1 second.

    So if you follow the example, for your actual table, the tuned projection might look like:

    create projection dMart.p_t_trade_info_us_regDate
    (
    Dim_Reg_Date ENCODING RLE
    ,sfRegDate
    ,DIM_CountryCode
    )
    AS
    select
    T_Trade_Info_US.dim_reg_date
    ,T_Trade_Info_US.sfRegDate
    ,T_Trade_Info_US.dim_countrycode
    from dMart.T_Trade_Info_US
    order by
    T_Trade_Info_US.dim_reg_date
    ,T_Trade_Info_US.dim_countrycode
    UNSEGMENTED ALL NODES;
    

    And the SQL would be:

    select dim_reg_date, dim_countrycode
    from (
    select T_Trade_Info_US.dim_reg_date, T_Trade_Info_US.dim_countrycode, max(T_Trade_Info_US.sfRegDate) sfRegDate
    from dMart.T_Trade_Info_US
    group by T_Trade_Info_US.dim_reg_date, T_Trade_Info_US.dim_countrycode
    order by sfRegDate) foo
    limit 4;
    limit 4;
    dMart.T_Trade_Info_US;
    

    To get the best projection design, run the above query through database designer!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file