the default data sorting doesn't work on projection
I want to create a projection to sort the data with DESC mode, due to the projection (v7.2) doesn't support the DESC sorting. so I created a calculate field which based on the Dim_Reg_Date field. The logic is below:
datediff(day,Dim_Reg_Date,'9999-12-30') as sfRegDate
and then I create a new projection to apply sorting by the new calculated field:
create projection dMart.p_t_trade_info_us_regDate
(
Dim_Reg_Date ENCODING RLE
,sfRegDate ENCODING RLE
,DIM_CountryCode ENCODING RLE
)
AS
select
T_Trade_Info_US.dim_reg_date
,T_Trade_Info_US.sfRegDate
,T_Trade_Info_US.dim_countrycode
from dMart.T_Trade_Info_US
order by sfRegDate
UNSEGMENTED ALL NODES;
select refresh('dmart.t_trade_info_us');
after I done the preparation, I want to query the data with the following script:
select distinct Dim_Reg_Date,sfRegDate,DIM_CountryCode
from dMart.p_t_trade_info_us_regDate
limit 4;
the result still doesn't sorted. and the result data is randomly changed while I run the select query many times.
first time:
2016-02-29 2916035 US
2017-06-23 2915555 US
2017-06-04 2915574 US
2017-07-12 2915536 US
second time:
2016-03-14 2916021 US
2017-07-07 2915541 US
2017-06-18 2915560 US
2017-07-26 2915522 US
so I have to use the sort as below:
select distinct Dim_Reg_Date,sfRegDate,DIM_CountryCode
from dMart.p_t_trade_info_us_regDate
order by sfRegDate
limit 4;
the following result is correct.
2017-11-06 2915419 US
2017-11-05 2915420 US
2017-11-04 2915421 US
2017-11-03 2915422 US
(I need to waiting for 20+ sec while I appended order by clause, if not, just 1 second completed, so I won't to use the order by clause in my data query script.)
any one can help me on the issue?
thanks!
Comments
Run an EXPLAIN on the SQL to make sure your projection is being used. I bet the default SUPER projection is being used and it's probably segmented.
EXPLAIN select distinct Dim_Reg_Date,sfRegDate,DIM_CountryCode
from dMart.p_t_trade_info_us_regDate
limit 4;
thanks, Jim
Yes, I did it, the new created projection had been used, not the default super projection.
As I mentioned, the result data doesn't be sorted, even i sorted the data in related projection already.
I don't know why.
Hi,
Take a look at the attachment. I create a simple example that I believe mimics what you are trying to do and it works okay for me. Am I doing something differently?
Hello, Jim
Appreciate for your detail instruction.
I follow your instruction to test it, but unfortunately, the result is against with yours.
Before I applied the customized projection, the result data is same for each run, and after I applied the projection, the data sorting is different for each times.
and most important, the default sorting which defined in projection doesn't work.
for details, please reference the attachment.
looking forward your great support!
Hmm. I did my test on a three node cluster. It works there. But when I do the same test on a single node, as you did, I see the same as you. That is, the projection we created returns a different result set for each run (because of the DISTINCT clause).
So, to work around that, try this SQL:
with /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
Works for me on my single node
Not sure how well the query above will work with your data set.
I tried it on the TEST table we are using with 10,354,984 records:
Hi, Jim
I have tried the new statement, but still doesn't work.
and I got one warning, not sure whether that impacts my data result?
OneDB=> with /+ENABLE_WITH_CLAUSE_MATERIALIZATION/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
WARNING 4486: Projections are always created and persisted in the default Vertica locale. The current locale is LEN_S1
c1 | c2
----+----
0 | 3
1 | 2
1 | 3
1 | 5
(4 rows)
Time: First fetch (4 rows): 33.529 ms. All rows formatted: 33.579 ms
my main concern is the lower performance while I applied the sort by clause.
I have one table contains more than 21,000,000 records. and the result data need to be sorted by registered date.
If just query the data without order by clause, 547 ms only, after i applied order by, more than 17s. so that will impact the user experience extremely.
So i want to set up the default sorting in projection, and just query the data from projection without order by clause.
Fyi ... the SQL command you used was incorrect:
Use:
/*+ ENABLE_WITH_CLAUSE_MATERIALIZATION */
Not:
/+ENABLE_WITH_CLAUSE_MATERIALIZATION/
You need this as "When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes."
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/Queries/Subqueries/WithClauseMaterialization.htm
Here is a test against 21,000,000 records on a single node VM (Very weak machine):
Hi, Jim
Our running script is correct while I used it, just post the incorrect one.
but i still got the incorrect sorting after I try your script.
how about the warning message? does the locale impact the default sorting?
WARNING 4486: Projections are always created and persisted in the default Vertica locale. The current locale is LEN_S1
or my VERTICA need to be upgraded?
Hi,
The warning just tells you that you are using the locale LEN_S1 at the session levels which is different than the default. IMHO using LEN_S1 is a bad idea in Vertica as un-optimized query plans can result. However, in our test we are dealing with integers so it shouldn't matter.
But you can try to set the session locale to the default. In vsql run:
\locale en_US@collation=binary;
I tried LEN_S1 and it had no effect on my end.
Anyway, did you try running the statement...
with /+ENABLE_WITH_CLAUSE_MATERIALIZATION/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
... a few times? Do you get different results each time?
If so, can you post the explain plan?
explain with /+ENABLE_WITH_CLAUSE_MATERIALIZATION/ test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
We need to make sure the WITH clause materialization is happening.
If not, try running this:
ALTER SESSION SET PARAMETER EnableWithClauseMaterialization=1;
Then run the query below a few times...
with test_q as (select c1, c2 from test /*+ PROJS('public.test_pr') */) select distinct c1, c2 from test_q limit 4;
Thanks!
Hi, Jim
that's pretty good to change the locale. thanks for your idea.
for our original question, the default sorting doesn't work still now.
by the way, i have inserted one new record, strange, i can't find it by the query what you provided.
Hi,
I think what is messing things up is the DISTINCT key word. It seems to work otherwise. If you need distinct values, I believe the only way to get the order by to work is to use an ORDER BY. We can tune the underlying projection for a GROUP BY.
Take a peek at the attached example. of how I did that. Querying 33,554,432 records returns the correct data from our sample query in < 1 second.
So if you follow the example, for your actual table, the tuned projection might look like:
And the SQL would be:
To get the best projection design, run the above query through database designer!