Minus operator
We have diferences when query a view and the source select. We tried even
select * from view
minus
select * from original_query
The result is not empty, even each time the query returns different number of rows.
We tried also select * from view MINUS select * from view and the result is not empty.
Is that the normal behaviour in Vertica?
Regards.
0
Comments
What is the SQL for the view?
If you have differences when you query a view and the source select, check the following:
1. Does the view and the source query tables are in the same schema?
2. Check your SCHEMA path:
SHOW SEARCH_PATH;
SELECT CURRENT_SCHEMA;
3. Just for troubleshooting, does the view owner, the source tables owner, and the user who run the query are different?
Or have any ACCESS POLICY definitions?
Try to query the view and afterwards the source SELECT from DB Super User, like dbadmin.
4. What is your Vertica version? SELECT version();
Do VER-77394 or VER-78130 is installed?
5. Like Jim asked, what is your view definition?
SELECT view_definition FROM views WHERE table_name ILIKE '%somthing%';
Or check specific view definition:
SELECT export_objects('','MY_SCHEMA.MY_VIEW');
Or if it is a system view, check your system view definition:
SELECT * from vs_system_views where view_name ilike '%partitions%';
Hi:
show search path:
"$user", public, v_catalog, v_monitor, v_internal, v_func
select current_schema:
public
select version:
Vertica Analytic Database v10.0.1-6
This is the view:
CREATE OR REPLACE VIEW dwh_ins_hub.dim_entity_cumulative_sum AS
SELECT entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate ),
sum(count (DISTINCT entity.entityReference))
OVER(
partition BY NULL
ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
group by entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate )
And if we do
select * from dwh_ins_hub.dim_entity_cumulative_sum
minus
select * from dwh_ins_hub.dim_entity_cumulative_sum
the result is not null nor always the same (the number of rows changes each time).
The same if we do:
SELECT entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate ),
sum(count (DISTINCT entity.entityReference))
OVER(
partition BY NULL
ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
group by entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate )
minus
SELECT entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate ),
sum(count (DISTINCT entity.entityReference))
OVER(
partition BY NULL
ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
group by entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate )
Regards and TIA
At this moment, I can only speculate.
to confirm - what you do is:
What I suspect (but cannot prove) is that the same full select (the one coming from the view) is fired two times, but at two different points in time, and some new rows are in one select that are not in the other.
Two diagnostic steps could help to shed some light on the matter:
a) add the pseudo column
epochfrom all original source tables, sodwh_ins_hub.entity.epochanddwh_ins_hub.entity_last_version.epochto the view, and get the max(epoch) from bothb) rewrite your query to:
WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ dim_entity_cumulative_sum AS ( SELECT entity.tenant_id , entity.entitystatus , date(entity.creationdate) , sum(COUNT (DISTINCT entity.entityreference)) over( PARTITION BY NULL ORDER BY date(entity.creationdate) ASC ROWS UNBOUNDED preceding ) "Cumulative Sum" FROM dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON elv.entityreference = entity.entityreference AND elv.last_entityversion = entity.entityversion GROUP BY entity.tenant_id , entity.entitystatus , date(entity.creationdate) ) SELECT * FROM dim_entity_cumulative_sum MINUS SELECT * FROM dim_entity_cumulative_sum ;Then, run it once with the
/*+ENABLE_WITH_CLAUSE_MATERIALIZATION */hint, and once without.The base tables does not (in theory) suffer changes. I even do CTAS to another names to fix the data and had the same problem.
If I run the query with the hint, the result is null (expected).
Regards.
I now reverse-engineered the query with @mosheg ' s help in providing test data:
Data preparation:
Then, I run the same query (one branch of your MINUS query, but rewritten in my stile for efficiency - and better readability for me, twice.
I had initially thought that
PARTITION BY NULLcould be the culprit, as NULL always risks to yield unpredictable results, and is not needed at all in this case, but then I realised that two tenants have the samecreationDate, and that's where your issue comes from.So, with the data filled as above, I run this query several times:
SELECT entity.tenant_id , entity.entitystatus , entity.creationdate , sum(COUNT (DISTINCT entity.entityreference)) OVER( PARTITION BY 1 ORDER BY date(entity.creationdate) ASC ROWS UNBOUNDED PRECEDING ) "Cumulative Sum" FROM dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON elv.entityreference = entity.entityreference AND elv.last_entityversion = entity.entityversion GROUP BY entity.tenant_id , entity.entitystatus , entity.creationdate;It returns one of these two versions of the result table randomly:
Note that
creationDateis the same:2021-09-07for both tenant_id3and5. And as only theentity.creationdateis in the ORDER BY subclause of the OVER() windowing clause, the two rows of tenant_id3and5can indeed appear in any order they want. So the coder, here, is rather the victim than the master of what the query does. SUM() OVER() does exactly what it's supposed to do, only the ORDER criterion turns out to return a partially random order.Materialising the WITH clause would only hide the ambiguity of the result of
Cumulative Sum. I regard that as counter-productive.ORDER BYa unique criterion - for example,entity.tenant_idandentity.creationdate, and the problem will go away.Hi: thanks for the answer I will try it.