Minus operator
mguelfi
Community Edition User ✭
We have diferences when query a view and the source select. We tried even
select * from view
minus
select * from original_query
The result is not empty, even each time the query returns different number of rows.
We tried also select * from view MINUS select * from view and the result is not empty.
Is that the normal behaviour in Vertica?
Regards.
0
Comments
What is the SQL for the view?
If you have differences when you query a view and the source select, check the following:
1. Does the view and the source query tables are in the same schema?
2. Check your SCHEMA path:
SHOW SEARCH_PATH;
SELECT CURRENT_SCHEMA;
3. Just for troubleshooting, does the view owner, the source tables owner, and the user who run the query are different?
Or have any ACCESS POLICY definitions?
Try to query the view and afterwards the source SELECT from DB Super User, like dbadmin.
4. What is your Vertica version? SELECT version();
Do VER-77394 or VER-78130 is installed?
5. Like Jim asked, what is your view definition?
SELECT view_definition FROM views WHERE table_name ILIKE '%somthing%';
Or check specific view definition:
SELECT export_objects('','MY_SCHEMA.MY_VIEW');
Or if it is a system view, check your system view definition:
SELECT * from vs_system_views where view_name ilike '%partitions%';
Hi:
show search path:
"$user", public, v_catalog, v_monitor, v_internal, v_func
select current_schema:
public
select version:
Vertica Analytic Database v10.0.1-6
This is the view:
CREATE OR REPLACE VIEW dwh_ins_hub.dim_entity_cumulative_sum AS
SELECT entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate ),
sum(count (DISTINCT entity.entityReference))
OVER(
partition BY NULL
ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
group by entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate )
And if we do
select * from dwh_ins_hub.dim_entity_cumulative_sum
minus
select * from dwh_ins_hub.dim_entity_cumulative_sum
the result is not null nor always the same (the number of rows changes each time).
The same if we do:
SELECT entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate ),
sum(count (DISTINCT entity.entityReference))
OVER(
partition BY NULL
ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
group by entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate )
minus
SELECT entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate ),
sum(count (DISTINCT entity.entityReference))
OVER(
partition BY NULL
ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
group by entity.tenant_id,
entity.entityStatus,
DATE(entity.creationDate )
Regards and TIA
At this moment, I can only speculate.
to confirm - what you do is:
What I suspect (but cannot prove) is that the same full select (the one coming from the view) is fired two times, but at two different points in time, and some new rows are in one select that are not in the other.
Two diagnostic steps could help to shed some light on the matter:
a) add the pseudo column
epoch
from all original source tables, sodwh_ins_hub.entity.epoch
anddwh_ins_hub.entity_last_version.epoch
to the view, and get the max(epoch) from bothb) rewrite your query to:
Then, run it once with the
/*+ENABLE_WITH_CLAUSE_MATERIALIZATION */
hint, and once without.The base tables does not (in theory) suffer changes. I even do CTAS to another names to fix the data and had the same problem.
If I run the query with the hint, the result is null (expected).
Regards.
I now reverse-engineered the query with @mosheg ' s help in providing test data:
Data preparation:
Then, I run the same query (one branch of your MINUS query, but rewritten in my stile for efficiency - and better readability for me, twice.
I had initially thought that
PARTITION BY NULL
could be the culprit, as NULL always risks to yield unpredictable results, and is not needed at all in this case, but then I realised that two tenants have the samecreationDate
, and that's where your issue comes from.So, with the data filled as above, I run this query several times:
It returns one of these two versions of the result table randomly:
Note that
creationDate
is the same:2021-09-07
for both tenant_id3
and5
. And as only theentity.creationdate
is in the ORDER BY subclause of the OVER() windowing clause, the two rows of tenant_id3
and5
can indeed appear in any order they want. So the coder, here, is rather the victim than the master of what the query does. SUM() OVER() does exactly what it's supposed to do, only the ORDER criterion turns out to return a partially random order.Materialising the WITH clause would only hide the ambiguity of the result of
Cumulative Sum
. I regard that as counter-productive.ORDER BY
a unique criterion - for example,entity.tenant_id
andentity.creationdate
, and the problem will go away.Hi: thanks for the answer I will try it.