Minus operator

mguelfimguelfi Community Edition User

We have diferences when query a view and the source select. We tried even
select * from view
minus
select * from original_query

The result is not empty, even each time the query returns different number of rows.

We tried also select * from view MINUS select * from view and the result is not empty.

Is that the normal behaviour in Vertica?

Regards.

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    What is the SQL for the view?

  • moshegmosheg Vertica Employee Administrator
    edited September 2021

    If you have differences when you query a view and the source select, check the following:
    1. Does the view and the source query tables are in the same schema?
    2. Check your SCHEMA path:
    SHOW SEARCH_PATH;
    SELECT CURRENT_SCHEMA;
    3. Just for troubleshooting, does the view owner, the source tables owner, and the user who run the query are different?
    Or have any ACCESS POLICY definitions?
    Try to query the view and afterwards the source SELECT from DB Super User, like dbadmin.
    4. What is your Vertica version? SELECT version();
    Do VER-77394 or VER-78130 is installed?
    5. Like Jim asked, what is your view definition?
    SELECT view_definition FROM views WHERE table_name ILIKE '%somthing%';
    Or check specific view definition:
    SELECT export_objects('','MY_SCHEMA.MY_VIEW');
    Or if it is a system view, check your system view definition:
    SELECT * from vs_system_views where view_name ilike '%partitions%';

  • mguelfimguelfi Community Edition User

    Hi:

    show search path:
    "$user", public, v_catalog, v_monitor, v_internal, v_func

    select current_schema:
    public

    select version:
    Vertica Analytic Database v10.0.1-6

    This is the view:
    CREATE OR REPLACE VIEW dwh_ins_hub.dim_entity_cumulative_sum AS
    SELECT entity.tenant_id,
    entity.entityStatus,
    DATE(entity.creationDate ),
    sum(count (DISTINCT entity.entityReference))
    OVER(
    partition BY NULL
    ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
    FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
    group by entity.tenant_id,
    entity.entityStatus,
    DATE(entity.creationDate )

    And if we do

    select * from dwh_ins_hub.dim_entity_cumulative_sum
    minus
    select * from dwh_ins_hub.dim_entity_cumulative_sum

    the result is not null nor always the same (the number of rows changes each time).

    The same if we do:
    SELECT entity.tenant_id,
    entity.entityStatus,
    DATE(entity.creationDate ),
    sum(count (DISTINCT entity.entityReference))
    OVER(
    partition BY NULL
    ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
    FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
    group by entity.tenant_id,
    entity.entityStatus,
    DATE(entity.creationDate )
    minus
    SELECT entity.tenant_id,
    entity.entityStatus,
    DATE(entity.creationDate ),
    sum(count (DISTINCT entity.entityReference))
    OVER(
    partition BY NULL
    ORDER BY DATE(entity.creationDate ) ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
    FROM (dwh_ins_hub.entity JOIN dwh_ins_hub.entity_last_version elv ON (((elv.entityReference = entity.entityReference) AND (elv.last_entityVersion = entity.entityVersion))))
    group by entity.tenant_id,
    entity.entityStatus,
    DATE(entity.creationDate )

    Regards and TIA

  • marcothesanemarcothesane - Select Field - Administrator

    At this moment, I can only speculate.

    to confirm - what you do is:

    SELECT
      *
    FROM the_same_complex_view
    MINUS
    SELECT
      *
    FROM the_same_complex_view
    

    What I suspect (but cannot prove) is that the same full select (the one coming from the view) is fired two times, but at two different points in time, and some new rows are in one select that are not in the other.

    Two diagnostic steps could help to shed some light on the matter:
    a) add the pseudo column epoch from all original source tables, so dwh_ins_hub.entity.epoch and dwh_ins_hub.entity_last_version.epoch to the view, and get the max(epoch) from both
    b) rewrite your query to:

    WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */
    dim_entity_cumulative_sum AS (
      SELECT 
        entity.tenant_id
      , entity.entitystatus
      , date(entity.creationdate)
      , sum(COUNT (DISTINCT entity.entityreference)) over(
         PARTITION BY NULL
         ORDER BY date(entity.creationdate) ASC 
         ROWS UNBOUNDED preceding
        ) "Cumulative Sum"
      FROM dwh_ins_hub.entity
      JOIN dwh_ins_hub.entity_last_version elv 
        ON elv.entityreference = entity.entityreference
       AND elv.last_entityversion = entity.entityversion
      GROUP BY  
        entity.tenant_id
      , entity.entitystatus
      , date(entity.creationdate)
    )
    SELECT
      *
    FROM dim_entity_cumulative_sum
    MINUS
    SELECT
      *
    FROM dim_entity_cumulative_sum
    ;
    

    Then, run it once with the /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ hint, and once without.

  • mguelfimguelfi Community Edition User

    @marcothesane said:
    At this moment, I can only speculate.

    to confirm - what you do is:

    SELECT
      *
    FROM the_same_complex_view
    MINUS
    SELECT
      *
    FROM the_same_complex_view
    

    What I suspect (but cannot prove) is that the same full select (the one coming from the view) is fired two times, but at two different points in time, and some new rows are in one select that are not in the other.

    The base tables does not (in theory) suffer changes. I even do CTAS to another names to fix the data and had the same problem.

    Two diagnostic steps could help to shed some light on the matter:
    a) add the pseudo column epoch from all original source tables, so dwh_ins_hub.entity.epoch and dwh_ins_hub.entity_last_version.epoch to the view, and get the max(epoch) from both
    b) rewrite your query to:

    WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */
    dim_entity_cumulative_sum AS (
      SELECT 
        entity.tenant_id
      , entity.entitystatus
      , date(entity.creationdate)
      , sum(COUNT (DISTINCT entity.entityreference)) over(
         PARTITION BY NULL
         ORDER BY date(entity.creationdate) ASC 
         ROWS UNBOUNDED preceding
        ) "Cumulative Sum"
      FROM dwh_ins_hub.entity
      JOIN dwh_ins_hub.entity_last_version elv 
        ON elv.entityreference = entity.entityreference
       AND elv.last_entityversion = entity.entityversion
      GROUP BY  
        entity.tenant_id
      , entity.entitystatus
      , date(entity.creationdate)
    )
    SELECT
      *
    FROM dim_entity_cumulative_sum
    MINUS
    SELECT
      *
    FROM dim_entity_cumulative_sum
    ;
    

    Then, run it once with the /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ hint, and once without.

    If I run the query with the hint, the result is null (expected).

    Regards.

  • marcothesanemarcothesane - Select Field - Administrator

    I now reverse-engineered the query with @mosheg ' s help in providing test data:
    Data preparation:

    CREATE TABLE dwh_ins_hub.entity (
      tenant_id INT
    , entitystatus INT
    , creationdate DATE
    , entityreference INT
    , entityversion INT
    );
    CREATE TABLE dwh_ins_hub.entity_last_version( 
      tenant_id INT
    , entityStatus INT
    , creationDate DATE
    , entityReference INT
    ,  last_entityVersion INT
    );
    
    COPY dwh_ins_hub.entity FROM STDIN DELIMITER ',' ABORT ON ERROR;
    1,3,2021-09-11,3,3
    2,4,2021-08-24,4,4
    3,5,2021-09-07,5,5
    4,6,2021-07-17,6,6
    5,7,2021-09-07,7,7
    \.
    
    INSERT INTO dwh_ins_hub.entity_last_version SELECT * FROM entity;
    COMMIT;
    

    Then, I run the same query (one branch of your MINUS query, but rewritten in my stile for efficiency - and better readability for me, twice.

    I had initially thought that PARTITION BY NULL could be the culprit, as NULL always risks to yield unpredictable results, and is not needed at all in this case, but then I realised that two tenants have the same creationDate, and that's where your issue comes from.
    So, with the data filled as above, I run this query several times:

    SELECT 
       entity.tenant_id
    ,  entity.entitystatus
    ,  entity.creationdate
    ,  sum(COUNT (DISTINCT entity.entityreference)) OVER(
          PARTITION BY 1 ORDER BY date(entity.creationdate) ASC ROWS UNBOUNDED PRECEDING
       ) "Cumulative Sum"
    FROM dwh_ins_hub.entity
    JOIN dwh_ins_hub.entity_last_version elv 
      ON elv.entityreference = entity.entityreference 
     AND elv.last_entityversion = entity.entityversion
    GROUP BY entity.tenant_id
    ,  entity.entitystatus
    ,  entity.creationdate;
    

    It returns one of these two versions of the result table randomly:

    tenant_id entitystatus creationdate Cumulative Sum
    4 6 2021-07-17 1
    2 4 2021-08-24 2
    5 7 2021-09-07 3
    3 5 2021-09-07 4
    1 3 2021-09-11 5
    tenant_id entitystatus creationdate Cumulative Sum
    4 6 2021-07-17 1
    2 4 2021-08-24 2
    3 5 2021-09-07 3
    5 7 2021-09-07 4
    1 3 2021-09-11 5

    Note that creationDate is the same: 2021-09-07 for both tenant_id 3 and 5. And as only the entity.creationdate is in the ORDER BY subclause of the OVER() windowing clause, the two rows of tenant_id 3 and 5 can indeed appear in any order they want. So the coder, here, is rather the victim than the master of what the query does. SUM() OVER() does exactly what it's supposed to do, only the ORDER criterion turns out to return a partially random order.

    Materialising the WITH clause would only hide the ambiguity of the result of Cumulative Sum. I regard that as counter-productive.

    ORDER BY a unique criterion - for example, entity.tenant_id and entity.creationdate, and the problem will go away.

  • mguelfimguelfi Community Edition User

    Hi: thanks for the answer I will try it.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file