SQL Issue: Question about the MINUS / INTERSECT operators

neo_i_am_the_oneneo_i_am_the_one Community Edition User
edited June 2022 in General Discussion

Hi All,

Calling all SQL gurus: I have a question about the MINUS / INTERSECT operators and was hoping to get help on a tricky SQL issue.

ASK 1: Run the two sub-queries below and do a diff or intersection on the tifa column as shown. A standard MINUS / INTERSECT operator does the job as follows.

QUERY 1

SELECT COUNT(e.tifa)
FROM (
          (SELECT DISTINCT e.tifa
           FROM ctv_app_launch e
           WHERE (e.country_code = 'US'
                  AND e.app_id = '11101200001'
                  AND e.event_time BETWEEN '2020-07-26 00:00:00.0' AND '2020-10-01 23:59:59.999')
           GROUP BY e.tifa
           HAVING COUNT(*) > '0' 

           MINUS -- (or INTERSECT)

           SELECT DISTINCT e.tifa
           FROM exposure_events e 
           WHERE (e.type = '1'
                  AND e.timestamp BETWEEN '2021-01-17 00:00:00.0' AND '2021-02-15 23:59:59.999'
                  AND e.country_code = 'CA'
                  AND e.content_id = '78354867415943449')
           GROUP BY e.tifa)) e

ASK 2: Do the same query as ASK 1 but also keep track of the associated "maximum timestamp" per e.tifa

QUERY 2 (not working as expected)

SELECT DISTINCT e.tifa,
                MAX(ts) AS max_timestamp
FROM (
          (SELECT DISTINCT e.tifa,
                           MAX(e.event_time) AS ts
           FROM ctv_app_launch e
           WHERE (e.country_code = 'US'
                  AND e.app_id = '11101200001'
                  AND e.event_time BETWEEN '2020-07-26 00:00:00.0' AND '2020-10-01 23:59:59.999')
           GROUP BY e.tifa
           HAVING COUNT(*) > '0' 

           MINUS -- (or INTERSECT)

           SELECT DISTINCT e.tifa,
                           MAX(e.timestamp) AS ts
           FROM exposure_events e
           WHERE (e.type = '1'
                  AND e.timestamp BETWEEN '2021-01-17 00:00:00.0' AND '2021-02-15 23:59:59.999'
                  AND e.country_code = 'CA'
                  AND e.content_id = '78354867415943449')
           GROUP BY e.tifa)) e
GROUP BY e.tifa

PROBLEM: Even though the number of columns in the two sub-queries is the same, what will be MINUS-ed or INTERSECT-ed is not the same. I want the MINUS or INTERSECT to happen on the e.tifa (from both sub-queries) and then intelligently pick where the ts comes from, either e.event_time or e.timestamp depending on where the e.tifa came from.

I hope I was able to explain my situation and that some gurus can help me out!

Answers

  • neo_i_am_the_oneneo_i_am_the_one Community Edition User

    So if Query 1 returns 100 records (=100 e.tifa), Query 2 should also return 100 records (pair of tifa and timestamp). As of now, query 2 does not return 100 records - it's a lot more.

  • Bryan_HBryan_H Vertica Employee Administrator

    MINUS and INTERSECT are different in Vertica. q1 MINUS q2 removes results from q1 that are present in q2. What you want is the common records between q1 and q2, which is q1 INTERSECT q2.

  • Bryan_HBryan_H Vertica Employee Administrator

    I'm not sure INTERSECT will work here though, since it won't match rows from q1 and q2 with different timestamps. I think you'll have better luck with a construct like this pseudo-SQL to join the two source tables/subqueries and use a CASE statement to pick the newer timestamp:
    SELECT tifa, (CASE WHEN q1max > q2max THEN q1max ELSE q2max END) AS max_ts FROM
    (SELECT table1.tifa, max(table1.ts) q1max, max(table2.ts) FROM table1 JOIN table2 USING (tifa));
    (where table1 and table2 should be replaced by subqueries)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file