SQL Issue: Question about the MINUS / INTERSECT operators
Hi All,
Calling all SQL gurus: I have a question about the MINUS / INTERSECT operators and was hoping to get help on a tricky SQL issue.
ASK 1: Run the two sub-queries below and do a diff or intersection on the tifa
column as shown. A standard MINUS
/ INTERSECT
operator does the job as follows.
QUERY 1
SELECT COUNT(e.tifa) FROM ( (SELECT DISTINCT e.tifa FROM ctv_app_launch e WHERE (e.country_code = 'US' AND e.app_id = '11101200001' AND e.event_time BETWEEN '2020-07-26 00:00:00.0' AND '2020-10-01 23:59:59.999') GROUP BY e.tifa HAVING COUNT(*) > '0' MINUS -- (or INTERSECT) SELECT DISTINCT e.tifa FROM exposure_events e WHERE (e.type = '1' AND e.timestamp BETWEEN '2021-01-17 00:00:00.0' AND '2021-02-15 23:59:59.999' AND e.country_code = 'CA' AND e.content_id = '78354867415943449') GROUP BY e.tifa)) e
ASK 2: Do the same query as ASK 1 but also keep track of the associated "maximum timestamp" per e.tifa
QUERY 2 (not working as expected)
SELECT DISTINCT e.tifa, MAX(ts) AS max_timestamp FROM ( (SELECT DISTINCT e.tifa, MAX(e.event_time) AS ts FROM ctv_app_launch e WHERE (e.country_code = 'US' AND e.app_id = '11101200001' AND e.event_time BETWEEN '2020-07-26 00:00:00.0' AND '2020-10-01 23:59:59.999') GROUP BY e.tifa HAVING COUNT(*) > '0' MINUS -- (or INTERSECT) SELECT DISTINCT e.tifa, MAX(e.timestamp) AS ts FROM exposure_events e WHERE (e.type = '1' AND e.timestamp BETWEEN '2021-01-17 00:00:00.0' AND '2021-02-15 23:59:59.999' AND e.country_code = 'CA' AND e.content_id = '78354867415943449') GROUP BY e.tifa)) e GROUP BY e.tifa
PROBLEM: Even though the number of columns in the two sub-queries is the same, what will be MINUS
-ed or INTERSECT
-ed is not the same. I want the MINUS
or INTERSECT
to happen on the e.tifa
(from both sub-queries) and then intelligently pick where the ts
comes from, either e.event_time
or e.timestamp
depending on where the e.tifa
came from.
I hope I was able to explain my situation and that some gurus can help me out!
Answers
So if Query 1 returns 100 records (=100
e.tifa
), Query 2 should also return 100 records (pair of tifa and timestamp). As of now, query 2 does not return 100 records - it's a lot more.MINUS and INTERSECT are different in Vertica. q1 MINUS q2 removes results from q1 that are present in q2. What you want is the common records between q1 and q2, which is q1 INTERSECT q2.
I'm not sure INTERSECT will work here though, since it won't match rows from q1 and q2 with different timestamps. I think you'll have better luck with a construct like this pseudo-SQL to join the two source tables/subqueries and use a CASE statement to pick the newer timestamp:
SELECT tifa, (CASE WHEN q1max > q2max THEN q1max ELSE q2max END) AS max_ts FROM
(SELECT table1.tifa, max(table1.ts) q1max, max(table2.ts) FROM table1 JOIN table2 USING (tifa));
(where table1 and table2 should be replaced by subqueries)