Logically right, but "strange" MERGE behavior.
Hello,
Let's suppose that we have an empty super projection with four fields: dateTime, bannerId, geo, count. We're going to bulk import some data from external CSV file using MERGE.
Here is example of such CSV file (dateTime,bannerId,geo,impressions):
Here is piece of SQL to make MERGE (taken from http://www.vertica.com/2012/08/27/load-new-data-and-modify-data-simultaneously/):
Now in our super projection we have three lines:
Is there any way to get only 2 lines as result of MERGEing without pre MERGE CSV file processing?
Let's suppose that we have an empty super projection with four fields: dateTime, bannerId, geo, count. We're going to bulk import some data from external CSV file using MERGE.
Here is example of such CSV file (dateTime,bannerId,geo,impressions):
2014-01-01,1,US,50Please pay attention to the first two lines of CSV file. First two lines have equal "key" (dateTime,bannerId,geo) - this is very important!
2014-01-01,1,US,10
2014-01-01,1,DE,75
Here is piece of SQL to make MERGE (taken from http://www.vertica.com/2012/08/27/load-new-data-and-modify-data-simultaneously/):
MERGE INTO impressions tgt
USING new_impressions src
ON src.dateTime = tgt.dateTime
AND src.bannerId = tgt.bannerId
AND src.geo = tgt.geo
WHEN MATCHED THEN
UPDATE SET count = tgt.count + src.count
WHEN NOT MATCHED THEN
INSERT VALUES (src.dateTime, src.bannerId, src.geo, src.count);
Now in our super projection we have three lines:
2014-01-01,1,US,50But first two of them have similar "key" (dateTime,bannerId,geo) despite the fact we were using "WHEN MATCHED THEN UPDATE SET count = tgt.count + src.count" in our MERGE query.
2014-01-01,1,US,10
2014-01-01,1,DE,75
Is there any way to get only 2 lines as result of MERGEing without pre MERGE CSV file processing?
2014-01-01,1,US,60 (sum of lines with equal keys)Thank you!
2014-01-01,1,US,10
0
Comments
If you are expecting sum of lines with equal keys, then you should use a group by clause and sum the count.
Other then that I don't think the behavior of Merge is "Strange".
Can you show what is ddl for impressions and new_impressions and what is the data inside them referring to this example
Short answer: No!
Longer answer: No, you can't achieve it with MERGE.
https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/MERGE.htm
Notes and Restrictions
- An optimized query plan for a
...MERGE
statement assumes that the data you are trying to merge conforms with any unique or primary key constraints you declare.PS
MERGE should fail on dups or you defined DISABLE_DUPLICATE_KEY_ERROR, so no dups detection. So logically and practically its a right MERGE.
https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/VerticaFunct...
https://community.vertica.com/vertica/topics/_duplicate_merge_key_detected_in_join_error_on_tables_w...
select dateTime, bannerId, geo, count, sum(count) OVER (PARTITION BY dateTime, bannerId, geo) as dbgsumcount from <table> ;
select dateTime, bannerId, geo, count from <table> GROUP BY dateTime, bannerId, geo;