Fill Missing Dates for Each Group in an existing table
Hello All,
I have a Table having 3 columns Date, Branch and Business Unit.
Date Branch Business Unit
12/1/2020 CE01 15001
12/5/2020 CE01 15001
12/2/2020 CE03 15002
12/8/2020 CE03 15002
For each Branch and Business Unit Combination, I want to insert the missing dates from 1st December'20 to 10th December'20.
The result should look like below..
Date Branch Business Unit
12/1/2020 CE01 15001
12/2/2020 CE01 15001
12/3/2020 CE01 15001
12/4/2020 CE01 15001
12/5/2020 CE01 15001
12/6/2020 CE01 15001
12/7/2020 CE01 15001
12/8/2020 CE01 15001
12/9/2020 CE01 15001
12/10/2020 CE01 15001
12/1/2020 CE03 15002
12/2/2020 CE03 15002
12/3/2020 CE03 15002
12/4/2020 CE03 15002
12/5/2020 CE03 15002
12/6/2020 CE03 15002
12/7/2020 CE03 15002
12/8/2020 CE03 15002
12/9/2020 CE03 15002
12/10/2020 CE03 15002
Just wanted to know what will be the best possible approach to tackle this.
Thank You in Advance.
Warm Regards,
Jayanta
Best Answer
-
marcothesane - Select Field - Administrator
Vertica has a clause that has been invented for something like what you ask the
TIMESERIES
clause.
In your case, you would be tempted to use it as:TIMESERIES gen_date AS '1 DAY' OVER(PARTITION BY branch,business_unit ORDER BY "date")
But then, it would start and end the series for'CE01'
and15001
with '2020-12-01' and '2020-12-10', and the other with '2020-12-02' and '2020-12-08'.So you have practially no other choice than to add a distinct timeseries in an in-line table, and CROSS JOIN that with a SELECT DISTINCT of
branch
andbusiness_unit
from your base table.What you might lose with this approach is any automatic constant or linear interpolation of any measures (which you're not including in your example though) ...
Here goes:
WITH -- your input ... indata(dt,brnch,bus_unit) AS ( SELECT DATE '2020-12-01','CE01',15001 UNION ALL SELECT DATE '2020-12-05','CE01',15001 UNION ALL SELECT DATE '2020-12-02','CE03',15002 UNION ALL SELECT DATE '2020-12-08','CE03',15002 ) -- real query starts here; replace "," with "WITH" , l(l_dt) AS ( SELECT MIN(dt)::TIMESTAMP FROM indata UNION ALL SELECT MAX(dt)::TIMESTAMP FROM indata ) , ts AS ( SELECT dt::DATE AS dt FROM l TIMESERIES dt AS '1 DAY' OVER(ORDER BY l_dt) ) , dist AS ( SELECT DISTINCT brnch , bus_unit FROM indata ) SELECT dt , brnch , bus_unit FROM dist CROSS JOIN ts ORDER BY 2,1 ; -- out dt | brnch | bus_unit -- out ------------+-------+---------- -- out 2020-12-01 | CE01 | 15001 -- out 2020-12-02 | CE01 | 15001 -- out 2020-12-03 | CE01 | 15001 -- out 2020-12-04 | CE01 | 15001 -- out 2020-12-05 | CE01 | 15001 -- out 2020-12-06 | CE01 | 15001 -- out 2020-12-07 | CE01 | 15001 -- out 2020-12-08 | CE01 | 15001 -- out 2020-12-01 | CE03 | 15002 -- out 2020-12-02 | CE03 | 15002 -- out 2020-12-03 | CE03 | 15002 -- out 2020-12-04 | CE03 | 15002 -- out 2020-12-05 | CE03 | 15002 -- out 2020-12-06 | CE03 | 15002 -- out 2020-12-07 | CE03 | 15002 -- out 2020-12-08 | CE03 | 15002
1
Answers
Thank You @marcothesane . I used the Cross Join logic and it worked. Thanks a lot!!!