Fill Missing Dates for Each Group in an existing table

JayantaBJayantaB Community Edition User

Hello All,
I have a Table having 3 columns Date, Branch and Business Unit.
Date Branch Business Unit
12/1/2020 CE01 15001
12/5/2020 CE01 15001
12/2/2020 CE03 15002
12/8/2020 CE03 15002

For each Branch and Business Unit Combination, I want to insert the missing dates from 1st December'20 to 10th December'20.
The result should look like below..
Date Branch Business Unit
12/1/2020 CE01 15001
12/2/2020 CE01 15001
12/3/2020 CE01 15001
12/4/2020 CE01 15001
12/5/2020 CE01 15001
12/6/2020 CE01 15001
12/7/2020 CE01 15001
12/8/2020 CE01 15001
12/9/2020 CE01 15001
12/10/2020 CE01 15001
12/1/2020 CE03 15002
12/2/2020 CE03 15002
12/3/2020 CE03 15002
12/4/2020 CE03 15002
12/5/2020 CE03 15002
12/6/2020 CE03 15002
12/7/2020 CE03 15002
12/8/2020 CE03 15002
12/9/2020 CE03 15002
12/10/2020 CE03 15002

Just wanted to know what will be the best possible approach to tackle this.
Thank You in Advance.

Warm Regards,
Jayanta

Best Answer

  • marcothesanemarcothesane - Select Field - Administrator
    Answer ✓

    Vertica has a clause that has been invented for something like what you ask the TIMESERIES clause.
    In your case, you would be tempted to use it as:
    TIMESERIES gen_date AS '1 DAY' OVER(PARTITION BY branch,business_unit ORDER BY "date")
    But then, it would start and end the series for 'CE01' and 15001 with '2020-12-01' and '2020-12-10', and the other with '2020-12-02' and '2020-12-08'.

    So you have practially no other choice than to add a distinct timeseries in an in-line table, and CROSS JOIN that with a SELECT DISTINCT of branch and business_unit from your base table.

    What you might lose with this approach is any automatic constant or linear interpolation of any measures (which you're not including in your example though) ...

    Here goes:

    WITH
    -- your input ...
    indata(dt,brnch,bus_unit) AS (
              SELECT DATE '2020-12-01','CE01',15001
    UNION ALL SELECT DATE '2020-12-05','CE01',15001
    UNION ALL SELECT DATE '2020-12-02','CE03',15002
    UNION ALL SELECT DATE '2020-12-08','CE03',15002
    )
    -- real query starts here; replace "," with "WITH"
    ,
    l(l_dt) AS (
                SELECT MIN(dt)::TIMESTAMP FROM indata
      UNION ALL SELECT MAX(dt)::TIMESTAMP FROM indata
    )
    ,
    ts AS (
      SELECT
        dt::DATE AS dt
      FROM l
      TIMESERIES dt AS '1 DAY' OVER(ORDER BY l_dt)
    )
    ,
    dist AS (
      SELECT DISTINCT
        brnch
      , bus_unit
      FROM indata
    )
    SELECT
      dt
    , brnch
    , bus_unit
    FROM dist
    CROSS JOIN ts
    ORDER BY 2,1
    ;
    -- out      dt     | brnch | bus_unit 
    -- out ------------+-------+----------
    -- out  2020-12-01 | CE01  |    15001
    -- out  2020-12-02 | CE01  |    15001
    -- out  2020-12-03 | CE01  |    15001
    -- out  2020-12-04 | CE01  |    15001
    -- out  2020-12-05 | CE01  |    15001
    -- out  2020-12-06 | CE01  |    15001
    -- out  2020-12-07 | CE01  |    15001
    -- out  2020-12-08 | CE01  |    15001
    -- out  2020-12-01 | CE03  |    15002
    -- out  2020-12-02 | CE03  |    15002
    -- out  2020-12-03 | CE03  |    15002
    -- out  2020-12-04 | CE03  |    15002
    -- out  2020-12-05 | CE03  |    15002
    -- out  2020-12-06 | CE03  |    15002
    -- out  2020-12-07 | CE03  |    15002
    -- out  2020-12-08 | CE03  |    15002
    

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file