Hi, i am trying to calculate a rolling twelve month average for BMI.

estoneestone Vertica Customer

Right now, all i have is a column containing the BMI and a Date_Column. Looks like bellow screenshot.
Service Date BMI Person
1/1/2020 16 x
2/1/2020 18 y
3/1/2020 41 z
4/1/2020 42 z
5/1/2020 23 y
6/1/2020 32 z
7/1/2020 31 z
8/1/2020 30 y
9/1/2020 15 y
10/1/2020 11 y
11/1/2020 22 y
12/1/2020 25 z
What i need to do is grab the most recent value for all people. you see that person x only has one value during 1/1/20. In my new data set, what should show across the board for a one year span should be 16. If x does not receive another bmi by 1/1/2021 they will be dropped form the data set. Until that date x should should show a bmi of 16 for all months from 1/1/2020 - 1/1/2021. So if a person recieved a BMI test up to two years ago to date then that will show for for that person bmi value otherwise the person is dropped. So it should be a rolling twelve month data set grabbing up to 2 years ago to date of the person BMI.

Answers

  • Vertica_CurtisVertica_Curtis Employee
    edited August 2020

    `
    create table bmi (service_date date, bmi int, person char(1) ) ;

    insert into bmi values ('1/1/2020',16,'x') ;
    insert into bmi values ('2/1/2020',18,'y') ;
    insert into bmi values ('3/1/2020',41,'z') ;
    insert into bmi values ('4/1/2020',42,'z') ;
    insert into bmi values ('5/1/2020',23,'y') ;
    insert into bmi values ('6/1/2020',32,'z') ;
    insert into bmi values ('7/1/2020',31,'z') ;
    insert into bmi values ('8/1/2020',30,'y') ;
    insert into bmi values ('9/1/2020',15,'y') ;
    insert into bmi values ('10/1/2020',11,'y') ;
    insert into bmi values ('11/1/2020',22,'y') ;
    insert into bmi values ('12/1/2020',25,'z') ;
    commit ;

    select person, service_date, bmi, avg(bmi) over (partition by person order by service_date range between '2 years' preceding and current row) as rolling_Bmi
    from bmi
    order by person, service_date ;
    `
    Your other requirements will need to be solved separately. So, if I understand, you're wanting to see person 'x' represent a '16' every month until they no longer have a value available. Is that right? To do that, you'll need to cartesian the results of this to some sort of calendar table. SQL isn't good at just making up values, so you need to join it somewhere with a cross join in order to create those records. You can them use a time series interpolation (use constant interpolation) to fill in the missing values with the bmi scores.

  • estoneestone Vertica Customer

    So basically i have a table already existing in vertica that contains my persons and their BMI values, and the date it was calculated all in separate columns. What i need to do is create a table that is pulling from my original table that is housing those column values in another table view. So it needs to be a rolling twelve months that grabs the most recent bmi value available from that other data set. So if x patient had a bmi of 19 in July of 2020 and no bmi value since then, it would show that patients of having a bmi in July and August. If a patient had a bmi of 22 during Aug of 2019 and has not had one since then, it would still show as 22 this month, but when this month is over, the patient would have a null value for September because they have not had a BMI done over a year. I would like the table to look similar to the below. But for an entire year.
    1month_ago_bmi 2month_ago_bmi 3month_ago_bmi 4month_ago_bmi 5month_ago_bmi 6month_ago_bmi
    33.48897959 33.48897959 33.48897959 33.48897959 33.48897959 32.99918367
    null null null null null null
    18.43535343 18.43535343 18.43535343 18.10870088 18.10870088 17.53870228
    26.22874135 26.22874135 26.22874135 26.22874135 26.22874135 26.22874135
    31.98144174 31.98144174 31.98144174 31.98144174 31.98144174 31.98144174
    28.25 28.25 28.25 28.25 28.25 28.25
    22.09562632 22.09562632 22.09562632 22.09562632 22.09562632 22.09562632
    20.3147449 20.3147449 20.3147449 20.3147449 20.3147449 20.3147449
    39.72433206 39.72433206 39.72433206 39.72433206 39.72433206 39.72433206
    15.89175615 15.89175615 15.89175615 15.89175615 15.89175615 15.89175615

  • estoneestone Vertica Customer

    At the moment i am doing this for twelve different data sets to grab the rolling twelve months and increasing that >=1 value up to >=12 and the <=13 to <=24 for each data set.
    Where
    DATEDIFF(mm, service_date, current_date()) >=1 AND (DATEDIFF(mm, service_date, current_date()) <=13)

  • estoneestone Vertica Customer

    @Vertica_Curtis ^ sorry this is the first time i have used this questions so i did not no that i could @ you. Thank you for your help in advance!

  • That's a time-series problem.
    `create table dim_months (report_month date) ;

    insert into dim_months values('01/01/2019') ;
    insert into dim_months values('02/01/2019') ;
    insert into dim_months values('03/01/2019') ;
    insert into dim_months values('04/01/2019') ;
    insert into dim_months values('05/01/2019') ;
    insert into dim_months values('06/01/2019') ;
    insert into dim_months values('07/01/2019') ;
    insert into dim_months values('08/01/2019') ;
    insert into dim_months values('09/01/2019') ;
    insert into dim_months values('10/01/2019') ;
    insert into dim_months values('11/01/2019') ;
    insert into dim_months values('12/01/2019') ;
    insert into dim_months values('01/01/2020') ;
    insert into dim_months values('02/01/2020') ;
    insert into dim_months values('03/01/2020') ;
    insert into dim_months values('04/01/2020') ;
    insert into dim_months values('05/01/2020') ;
    insert into dim_months values('06/01/2020') ;
    insert into dim_months values('07/01/2020') ;
    insert into dim_months values('08/01/2020') ;
    insert into dim_months values('09/01/2020') ;
    insert into dim_months values('10/01/2020') ;
    insert into dim_months values('11/01/2020') ;
    insert into dim_months values('12/01/2020') ;
    commit ;

    select person, slice_time::date, ts_first_value (bmi ignore nulls, 'CONST') as bmi
    from (
    select distinct person, report_month, case when report_month = bmi.service_date then bmi else null end as bmi
    from dim_months m
    full outer join bmi on 1=1
    where m.report_month <= (Select max(service_date) from bmi)
    and m.report_month >= (select min(Service_date) from bmi)
    order by person, report_month ) a
    timeseries slice_time as '1 month' over (partition by person order by report_month::timestamp) ;

    `
    A very challenging problem!
    Timeseries is useful here, since I can use it for the constant interpolation, but it also seems to want to move your months around to some ambiguous day of the month. That's a weird side-effect of the ambiguity of the windowing of time series, I think. So, this is close, but it's not exact.
    Unfortunately, someone else might have to chime in here if they can find a viable solution, as I have other things I need to address. But hopefully this gets you close.

  • marcothesanemarcothesane - Select Field - Administrator

    A slightly whacky way of doing this.
    Force a TIMESERIES clause, but using hours instead of months, then translating back to months.
    But letting TIMESERIES functions do the job ...

    WITH
    -- your input ...
    bmi(srvc_dt,bmi,pers) AS (
              SELECT DATE '2020-01-01',16,'x'
    UNION ALL SELECT DATE '2020-02-01',18,'y'
    UNION ALL SELECT DATE '2020-03-01',41,'z'
    UNION ALL SELECT DATE '2020-04-01',42,'z'
    UNION ALL SELECT DATE '2020-05-01',23,'y'
    UNION ALL SELECT DATE '2020-06-01',32,'z'
    UNION ALL SELECT DATE '2020-07-01',31,'z'
    UNION ALL SELECT DATE '2020-08-01',30,'y'
    UNION ALL SELECT DATE '2020-09-01',15,'y'
    UNION ALL SELECT DATE '2020-10-01',11,'y'
    UNION ALL SELECT DATE '2020-11-01',22,'y'
    UNION ALL SELECT DATE '2020-12-01',25,'z'
    )
    ,
    padding AS (
      SELECT
        DATE '2021-01-01' AS srvc_dt
      , bmi
      , pers
      FROM bmi
      LIMIT 1 OVER(PARTITION BY pers ORDER BY srvc_dt DESC)
    )
    ,
    padded AS (
      SELECT
        *
      , TIMESTAMPADD(
          hh
        , TIMESTAMPDIFF(mm,TIMESTAMP '2020-01-01',srvc_dt)-1
        , TIMESTAMP '2020-01-01'
        ) AS h2m
      FROM bmi
      UNION ALL SELECT
        *
        , TIMESTAMPADD(
          hh
        , TIMESTAMPDIFF(mm,TIMESTAMP '2020-01-01',srvc_dt)-1
        , TIMESTAMP '2020-01-01'
        ) AS h2m
      FROM padding
    )
    SELECT  
      TIMESTAMPADD(mm,HOUR(hh), TIMESTAMP '2020-01-01')::DATE AS srvc_dt
    , TS_FIRST_VALUE(bmi ,'CONST') AS bmi
    , pers
    FROM padded
    TIMESERIES hh AS '1 HOUR' OVER(PARTITION BY pers ORDER BY h2m);
    -- out   srvc_dt   | bmi | pers 
    -- out ------------+-----+------
    -- out  2021-12-01 |  16 | x
    -- out  2020-01-01 |  16 | x
    -- out  2020-02-01 |  16 | x
    -- out  2020-03-01 |  16 | x
    -- out  2020-04-01 |  16 | x
    -- out  2020-05-01 |  16 | x
    -- out  2020-06-01 |  16 | x
    -- out  2020-07-01 |  16 | x
    -- out  2020-08-01 |  16 | x
    -- out  2020-09-01 |  16 | x
    -- out  2020-10-01 |  16 | x
    -- out  2020-11-01 |  16 | x
    -- out  2020-12-01 |  16 | x
    -- out  2020-01-01 |  18 | y
    -- out  2020-02-01 |  18 | y
    -- out  2020-03-01 |  18 | y
    -- out  2020-04-01 |  23 | y
    -- out  2020-05-01 |  23 | y
    -- out  2020-06-01 |  23 | y
    -- out  2020-07-01 |  30 | y
    -- out  2020-08-01 |  15 | y
    -- out  2020-09-01 |  11 | y
    -- out  2020-10-01 |  22 | y
    -- out  2020-11-01 |  22 | y
    -- out  2020-12-01 |  22 | y
    -- out  2020-02-01 |  41 | z
    -- out  2020-03-01 |  42 | z
    -- out  2020-04-01 |  42 | z
    -- out  2020-05-01 |  32 | z
    -- out  2020-06-01 |  31 | z
    -- out  2020-07-01 |  31 | z
    -- out  2020-08-01 |  31 | z
    -- out  2020-09-01 |  31 | z
    -- out  2020-10-01 |  31 | z
    -- out  2020-11-01 |  25 | z
    -- out  2020-12-01 |  25 | z
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file