Hi, i am trying to calculate a rolling twelve month average for BMI.
Right now, all i have is a column containing the BMI and a Date_Column. Looks like bellow screenshot.
Service Date BMI Person
1/1/2020 16 x
2/1/2020 18 y
3/1/2020 41 z
4/1/2020 42 z
5/1/2020 23 y
6/1/2020 32 z
7/1/2020 31 z
8/1/2020 30 y
9/1/2020 15 y
10/1/2020 11 y
11/1/2020 22 y
12/1/2020 25 z
What i need to do is grab the most recent value for all people. you see that person x only has one value during 1/1/20. In my new data set, what should show across the board for a one year span should be 16. If x does not receive another bmi by 1/1/2021 they will be dropped form the data set. Until that date x should should show a bmi of 16 for all months from 1/1/2020 - 1/1/2021. So if a person recieved a BMI test up to two years ago to date then that will show for for that person bmi value otherwise the person is dropped. So it should be a rolling twelve month data set grabbing up to 2 years ago to date of the person BMI.
Answers
`
create table bmi (service_date date, bmi int, person char(1) ) ;
insert into bmi values ('1/1/2020',16,'x') ;
insert into bmi values ('2/1/2020',18,'y') ;
insert into bmi values ('3/1/2020',41,'z') ;
insert into bmi values ('4/1/2020',42,'z') ;
insert into bmi values ('5/1/2020',23,'y') ;
insert into bmi values ('6/1/2020',32,'z') ;
insert into bmi values ('7/1/2020',31,'z') ;
insert into bmi values ('8/1/2020',30,'y') ;
insert into bmi values ('9/1/2020',15,'y') ;
insert into bmi values ('10/1/2020',11,'y') ;
insert into bmi values ('11/1/2020',22,'y') ;
insert into bmi values ('12/1/2020',25,'z') ;
commit ;
select person, service_date, bmi, avg(bmi) over (partition by person order by service_date range between '2 years' preceding and current row) as rolling_Bmi
from bmi
order by person, service_date ;
`
Your other requirements will need to be solved separately. So, if I understand, you're wanting to see person 'x' represent a '16' every month until they no longer have a value available. Is that right? To do that, you'll need to cartesian the results of this to some sort of calendar table. SQL isn't good at just making up values, so you need to join it somewhere with a cross join in order to create those records. You can them use a time series interpolation (use constant interpolation) to fill in the missing values with the bmi scores.
So basically i have a table already existing in vertica that contains my persons and their BMI values, and the date it was calculated all in separate columns. What i need to do is create a table that is pulling from my original table that is housing those column values in another table view. So it needs to be a rolling twelve months that grabs the most recent bmi value available from that other data set. So if x patient had a bmi of 19 in July of 2020 and no bmi value since then, it would show that patients of having a bmi in July and August. If a patient had a bmi of 22 during Aug of 2019 and has not had one since then, it would still show as 22 this month, but when this month is over, the patient would have a null value for September because they have not had a BMI done over a year. I would like the table to look similar to the below. But for an entire year.
1month_ago_bmi 2month_ago_bmi 3month_ago_bmi 4month_ago_bmi 5month_ago_bmi 6month_ago_bmi
33.48897959 33.48897959 33.48897959 33.48897959 33.48897959 32.99918367
null null null null null null
18.43535343 18.43535343 18.43535343 18.10870088 18.10870088 17.53870228
26.22874135 26.22874135 26.22874135 26.22874135 26.22874135 26.22874135
31.98144174 31.98144174 31.98144174 31.98144174 31.98144174 31.98144174
28.25 28.25 28.25 28.25 28.25 28.25
22.09562632 22.09562632 22.09562632 22.09562632 22.09562632 22.09562632
20.3147449 20.3147449 20.3147449 20.3147449 20.3147449 20.3147449
39.72433206 39.72433206 39.72433206 39.72433206 39.72433206 39.72433206
15.89175615 15.89175615 15.89175615 15.89175615 15.89175615 15.89175615
At the moment i am doing this for twelve different data sets to grab the rolling twelve months and increasing that >=1 value up to >=12 and the <=13 to <=24 for each data set.
Where
DATEDIFF(mm, service_date, current_date()) >=1 AND (DATEDIFF(mm, service_date, current_date()) <=13)
@Vertica_Curtis ^ sorry this is the first time i have used this questions so i did not no that i could @ you. Thank you for your help in advance!
That's a time-series problem.
`create table dim_months (report_month date) ;
insert into dim_months values('01/01/2019') ;
insert into dim_months values('02/01/2019') ;
insert into dim_months values('03/01/2019') ;
insert into dim_months values('04/01/2019') ;
insert into dim_months values('05/01/2019') ;
insert into dim_months values('06/01/2019') ;
insert into dim_months values('07/01/2019') ;
insert into dim_months values('08/01/2019') ;
insert into dim_months values('09/01/2019') ;
insert into dim_months values('10/01/2019') ;
insert into dim_months values('11/01/2019') ;
insert into dim_months values('12/01/2019') ;
insert into dim_months values('01/01/2020') ;
insert into dim_months values('02/01/2020') ;
insert into dim_months values('03/01/2020') ;
insert into dim_months values('04/01/2020') ;
insert into dim_months values('05/01/2020') ;
insert into dim_months values('06/01/2020') ;
insert into dim_months values('07/01/2020') ;
insert into dim_months values('08/01/2020') ;
insert into dim_months values('09/01/2020') ;
insert into dim_months values('10/01/2020') ;
insert into dim_months values('11/01/2020') ;
insert into dim_months values('12/01/2020') ;
commit ;
select person, slice_time::date, ts_first_value (bmi ignore nulls, 'CONST') as bmi
from (
select distinct person, report_month, case when report_month = bmi.service_date then bmi else null end as bmi
from dim_months m
full outer join bmi on 1=1
where m.report_month <= (Select max(service_date) from bmi)
and m.report_month >= (select min(Service_date) from bmi)
order by person, report_month ) a
timeseries slice_time as '1 month' over (partition by person order by report_month::timestamp) ;
`
A very challenging problem!
Timeseries is useful here, since I can use it for the constant interpolation, but it also seems to want to move your months around to some ambiguous day of the month. That's a weird side-effect of the ambiguity of the windowing of time series, I think. So, this is close, but it's not exact.
Unfortunately, someone else might have to chime in here if they can find a viable solution, as I have other things I need to address. But hopefully this gets you close.
A slightly whacky way of doing this.
Force a TIMESERIES clause, but using hours instead of months, then translating back to months.
But letting TIMESERIES functions do the job ...