Grouping with moving_average or ARIMA

Trying to use Vertica ML to do some forecasting on some infrastructure metrics. In the documentation for moving_average and ARIMA, there is a somewhat simple example of using this with a table consisting of "time" and "value", the example is supposed to be for temperatures in Melbourne, like

    time         | Temperature

1981-01-01 00:00:00 | 20.7
1981-01-02 00:00:00 | 17.9

Tried this with sample data and it works fine. But in my environment, my data contains data from several different entities that I would like to forecast individually. For easy comparison with the example, we could say a table that contained temperatures for different cities.

time, city, temperature

How would I use moving_average or ARIMA to forecast on this, but separate this based on "city"?

The input-relation field, does not seem to support any grouping.


  • Options
    VValdarVValdar Vertica Employee Employee
    edited November 2023

    Hi lrng,

    Unfortunately, you will need to loop through all your distinct cities values and build a model for each.

    Something like this - untested and you'll probably need version 23.4 to train a model inside a stored procedure, but that's the logic:

    create or replace procedure sp_train_model ()
    language PLvSQL
    security definer
        v_city   varchar;
        v_cnt    integer not null :=0;
        for v_city in query select distinct city from MyTable order by 1
            perform create or replace view v_MyTable_City as select * from MyTable where city = v_city;
            execute 'select arima(''arima_MyTable_' || v_city || ''', ''v_MyTable_City'', ''temperature'', ''time'' using parameters p=2, q=2)';
            v_cnt := v_cnt+1;
        end loop;
        raise info '% models has been trained', v_cnt;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file