Median in the moving window

Hello,

 

I recently started to use Vetrica and it was great so far! I was curious if there is a way to calculate MEDIAN value in the moving window. For example, if I want to compute 20 day average volume, it can be done as:

 

AVG(volume) OVER (PARTITION BY symbol ORDER BY date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW) as adv20

...is there a way to compute 20 day median volume instead? From documentation it looks like that it is not possible: 'MEDIAN() does not allow the window_order_clause or window_frame_clause'., but was wondering if there is any workaround.

 

Thanks much in advance,

Konstantin

Comments

  • You are correct that median() won't work. 

     

    From the documentation:

    MEDIAN() does not allow the window-order-clause or window-frame-clause.

     

    You'd have to code something manually.  Statistically speaking, I'm not sure a rolling median would differ significantly from a rolling average, especially if you're only looking at 20 rows.  But then, I'm not a statistician.

     

  • Thanks for your response. If you think from trading perspective, 1 heavy day can easily skew your average, while median would be still in line with reality. Not sure, if it would be possible for Vertica to start supporting it, as I feel that it is a really useful feature (actually from the statistical perspective).

  • Ralf_SRalf_S Community Edition User

    Hello -
    just picking up on that,
    nth_value is a good idea but ordering the values in the window frame seems not be possible.
    I guess it comes down to writing a function, or similar. There is a postgres equivalent in this post
    I tried to translate into vertica syntax but not luck so far.
    Anyone would have advice on how to implement a rolling median?
    Thanks.

  • VValdarVValdar Vertica Employee Employee
    edited November 2022

    Hi Ralf_S,

    You can do the trick involving a complementary self-join.

    This is far from perfect I agree but still working:

      select t1.symbol, t1.date, t1.volume
           , approximate_median(t2.volume)
        from MyTable as t1
        join MyTable as t2  on t2.symbol  = t1.symbol
                           and t2.date   >= t1.date - interval '20 days'
                           and t2.date   <= t1.date
    group by t1.symbol, t1.date, t1.volume
    order by 1, 2;
    
    
      select distinct
             t1.symbol, t1.date, t1.volume
           , median(t2.volume) over(partition by t1.symbol, t1.date)
        from MyTable as t1
        join MyTable as t2  on t2.symbol  = t1.symbol
                           and t2.date   >= t1.date - interval '20 days'
                           and t2.date   <= t1.date
    order by 1, 2;
    

    Beware of the volume generated by the join.

  • Ralf_SRalf_S Community Edition User

    Hello VValdar,

    Thanks for the response - that does work for me.
    Initially, I thought solutions like self-join would blow up the data/computation too much, but it works fine with the data I have.
    Thank you!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file