We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Using RSQUARED function over multiple data sets — Vertica Forum

Using RSQUARED function over multiple data sets

I have the need to use the rsquared(x,y) analytic function over many data sets in order to determine the those that have a good correlation, however it seems that you can only run RSQUARED(x,y) for one data set at a time to get one row of data (one r2 value).

As the OVER() clause can't have a partition value in it, it seems fairly limiting that you can't use this over lots for sets of targets and predictions

Does anyone know of a way around this at all?

The sort of thing I would have hoped to have done would be:

select data_set_id,
RSQUARED ( target_values, predition_values ) OVER(data_set_id)
from target t
join predition p on p.set_ref = c.set_ref
group by data_set_id

https://www.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/MachineLearning/RSQUARED.htm

Regards

Answers

  • If anyone is interested. I decided to sidestep the RSQUARED function and just write it in SQL manually to offer more flexibility.

    1-(SUM(POWER(target_value - predition_value,2)))/(SUM(POWER(target_value - [AVG_VALUE] ,2))) as rsquared

    [AVG_VALUE] = "select avg(target_value) from target_value "

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file