Options

Using RSQUARED function over multiple data sets

I have the need to use the rsquared(x,y) analytic function over many data sets in order to determine the those that have a good correlation, however it seems that you can only run RSQUARED(x,y) for one data set at a time to get one row of data (one r2 value).

As the OVER() clause can't have a partition value in it, it seems fairly limiting that you can't use this over lots for sets of targets and predictions

Does anyone know of a way around this at all?

The sort of thing I would have hoped to have done would be:

select data_set_id,
RSQUARED ( target_values, predition_values ) OVER(data_set_id)
from target t
join predition p on p.set_ref = c.set_ref
group by data_set_id

https://www.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/MachineLearning/RSQUARED.htm

Regards

Answers

  • Options

    If anyone is interested. I decided to sidestep the RSQUARED function and just write it in SQL manually to offer more flexibility.

    1-(SUM(POWER(target_value - predition_value,2)))/(SUM(POWER(target_value - [AVG_VALUE] ,2))) as rsquared

    [AVG_VALUE] = "select avg(target_value) from target_value "

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file