Using RSQUARED function over multiple data sets
I have the need to use the rsquared(x,y) analytic function over many data sets in order to determine the those that have a good correlation, however it seems that you can only run RSQUARED(x,y) for one data set at a time to get one row of data (one r2 value).
As the OVER() clause can't have a partition value in it, it seems fairly limiting that you can't use this over lots for sets of targets and predictions
Does anyone know of a way around this at all?
The sort of thing I would have hoped to have done would be:
select data_set_id,
RSQUARED ( target_values, predition_values ) OVER(data_set_id)
from target t
join predition p on p.set_ref = c.set_ref
group by data_set_id
Regards
0
Answers
If anyone is interested. I decided to sidestep the RSQUARED function and just write it in SQL manually to offer more flexibility.
1-(SUM(POWER(target_value - predition_value,2)))/(SUM(POWER(target_value - [AVG_VALUE] ,2))) as rsquared
[AVG_VALUE] = "select avg(target_value) from target_value "