REGR_SLOPE PRECISION
REGR_SLOPE function returns the slope of the regression line, determined by a set of expression pairs. The return value is of type DOUBLE PRECISION. It is possible that you will get different results in different clusters with various types of cloud deployments such as cloud, on premises and so on… let us review it with an example.
We have a table named cpu in with the following records in 2 different clusters.
dbadmin=> select * from cpu;
cpu_util | timestamp_utc
----------+---------------
1.86 | 1539251662
1.83 | 1539251662
1.8 | 1539251700
2.06 | 1539251700
1.2 | 1539252100
0.91 | 1539252102
0.63 | 1539252200
1.35 | 1539252400
1.13 | 1539252700
0.97 | 1539252800
1.53 | 1539252800
1.6 | 1539252800
1.02 | 1539252900
1.11 | 1539252900
1.28 | 1539253100
0.87 | 1539253500
0.96 | 1539253500
1.03 | 1539336600
1.13 | 1539336900
1.31 | 1539337200
(20 rows)
dbadmin=>
Results in cluster 1(3 node cluster): It is consistent for every run and it resulted same output as follows.
dbadmin=> select regr_slope (cpu_util,timestamp_utc/86400) AS cpu_util_slope from cpu;
cpu_util_slope
-0.158649414072066
(1 row)
Results in cluster 2(2 node cluster): Results varied for every run. Please find the sample outputs below.
dbadmin=> select regr_slope (cpu_util,timestamp_utc/86400) AS cpu_util_slope from cpu;
cpu_util_slope
-0.158649476255395
(1 row)
dbadmin=> select regr_slope (cpu_util,timestamp_utc/86400) AS cpu_util_slope from cpu;
cpu_util_slope
-0.158649414024238
(1 row)
dbadmin=> select regr_slope (cpu_util,timestamp_utc/86400) AS cpu_util_slope from cpu;
cpu_util_slope
-0.158649476255395
(1 row)
Why is it so?
64-bit float is accurate to 15-16 digits. Summations are going to lose a digit or two from the most significant digit in any of the summed items. Multiplications are also present in regr_slope, which basically cuts the number of digits in half. So you'd expect about 7 digits of precision here, and we have same result till 7 digit precision. Hence all the results are accurate.