Inquiries on the LINEAR_REG function

jquanjquan Community Edition User

I am interested in using the machine learning functions for linear regression, mainly:

  • LINEAR_REG
  • PREDICT_LINEAR_REG

I have a table that has these columns:

  • time_epoch (numeric) – denoting a "x" the independent variable
  • score (numeric) - denoting a "y" the dependent variables
  • location (VARCHAR) – for each value of this variable, the values of (x,y) are the linear relationship I want to model and predict on
    Example:
dbadmin=> select * from test.my_table;
    time_epoch (x) |   location    |    score (y)
-------------------+------------------------------------+
 1591754400.000000 |       AAAA    |  0.0427034187223203
 1591758000.000000 |       AAAA    |  0.0447975381705089
 1591761600.000000 |       AAAA    |  0.0444945871539881
 1591765200.000000 |       AAAA    |  0.0368878016429624
 1591768800.000000 |       AAAA    |  0.0478755857529369

 1591754400.000000 |       BBBB    |  0.0297034187223203
 1591758000.000000 |       BBBB    |  0.0283424324324324
 1591761600.000000 |       BBBB    |  0.0274945871539881
 1591765200.000000 |       BBBB    |  0.0268878016429624
 1591768800.000000 |       BBBB    |  0.0268755857529369

 1591754400.000000 |       CCCC    |  0.0392133432432432
 1591758000.000000 |       CCCC    |  0.0372343243243234
 1591761600.000000 |       CCCC    |  0.0363143243243243
 1591765200.000000 |       CCCC    |  0.0354382943203243
 1591768800.000000 |       CCCC    |  0.0342343243243243

 1591754400.000000 |       DDDD    |  0.0213213213213213
 1591758000.000000 |       DDDD    |  0.0202132132132132
 1591761600.000000 |       DDDD    |  0.0181321321321321
 1591765200.000000 |       DDDD    |  0.0162132132132132
 1591768800.000000 |       DDDD    |  0.0123213213232321

I want to use LINEAR_REG function to predict the "score" column (i.e., "y" values) at several time_epochs (i.e., "x" values) for each location.
Note: The values of "x" for which we have data for AND for what to predict "y" for will be the same for all locations.

I have 2 questions:
Question 1
From documentation, I cannot use non-numeric values, but is there a way around it? I would like to do something like:

  • **score **column as the "response column"
  • time_epoch, and location as the "predictor columns"
    Invocation example:
SELECT LINEAR_REG('my_prediction', 'test.my_table', 'score', 'time_epoch,location' USING PARAMETERS optimizer='BFGS');

Question 2
Is there a function that can predict linear regression without having to perform two calls (LINEAR_REG and PREDICT_LINEAR_REG). Ideally, I am looking for a single function call for both.

Thanks for your help!

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file