Vertica ML GET_MODEL_SUMMARY got error
I am new to Vertica ML model, I have try LOGISTIC_REG model, i can create the model. but when I do
select GET_MODEL_SUMMARY(USING PARAMETERS model_name='iot_test');
got the error said:
ODBC Database Error: ERROR 3457: Function GET_MODEL_SUMMARY() does not exist, or permission is denied for GET_MODEL_SUMMARY()
Best Answers
-
Vertica_Curtis Employee
The V8.1.1 version of RF_Classifier didn't support NUMERIC types in the independent variable list. That's supported in 9.2. I don't know which version that changed in - clearly somewhere between 8.1.x and 9.2.
In your version, best option is to just to multiply it by 100 and convert it to an int. Numeric(2,2) only supports values from 0 to .99. Are these probability columns? You should be fine just expressing them as integers in your model.
5 -
Vertica_Curtis Employee
In some models, you can use things like p-value (which you get from the model summary) to show you how much relevance an independent variable has on a model. You could try running a logistic regression, instead of a random forest, and evaluate the logistic regression's P-value scores from the model summary. Alternatively, version 8 supports CORR() which shows column pair correlations. You could look to see how columns in your set of independent variables correlate to your dependent variable. Otherwise, I'd recommend upgrading to 9.2.
5 -
Vertica_Curtis Employee
I don't think v8 is going to have what you need. Random Forest was added in v8, so they added in the support functions over the next several releases. What you have is sort of the beginning of the addition of RF into the product. I would say it's probably not fully supported with the addition of those ancillary functions until 9.x.
As to your other question, I'd say that's more of a data science problem than a Vertica one. p-values and z-scores, which you can get from the model summaries of several classification functions, can be used to ascertain the usefulness of an independent variable. If you have a lot of features, you can reduce the feature set while still retaining the numerical significance of the feature set with tools like PCA and SVD, which Vertica does support start in v9. But judging by your set of data, it doesn't seem like you have a lot of independent variables. So those might not be necessary. But I'm wondering if these forums aren't exactly the best place to answer all your questions, as it seems there are larger things at play here. We'd be happy to work with you guys to help you develop a solution to help solve your ML use-case. We've done it for a number of clients who are actively trying to use the predictive capabilities in Vertica to help them do that. If that sounds appealing, I'd suggest you reach up through your management chain to get us involved directly. I don't actively support your account, but I'm happy to help if that's necessary.
5 -
Vertica_Curtis Employee
Great! Don't hesitate to reach out again if you run into questions. Glad to help.
1
Answers
what is your vertica version?
V8
v8.1.1. thanks for quick response
GET_MODEL_SUMMARY was introduced in 9.0. Hence what you are seeing is expected as vertica 8.x doesn't contain that function.
thanks!
SELECT apn,is_iot, predict_logistic_reg(bytesup,bytesdown
USING PARAMETERS model_name='iot_test_ds'
)
AS Prediction
FROM tmp_iot_train;
got the error:
ERROR 3457: Function predict_logistic_reg( int, int) does not exist, or permission is denied for predict_logistic_reg(int, int)
any thought on this?
Are apn, is_iot columns from tmp_iot_train table? Can you share me the output of following
SELECT SUMMARIZE_MODEL('iot_test_ds');
Hi, Sruthia: I sent the result to you via email last Friday, if you did not receive it please let me know
thanks
I found the issue: need to put "public" before procedure. it works now.. thanks all your great support!
Hi,
now I tried to first balance data then normalize data, all these steps work fine.
but when I tried to create the model and I got error: "Unsupported column type [Numeric(2,2)] for column [bytesup_home]"
what is data type need for this model?
Thanks
SELECT BALANCE('balance_iot_balance_data', 'tmp_iot_wa_cls_train_data', 'is_iot', 'under_sampling'
USING PARAMETERS sampling_ratio = 0.05);
SELECT NORMALIZE('normalized_iot_balance_data', 'balance_iot_balance_data', 'bytesup_home,bytesup_work,bytesdown_home,bytesdown_work,dur_home,dur_work', 'minmax');
-----------format column
create or replace view normalized_iot_balance_data_v as
select imei,firstcgi,apn,cfrc
,bytesup_home::NUMERIC(2, 2) bytesup_home
,bytesup_work::NUMERIC(2, 2) bytesup_work
,bytesdown_home::NUMERIC(2, 2) bytesdown_home
,bytesdown_work::NUMERIC(2, 2) bytesdown_work
,dur_home::NUMERIC(2, 2) dur_home
,dur_work::NUMERIC(2, 2) dur_work
,record_cnt
,cft
,bearer_type
,is_iot
,part
from normalized_iot_balance_data;
SELECT public.RF_CLASSIFIER(
'iot_rf_test_ds',
'sandbox.normalized_iot_balance_data_v',
'is_iot',
'cfrc,bytesup_home,bytesup_work,bytesdown_home,bytesdown_work,dur_home,dur_work,record_cnt,cft,bearer_type'
USING PARAMETERS ntree=50, sampling_size=0.2
);
DBC Database Error: ERROR 8135: Problem in rf_classifier.
Detail: Cannot compute input column list.
Detail: Unsupported column type [Numeric(2,2)] for column [bytesup_home]
What you are seeing is expect. We don't support NUMERIC type for RF_CLASSIFER in 8.1.1. We support it in 9.2. please check predictor_columns Argument Description in the below links
https://www.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/MachineLearning/RANDOM_FOREST_CLASSIFIER.htm
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/MachineLearning/RF_CLASSIFIER.htm
great, Sruthia.. thanks for your lighting response....
Thanks!
In V8, any function can be used for evaluating feature importance? like RF_PREDICTOR_IMPORTANCE in v9
In V9, has any function to identify similar variables and Removing redundant features so we can increase accuracy?
totally agree with you! thanks... based on my current study, RF already gave me pretty accurate predication and I just want to explore more possible feature in V8. we will upgrade to V9 soon, we will test all these new features.
thanks again, all your feedback were very helpful!!
actually this is a first time I used Vertica support forum, you guys did awesome job! was much and much better than oracle support, lol
thanks again and have a good weekend....