Error when review kmeans result
select kmeans('timeout_kmeans_model_1', 'timeout_anomaly_detect_view', '*', 4
using parameters max_iterations=20,
output_view='myKmeansTimeoutView1',
exclude_columns = 'sql_interval, my_node_address, partner_carrier_no',
key_columns= 'sql_interval, my_node_address, partner_carrier_no');
Finished in 6 iterations
select * from myKmeansTimeoutView1;
Got the below error:
ERROR 5861: Error calling setup() in User Function apply_kmeans at [src/Common/GPredict.cpp:42], error code: 0, message: Error in setup: [Error in setup: Invalid column '']
0
Comments
Which version of Vertica are you using?
Vertica Analytic Database v8.1.1-10
The view, myKmeansTimeoutView1, is created. I can see it from the catalog.
Hi,
Can you run the following and post the results?
select export_objects('','myKmeansTimeoutView1');
Yes.
Here is the result.
export_objects|
CREATE VIEW ML.myKmeansTimeoutView1 AS
SELECT timeout_anomaly_detect_view.sql_interval,
timeout_anomaly_detect_view.my_node_address,
timeout_anomaly_detect_view.partner_carrier_no,
public.apply_kmeans(timeout_anomaly_detect_view.hist_comparator, timeout_anomaly_detect_view.peer_comparator USING PARAMETERS model_name='ML.timeout_kmeans_model_1'::varchar(386)) AS cluster_id
FROM ML.timeout_anomaly_detect_view;
SELECT MARK_DESIGN_KSAFE(1);
|
Thank you for your help.
@lijiang
Your problem in
USING PARAMETERS
definition:exclude_columns = 'sql_interval, my_node_address, partner_carrier_no'
key_columns= 'sql_interval, my_node_address, partner_carrier_no'
You excluded all columns, thats why you got error:
Invalid column ''
. Do you understand what dokmeans
and how to use it?@Jim_Knicely
Will Vertica open API for "how to create db objects like view with UDF"?
kmeans
is an UDF Transform function and it createsVIEW
, but how(its not documented)?(Answer: No, its for internal usage - also accepted).
Hi,
I'm using Vertica 9. It works okay:
I will check if there is a bug in 8.1.1.
@lijiang Can you run the following and post the results?
select export_objects('','timeout_anomaly_detect_view');
@sKwa - Will Vertica open API for "how to create db objects like view with UDF"?
No, it's for internal usage
I did not exclude all columns from the training data.
There are 5 columns, sql_interval, my_node_address, partner_carrier_no, hist_comparator, and peer_comparator, in the data view. The three columns specified are the key columns. I'm asking KMEAN to label the data in 4 clusters based on hist_comparator and peer_comparator for each key value, which is the combination of the three columns specified in the select statement.
select export_objects('','timeout_anomaly_detect_view');
returns the results below:
export_objects|
CREATE VIEW ML.timeout_anomaly_detect_view AS
SELECT a.sql_interval,
a.my_node_address,
a.partner_carrier_no,
round(CASE WHEN (a.timeout_msg_stddev = 0::numeric(18,0)) THEN 0::numeric(18,0) ELSE (abs((a.timeout_tdr_count - a.timeout_msg_avg)) / a.timeout_msg_stddev) END, 0) AS hist_comparator,
(a.timeout_rate / a.timeout_rate_all) AS peer_comparator
FROM ( SELECT kmean_training_data_hr.sql_interval,
kmean_training_data_hr.my_node_address,
kmean_training_data_hr.partner_carrier_no,
kmean_training_data_hr.timeout_tdr_count,
CASE WHEN (kmean_training_data_hr.total_tdr_count = 0) THEN 0::numeric(18,0) ELSE (kmean_training_data_hr.timeout_tdr_count / kmean_training_data_hr.total_tdr_count) END AS timeout_rate,
kmean_training_data_hr.timeout_msg_avg,
kmean_training_data_hr.timeout_msg_stddev,
round((kmean_training_data_hr.total_msg_avg / sum(kmean_training_data_hr.total_msg_avg) OVER (PARTITION BY kmean_training_data_hr.sql_interval)), 2) AS traffic_percentage,
CASE WHEN (sum(kmean_training_data_hr.total_tdr_count) OVER (PARTITION BY kmean_training_data_hr.sql_interval) = 0) THEN 0::numeric(18,0) ELSE (sum(kmean_training_data_hr.timeout_tdr_count) OVER (PARTITION BY kmean_training_data_hr.sql_interval) / sum(kmean_training_data_hr.total_tdr_count) OVER (PARTITION BY kmean_training_data_hr.sql_interval)) END AS timeout_rate_all
FROM ML.kmean_training_data_hr) a
WHERE ((a.traffic_percentage > 0.001) AND (a.timeout_rate_all > 0::numeric(18,0)));
SELECT MARK_DESIGN_KSAFE(1);
|
@Jim_Knicely
So can you give a link to a source code of
kmeans
?@lijiang - Sorry, one more:
select export_objects('','ML.kmean_training_data_hr');
I want to see the data types.
@lijiang
Don't get me wrong, I just asked do you understand what you do. Question do not includes all info(probably @Jim_Knicely has access to more info).
here is formatted output:
select export_objects('','ML.kmean_training_data_hr');
Returns:
export_objects|
CREATE TABLE ML.kmean_training_data_hr
(
sql_interval timestamp NOT NULL,
my_carrier_no int NOT NULL,
message_type varchar(20) NOT NULL,
my_node_address varchar(32) NOT NULL,
partner_carrier_no int NOT NULL,
failure_tdr_count int NOT NULL,
timeout_tdr_count int NOT NULL,
failure_imsi_count int NOT NULL,
timeout_imsi_count int NOT NULL,
success_imsi_count int NOT NULL,
total_tdr_count int NOT NULL,
total_msg_avg int NOT NULL,
total_msg_stddev numeric(16,6) NOT NULL,
success_msg_avg int NOT NULL,
success_msg_stddev numeric(16,6) NOT NULL,
failure_msg_avg int NOT NULL,
failure_msg_stddev numeric(16,6) NOT NULL,
timeout_msg_avg int NOT NULL,
timeout_msg_stddev numeric(16,6) NOT NULL,
total_roamer_avg int NOT NULL,
total_roamer_stddev numeric(16,6) NOT NULL,
success_roamer_avg int NOT NULL,
success_roamer_stddev numeric(16,6) NOT NULL,
failure_roamer_avg int NOT NULL,
failure_roamer_stddev numeric(16,6) NOT NULL,
timeout_roamer_avg int NOT NULL,
timeout_roamer_stddev numeric(16,6) NOT NULL
)
PARTITION BY ((kmean_training_data_hr.sql_interval)::date);
CREATE PROJECTION ML.kmean_training_data_hr_super
(
sql_interval ENCODING RLE,
my_carrier_no ENCODING RLE,
message_type ENCODING RLE,
my_node_address,
partner_carrier_no,
failure_tdr_count,
timeout_tdr_count,
failure_imsi_count,
timeout_imsi_count,
success_imsi_count,
total_tdr_count,
total_msg_avg,
total_msg_stddev,
success_msg_avg,
success_msg_stddev,
failure_msg_avg,
failure_msg_stddev,
timeout_msg_avg,
timeout_msg_stddev,
total_roamer_avg,
total_roamer_stddev,
success_roamer_avg,
success_roamer_stddev,
failure_roamer_avg,
failure_roamer_stddev,
timeout_roamer_avg,
timeout_roamer_stddev
)
AS
SELECT kmean_training_data_hr.sql_interval,
kmean_training_data_hr.my_carrier_no,
kmean_training_data_hr.message_type,
kmean_training_data_hr.my_node_address,
kmean_training_data_hr.partner_carrier_no,
kmean_training_data_hr.failure_tdr_count,
kmean_training_data_hr.timeout_tdr_count,
kmean_training_data_hr.failure_imsi_count,
kmean_training_data_hr.timeout_imsi_count,
kmean_training_data_hr.success_imsi_count,
kmean_training_data_hr.total_tdr_count,
kmean_training_data_hr.total_msg_avg,
kmean_training_data_hr.total_msg_stddev,
kmean_training_data_hr.success_msg_avg,
kmean_training_data_hr.success_msg_stddev,
kmean_training_data_hr.failure_msg_avg,
kmean_training_data_hr.failure_msg_stddev,
kmean_training_data_hr.timeout_msg_avg,
kmean_training_data_hr.timeout_msg_stddev,
kmean_training_data_hr.total_roamer_avg,
kmean_training_data_hr.total_roamer_stddev,
kmean_training_data_hr.success_roamer_avg,
kmean_training_data_hr.success_roamer_stddev,
kmean_training_data_hr.failure_roamer_avg,
kmean_training_data_hr.failure_roamer_stddev,
kmean_training_data_hr.timeout_roamer_avg,
kmean_training_data_hr.timeout_roamer_stddev
FROM ML.kmean_training_data_hr
ORDER BY kmean_training_data_hr.my_carrier_no,
kmean_training_data_hr.message_type,
kmean_training_data_hr.sql_interval,
kmean_training_data_hr.my_node_address,
kmean_training_data_hr.partner_carrier_no
SEGMENTED BY hash(kmean_training_data_hr.sql_interval, kmean_training_data_hr.total_msg_avg) ALL NODES KSAFE 1;
SELECT MARK_DESIGN_KSAFE(1);
|
Thank you for your help.
@Jim_Knicely
FYI, the training data view, timeout_anomaly_detect_view, returns 402 rows. Some of the values of hist_comparator and peer_comparator are very small. For example, 0E-54. Could this be the problem?
I'll replace the small values by zero and give it a try.
No luck. Still get the same error. Training data are attached.
Can you try creating the model specifying the schema name ML for the objects?
run the following and got the same error.
select kmeans('ML.timeout_kmeans_model_3', 'ML.timeout_anomaly_detect_view', '*', 4
using parameters max_iterations=20,
output_view='ML.myKmeansTimeoutView4',
exclude_columns = 'sql_interval, my_node_address, partner_carrier_no',
key_columns= 'sql_interval, my_node_address, partner_carrier_no');
select * from ML.myKmeansTimeoutView4;
Ok, thanks! Let me look into this a little deeper for you.
Much appreciated.
@lijiang - I believe that you are hitting a bug in 8.1.1 where column aliases were not being passed to UDXs (i.e. apply_kmeans). This is fixed in Vertica 9.0.1.
If you can't upgrade, see if this works:
@lijiang - Another work around might be to create a top level view that does not have column aliases for calculated fields...
Example:
If the view does not help, creating a table will work for sure...