We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Error when review kmeans result — Vertica Forum

Error when review kmeans result

select kmeans('timeout_kmeans_model_1', 'timeout_anomaly_detect_view', '*', 4
using parameters max_iterations=20,
output_view='myKmeansTimeoutView1',
exclude_columns = 'sql_interval, my_node_address, partner_carrier_no',
key_columns= 'sql_interval, my_node_address, partner_carrier_no');
Finished in 6 iterations
select * from myKmeansTimeoutView1;
Got the below error:

ERROR 5861: Error calling setup() in User Function apply_kmeans at [src/Common/GPredict.cpp:42], error code: 0, message: Error in setup: [Error in setup: Invalid column '']

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Which version of Vertica are you using?

  • Vertica Analytic Database v8.1.1-10

  • The view, myKmeansTimeoutView1, is created. I can see it from the catalog.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    Can you run the following and post the results?

    select export_objects('','myKmeansTimeoutView1');

  • Yes.
    Here is the result.
    export_objects|

    CREATE VIEW ML.myKmeansTimeoutView1 AS
    SELECT timeout_anomaly_detect_view.sql_interval,
    timeout_anomaly_detect_view.my_node_address,
    timeout_anomaly_detect_view.partner_carrier_no,
    public.apply_kmeans(timeout_anomaly_detect_view.hist_comparator, timeout_anomaly_detect_view.peer_comparator USING PARAMETERS model_name='ML.timeout_kmeans_model_1'::varchar(386)) AS cluster_id
    FROM ML.timeout_anomaly_detect_view;

    SELECT MARK_DESIGN_KSAFE(1);
    |

  • Thank you for your help.

  • edited February 2018

    @lijiang
    Your problem in USING PARAMETERS definition:

    • exclude_columns = 'sql_interval, my_node_address, partner_carrier_no'
    • key_columns= 'sql_interval, my_node_address, partner_carrier_no'

    You excluded all columns, thats why you got error: Invalid column ''. Do you understand what do kmeans and how to use it?

    @Jim_Knicely
    Will Vertica open API for "how to create db objects like view with UDF"? kmeans is an UDF Transform function and it creates VIEW, but how(its not documented)?
    (Answer: No, its for internal usage - also accepted).

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    I'm using Vertica 9. It works okay:

    dbadmin=> select kmeans('timeout_kmeans_model_1', 'timeout_anomaly_detect_view', '*', 4
    dbadmin(> using parameters max_iterations=20,
    dbadmin(> output_view='myKmeansTimeoutView1',
    dbadmin(> exclude_columns = 'sql_interval, my_node_address, partner_carrier_no',
    dbadmin(> key_columns= 'sql_interval, my_node_address, partner_carrier_no');
    WARNING 7858:  Only found 2 non-empty clusters. You may want to try again or use a better set of initial centers; or there may be fewer than k distinct datapoints in the table
              kmeans
    ---------------------------
     Finished in 1 iterations
    
    (1 row)
    
    dbadmin=> select * from myKmeansTimeoutView1;
     sql_interval | my_node_address | partner_carrier_no | cluster_id
    --------------+-----------------+--------------------+------------
     TEST         | TEST            |                  1 |          1
     TEST2        | TEST2           |                  2 |          0
     TEST2        | TEST2           |                  2 |          0
     TEST2        | TEST2           |                  2 |          0
                  |                 |                    |
     TEST2        | TEST2           |                  2 |          0
    (6 rows)
    

    I will check if there is a bug in 8.1.1.

    @lijiang Can you run the following and post the results?

    select export_objects('','timeout_anomaly_detect_view');

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited February 2018

    @sKwa - Will Vertica open API for "how to create db objects like view with UDF"?

    No, it's for internal usage ;)

  • I did not exclude all columns from the training data.
    There are 5 columns, sql_interval, my_node_address, partner_carrier_no, hist_comparator, and peer_comparator, in the data view. The three columns specified are the key columns. I'm asking KMEAN to label the data in 4 clusters based on hist_comparator and peer_comparator for each key value, which is the combination of the three columns specified in the select statement.

  • select export_objects('','timeout_anomaly_detect_view');
    returns the results below:
    export_objects|
    CREATE VIEW ML.timeout_anomaly_detect_view AS
    SELECT a.sql_interval,
    a.my_node_address,
    a.partner_carrier_no,
    round(CASE WHEN (a.timeout_msg_stddev = 0::numeric(18,0)) THEN 0::numeric(18,0) ELSE (abs((a.timeout_tdr_count - a.timeout_msg_avg)) / a.timeout_msg_stddev) END, 0) AS hist_comparator,
    (a.timeout_rate / a.timeout_rate_all) AS peer_comparator
    FROM ( SELECT kmean_training_data_hr.sql_interval,
    kmean_training_data_hr.my_node_address,
    kmean_training_data_hr.partner_carrier_no,
    kmean_training_data_hr.timeout_tdr_count,
    CASE WHEN (kmean_training_data_hr.total_tdr_count = 0) THEN 0::numeric(18,0) ELSE (kmean_training_data_hr.timeout_tdr_count / kmean_training_data_hr.total_tdr_count) END AS timeout_rate,
    kmean_training_data_hr.timeout_msg_avg,
    kmean_training_data_hr.timeout_msg_stddev,
    round((kmean_training_data_hr.total_msg_avg / sum(kmean_training_data_hr.total_msg_avg) OVER (PARTITION BY kmean_training_data_hr.sql_interval)), 2) AS traffic_percentage,
    CASE WHEN (sum(kmean_training_data_hr.total_tdr_count) OVER (PARTITION BY kmean_training_data_hr.sql_interval) = 0) THEN 0::numeric(18,0) ELSE (sum(kmean_training_data_hr.timeout_tdr_count) OVER (PARTITION BY kmean_training_data_hr.sql_interval) / sum(kmean_training_data_hr.total_tdr_count) OVER (PARTITION BY kmean_training_data_hr.sql_interval)) END AS timeout_rate_all
    FROM ML.kmean_training_data_hr) a
    WHERE ((a.traffic_percentage > 0.001) AND (a.timeout_rate_all > 0::numeric(18,0)));

    SELECT MARK_DESIGN_KSAFE(1);
    |

  • @Jim_Knicely

    So can you give a link to a source code of kmeans? o:)

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @lijiang - Sorry, one more:

    select export_objects('','ML.kmean_training_data_hr');

    I want to see the data types.

  • edited February 2018

    @lijiang

    I did not exclude all columns from the training data.
    There are 5 columns [...]

    Don't get me wrong, I just asked do you understand what you do. Question do not includes all info(probably @Jim_Knicely has access to more info).

    here is formatted output:

    CREATE VIEW ml.timeout_anomaly_detect_view
    AS
      SELECT a.sql_interval,
             a.my_node_address,
             a.partner_carrier_no,
             Round(CASE
                     WHEN ( a.timeout_msg_stddev = 0 :: Numeric(18, 0) ) THEN 0 ::
                     Numeric(18, 0)
                     ELSE ( Abs(( a.timeout_tdr_count - a.timeout_msg_avg )) /
                            a.timeout_msg_stddev )
                   END, 0)                           AS hist_comparator,
             ( a.timeout_rate / a.timeout_rate_all ) AS peer_comparator
      FROM   (SELECT kmean_training_data_hr.sql_interval,
                     kmean_training_data_hr.my_node_address,
                     kmean_training_data_hr.partner_carrier_no,
                     kmean_training_data_hr.timeout_tdr_count,
                     CASE
                       WHEN ( kmean_training_data_hr.total_tdr_count = 0 ) THEN
                       0 :: Numeric(18, 0)
                       ELSE ( kmean_training_data_hr.timeout_tdr_count /
                                     kmean_training_data_hr.total_tdr_count )
                     END
                            AS timeout_rate,
                     kmean_training_data_hr.timeout_msg_avg,
                     kmean_training_data_hr.timeout_msg_stddev,
                     Round(( kmean_training_data_hr.total_msg_avg / Sum(
                             kmean_training_data_hr.total_msg_avg)
                                     OVER (
                                       PARTITION BY
                                     kmean_training_data_hr.sql_interval) ), 2
                            ) AS
                     traffic_percentage,
                     CASE
                       WHEN ( Sum(kmean_training_data_hr.total_tdr_count)
                                OVER (
                                  PARTITION BY kmean_training_data_hr.sql_interval)
                              =
                              0 )
                            THEN 0 ::
                       Numeric(18, 0)
                       ELSE ( Sum(kmean_training_data_hr.timeout_tdr_count)
                                OVER (
                                  PARTITION BY kmean_training_data_hr.sql_interval)
                              /
                              Sum(
                                     kmean_training_data_hr.total_tdr_count)
                                       OVER (
                                         PARTITION BY
                                       kmean_training_data_hr.sql_interval) )
                     END
                            AS timeout_rate_all
              FROM   ml.kmean_training_data_hr) a
      WHERE  ( ( a.traffic_percentage > 0.001 )
    
  • select export_objects('','ML.kmean_training_data_hr');
    Returns:
    export_objects|

    CREATE TABLE ML.kmean_training_data_hr
    (
    sql_interval timestamp NOT NULL,
    my_carrier_no int NOT NULL,
    message_type varchar(20) NOT NULL,
    my_node_address varchar(32) NOT NULL,
    partner_carrier_no int NOT NULL,
    failure_tdr_count int NOT NULL,
    timeout_tdr_count int NOT NULL,
    failure_imsi_count int NOT NULL,
    timeout_imsi_count int NOT NULL,
    success_imsi_count int NOT NULL,
    total_tdr_count int NOT NULL,
    total_msg_avg int NOT NULL,
    total_msg_stddev numeric(16,6) NOT NULL,
    success_msg_avg int NOT NULL,
    success_msg_stddev numeric(16,6) NOT NULL,
    failure_msg_avg int NOT NULL,
    failure_msg_stddev numeric(16,6) NOT NULL,
    timeout_msg_avg int NOT NULL,
    timeout_msg_stddev numeric(16,6) NOT NULL,
    total_roamer_avg int NOT NULL,
    total_roamer_stddev numeric(16,6) NOT NULL,
    success_roamer_avg int NOT NULL,
    success_roamer_stddev numeric(16,6) NOT NULL,
    failure_roamer_avg int NOT NULL,
    failure_roamer_stddev numeric(16,6) NOT NULL,
    timeout_roamer_avg int NOT NULL,
    timeout_roamer_stddev numeric(16,6) NOT NULL
    )
    PARTITION BY ((kmean_training_data_hr.sql_interval)::date);

    CREATE PROJECTION ML.kmean_training_data_hr_super
    (
    sql_interval ENCODING RLE,
    my_carrier_no ENCODING RLE,
    message_type ENCODING RLE,
    my_node_address,
    partner_carrier_no,
    failure_tdr_count,
    timeout_tdr_count,
    failure_imsi_count,
    timeout_imsi_count,
    success_imsi_count,
    total_tdr_count,
    total_msg_avg,
    total_msg_stddev,
    success_msg_avg,
    success_msg_stddev,
    failure_msg_avg,
    failure_msg_stddev,
    timeout_msg_avg,
    timeout_msg_stddev,
    total_roamer_avg,
    total_roamer_stddev,
    success_roamer_avg,
    success_roamer_stddev,
    failure_roamer_avg,
    failure_roamer_stddev,
    timeout_roamer_avg,
    timeout_roamer_stddev
    )
    AS
    SELECT kmean_training_data_hr.sql_interval,
    kmean_training_data_hr.my_carrier_no,
    kmean_training_data_hr.message_type,
    kmean_training_data_hr.my_node_address,
    kmean_training_data_hr.partner_carrier_no,
    kmean_training_data_hr.failure_tdr_count,
    kmean_training_data_hr.timeout_tdr_count,
    kmean_training_data_hr.failure_imsi_count,
    kmean_training_data_hr.timeout_imsi_count,
    kmean_training_data_hr.success_imsi_count,
    kmean_training_data_hr.total_tdr_count,
    kmean_training_data_hr.total_msg_avg,
    kmean_training_data_hr.total_msg_stddev,
    kmean_training_data_hr.success_msg_avg,
    kmean_training_data_hr.success_msg_stddev,
    kmean_training_data_hr.failure_msg_avg,
    kmean_training_data_hr.failure_msg_stddev,
    kmean_training_data_hr.timeout_msg_avg,
    kmean_training_data_hr.timeout_msg_stddev,
    kmean_training_data_hr.total_roamer_avg,
    kmean_training_data_hr.total_roamer_stddev,
    kmean_training_data_hr.success_roamer_avg,
    kmean_training_data_hr.success_roamer_stddev,
    kmean_training_data_hr.failure_roamer_avg,
    kmean_training_data_hr.failure_roamer_stddev,
    kmean_training_data_hr.timeout_roamer_avg,
    kmean_training_data_hr.timeout_roamer_stddev
    FROM ML.kmean_training_data_hr
    ORDER BY kmean_training_data_hr.my_carrier_no,
    kmean_training_data_hr.message_type,
    kmean_training_data_hr.sql_interval,
    kmean_training_data_hr.my_node_address,
    kmean_training_data_hr.partner_carrier_no
    SEGMENTED BY hash(kmean_training_data_hr.sql_interval, kmean_training_data_hr.total_msg_avg) ALL NODES KSAFE 1;

    SELECT MARK_DESIGN_KSAFE(1);
    |

    Thank you for your help.

  • @Jim_Knicely

    FYI, the training data view, timeout_anomaly_detect_view, returns 402 rows. Some of the values of hist_comparator and peer_comparator are very small. For example, 0E-54. Could this be the problem?

    I'll replace the small values by zero and give it a try.

  • No luck. Still get the same error. Training data are attached.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Can you try creating the model specifying the schema name ML for the objects?

    select kmeans('ML.timeout_kmeans_model_1', 'ML.timeout_anomaly_detect_view', '*', 4 
    using parameters max_iterations=20, 
    output_view='ML.myKmeansTimeoutView1',
    exclude_columns = 'sql_interval, my_node_address, partner_carrier_no', 
    key_columns= 'sql_interval, my_node_address, partner_carrier_no');
    
    select * from ML.myKmeansTimeoutView1;
    
  • run the following and got the same error.
    select kmeans('ML.timeout_kmeans_model_3', 'ML.timeout_anomaly_detect_view', '*', 4
    using parameters max_iterations=20,
    output_view='ML.myKmeansTimeoutView4',
    exclude_columns = 'sql_interval, my_node_address, partner_carrier_no',
    key_columns= 'sql_interval, my_node_address, partner_carrier_no');

    select * from ML.myKmeansTimeoutView4;

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Ok, thanks! Let me look into this a little deeper for you.

  • Much appreciated.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @lijiang - I believe that you are hitting a bug in 8.1.1 where column aliases were not being passed to UDXs (i.e. apply_kmeans). This is fixed in Vertica 9.0.1.

    If you can't upgrade, see if this works:

    dbadmin=> SELECT set_vertica_options('OPT', 'DISABLE_FLATTEN_SUBQUERY');
                             set_vertica_options
    ----------------------------------------------------------------------
    
    Opt Vertica Options
    --------------------
    DISABLE_FLATTEN_SUBQUERY
    
    
    (1 row)
    
    dbadmin=> SELECT * FROM ML.myKmeansTimeoutView1;
            sql_interval        | my_node_address | partner_carrier_no | cluster_id
    ----------------------------+-----------------+--------------------+------------
     2018-02-07 11:39:58.281504 | T               |                  1 |          0
     2018-02-07 11:40:02.270718 | T               |                  1 |          0
     2018-02-08 11:40:51.948821 | T               |                  2 |          0
     2018-02-08 11:40:54.343808 | T               |                  2 |          0
     2018-02-10 11:41:07.615797 | T               |                  2 |          0
     2018-02-10 11:41:08.799745 | T               |                  2 |          0
    (6 rows)
    
    dbadmin=> SELECT clr_vertica_options('OPT', 'DISABLE_FLATTEN_SUBQUERY');
     clr_vertica_options
    ---------------------
    
    
    (1 row)
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited February 2018

    @lijiang - Another work around might be to create a top level view that does not have column aliases for calculated fields...

    Example:

    dbadmin=> CREATE OR REPLACE VIEW ML.timeout_anomaly_detect_view_top AS SELECT * FROM ML.timeout_anomaly_detect_view;
    CREATE VIEW
    
    dbadmin=> SELECT sql_interval, my_node_address, partner_carrier_no,
    dbadmin->        public.apply_kmeans(hist_comparator, peer_comparator USING PARAMETERS model_name='ML.timeout_kmeans_model_1'::varchar(386)) AS cluster_id
    dbadmin->   FROM ML.timeout_anomaly_detect_view_top;
            sql_interval        | my_node_address | partner_carrier_no | cluster_id
    ----------------------------+-----------------+--------------------+------------
     2018-02-07 11:39:58.281504 | T               |                  1 |          0
     2018-02-07 11:40:02.270718 | T               |                  1 |          0
     2018-02-08 11:40:51.948821 | T               |                  2 |          0
     2018-02-08 11:40:54.343808 | T               |                  2 |          0
     2018-02-10 11:41:07.615797 | T               |                  2 |          0
     2018-02-10 11:41:08.799745 | T               |                  2 |          0
    (6 rows)
    

    If the view does not help, creating a table will work for sure...

    dbadmin=> CREATE TABLE ML.timeout_anomaly_detect_view_table AS SELECT * FROM ML.timeout_anomaly_detect_view;
    CREATE TABLE
    
    dbadmin=> SELECT sql_interval, my_node_address, partner_carrier_no,
    dbadmin->        public.apply_kmeans(hist_comparator, peer_comparator USING PARAMETERS model_name='ML.timeout_kmeans_model_1'::varchar(386)) AS cluster_id
    dbadmin->   FROM ML.timeout_anomaly_detect_view_table;
            sql_interval        | my_node_address | partner_carrier_no | cluster_id
    ----------------------------+-----------------+--------------------+------------
     2018-02-07 11:39:58.281504 | T               |                  1 |          0
     2018-02-07 11:40:02.270718 | T               |                  1 |          0
     2018-02-08 11:40:51.948821 | T               |                  2 |          0
     2018-02-08 11:40:54.343808 | T               |                  2 |          0
     2018-02-10 11:41:07.615797 | T               |                  2 |          0
     2018-02-10 11:41:08.799745 | T               |                  2 |          0
    (6 rows)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file