Need to compare column value and identify outlier records which shows huge spike in data

Dear All,

Can you please help me develop query to solve below cases

Below is one sample data set

CUST_ID BATCH1 BATCH2 BATCH3 BATCH4 DATE
2573 265 265 265 265 9/18/2018
3205 261 261 261 261 9/18/2018
3211 375 375 375 375 9/18/2018
3292 207 207 207 207 9/18/2018
3755 203 8996 203 203 9/18/2018
3640 6244 216 216 216 9/18/2018

Here I need to identify all those cust_id where among (col2 ,col3 ,col4,col5 i.e Batch1 , Batch2 , Batch3 , Batch 3 , Batch 4 values has huge spike

So Here I am expecting output like below

CUST_ID BATCH1 BATCH2 BATCH3 BATCH4 DATE
3755 203 8996 203 203 9/18/2018
3640 6244 216 216 216 9/18/2018
I found DETECT_OUTLIERS function but it seems in my environment this function is not supported . In my current environment Vertica 8.1 version is there

Comments

  • Jim_KnicelyJim_Knicely Administrator, Moderator, Employee, Registered User, VerticaExpert
    edited September 25

    Hi,

    The DETECT_OUTLIERS function is available in Vertica 8.1.

    See:

    https://www.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AnalyzingData/MachineLearning/DataPreparation/DetectingOutliers.htm

    Example:

    dbadmin=> SELECT version();
                   version
    -------------------------------------
     Vertica Analytic Database v8.1.1-18
    (1 row)
    
    dbadmin=> SELECT * FROM test_nutty_values;
     cust_id | batch1 | batch2 | batch3 | batch4 |    DATE
    ---------+--------+--------+--------+--------+------------
        2573 |    265 |    265 |    265 |    265 | 2018-09-18
        3205 |    261 |    261 |    261 |    261 | 2018-09-18
        3211 |    375 |    375 |    375 |    375 | 2018-09-18
        3292 |    207 |    207 |    207 |    207 | 2018-09-18
        3755 |    203 |   8996 |    203 |    203 | 2018-09-18
        3640 |   6244 |    216 |    216 |    216 | 2018-09-18
    (6 rows)
    
    dbadmin=> SELECT DETECT_OUTLIERS('test_nutty_values_outliers', 'test_nutty_values', 'cust_id, batch1, batch2, batch3, batch4', 'robust_zscore' USING PARAMETERS outlier_threshold=4.0);
       DETECT_OUTLIERS
    ----------------------
     Detected 2 outliers
    
    (1 row)
    
    dbadmin=> SELECT * FROM test_nutty_values_outliers;
     cust_id | batch1 | batch2 | batch3 | batch4 |    DATE
    ---------+--------+--------+--------+--------+------------
        3640 |   6244 |    216 |    216 |    216 | 2018-09-18
        3755 |    203 |   8996 |    203 |    203 | 2018-09-18
    (2 rows)
    

    Maybe have the DBA reinstall the Machine Learning package if you cannot access the function.

    Example:

    [[email protected] ~]$ admintools -t install_package -d test_db -P MachineLearning --force-reinstall
    Installing package MachineLearning...
    ...Success!
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file