IsNumeric function(UDF)

Hi!

Question: I want to know if a string value is numeric only. How can I do this?

Its very easy with Vertica to test if a string value is a number:

daniel=> select to_number('0x0abc');
 to_number 
-----------
      2748
(1 row)

but a problem, that it throws exception on non-numeric values:

daniel=> select to_number('1 egg');
ERROR 2826:  Could not convert "1 egg" to a float8

and thats why you can't use it with CASE clause.

So I created UDF Scalar function - is_numeric, that do it:

daniel=> select str, DECODE(is_numeric(str), 't', 'TRUE', 'f', 'FALSE') from nums ;
    str    | case
-----------+-------
 0X0DEAD   | TRUE
 0x0abc    | TRUE
 1 foo bar | FALSE
 1P10      | FALSE    <----- fail
 1e10      | TRUE
 1p10      | FALSE    <----- fail
 40        | TRUE
 45.0      | TRUE
 999.999   | TRUE
 INFINITY  | TRUE
 Infinity  | TRUE
 NAN       | TRUE
 bar       | FALSE
 egg 4     | FALSE
 iNf       | TRUE
 nAn       | TRUE
(16 rows)

FYA: As you can see, my function fails on 1 case - 1P10(1024 in decimal representation).

Tested on Vertica 7,8,9.
Source code here.

Comments

  • You can use a nifty type of cast (::!) for this, which instead of erroring, returns null if the cast fails. (it also works for non-numeric types)
    select str, str::!int is not null from nums;
    (it will error on scalars so don't test "select 'foo'::!int;" and conclude I'm smoking something)

  • Hi!

    Oops... :D thanks.
    So this is an unnecessary function => delete :p

  • You are definitely not the first person to have written this function! (I saw it in our customer diagnostics just last week) And the doc on ::! could be much more notable.
    Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.
    I sound like a broken record, but we have seen a 7x perf drop due to allocation in the inner loop of a UDx. I believe this was due to a glibc performance bug, but once bitten twice (7x?) shy.

  • edited February 2018

    It also fails on p notation :D:D:D
    ::!float works like a charm

    daniel=> select value as str, value::!int as 'int', value::!float as 'float', value::!numeric(15,8) as 'numeric' from T;
        str     | int |    float    |    numeric 
    ------------+-----+-------------+----------------
     0X0DEAD    |     |       57005 | 57005.00000000
     0x0abc     |     |        2748 |  2748.00000000
     1 foo bar  |     |             |           
     1P10       |     |             |  1024.00000000
     1e10       |     | 10000000000 |           
     1p10       |     |             |  1024.00000000
     40         |  40 |          40 |    40.00000000
     45.0       |     |          45 |    45.00000000
     999.999    |     |     999.999 |   999.99900000
     INFINITY   |     |    Infinity |           
     Infinity   |     |    Infinity |           
     NAN        |     |         NaN |           
     bar        |     |             |           
     egg 4      |     |             |           
     iNf        |     |    Infinity |           
     nAn        |     |         NaN |           
    (16 rows)
    

    Why this notation exists? :D

  • That's a pretty mixed bag, actually. I'm surprised that it fails to parse hex as an int.

  • Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.

    hm... I tried const char* from BlockReader and it doesn't work as expected, it returns whole column and not a single value.

                do {
                    const char* value = argReader.getStringRef(0).data();
                    char* pointer;
                    strtod(value, &pointer);
                    vbool isNum = (*pointer == 0 || pointer == value);
                    resWriter.setBool(isNum);
                    resWriter.next();
                    srvInterface.log("VALUE: %s",value);
                } while (argReader.next());
    

    On single value(on scalar) it works.

    Also LIMIT clause applied after and not before, i.e. UDF still gets and processes all values from column. Why?

  • The string is not stored null-terminated. Hence strtod is reading additional bytes off the end. There doesn't appear to be an easy c method for number conversion with a length (https://stackoverflow.com/questions/16412840/strtod-with-limited-string-length)
    When we run a UDx, we send it a batch of rows to reduce the overhead of control & data flow to/from the UDx. This can mean the UDx runs on 1000 records despite the LIMIT right above.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @sKwa - Thanks for making the code available for your function. It'll definitely help out the community with their own code writing endeavors!

  • Actually its unnecessary function :'(.
    As @Ben_Vandiver pointed I used it with vertica7.0 to cast a column.
    But important lessons learned from my conversation with @Ben_Vandiver.


    Say me if you need some functionality that I can implement on C++/Java/Python. I like to develop. BTW: check out my ListAgg function, its analytical function and not transform as GROUP_CONCAT(I think my implementation is little better than GROUP_CONCAT from point of view of performance )

  • edited February 2018

    [DELETED]

  • Use this:
    REGEXP_LIKE(score, '^[0-9]\d*(\".\d+)"?$')
    result is a boolean is_numeric function

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file