We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


IsNumeric function(UDF) — Vertica Forum

IsNumeric function(UDF)

Hi!

Question: I want to know if a string value is numeric only. How can I do this?

Its very easy with Vertica to test if a string value is a number:

daniel=> select to_number('0x0abc');
 to_number 
-----------
      2748
(1 row)

but a problem, that it throws exception on non-numeric values:

daniel=> select to_number('1 egg');
ERROR 2826:  Could not convert "1 egg" to a float8

and thats why you can't use it with CASE clause.

So I created UDF Scalar function - is_numeric, that do it:

daniel=> select str, DECODE(is_numeric(str), 't', 'TRUE', 'f', 'FALSE') from nums ;
    str    | case
-----------+-------
 0X0DEAD   | TRUE
 0x0abc    | TRUE
 1 foo bar | FALSE
 1P10      | FALSE    <----- fail
 1e10      | TRUE
 1p10      | FALSE    <----- fail
 40        | TRUE
 45.0      | TRUE
 999.999   | TRUE
 INFINITY  | TRUE
 Infinity  | TRUE
 NAN       | TRUE
 bar       | FALSE
 egg 4     | FALSE
 iNf       | TRUE
 nAn       | TRUE
(16 rows)

FYA: As you can see, my function fails on 1 case - 1P10(1024 in decimal representation).

Tested on Vertica 7,8,9.
Source code here.

Comments

  • You can use a nifty type of cast (::!) for this, which instead of erroring, returns null if the cast fails. (it also works for non-numeric types)
    select str, str::!int is not null from nums;
    (it will error on scalars so don't test "select 'foo'::!int;" and conclude I'm smoking something)

  • Hi!

    Oops... :D thanks.
    So this is an unnecessary function => delete :p

  • You are definitely not the first person to have written this function! (I saw it in our customer diagnostics just last week) And the doc on ::! could be much more notable.
    Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.
    I sound like a broken record, but we have seen a 7x perf drop due to allocation in the inner loop of a UDx. I believe this was due to a glibc performance bug, but once bitten twice (7x?) shy.

  • edited February 2018

    It also fails on p notation :D:D:D
    ::!float works like a charm

    daniel=> select value as str, value::!int as 'int', value::!float as 'float', value::!numeric(15,8) as 'numeric' from T;
        str     | int |    float    |    numeric 
    ------------+-----+-------------+----------------
     0X0DEAD    |     |       57005 | 57005.00000000
     0x0abc     |     |        2748 |  2748.00000000
     1 foo bar  |     |             |           
     1P10       |     |             |  1024.00000000
     1e10       |     | 10000000000 |           
     1p10       |     |             |  1024.00000000
     40         |  40 |          40 |    40.00000000
     45.0       |     |          45 |    45.00000000
     999.999    |     |     999.999 |   999.99900000
     INFINITY   |     |    Infinity |           
     Infinity   |     |    Infinity |           
     NAN        |     |         NaN |           
     bar        |     |             |           
     egg 4      |     |             |           
     iNf        |     |    Infinity |           
     nAn        |     |         NaN |           
    (16 rows)
    

    Why this notation exists? :D

  • That's a pretty mixed bag, actually. I'm surprised that it fails to parse hex as an int.

  • Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.

    hm... I tried const char* from BlockReader and it doesn't work as expected, it returns whole column and not a single value.

                do {
                    const char* value = argReader.getStringRef(0).data();
                    char* pointer;
                    strtod(value, &pointer);
                    vbool isNum = (*pointer == 0 || pointer == value);
                    resWriter.setBool(isNum);
                    resWriter.next();
                    srvInterface.log("VALUE: %s",value);
                } while (argReader.next());
    

    On single value(on scalar) it works.

    Also LIMIT clause applied after and not before, i.e. UDF still gets and processes all values from column. Why?

  • The string is not stored null-terminated. Hence strtod is reading additional bytes off the end. There doesn't appear to be an easy c method for number conversion with a length (https://stackoverflow.com/questions/16412840/strtod-with-limited-string-length)
    When we run a UDx, we send it a batch of rows to reduce the overhead of control & data flow to/from the UDx. This can mean the UDx runs on 1000 records despite the LIMIT right above.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @sKwa - Thanks for making the code available for your function. It'll definitely help out the community with their own code writing endeavors!

  • Actually its unnecessary function :'(.
    As @Ben_Vandiver pointed I used it with vertica7.0 to cast a column.
    But important lessons learned from my conversation with @Ben_Vandiver.


    Say me if you need some functionality that I can implement on C++/Java/Python. I like to develop. BTW: check out my ListAgg function, its analytical function and not transform as GROUP_CONCAT(I think my implementation is little better than GROUP_CONCAT from point of view of performance )

  • edited February 2018

    [DELETED]

  • Use this:
    REGEXP_LIKE(score, '^[0-9]\d*(\".\d+)"?$')
    result is a boolean is_numeric function

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file