IsNumeric function(UDF)

sKwasKwa Registered User

Hi!

Question: I want to know if a string value is numeric only. How can I do this?

Its very easy with Vertica to test if a string value is a number:

daniel=> select to_number('0x0abc');
 to_number 
-----------
      2748
(1 row)

but a problem, that it throws exception on non-numeric values:

daniel=> select to_number('1 egg');
ERROR 2826:  Could not convert "1 egg" to a float8

and thats why you can't use it with CASE clause.

So I created UDF Scalar function - is_numeric, that do it:

daniel=> select str, DECODE(is_numeric(str), 't', 'TRUE', 'f', 'FALSE') from nums ;
    str    | case
-----------+-------
 0X0DEAD   | TRUE
 0x0abc    | TRUE
 1 foo bar | FALSE
 1P10      | FALSE    <----- fail
 1e10      | TRUE
 1p10      | FALSE    <----- fail
 40        | TRUE
 45.0      | TRUE
 999.999   | TRUE
 INFINITY  | TRUE
 Infinity  | TRUE
 NAN       | TRUE
 bar       | FALSE
 egg 4     | FALSE
 iNf       | TRUE
 nAn       | TRUE
(16 rows)

FYA: As you can see, my function fails on 1 case - 1P10(1024 in decimal representation).

Tested on Vertica 7,8,9.
Source code here.

Comments

  • Ben_VandiverBen_Vandiver Employee, Registered User, VerticaExpert

    You can use a nifty type of cast (::!) for this, which instead of erroring, returns null if the cast fails. (it also works for non-numeric types)
    select str, str::!int is not null from nums;
    (it will error on scalars so don't test "select 'foo'::!int;" and conclude I'm smoking something)

  • sKwasKwa Registered User

    Hi!

    Oops... :D thanks.
    So this is an unnecessary function => delete :p

  • Ben_VandiverBen_Vandiver Employee, Registered User, VerticaExpert

    You are definitely not the first person to have written this function! (I saw it in our customer diagnostics just last week) And the doc on ::! could be much more notable.
    Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.
    I sound like a broken record, but we have seen a 7x perf drop due to allocation in the inner loop of a UDx. I believe this was due to a glibc performance bug, but once bitten twice (7x?) shy.

  • sKwasKwa Registered User
    edited February 7

    It also fails on p notation :D :D :D
    ::!float works like a charm

    daniel=> select value as str, value::!int as 'int', value::!float as 'float', value::!numeric(15,8) as 'numeric' from T;
        str     | int |    float    |    numeric 
    ------------+-----+-------------+----------------
     0X0DEAD    |     |       57005 | 57005.00000000
     0x0abc     |     |        2748 |  2748.00000000
     1 foo bar  |     |             |           
     1P10       |     |             |  1024.00000000
     1e10       |     | 10000000000 |           
     1p10       |     |             |  1024.00000000
     40         |  40 |          40 |    40.00000000
     45.0       |     |          45 |    45.00000000
     999.999    |     |     999.999 |   999.99900000
     INFINITY   |     |    Infinity |           
     Infinity   |     |    Infinity |           
     NAN        |     |         NaN |           
     bar        |     |             |           
     egg 4      |     |             |           
     iNf        |     |    Infinity |           
     nAn        |     |         NaN |           
    (16 rows)
    

    Why this notation exists? :D

  • Ben_VandiverBen_Vandiver Employee, Registered User, VerticaExpert

    That's a pretty mixed bag, actually. I'm surprised that it fails to parse hex as an int.

  • sKwasKwa Registered User

    Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.

    hm... I tried const char* from BlockReader and it doesn't work as expected, it returns whole column and not a single value.

                do {
                    const char* value = argReader.getStringRef(0).data();
                    char* pointer;
                    strtod(value, &pointer);
                    vbool isNum = (*pointer == 0 || pointer == value);
                    resWriter.setBool(isNum);
                    resWriter.next();
                    srvInterface.log("VALUE: %s",value);
                } while (argReader.next());
    

    On single value(on scalar) it works.

    Also LIMIT clause applied after and not before, i.e. UDF still gets and processes all values from column. Why?

  • Ben_VandiverBen_Vandiver Employee, Registered User, VerticaExpert

    The string is not stored null-terminated. Hence strtod is reading additional bytes off the end. There doesn't appear to be an easy c method for number conversion with a length (https://stackoverflow.com/questions/16412840/strtod-with-limited-string-length)
    When we run a UDx, we send it a batch of rows to reduce the overhead of control & data flow to/from the UDx. This can mean the UDx runs on 1000 records despite the LIMIT right above.

  • Jim_KnicelyJim_Knicely Employee, Registered User, VerticaExpert

    @sKwa - Thanks for making the code available for your function. It'll definitely help out the community with their own code writing endeavors!

  • sKwasKwa Registered User

    Actually its unnecessary function :'(.
    As @Ben_Vandiver pointed I used it with vertica7.0 to cast a column.
    But important lessons learned from my conversation with @Ben_Vandiver.


    Say me if you need some functionality that I can implement on C++/Java/Python. I like to develop. BTW: check out my ListAgg function, its analytical function and not transform as GROUP_CONCAT(I think my implementation is little better than GROUP_CONCAT from point of view of performance )

  • sKwasKwa Registered User
    edited February 9

    [DELETED]

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file