IsNumeric function(UDF)
Hi!
Question: I want to know if a string value is numeric only. How can I do this?
Its very easy with Vertica to test if a string value is a number:
daniel=> select to_number('0x0abc');
to_number
-----------
2748
(1 row)
but a problem, that it throws exception on non-numeric values:
daniel=> select to_number('1 egg');
ERROR 2826: Could not convert "1 egg" to a float8
and thats why you can't use it with CASE clause.
So I created UDF Scalar function - is_numeric, that do it:
daniel=> select str, DECODE(is_numeric(str), 't', 'TRUE', 'f', 'FALSE') from nums ;
str | case
-----------+-------
0X0DEAD | TRUE
0x0abc | TRUE
1 foo bar | FALSE
1P10 | FALSE <----- fail
1e10 | TRUE
1p10 | FALSE <----- fail
40 | TRUE
45.0 | TRUE
999.999 | TRUE
INFINITY | TRUE
Infinity | TRUE
NAN | TRUE
bar | FALSE
egg 4 | FALSE
iNf | TRUE
nAn | TRUE
(16 rows)
FYA: As you can see, my function fails on 1 case - 1P10(1024 in decimal representation).
Tested on Vertica 7,8,9.
Source code here.
0
Comments
You can use a nifty type of cast (::!) for this, which instead of erroring, returns null if the cast fails. (it also works for non-numeric types)
select str, str::!int is not null from nums;
(it will error on scalars so don't test "select 'foo'::!int;" and conclude I'm smoking something)
Hi!
Oops...
thanks.
So this is an unnecessary function => delete
You are definitely not the first person to have written this function! (I saw it in our customer diagnostics just last week) And the doc on ::! could be much more notable.
Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.
I sound like a broken record, but we have seen a 7x perf drop due to allocation in the inner loop of a UDx. I believe this was due to a glibc performance bug, but once bitten twice (7x?) shy.
It also fails on


pnotation::!floatworks like a charmdaniel=> select value as str, value::!int as 'int', value::!float as 'float', value::!numeric(15,8) as 'numeric' from T; str | int | float | numeric ------------+-----+-------------+---------------- 0X0DEAD | | 57005 | 57005.00000000 0x0abc | | 2748 | 2748.00000000 1 foo bar | | | 1P10 | | | 1024.00000000 1e10 | | 10000000000 | 1p10 | | | 1024.00000000 40 | 40 | 40 | 40.00000000 45.0 | | 45 | 45.00000000 999.999 | | 999.999 | 999.99900000 INFINITY | | Infinity | Infinity | | Infinity | NAN | | NaN | bar | | | egg 4 | | | iNf | | Infinity | nAn | | NaN | (16 rows)Why this notation exists?
That's a pretty mixed bag, actually. I'm surprised that it fails to parse hex as an int.
hm... I tried
const char*fromBlockReaderand it doesn't work as expected, it returns whole column and not a single value.do { const char* value = argReader.getStringRef(0).data(); char* pointer; strtod(value, &pointer); vbool isNum = (*pointer == 0 || pointer == value); resWriter.setBool(isNum); resWriter.next(); srvInterface.log("VALUE: %s",value); } while (argReader.next());On single value(on scalar) it works.
Also
LIMITclause applied after and not before, i.e. UDF still gets and processes all values from column. Why?The string is not stored null-terminated. Hence strtod is reading additional bytes off the end. There doesn't appear to be an easy c method for number conversion with a length (https://stackoverflow.com/questions/16412840/strtod-with-limited-string-length)
When we run a UDx, we send it a batch of rows to reduce the overhead of control & data flow to/from the UDx. This can mean the UDx runs on 1000 records despite the LIMIT right above.
@sKwa - Thanks for making the code available for your function. It'll definitely help out the community with their own code writing endeavors!
Actually its unnecessary function :'(.
As @Ben_Vandiver pointed I used it with vertica7.0 to cast a column.
But important lessons learned from my conversation with @Ben_Vandiver.
Say me if you need some functionality that I can implement on C++/Java/Python. I like to develop. BTW: check out my
ListAggfunction, its analytical function and not transform asGROUP_CONCAT(I think my implementation is little better thanGROUP_CONCATfrom point of view of performance )[DELETED]
Use this:
REGEXP_LIKE(score, '^[0-9]\d*(\".\d+)"?$')
result is a boolean is_numeric function