IsNumeric function(UDF)
Hi!
Question: I want to know if a string value is numeric only. How can I do this?
Its very easy with Vertica to test if a string value is a number:
daniel=> select to_number('0x0abc'); to_number ----------- 2748 (1 row)
but a problem, that it throws exception on non-numeric values:
daniel=> select to_number('1 egg'); ERROR 2826: Could not convert "1 egg" to a float8
and thats why you can't use it with CASE
clause.
So I created UDF Scalar function - is_numeric, that do it:
daniel=> select str, DECODE(is_numeric(str), 't', 'TRUE', 'f', 'FALSE') from nums ; str | case -----------+------- 0X0DEAD | TRUE 0x0abc | TRUE 1 foo bar | FALSE 1P10 | FALSE <----- fail 1e10 | TRUE 1p10 | FALSE <----- fail 40 | TRUE 45.0 | TRUE 999.999 | TRUE INFINITY | TRUE Infinity | TRUE NAN | TRUE bar | FALSE egg 4 | FALSE iNf | TRUE nAn | TRUE (16 rows)
FYA: As you can see, my function fails on 1 case - 1P10
(1024 in decimal representation).
Tested on Vertica 7,8,9.
Source code here.
0
Comments
You can use a nifty type of cast (::!) for this, which instead of erroring, returns null if the cast fails. (it also works for non-numeric types)
select str, str::!int is not null from nums;
(it will error on scalars so don't test "select 'foo'::!int;" and conclude I'm smoking something)
Hi!
Oops... thanks.
So this is an unnecessary function => delete
You are definitely not the first person to have written this function! (I saw it in our customer diagnostics just last week) And the doc on ::! could be much more notable.
Also, I'd recommend against the std::string in the while loop - it likely does memory allocation. I think you can get a bar const char * from the BlockReader.
I sound like a broken record, but we have seen a 7x perf drop due to allocation in the inner loop of a UDx. I believe this was due to a glibc performance bug, but once bitten twice (7x?) shy.
It also fails on
p
notation::!float
works like a charmWhy this notation exists?
That's a pretty mixed bag, actually. I'm surprised that it fails to parse hex as an int.
hm... I tried
const char*
fromBlockReader
and it doesn't work as expected, it returns whole column and not a single value.On single value(on scalar) it works.
Also
LIMIT
clause applied after and not before, i.e. UDF still gets and processes all values from column. Why?The string is not stored null-terminated. Hence strtod is reading additional bytes off the end. There doesn't appear to be an easy c method for number conversion with a length (https://stackoverflow.com/questions/16412840/strtod-with-limited-string-length)
When we run a UDx, we send it a batch of rows to reduce the overhead of control & data flow to/from the UDx. This can mean the UDx runs on 1000 records despite the LIMIT right above.
@sKwa - Thanks for making the code available for your function. It'll definitely help out the community with their own code writing endeavors!
Actually its unnecessary function .
As @Ben_Vandiver pointed I used it with vertica7.0 to cast a column.
But important lessons learned from my conversation with @Ben_Vandiver.
Say me if you need some functionality that I can implement on C++/Java/Python. I like to develop. BTW: check out my
ListAgg
function, its analytical function and not transform asGROUP_CONCAT
(I think my implementation is little better thanGROUP_CONCAT
from point of view of performance )[DELETED]
Use this:
REGEXP_LIKE(score, '^[0-9]\d*(\".\d+)"?$')
result is a boolean is_numeric function