How to search particular string in long varchar column?
SELECT query ,user_name FROM analytics.query_tracker where REGEXP_LIKE(query, 'junk');
(Also long datatype does not support simple like and ilike)
This query is returning error-
[Code: 4551, SQL State: 22021] [Vertica]VJDBC ERROR: Regexp encountered an invalid UTF-8 character
Best Answers
-
Jim_Knicely - Select Field - Administrator
REGEXP_LIKE works fine on LONG data types like LONG VARCHAR.
Example:
dbadmin=> CREATE TABLE t (c LONG VARCHAR); CREATE TABLE dbadmin=> INSERT INTO t SELECT 'Some junk'; OUTPUT -------- 1 (1 row) dbadmin=> SELECT * FROM t WHERE REGEXP_LIKE(c, 'junk'); c ----------- Some junk (1 row)
The error you are getting is because you've stored some non UTF-8 data in the column in some record(s).
Take a look here:
https://forum.vertica.com/discussion/240991/remove-non-utf-8-characters-from-varchar-column1 -
Jim_Knicely - Select Field - Administrator
Like this:
dbadmin=> INSERT INTO t SELECT E'\xa0' || 'Test!' || E'\xa0'; OUTPUT -------- 1 (1 row) dbadmin=> SELECT * FROM t; c ----------- Some junk ▒Test!▒ (2 rows) dbadmin=> SELECT * FROM t WHERE REGEXP_LIKE(c, '[^\t\r\n\x20-\x7E]+', 'b'); c --------- ▒Test!▒ (1 row)
But, you can probably just run your original query with the B option of the REGEXP_LIKE function...
SELECT query, user_name FROM analytics.query_tracker where REGEXP_LIKE(query, 'junk', 'b');
1
Answers
But my table has 50 million records how do I identify non utf data.
@Jim_Knicely you are awesome !