We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Remove non UTF-8 characters from varchar column — Vertica Forum

Remove non UTF-8 characters from varchar column

Does anyone know how to remove non UTF-8 characters from string? Could not find any relevant info on web/forum.

I need to remove (replace) question mark in a diamond. Found out that ascii 56480 corresponds to E'/xa0' value and was able to remove it; but would prefer to have one code to remove all instances of these non printable characters.
Any ideas?

Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2019

    Hi,
    You should remove non UTF-8 characters using an ETL tool or process. But, if the data is already in Vertica, you can use a regular expression to remove all non UTF-8 characters...

    Example:

    dbadmin=> \d nonutf_test;
                                           List of Fields by Tables
     Schema |    Table    | Column |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------------+--------+--------------+------+---------+----------+-------------+-------------
     public | nonutf_test | c      | varchar(100) |  100 |         | f        | f           |
    (1 row)
    
    dbadmin=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> SELECT c, REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b') FROM nonutf_test ;
        c    | REGEXP_REPLACE
    ---------+----------------
     ▒Test!▒ | Test!
    (1 row)
    
    dbadmin=> SELECT * FROM nonutf_test;
        c
    ---------
     ▒Test!▒
    (1 row)
    
    dbadmin=> UPDATE /*+ DIRECT */ nonutf_test SET c = REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b');
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> SELECT * FROM nonutf_test;
       c
    -------
     Test!
    (1 row)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file