Delete specific word using regexp_replace

shashank_pshashank_p Vertica Customer

Hi All,
I'm using below command to delete .co but it is not removing.
SELECT REGEXP_REPLACE('110 Companies .co' ,('^.co$'), '',1,0,'i')
Instead I tried SELECT REGEXP_REPLACE('110 Companies .co' ,'.co', '',1,0,'i') but ended up in wrong result.
I could not find much info about regexp_replace and it is really hard time for me.

Answers

  • DaveTDaveT Vertica Employee Employee
    edited May 2020

    Try using a backslash in your search string, as in '\.co'

  • moshegmosheg Vertica Employee Administrator
    edited May 2020
        SELECT REGEXP_REPLACE('[110 Companies .co]','\.co');
          REGEXP_REPLACE
        ------------------
         [110 Companies ]
        (1 row)
    

    And if you want to delete only the last occurrence of '.co' do:

    SELECT REGEXP_REPLACE('1st.co 2nd.co 3rd.co','\.co$');
      REGEXP_REPLACE
    -------------------
     1st.co 2nd.co 3rd
    (1 row)
    
  • marcothesanemarcothesane - Select Field - Administrator

    If you want to be - at least slightly - more performant, try:

    $ vsql -c "SELECT REPLACE('110 Companies .co' ,'.co','')"
    SET
        REPLACE     
    ----------------
     110 Companies 
    
    

    You are not loading all the REGEX environment, and "just" perform a string operation .....

  • What might be more useful here instead, is to just return the integer portion of that string - the 110. This function will do that.

    CREATE OR REPLACE FUNCTION fn_GetOnlyNumeric (Source varchar(255))
    return varchar(50)
    AS BEGIN
    -- use regex to take all the character fields in a string, leaving only numeric fields, and return that. \D says find all digits, and just return those. super simple.
    RETURN REGEXP_REPLACE(Source, '\D') ;
    END ;

  • shashank_pshashank_p Vertica Customer

    ok, it seems for one particular pattern string I can use REGEXP_REPLACE, but here I need to operate on dynamic pattern
    something like this; REGEXP_REPLACE(column1, '\'||column2 , '') but it throws error. Is there a limitation only to use pattern as string?
    Where as for I'm able to do above operation using REPLACE(column1, column2 , '') but again REPLACE has limitations like case sensitivity, replace only for exact match etc. Obviously I can use LOWER() for comparing the strings but final output case is getting changed.
    Eg.
    Get Well Soon --> Get
    SELECT REPLACE('Get Well Soon', 'Well.' , '') -- This is not replacing and I need to trim 'Well Soon' by using 'Well'
    REGEXP_REPLACE('Get Well Soon', 'Well.
    ' , '') --> Get (But I want to use a column for second argument)
    What is the best way to perform this operation where argument 1 and argument 2 are columns (dynamic values).

  • moshegmosheg Vertica Employee Administrator
    edited May 2020

    Try this:

    CREATE TABLE A (f1 varchar(50));
    
    COPY A (f1) FROM STDIN DELIMITER ',' ABORT ON ERROR;
    Get Well Soon,
    gET wELL sOON,
    GET WELL SOON,
    \.
    
    SELECT f1, REPLACE(f1,' Well Soon','') FROM A;
          f1       |    REPLACE
    ---------------+---------------
     gET wELL sOON | gET wELL sOON
     Get Well Soon | Get
     GET WELL SOON | GET WELL SOON
    (3 rows)
    
    SELECT f1, REGEXP_REPLACE(f1,' Well Soon','',1,1,'c')  -- c force the match to be case sensitive (the default)
    FROM A;
          f1       | REGEXP_REPLACE
    ---------------+----------------
     gET wELL sOON | gET wELL sOON
     Get Well Soon | Get
     GET WELL SOON | GET WELL SOON
    (3 rows)
    
    SELECT f1, REGEXP_REPLACE(f1,' Well Soon','',1,1,'i')  -- i force the match to be case insensitive
    FROM A;
          f1       | REGEXP_REPLACE
    ---------------+----------------
     gET wELL sOON | gET
     Get Well Soon | Get
     GET WELL SOON | GET
    (3 rows)
    
  • marcothesanemarcothesane - Select Field - Administrator

    You can also try this:
    You have the same string, and different parts to remove (or replace by an empty string):
    Play with different values in the initial WITH clause to find out if you get the right behaviour ....

    WITH
    input(s,to_remove) AS (
                SELECT 'get well soon','soon'
      UNION ALL SELECT 'get well soon','get'
      UNION ALL SELECT 'get well soon','well'
      UNION ALL SELECT 'GET WELL SOON','soon'
      UNION ALL SELECT 'GET WELL SOON','get'
      UNION ALL SELECT 'GET WELL SOON','well'
    )
    SELECT
      *
    , REGEXP_REPLACE(s,to_remove||'\W*','',1,1,'i')
    FROM input;
    -- out        s       | to_remove | REGEXP_REPLACE 
    -- out ---------------+-----------+----------------
    -- out  get well soon | soon      | get well 
    -- out  get well soon | get       | well soon
    -- out  get well soon | well      | get soon
    -- out  GET WELL SOON | soon      | GET WELL 
    -- out  GET WELL SOON | get       | WELL SOON
    -- out  GET WELL SOON | well      | GET SOON
    
  • shashank_pshashank_p Vertica Customer

    The solution is feasible for single pattern, but as I mentioned Requirement is for entire column which consists thousands of pattern which I need to match with one more column and remove the sub string.

  • moshegmosheg Vertica Employee Administrator
    edited May 2020

    The following example on one million rows takes 32 ms.
    Does it address your need?

    CREATE TABLE A (f1 varchar(50), f2 varchar(50)) ORDER BY f1;
    
    -- The following generates a 10 char random string
    CREATE OR REPLACE FUNCTION randomstring() RETURN VARCHAR
    AS
    BEGIN
       RETURN  CHR(RANDOMINT(26) + 65) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) ||
               CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) ||
               CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97);
    END;
    
    -- The following generates one million rows in table A
    \set DEMO_ROWS 1000000
    INSERT INTO A
    with myrows as (select randomstring() as f1
    from ( select 1 from ( select now() as se union all
    select now() + :DEMO_ROWS - 1 as se) a timeseries ts as '1 day' over (order by se)) b)
    select f1, right(f1,2) as f2
    from myrows;
     OUTPUT
    ---------
     1000000
    (1 row)
    
    \timing on
    SELECT f1, f2, REGEXP_REPLACE(f1,f2,'',1,1,'i')
    FROM A limit 10;
         f1     | f2 | REGEXP_REPLACE
    ------------+----+----------------
     Aaaavpejhr | hr | Aaaavpej
     Aaaerkrmds | ds | Aaaerkrm
     Aaamuokksi | si | Aaamuokk
     Aaaobononi | ni | Aaaobono
     Aaaqczihnt | nt | Aaaqczih
     Aaaqfryryb | yb | Aaaqfryr
     Aaaqvscxuf | uf | Aaaqvscx
     Aaargzuonb | nb | Aaargzuo
     Aaatjuyhjg | jg | Aaatjuyh
     Aaauczaiah | ah | Aaauczai
    (10 rows)
    
    Time: First fetch (10 rows): 32.226 ms. All rows formatted: 32.372 ms
    
  • shashank_pshashank_p Vertica Customer

    I think this should help, let me try and confirm you!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file