Delete specific word using regexp_replace

shashank_p · May 2020

Hi All,
I'm using below command to delete .co but it is not removing.
SELECT REGEXP_REPLACE('110 Companies .co' ,('^.co$'), '',1,0,'i')
Instead I tried SELECT REGEXP_REPLACE('110 Companies .co' ,'.co', '',1,0,'i') but ended up in wrong result.
I could not find much info about regexp_replace and it is really hard time for me.

DaveT · May 2020

Try using a backslash in your search string, as in '\.co'

mosheg · May 2020

    SELECT REGEXP_REPLACE('[110 Companies .co]','\.co');
      REGEXP_REPLACE
    ------------------
     [110 Companies ]
    (1 row)

And if you want to delete only the last occurrence of '.co' do:

SELECT REGEXP_REPLACE('1st.co 2nd.co 3rd.co','\.co$');
  REGEXP_REPLACE
-------------------
 1st.co 2nd.co 3rd
(1 row)

marcothesane · May 2020

If you want to be - at least slightly - more performant, try:

$ vsql -c "SELECT REPLACE('110 Companies .co' ,'.co','')"
SET
    REPLACE     
----------------
 110 Companies

You are not loading all the REGEX environment, and "just" perform a string operation .....

Vertica_Curtis · May 2020

What might be more useful here instead, is to just return the integer portion of that string - the 110. This function will do that.

CREATE OR REPLACE FUNCTION fn_GetOnlyNumeric (Source varchar(255))
return varchar(50)
AS BEGIN
-- use regex to take all the character fields in a string, leaving only numeric fields, and return that. \D says find all digits, and just return those. super simple.
RETURN REGEXP_REPLACE(Source, '\D') ;
END ;

shashank_p · May 2020

ok, it seems for one particular pattern string I can use REGEXP_REPLACE, but here I need to operate on dynamic pattern
something like this; REGEXP_REPLACE(column1, '\'||column2 , '') but it throws error. Is there a limitation only to use pattern as string?
Where as for I'm able to do above operation using REPLACE(column1, column2 , '') but again REPLACE has limitations like case sensitivity, replace only for exact match etc. Obviously I can use LOWER() for comparing the strings but final output case is getting changed.
Eg.
Get Well Soon --> Get
SELECT REPLACE('Get Well Soon', 'Well.' , '') -- This is not replacing and I need to trim 'Well Soon' by using 'Well'
REGEXP_REPLACE('Get Well Soon', 'Well.' , '') --> Get (But I want to use a column for second argument)
What is the best way to perform this operation where argument 1 and argument 2 are columns (dynamic values).

mosheg · May 2020

Try this:

CREATE TABLE A (f1 varchar(50));

COPY A (f1) FROM STDIN DELIMITER ',' ABORT ON ERROR;
Get Well Soon,
gET wELL sOON,
GET WELL SOON,
\.

SELECT f1, REPLACE(f1,' Well Soon','') FROM A;
      f1       |    REPLACE
---------------+---------------
 gET wELL sOON | gET wELL sOON
 Get Well Soon | Get
 GET WELL SOON | GET WELL SOON
(3 rows)

SELECT f1, REGEXP_REPLACE(f1,' Well Soon','',1,1,'c')  -- c force the match to be case sensitive (the default)
FROM A;
      f1       | REGEXP_REPLACE
---------------+----------------
 gET wELL sOON | gET wELL sOON
 Get Well Soon | Get
 GET WELL SOON | GET WELL SOON
(3 rows)

SELECT f1, REGEXP_REPLACE(f1,' Well Soon','',1,1,'i')  -- i force the match to be case insensitive
FROM A;
      f1       | REGEXP_REPLACE
---------------+----------------
 gET wELL sOON | gET
 Get Well Soon | Get
 GET WELL SOON | GET
(3 rows)

marcothesane · May 2020

You can also try this:
You have the same string, and different parts to remove (or replace by an empty string):
Play with different values in the initial WITH clause to find out if you get the right behaviour ....

WITH
input(s,to_remove) AS (
            SELECT 'get well soon','soon'
  UNION ALL SELECT 'get well soon','get'
  UNION ALL SELECT 'get well soon','well'
  UNION ALL SELECT 'GET WELL SOON','soon'
  UNION ALL SELECT 'GET WELL SOON','get'
  UNION ALL SELECT 'GET WELL SOON','well'
)
SELECT
  *
, REGEXP_REPLACE(s,to_remove||'\W*','',1,1,'i')
FROM input;
-- out        s       | to_remove | REGEXP_REPLACE 
-- out ---------------+-----------+----------------
-- out  get well soon | soon      | get well 
-- out  get well soon | get       | well soon
-- out  get well soon | well      | get soon
-- out  GET WELL SOON | soon      | GET WELL 
-- out  GET WELL SOON | get       | WELL SOON
-- out  GET WELL SOON | well      | GET SOON

shashank_p · May 2020

The solution is feasible for single pattern, but as I mentioned Requirement is for entire column which consists thousands of pattern which I need to match with one more column and remove the sub string.

mosheg · May 2020

The following example on one million rows takes 32 ms.
Does it address your need?

CREATE TABLE A (f1 varchar(50), f2 varchar(50)) ORDER BY f1;

-- The following generates a 10 char random string
CREATE OR REPLACE FUNCTION randomstring() RETURN VARCHAR
AS
BEGIN
   RETURN  CHR(RANDOMINT(26) + 65) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) ||
           CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) ||
           CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97) || CHR(RANDOMINT(26) + 97);
END;

-- The following generates one million rows in table A
\set DEMO_ROWS 1000000
INSERT INTO A
with myrows as (select randomstring() as f1
from ( select 1 from ( select now() as se union all
select now() + :DEMO_ROWS - 1 as se) a timeseries ts as '1 day' over (order by se)) b)
select f1, right(f1,2) as f2
from myrows;
 OUTPUT
---------
 1000000
(1 row)

\timing on
SELECT f1, f2, REGEXP_REPLACE(f1,f2,'',1,1,'i')
FROM A limit 10;
     f1     | f2 | REGEXP_REPLACE
------------+----+----------------
 Aaaavpejhr | hr | Aaaavpej
 Aaaerkrmds | ds | Aaaerkrm
 Aaamuokksi | si | Aaamuokk
 Aaaobononi | ni | Aaaobono
 Aaaqczihnt | nt | Aaaqczih
 Aaaqfryryb | yb | Aaaqfryr
 Aaaqvscxuf | uf | Aaaqvscx
 Aaargzuonb | nb | Aaargzuo
 Aaatjuyhjg | jg | Aaatjuyh
 Aaauczaiah | ah | Aaauczai
(10 rows)

Time: First fetch (10 rows): 32.226 ms. All rows formatted: 32.372 ms

shashank_p · June 2020

I think this should help, let me try and confirm you!

We're Moving!

Create My New Community Account Now

Delete specific word using regexp_replace

Answers

Leave a Comment