how to make vertica case insensitive
Hi Expert,
I have a report application connecting to Sqlserver. Now I want to change the DB to vertica.
But I face case sensitive issue. for example:
when I run the following sql in SQLserver(notes: the DB is case insensitive)
select username from user where username='James'.
username
------------------------------------------------------
James
JAMES
james
I got three records. But in vertica I got only one record running the same sql
select username from user where username='James'.
username
------------------------------------------------------
James
I known that I can use upper, lower function or Ilike to workaround.
but if will take effort to change a lot of sql in the application.
Is there any other way to make vertica case insensitive?
I have a report application connecting to Sqlserver. Now I want to change the DB to vertica.
But I face case sensitive issue. for example:
when I run the following sql in SQLserver(notes: the DB is case insensitive)
select username from user where username='James'.
username
------------------------------------------------------
James
JAMES
james
I got three records. But in vertica I got only one record running the same sql
select username from user where username='James'.
username
------------------------------------------------------
James
I known that I can use upper, lower function or Ilike to workaround.
but if will take effort to change a lot of sql in the application.
Is there any other way to make vertica case insensitive?
0
Comments
Set locale to LEN_S1 & then try out. It will make vertica to ignore cases in queries search.
On VSQL prompt issue below command & then try your sql's:
SET LOCALE to LEN_S1
Regards'
Abhishek
Too hard, use in ilike or function lower() or upper().
"ilike" can be very slow for join operator ( a.a ilike b.a) , try to avoid it and use upper() or lower () ( upper(a.a) = upper(b.a) ) functions if you have join’s on your cases .
One of the results for slowdown is the fact that Vertica is not use its column sparse index for access the column data and will do some kind of full column scan .
Thank you all.
@Abhishek, it works with solution you provided.
Thank you again.
two more questions:
1)Is there any performance impact if changing locale to LEN_S1?
2) where can I find the locale description such as LEN_S1 in the doc?
attach the test for set locale to LEN_S1:
------------------------------------------
vertica=> create table vincent(username varchar(100));
CREATE TABLE
vertica=> insert into vincent values('James');
OUTPUT
--------
1
(1 row)
vertica=> insert into vincent values('james');
OUTPUT
--------
1
(1 row)
vertica=> insert into vincent values('JAMES');
OUTPUT
--------
1
(1 row)
vertica=> commit;
COMMIT
vertica=> select * from vincent;
username
----------
JAMES
James
james
(3 rows)
vertica=> select * from vincent where username like 'JAMES';
username
----------
JAMES
(1 row)
vertica=> select * from vincent where username ilike 'JAMES';
username
----------
JAMES
James
james
(3 rows)
vertica=> set locale to LEN_S1;
INFO 2567: Canonical locale: 'en'
Standard collation: 'LEN_S1'
English
SET
vertica=> select * from vincent where username like 'JAMES';
username
----------
JAMES
James
james
(3 rows)
If in your case all the tables are segmented on int column then setting the default session locale to LEN_S1 will help, but If tables are segmented on a varchar column, I am sure, it will show different results.
Setting the locale is a temporary solution and does not fix everything.
As eli and Daniel , UPPER and LOWER would be the best solution for this scenario.
Hope this helps.
NC
I will add my 2 cents to Navin's comment (in addition to segmentation) - Vertica sorts it input/output and order also depends on locale(its possible that Vertica will sort data twice - while data loaded in locale A and will sort for output in locale . So setting a locale not a good choice.