Options

how to make vertica case insensitive

Hi Expert,
    I have a report application connecting to Sqlserver. Now I want to change the DB to vertica.
But I face case sensitive issue.  for example:
    when I run the following sql in SQLserver(notes: the DB is case insensitive)
    select username from user where username='James'.
  username
  ------------------------------------------------------
  James
  JAMES
   james

I got three records. But in vertica  I got only one record running the same sql
    select username from user where username='James'.
username
------------------------------------------------------
James

I known that I can use upper, lower function or Ilike to workaround.
but if will take effort to change a lot of sql in the application.

Is there any other way to make vertica case insensitive?

Comments

  • Options
    Abhishek_RanaAbhishek_Rana Vertica Employee Employee
    Hi,

    Set locale to LEN_S1 & then try out. It will make vertica to ignore cases in queries search.

    On VSQL prompt issue below command & then try your sql's:

    SET LOCALE to LEN_S1

    Regards'
    Abhishek
  • Options
    Hi!

    Too hard, use in ilike or function lower() or upper().
    daniel=> create table vincent (user_name varchar);
    CREATE TABLE
    daniel=> copy vincent from stdin direct;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> John
    >> James
    >> Jane
    >> Jim
    >> jac
    >> \.
    daniel=> select lower(user_name) from vincent ;
    lower
    -------
    james
    jane
    jim
    john
    jac
    (5 rows)

    daniel=> select upper(user_name) from vincent ;
    upper
    -------
    JAMES
    JANE
    JIM
    JOHN
    JAC
    (5 rows)

    daniel=> select * from vincent where user_name ilike 'jac';
    user_name
    -----------
    jac
    (1 row)

    daniel=> select * from vincent where user_name ilike 'JAC';
    user_name
    -----------
    jac
    (1 row)


    daniel=> select lower(user_name) from vincent ;
    lower
    -------
    james
    jane
    jim
    john
    jac
    (5 rows)

    daniel=> select upper(user_name) from vincent ;
    upper
    -------
    JAMES
    JANE
    JIM
    JOHN
    JAC
    (5 rows)

    daniel=> select * from vincent where user_name ilike 'jac';
    user_name
    -----------
    jac
    (1 row)

    daniel=> select * from vincent where user_name ilike 'JAC';
    user_name
    -----------
    jac
    (1 row)

  • Options
    GetSatisfaction - die!
  • Options
    Hi 
    "ilike"   can be very slow for  join operator ( a.a ilike b.a)  , try to avoid  it  and use upper() or  lower ()  ( upper(a.a) = upper(b.a)  ) functions  if you have join’s on your cases    .  
    One of the results for slowdown is the fact that  Vertica is not use its column  sparse index  for access the column data  and will do some kind of full column scan   .   
  • Options

    Thank you all.

    @Abhishek,  it works with solution you provided.

    Thank you again.
    two more questions:

    1)Is there any performance impact if changing locale to LEN_S1?

    2) where can I find the locale description such as LEN_S1 in the doc?



    attach the test for set locale to LEN_S1:

    ------------------------------------------
    vertica=> create table vincent(username varchar(100));
    CREATE TABLE
    vertica=> insert into vincent values('James');
    OUTPUT
    --------
    1
    (1 row)

    vertica=> insert into vincent values('james');
    OUTPUT
    --------
    1
    (1 row)

    vertica=> insert into vincent values('JAMES');
    OUTPUT
    --------
    1
    (1 row)

    vertica=> commit;
    COMMIT
    vertica=> select * from vincent;
    username
    ----------
    JAMES
    James
    james
    (3 rows)

    vertica=> select * from vincent where username like 'JAMES';
    username
    ----------
    JAMES
    (1 row)

    vertica=> select * from vincent where username ilike 'JAMES';
    username
    ----------
    JAMES
    James
    james
    (3 rows)

    vertica=> set locale to LEN_S1;
    INFO 2567: Canonical locale: 'en'
    Standard collation: 'LEN_S1'
    English
    SET
    vertica=> select * from vincent where username like 'JAMES';
    username
    ----------
    JAMES
    James
    james
    (3 rows)

  • Options
    Navin_CNavin_C Vertica Customer
    Hi Vincent,

    If in your case all the tables are segmented on int column then setting the default session locale to LEN_S1 will help, but If tables are segmented on a varchar column, I am sure, it will show different results.

    Setting the locale is a temporary solution and does not fix everything.

    As eli and Daniel , UPPER and LOWER would be the best solution for this scenario.

    Hope this helps.
    NC


  • Options
    Hi!

    I will add my 2 cents to Navin's comment (in addition to segmentation) - Vertica sorts it input/output and order also depends on locale(its possible that Vertica will sort data twice - while data loaded in locale A and will sort for output in locale B). So setting a locale not a good choice.
  • Options
    thanks a lot

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file