If I don't care about the cases, i.e. I know for sure whether my string is in upper case or lower case, would LIKE be faster, or ILIKE, or both are about the same?
Good question. You could benchmark this, but if LIKE is a possibility, then use it because it would be faster or equally fast as ILIKE that has to do more work. When parsing log files I noticed grep -i ( case insensitive) being substantially slower than grep, so i would not be surprised to see a similar finding here. The differences are probably very small if the look up space for string searching is limited.
Since Vertica's niche is all about fast queries, you're also best off normalizing your data on the way into the database, bringing it to a consistent case. Similarly if there are expressions that you evaluate often with LIKE/ILIKE, consider pulling them out during ETL.
Guys, Vertica is Case-sensitive!! so, ILIKE makes your search bit easier , by not typing the exact word as it is in the database while searching for a record.. Eg; say there is a name like 'raj' in the database table select * from table where name LIKE '%Raj%'; It wont fetch my record. select * from table where name ILIKE '%Raj%'; It fetches my record.
Hi! Case-sensitive search ALWAYS will be faster than non sensitive, because it compares chars by their order in characters table. For example ASCII table comparison: small "a" = 97, while capital "A" = 65.
So non-case sensitive search require 2 comparison for decision: character compared to "small" AND "capital" letter. Can be another strategy: convert all strings to same case (lower or upper) and prepare case-sensitive comparison. I don't know what strategy Vertica choose, both strategies has same complexity in worst case ==> O(length of string). FYI: For UTF/Unicode its a big problem/challenge - case sensitive comparison, because same letters in different languages can has different order. Or: how to compare chars from different languages? How to compare Cyrillic "T" to latin "T"? In UTF/Unicode such comparison is possible (and Vertica data encoded in UTF-8). So even here no stable solutions!
Comments