The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.

Several String functions showing "stable" behavior, despite documentation listing as "immutable".

Per 7.0.1 documentation, functions such as UPPER should be immutable.  However, I see two indications that many of these functions are actually treated as "stable" in my 7.0.1 installation. 

 

Details below... but what I'm wondering:

  • Is this a database setting that can be changed (i.e. some setting that's forcing the database to run at compatibility level of pre-5.1 release)?
  • Is it a bug (and if so, is it fixed in later version)?
  • For current state, unless a database setting exists to fix this, I'm planning to do blanket rewrite of all UDFs to use the octet-version (immutable version) of all string functions.  Given that I'm only processing ASCII data, are there any theoretical risks that this could change results?

DETAILS:

It seems the impacted functions are only those with following note in their documentation:

 

Starting in Release 5.1, this function treats the string argument as a UTF-8 encoded string, rather than depending on the collation setting of the locale (for example, collation=binary) to identify the encoding. Prior to Release 5.1, the behavior type of this function was stable .

 

The general pattern is that the UTF-8 version of string functions have that note (and seem to be "stable" despite documentation listing as "immutable").  So, UPPER is impacted, UPPERB is not.  INSTR is impacted, INSTRB is not. 

 

Conveniently, there's one (useful for testing) exception to that pattern: the UTF-8 version of SUBSTR does not have the reference to Release 5.1 *and* does not exhibit the problem. 

 

OBSERVATION 1: PERFORMANCE

I've found that for every single pair of UTF-8 vs Bytewise/octect string functions (e.g. UPPER vs UPPERB), the UTF-8 version performs measurably slower (for high-volume use-cases, or when multiple functions are used in complex UDFs, I've measured over 20-fold increases in runtimes).

 

As mentioned above, it's very useful that the UTF-8 version of SUBSTR did NOT have a reference to the 5.1 caveat in documentation.  When the UTF-8 version of SUBSTR was compared to Bytewise version, SUBSTRB, there was little measureable difference in performance.  This indicates that the performance impact results from stable vs. immutable, not from the UTF-8 logic being slightly more complex than Byte-logic.

 

 

BASELINE:

SELECT MAX(RANDOM_CHARACTERS) FROM PUBLIC.VERY_BIG_TABLE;
Runtimes: 4.4 sec, 4.6 sec, 4.6 sec

UPPERB vs UPPER:

SELECT MAX(UPPERB(RANDOM_CHARACTERS)) FROM PUBLIC.VERY_BIG_TABLE;
Runtimes: 8.1 sec, 8.1 sec, 8.0 sec

SELECT MAX(UPPER(RANDOM_CHARACTERS)) FROM PUBLIC.VERY_BIG_TABLE;
Runtimes: 27.3 sec, 27.8 sec, 26.9 sec

SUBSTRB vs SUBSTR:

SELECT MAX(SUBSTRB(RANDOM_CHARACTERS,1,10)) FROM PUBLIC.VERY_BIG_TABLE;
Runtimes: 6.0 sec, 5.8 sec, 5.7 sec

SELECT MAX(SUBSTR(RANDOM_CHARACTERS,1,10)) FROM PUBLIC.VERY_BIG_TABLE;
Runtimes: 7.7 sec, 8.2 sec, 7.9 sec

 

 

OBSERVATION 2: USER-DEFINED-FUNCTION VOLATILITY
I've written a basic "immutable" UDF below.  When UPPER is used in the UDF, the UDF is downgraded to "stable".  But if UPPERB is used, the UDF remains "immutable":

 

Base case:

testuser=> CREATE OR REPLACE FUNCTION PUBLIC.VOLATILITY_TEST (P1 VARCHAR) RETURN VARCHAR AS BEGIN RETURN P1; END;
CREATE FUNCTION
testuser=> SELECT SCHEMA_NAME, FUNCTION_NAME, VOLATILITY FROM USER_FUNCTIONS WHERE FUNCTION_NAME = 'VOLATILITY_TEST';
SCHEMA_NAME | FUNCTION_NAME | VOLATILITY
-------------+-----------------+------------
public | VOLATILITY_TEST | immutable
(1 row)

 

Using UPPER:

testuser=> CREATE OR REPLACE FUNCTION PUBLIC.VOLATILITY_TEST (P1 VARCHAR) RETURN VARCHAR AS BEGIN RETURN UPPER(P1); END;
CREATE FUNCTION
testuser=> SELECT SCHEMA_NAME, FUNCTION_NAME, VOLATILITY FROM USER_FUNCTIONS WHERE FUNCTION_NAME = 'VOLATILITY_TEST';
SCHEMA_NAME | FUNCTION_NAME | VOLATILITY
-------------+-----------------+------------
public | VOLATILITY_TEST | stable
(1 row)

Using UPPERB:

testuser=> CREATE OR REPLACE FUNCTION PUBLIC.VOLATILITY_TEST (P1 VARCHAR) RETURN VARCHAR AS BEGIN RETURN UPPERB(P1); END;
CREATE FUNCTION
testuser=> SELECT SCHEMA_NAME, FUNCTION_NAME, VOLATILITY FROM USER_FUNCTIONS WHERE FUNCTION_NAME = 'VOLATILITY_TEST';
SCHEMA_NAME | FUNCTION_NAME | VOLATILITY
-------------+-----------------+------------
public | VOLATILITY_TEST | immutable

 

Appreciate any direction/feedback anyone might have....

 

-Kevin

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.