Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Avoid Using Functions on Very Large Data Sets

This blog post was authored by Jim Knicely.

You can store billions and billions and billions (i.e., a lot) of records in your Vertica tables. When querying these large data sets, try to avoid using database functions like TO_DATE, TO_CHAR, NVL, etc. when unnecessary.

Example:

A table named BIG_DATE_TABLE has 1 billion rows and a column named THE_DATE, which stores a date value as an integer in the format YYYYMMDD.

dbadmin=> SELECT COUNT(*) FROM big_date_table;
   COUNT
------------
1000000000
(1 row)

I want to know how many records there are in the table where THE_DATE column is equal to the current SYSDATE.

My first attempt at getting that answer is the following SQL statement:

dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE TO_DATE(the_date::VARCHAR, 'YYYYMMDD') = TRUNC(SYSDATE);
  COUNT
--------
134976
(1 row)
Time: First fetch (1 row): 78354.856 ms. All rows formatted: 78354.900 ms

That query took a whopping 80 seconds to run!

Fortunately I realized a better solution is to avoid using the TO_DATE function directly against the billion records!

dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE the_date = TO_CHAR(SYSDATE, 'YYYYMMDD')::INT;
  COUNT
--------
134976
(1 row)
Time: First fetch (1 row): 782.399 ms. All rows formatted: 782.440 ms

That’s better! Have Fun!

Sign In or Register to comment.

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.