Avoid Using Functions on Very Large Data Sets

This blog post was authored by Jim Knicely.

You can store billions and billions and billions (i.e., a lot) of records in your Vertica tables. When querying these large data sets, try to avoid using database functions like TO_DATE, TO_CHAR, NVL, etc. when unnecessary.

Example:

A table named BIG_DATE_TABLE has 1 billion rows and a column named THE_DATE, which stores a date value as an integer in the format YYYYMMDD.

dbadmin=> SELECT COUNT(*) FROM big_date_table;
   COUNT
------------
1000000000
(1 row)

I want to know how many records there are in the table where THE_DATE column is equal to the current SYSDATE.

My first attempt at getting that answer is the following SQL statement:

dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE TO_DATE(the_date::VARCHAR, 'YYYYMMDD') = TRUNC(SYSDATE);
  COUNT
--------
134976
(1 row)
Time: First fetch (1 row): 78354.856 ms. All rows formatted: 78354.900 ms

That query took a whopping 80 seconds to run!

Fortunately I realized a better solution is to avoid using the TO_DATE function directly against the billion records!

dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE the_date = TO_CHAR(SYSDATE, 'YYYYMMDD')::INT;
  COUNT
--------
134976
(1 row)
Time: First fetch (1 row): 782.399 ms. All rows formatted: 782.440 ms

That’s better! Have Fun!

Sign In or Register to comment.