We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Avoid Using Functions on Very Large Data Sets — Vertica Forum

Avoid Using Functions on Very Large Data Sets

This blog post was authored by Jim Knicely.

You can store billions and billions and billions (i.e., a lot) of records in your Vertica tables. When querying these large data sets, try to avoid using database functions like TO_DATE, TO_CHAR, NVL, etc. when unnecessary.

Example:

A table named BIG_DATE_TABLE has 1 billion rows and a column named THE_DATE, which stores a date value as an integer in the format YYYYMMDD.

dbadmin=> SELECT COUNT(*) FROM big_date_table;
   COUNT
------------
1000000000
(1 row)

I want to know how many records there are in the table where THE_DATE column is equal to the current SYSDATE.

My first attempt at getting that answer is the following SQL statement:

dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE TO_DATE(the_date::VARCHAR, 'YYYYMMDD') = TRUNC(SYSDATE);
  COUNT
--------
134976
(1 row)
Time: First fetch (1 row): 78354.856 ms. All rows formatted: 78354.900 ms

That query took a whopping 80 seconds to run!

Fortunately I realized a better solution is to avoid using the TO_DATE function directly against the billion records!

dbadmin=> SELECT COUNT(*) FROM big_date_table WHERE the_date = TO_CHAR(SYSDATE, 'YYYYMMDD')::INT;
  COUNT
--------
134976
(1 row)
Time: First fetch (1 row): 782.399 ms. All rows formatted: 782.440 ms

That’s better! Have Fun!

Sign In or Register to comment.