We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Query performance — Vertica Forum

Query performance

Table has day, campaign_id and various data columns.
If the table has NULL value in data columns, does that effect on query performance/speed,( like query need to filter on campaign_id in last n days) ?

Best Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    Answer ✓

    Not for a columnar DB like Vertica (See attachment). But be careful when segmenting on columns that have alot of NULLs as that data may skew to a particular node.

  • marcothesanemarcothesane - Select Field - Administrator
    Answer ✓

    No. the NULL value , for querying, filtering, sorting, is to be preferred.
    I always say - with one of my buddies - that the introduction of the NULL value , with the meaning "I don't know", in relational databases, was practically as important as the introduction of the number 0 by the Arabs (the Romans did not have it): it opened a completely new way of calculating or evaluating truth or falseness of a statement.

    On "Do not use columns with a lot of NULL values to segment data": If you have 4 nodes; and 4 million rows in a table; and segment by , say, vendor_id; and vendor_id is NULL in 50% of the cases, and, otherwise, evenly distributed.
    Then, you can expect to have 500,000 rows each on 3 nodes, and 2,500,000 rows on the one node that is assigned to the NULL value.
    You will have, to over-simplify, all 4 nodes busy until the first 500,000 rows are processed, and then three nodes turning their thumbs while waiting for the fourth to process 2 million more rows all by itself.

Answers

  • hqVerthqVert Vertica Customer
    edited February 2021

    Thanks. Then If the column has same value (lke substitute NULL as "unknown") a lot, does the segmentation not to skew to a particular node (we are creating projections based on query to evenly distribute data across nodes); in other words, do I need to remove/substitute the NULL in table in order to have it to improve querying data out of table more efficiently (again the query is to get all data columns filtered on a/list campaign_id within last n days) ?

  • hqVerthqVert Vertica Customer

    Like do not use columns with lot null to segment data?

  • hqVerthqVert Vertica Customer

    Alright that helpful. Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file