Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Query performance

Table has day, campaign_id and various data columns.
If the table has NULL value in data columns, does that effect on query performance/speed,( like query need to filter on campaign_id in last n days) ?

Best Answers

  • Jim_KnicelyJim_Knicely Administrator
    Accepted Answer

    Not for a columnar DB like Vertica (See attachment). But be careful when segmenting on columns that have alot of NULLs as that data may skew to a particular node.

  • marcothesanemarcothesane Employee
    Accepted Answer

    No. the NULL value , for querying, filtering, sorting, is to be preferred.
    I always say - with one of my buddies - that the introduction of the NULL value , with the meaning "I don't know", in relational databases, was practically as important as the introduction of the number 0 by the Arabs (the Romans did not have it): it opened a completely new way of calculating or evaluating truth or falseness of a statement.

    On "Do not use columns with a lot of NULL values to segment data": If you have 4 nodes; and 4 million rows in a table; and segment by , say, vendor_id; and vendor_id is NULL in 50% of the cases, and, otherwise, evenly distributed.
    Then, you can expect to have 500,000 rows each on 3 nodes, and 2,500,000 rows on the one node that is assigned to the NULL value.
    You will have, to over-simplify, all 4 nodes busy until the first 500,000 rows are processed, and then three nodes turning their thumbs while waiting for the fourth to process 2 million more rows all by itself.


  • edited February 4

    Thanks. Then If the column has same value (lke substitute NULL as "unknown") a lot, does the segmentation not to skew to a particular node (we are creating projections based on query to evenly distribute data across nodes); in other words, do I need to remove/substitute the NULL in table in order to have it to improve querying data out of table more efficiently (again the query is to get all data columns filtered on a/list campaign_id within last n days) ?

  • Like do not use columns with lot null to segment data?

  • Alright that helpful. Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.