Is a Union faster than multiple inserts?

If I have a table that is both segmented and ordered on a very high cardinality column, should I try to combine inserts into into a single Union statement for performance?  The inserts are queries from other tables, involving joins, aggregation, and analytic functions.

Comments

  •  Segmenting a table on a very high cardinality column souds like a great idea, but ordering it by the same "very high cardinality column" that is a bad idea !! Unless this is the only column used as a predicate.

     

    Anyway 

    - to improve the perf use the /*+direct*/ hint to avoid WOS. 

     

  • Thank you for your response and concern.  That high-cardinality column is used to join to another very large table that is segmented and ordered on the analogous column in that table, resulting in a nice merge join.

     

    I'm more concerned with the performance of Unions versus separate, multiple inserts.  Do I gain anything by UNIONing the multiple inserts into the same statement?  It's just a guess, but it would seem that any inserts after the first one might cause the whole table to be resorted, to integrate the additional records.  On the other hand, during a Union, the optimizer might be able to plan accordingly for the additional data from the subsequent subqueries.

  • Hi 

    See below some answers  :

     

    Do I gain anything by UNIONing the multiple inserts into the same statement?


    Sure you are , manly single INSERTs has many overhaded , below few of them :


    1) In cluster database like vertica (same for none cluster DB , in cluster DB the problem is worse ) , each time SQL statement is being submitted , the execution engin (EE) parse the statement and build execution plan , in cluster database this task is more time consuming than on a regular none cluster DB , as the query initiator node need to comunicate it with all the nodes , the communication is take place using internal network protocol ,with many small insert statements you end up with many parsing validation and extra network round trips , parsing is runing on top of vertica distributed catalog , any distributed call has its overheads , for bulk load you do it only םnce.
    .
    2) Small inserts into WOS will create many WOS objects , queries top of many WOS objects it will work harder than one big WOS object .

     

    3)Small insert to ROS will end up with many small files in the file system , will increase the chance for hitting two many ROS files exception , MergeOut will work harder and the total resource consumption will increase .

     

    4) In term of performance you will gain dramatically performance improvements using large bulk loads

     

    It's just a guess, but it would seem that any inserts after the first one might cause the whole table to be resorted, to integrate the additional records

     

    Data is not being resorted each time you isert data , i recomend to read this thread https://community.dev.hpe.com/t5/Vertica-Forum/Partitions-and-ROS-containers/m-p/234214#M11465


    it will gave you some background on how Vertica is managing the data and when the data is being sorted

     

    I hope you will find it useful 

     

    Thanks 

  • Very interesting -- thank you!

     

    I definitely see how one bulk insert would have a huge advantage over numerous small inserts.  What do you think would be the performance difference between one large, bulk insert consisting of union of two subqueries, versus two bulk inserts each of half the size, each originating from a somewhat complex subquery?

     

    If the second of two somewhat large inserts doesn't cause the existing data in the table to change at all, then it sounds like there may not be much benefit in union'ing them together.

  • Hi
    Your question is very generic , there are many parameters that can impact the answers , what can i say in short is that the scanning part of each query will not run in parallel in the  case of union  query , so it may run less good than two different parallel bulk loads .


    But as i always say , benchmark it and benchmark it  and benchmark it  , you have your data .
    I hope it finally answer your question

     

    Thanks

  •  I have to say that this was the best answer i have read on this forum ! 

    Thank you Eli ! 

    Multiverse make sure you mark this as the soution 

  • Thanks Adrian

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file