Is a Union faster than multiple inserts?

Multiverse · March 2016

If I have a table that is both segmented and ordered on a very high cardinality column, should I try to combine inserts into into a single Union statement for performance? The inserts are queries from other tables, involving joins, aggregation, and analytic functions.

Adrian_Oprea_1 · March 2016

Segmenting a table on a very high cardinality column souds like a great idea, but ordering it by the same "very high cardinality column" that is a bad idea !! Unless this is the only column used as a predicate.

Anyway

- to improve the perf use the /*+direct*/ hint to avoid WOS.

Multiverse · March 2016

Thank you for your response and concern. That high-cardinality column is used to join to another very large table that is segmented and ordered on the analogous column in that table, resulting in a nice merge join.

I'm more concerned with the performance of Unions versus separate, multiple inserts. Do I gain anything by UNIONing the multiple inserts into the same statement? It's just a guess, but it would seem that any inserts after the first one might cause the whole table to be resorted, to integrate the additional records. On the other hand, during a Union, the optimizer might be able to plan accordingly for the additional data from the subsequent subqueries.

eli_revach · March 2016

Hi

See below some answers :

Do I gain anything by UNIONing the multiple inserts into the same statement?

Sure you are , manly single INSERTs has many overhaded , below few of them :

1) In cluster database like vertica (same for none cluster DB , in cluster DB the problem is worse ) , each time SQL statement is being submitted , the execution engin (EE) parse the statement and build execution plan , in cluster database this task is more time consuming than on a regular none cluster DB , as the query initiator node need to comunicate it with all the nodes , the communication is take place using internal network protocol ,with many small insert statements you end up with many parsing validation and extra network round trips , parsing is runing on top of vertica distributed catalog , any distributed call has its overheads , for bulk load you do it only םnce.
.
2) Small inserts into WOS will create many WOS objects , queries top of many WOS objects it will work harder than one big WOS object .

3)Small insert to ROS will end up with many small files in the file system , will increase the chance for hitting two many ROS files exception , MergeOut will work harder and the total resource consumption will increase .

4) In term of performance you will gain dramatically performance improvements using large bulk loads

It's just a guess, but it would seem that any inserts after the first one might cause the whole table to be resorted, to integrate the additional records

Data is not being resorted each time you isert data , i recomend to read this thread https://community.dev.hpe.com/t5/Vertica-Forum/Partitions-and-ROS-containers/m-p/234214#M11465

it will gave you some background on how Vertica is managing the data and when the data is being sorted

I hope you will find it useful

Thanks

Multiverse · March 2016

Very interesting -- thank you!

I definitely see how one bulk insert would have a huge advantage over numerous small inserts. What do you think would be the performance difference between one large, bulk insert consisting of union of two subqueries, versus two bulk inserts each of half the size, each originating from a somewhat complex subquery?

If the second of two somewhat large inserts doesn't cause the existing data in the table to change at all, then it sounds like there may not be much benefit in union'ing them together.

eli_revach · March 2016

Hi
Your question is very generic , there are many parameters that can impact the answers , what can i say in short is that the scanning part of each query will not run in parallel in the case of union query , so it may run less good than two different parallel bulk loads .

But as i always say , benchmark it and benchmark it and benchmark it , you have your data .
I hope it finally answer your question

Thanks

Adrian_Oprea_1 · March 2016

I have to say that this was the best answer i have read on this forum !

Thank you Eli !

Multiverse make sure you mark this as the soution

eli_revach · March 2016

Thanks Adrian

Is a Union faster than multiple inserts?

Comments

Leave a Comment