insert performance difference between a single node and a three-nodes cluster

consider a typical insert statement,
insert into xxxxx select * from yyyyy

if running on a single node setup, it takes 6 sec
if running on a three-nodes cluster setup, it takes 17 sec; for the cluster, xxxxx and yyyyy are both segmented by hashing on the primary key column.

can someone describe or point me some direction why on cluster setup, it is slower this much.

Comments

  • Navin_CNavin_C Vertica Customer
    Hi XiangLi,

    The explanation would be :

    Since your hare hash segmenting the values on three node cluster the nodes(initiator) has to perform hashing on every value  and then has to distribute over every projection in other nodes .

    With one node, the initiator does the hashing, but does not distribute as there are no other nodes.
    This is how you save time and the query is faster,

    Hope this helps.
    NC 
  • Hi!

    Agree with Navin.
    6 [sec/node] * 3 [nodes] = 18 [secs] 
    @XiangLi
    To confirm answer: run another query against single node and run it against a 3 node cluster. If execution time on single and cluster is linear, so Navin is correct.
  • thanks you all for the quick reply. One more question, for single setup there is no backup, while for cluster setup, there is 1 set of backup. will it contribute the slow-down as well?
  • Navin_CNavin_C Vertica Customer
    Hi XiangLi,

    If you are referring buddy projections as backup here, this is the explanation:

    With Single node installation Vertica will be set to k=0 (No buddy projections)
    With 3 or more node installation Vertica will set k=1 or k=2( depends on you)

    In a single node installation, since vertica does not have to copy data or I would say load data into buddy projections as well at same time of loading your data, it will always be faster.

    In a multiple node installation , Vertica has to load data into every buddy projects while loading data to your table. This includes loading into two objects and this will be slower than the one node installation node.

    With K=1 you will see better load time when compared with K=2

    Hope this helps
    NC





  • Hi!
    Im not sure about k-safety=2 on 3 nodes cluster.
  • Yes it is not possible.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file