root operator

I'm profiling a query across two systems (same number of nodes, but different hardware). 

 

On my old system, this query runs in 30 minutes.  On the new system, it takes 1+ hours.  The new system was COPY CLUSTER'd from the old, and the explain plans are identical, so I decided to run a profile

 

I took a look at various counters, as well as avg/max execution times between the two, and it seems like the Root operator is the most dramatic.  The average execution time for the Root operator almost makes up the entire difference in total execution time.  What exactly is happening in the root operator?

 

Thanks!

Comments

  • The root operator sends the data back to the client.  You said different hardware - are the CPU speeds very different between the two clusters?  Different network speeds?

     

      --Sharon

     

     

  • Thanks for the response Sharon.  The new cluster has more CPU cores (48/node vs 32/node) but similar clock speeds (new cluster is 300mhz slower, but again, more cores).  Network speeds on the new cluster are much faster (10gig-e vs 1gig-e), which is why it's rather perplexing that this would be a bottleneck. 

     

    So - the root operator sends data back to the client.  I presume that when you profile INSERT statements, the DataTarget operator is what's responsible for writing data to the nodes?

  • Yes - for loads the DataTarget operator writes data to the projections.

     

    That is curious that you would see such a degradation.  Have you done a comparison of the numbers reported by vnetperf, vcpuperf, and vioperf?

     

    Also when comparing performance between these two clusters, be sure that you have the same "query budget" used by your resource pools for an apples to apples comparison.  See the resource_pool_status table.  The query budget by default is memorysize or maxmemorysize divided by plannedconcurrency, which is AUTO by default and would vary based on the number of cores.  On the higher core systems, if the memory is the same as the original system, you'd want to set the planned concurrency to 32 to get the same query budget.  

     

      --Sharon

     

     

  • Vnetperf yes, vcpuperf or vioperf no, but I'll take a look at the last two.

     

    Thanks for the tip on query budget.  The memorysize and planned concurrency on the systems are indeed identical, and I've verified that the initial memory reported by the profile is more or less the same.

     

    It's curious indeed!

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file