root operator

sjyu · October 2015

I'm profiling a query across two systems (same number of nodes, but different hardware).

On my old system, this query runs in 30 minutes. On the new system, it takes 1+ hours. The new system was COPY CLUSTER'd from the old, and the explain plans are identical, so I decided to run a profile

I took a look at various counters, as well as avg/max execution times between the two, and it seems like the Root operator is the most dramatic. The average execution time for the Root operator almost makes up the entire difference in total execution time. What exactly is happening in the root operator?

Thanks!

Sharon_Cutter · October 2015

The root operator sends the data back to the client. You said different hardware - are the CPU speeds very different between the two clusters? Different network speeds?

--Sharon

sjyu · October 2015

Thanks for the response Sharon. The new cluster has more CPU cores (48/node vs 32/node) but similar clock speeds (new cluster is 300mhz slower, but again, more cores). Network speeds on the new cluster are much faster (10gig-e vs 1gig-e), which is why it's rather perplexing that this would be a bottleneck.

So - the root operator sends data back to the client. I presume that when you profile INSERT statements, the DataTarget operator is what's responsible for writing data to the nodes?

Sharon_Cutter · November 2015

Yes - for loads the DataTarget operator writes data to the projections.

That is curious that you would see such a degradation. Have you done a comparison of the numbers reported by vnetperf, vcpuperf, and vioperf?

Also when comparing performance between these two clusters, be sure that you have the same "query budget" used by your resource pools for an apples to apples comparison. See the resource_pool_status table. The query budget by default is memorysize or maxmemorysize divided by plannedconcurrency, which is AUTO by default and would vary based on the number of cores. On the higher core systems, if the memory is the same as the original system, you'd want to set the planned concurrency to 32 to get the same query budget.

--Sharon

sjyu · November 2015

Vnetperf yes, vcpuperf or vioperf no, but I'll take a look at the last two.

Thanks for the tip on query budget. The memorysize and planned concurrency on the systems are indeed identical, and I've verified that the initial memory reported by the profile is more or less the same.

It's curious indeed!

We're Moving!

Create My New Community Account Now

root operator

Comments

Leave a Comment