INSERT SELECT * FROM queries single-threaded ?

Grega_Kešpret · November 2013

Hi!

Are INSERT SELECT * FROM ... queries single-threaded? We have tried executing exactly the same query two times on a computer with 24 cores, where the first time EXECUTIONPARALLELISM was set to 24 and the second time EXECUTIONPARALLELISM was set to 1. The query execution time was the same in both cases.

If that is the case, is there a way to run this query to use all cores? Something like Parallel Load Library (PLoad) for copy statements?

Thanks,
Grega

[Deleted User] · November 2013

Hi Grega,

I don't know the answer to this. I also don't believe there is a single answer; it will certainly depend on the layout of your data and on the body of your "SELECT *" statement, as different operators parallelize differently.

But if you are up for an experiment, I would be curious what happens for you if you turn on Local Segmentation (and rebalance or re-create the table so that it has local segments). Splitting data streams is expensive (you may have noticed that pload is not 24x faster on a 24-core machine...); having them pre-split makes a lot of things easier.

You should also check what your system is actually doing. Depending on the remainder of the "SELECT * FROM ..." clause, it may be that CPU isn't your limitation at all; it may be that you're constrained by I/O or by your network. There are various tables in the "V_MONITOR" schema that can give you this information, as can the Management Console, and command-line tools like "iotop" and "iftop" if you have shell access to your Vertica servers. 1 core is just as good at sitting around waiting as 24 cores are :-)

Adam

[Deleted User] · November 2013

Also, there are known issues with parallelism in certain operators. (For example, COPY, prior to pload :-) ) Make sure to try with the latest Vertica version, and with newer versions as they come out.

Viljem_Skornik · November 2013

Hi Adam,

I'll add some details to Grega's question and hopefully answer yours.

We’re importing cca 13 million rows which take about 3GB in storage per projection, destination table has 40 columns. I've checked net/cpu/disk usage via Management Console and it's showing cca 20% max usage. I've also checked it via top/iftop/iotop and none of them are maxed out.

We've had local segmentation before but we were hiting ROS limits when inserting into temporary tables so we disabled it for the time being. I can enabled it again and recheck.

Little bit of background:

destination_table has 40 columns and two super projections, superA with offset 0 and superB with offset 1 and different sort order. This means at insert time it's going to have to sort and resegement. This takes about 220 seconds for 13 million rows.

I've tried also with different projections and I found out that two simple identical projections on destination table are fastest, taking about 35 seconds to insert. If I add offset to one of them, but keeping sort order it's going to take about 220 seconds. So I'm guessing it's the RESEGMENT part that's slowing things down. Nodes are on 1gb switch and I've verified that with vnetperf. I've checked network with iftop and I don't see any real bottlenecks same as in MC. I see some spikes but nothing serious and cca 3GB of data transfered via network on one node. I guess I'll go play with Ganglia and get some proper high-resolution charts.

Jaka_Jančar · November 2013

If destination projections are pre-joins between large tables, the JOIN operator is single-threaded (VER-27781), so the SELECT part will be single-threaded and consequentially also INSERT ... SELECT.

INSERT SELECT * FROM queries single-threaded ?

Comments

Leave a Comment