Why is this query much slower than in Postgresql?

This query (joining two tables) takes about 250 seconds in Postgresql 9.1 and 28 minutes (consistently) in Vertica 6.1.3 on Linux on a host with 15G memory:
select
lead.score > 0,
count(*) as effort, count(distinct lead.id) as leads, count(distinct case lead.isconverted when false then lead.id end) as unconv, count(nullif(lead.isconverted, true)) as unconv_effort, from lead left outer join task on lead.id = task.whoid where not lead.isdeleted and lead.createddate between '2013-01-01' and '2013-11-01' group by lead.score > 0;
The schemas are simple:
CREATE TABLE lead (      id varchar(65000) NOT NULL,      score real,      isconverted boolean,      isdeleted boolean,      createddate timestamp without time zone,      ... 328 columns total ...  );  CREATE TABLE task (      id varchar(65000) NOT NULL,      whoid varchar(65000),      ... 88 columns total ...  );
EXPLAIN shows nothing outrageous:
 Access Path:   +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 447M, Rows: 1 (NO STATISTICS)] (PATH ID: 1)   |  Aggregates: count(DISTINCT lead.id), sum_of_count(*), sum_of_count(), count(DISTINCT )   |  Group By:    | +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 447M, Rows: 10K (NO STATISTICS)] (PATH ID: 2)   | |      Aggregates: count(*), count(CASE WHEN (lead.isconverted = true) THEN NULL::boolean ELSE lead.isconverted END)   | |      Group By: (lead.score > 0), lead.id, CASE lead.isconverted WHEN false THEN lead.id ELSE NULL END   | |      Grouping Sets: (, lead.id, , ), (, )   | | +---> JOIN HASH [RightOuter] [Cost: 437M, Rows: 6M (NO STATISTICS)] (PATH ID: 3)   | | |      Join Cond: (lead.id = task.whoid)   | | | +-- Outer -> STORAGE ACCESS for task [Cost: 19M, Rows: 6M (NO STATISTICS)] (PATH ID: 4)   | | | |      Projection: task_super   | | | |      Materialize: task.whoid   | | | |      Runtime Filter: (SIP1(HashJoin): task.whoid)   | | | +-- Inner -> STORAGE ACCESS for lead [Cost: 1M, Rows: 433K (NO STATISTICS)] (PATH ID: 5)   | | | |      Projection: lead_super   | | | |      Materialize: lead.id, lead.isconverted, lead.score   | | | |      Filter: (NOT lead.isdeleted)   | | | |      Filter: ((lead.createddate >= '2013-01-01 00:00:00'::timestamp) AND (lead.createddate <= '2013-11-01 00:00:00'::timestamp))         ------------------------------    -----------------------------------------------    PLAN: BASE QUERY PLAN (GraphViz Format)   -----------------------------------------------    digraph G {   graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: explain\nselect\n  lead.score \> 0,\n  count(*) as effort,\n  count(distinct lead.id) as leads,\n  count(distinct case lead.isconverted when false then lead.id end) as unconv,\n  count(nullif(lead.isconverted, true)) as unconv_effort\nfrom lead\nleft outer join task on lead.id = task.whoid\nwhere not lead.isdeleted\n  and lead.createddate between \'2013-01-01\' and \'2013-11-01\'\ngroup by lead.score \> 0;\n\nAll Nodes Vector: \n\n  node[0]=v_pod_node0001 (initiator) Up\n", labelloc=t, labeljust=l ordering=out]   0[label = "Root \nOutBlk=[UncTuple(5)]", color = "green", shape = "house"];   1[label = "NewEENode \nOutBlk=[UncTuple(5)]", color = "green", shape = "box"];   2[label = "ParallelUnionStep: \nCombine\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)", color = "green", shape = "box"];   3[label = "ExprEval: \n  \\n  \\n  \\n  \\n  \\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)", color = "green", shape = "box"];   4[label = "GroupByHash(Spill): 1 keys\nAggs:\n  count(DISTINCT lead.id)\n  sum_of_count(*)\n  sum_of_count(\)\n  count(DISTINCT \)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)", color = "green", shape = "box"];   5[label = "ParallelUnionStep: \nLocally Resegment\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)", color = "green", shape = "box"];   6[label = "GroupByPipe(HashPrepass): 1 keys\nAggs:\n  count(DISTINCT lead.id)\n  sum_of_count(*)\n  sum_of_count(\)\n  count(DISTINCT \)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)", color = "green", shape = "box"];   7[label = "ExprEval: \n  (lead.score \> 0)\n  lead.id\n  count(*)\n  count(CASE WHEN (lead.isconverted = true) THEN NULL::boolean ELSE lead.isconverted END)\n  CASE lead.isconverted WHEN false THEN lead.id ELSE NULL END\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Varchar(65000)", color = "green", shape = "box"];   8[label = "GroupByHash(Spill): 4 keys\nAggs:\n  count(*)\n  count(CASE WHEN (lead.isconverted = true) THEN NULL::boolean ELSE lead.isconverted END)\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Varchar(65000)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Integer(8)", color = "green", shape = "box"];   9[label = "ParallelUnionStep: \nLocally Resegment\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Varchar(65000)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Boolean(1)", color = "green", shape = "box"];   10[label = "ExprEval: \n  (lead.score \> 0)\n  lead.id\n  CASE lead.isconverted WHEN false THEN lead.id ELSE NULL END\n  \\n  count(*)\n  count(CASE WHEN (lead.isconverted = true) THEN NULL::boolean ELSE lead.isconverted END)\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Varchar(65000)\nUnc: Integer(8)\nUnc: Integer(8)\nUnc: Boolean(1)", color = "green", shape = "box"];   11[label = "GroupGeneratorStep\nkeys: 5\ngExprs: 1\nAggregates: \n count(*)\n count(CASE WHEN (lead.isconverted = true) THEN NULL::boolean ELSE lead.isconverted END)\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Integer(8)", color = "green", shape = "box"];   12[label = "ExprEval: \n  (lead.score \> 0)\n  lead.id\n  count(*)\n  count(CASE WHEN (lead.isconverted = true) THEN NULL::boolean ELSE lead.isconverted END)\n  CASE lead.isconverted WHEN false THEN lead.id ELSE NULL END\n  1\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Integer(8)", color = "green", shape = "box"];   13[label = "ExprEval: \n  (lead.score \> 0)\n  lead.id\n  CASE lead.isconverted WHEN false THEN lead.id ELSE NULL END\n  1\n  CASE WHEN (lead.isconverted = true) THEN NULL::boolean ELSE lead.isconverted END\nUnc: Boolean(1)\nUnc: Varchar(65000)\nUnc: Varchar(65000)\nUnc: Integer(8)\nUnc: Boolean(1)", color = "green", shape = "box"];   14[label = "Join: Hash-Join: \n(task x lead) using task_super and lead_super (PATH ID: 3)\n[RightOuter]\n\nUnc: Varchar(65000)\nUnc: Boolean(1)\nUnc: Float(8)", color = "green", shape = "box"];   15[label = "StorageUnionStep: task_super\nUnc: Varchar(65000)", color = "purple", shape = "box"];   16[label = "ScanStep: task_super\nSIP1(HashJoin): task.whoid\nid (not emitted)\nwhoid\nUnc: Varchar(65000)", color = "brown", shape = "box"];   17[label = "StorageUnionStep: lead_super\nUnc: Varchar(65000)\nUnc: Boolean(1)\nUnc: Float(8)", color = "purple", shape = "box"];   18[label = "ScanStep: lead_super\n(NOT lead.isdeleted)\n((lead.createddate \>= \'2013-01-01 00:00:00\'::timestamp) AND (lead.createddate \<= \'2013-11-01 00:00:00\'::timestamp))\nid\nisdeleted (not emitted)\ncreateddate (not emitted)\nisconverted\nscore\nUnc: Varchar(65000)\nUnc: Boolean(1)\nUnc: Float(8)", color = "brown", shape = "box"];   1->0 [label = "V[0] C=5",color = "black",style="bold", arrowtail="inv"];   2->1 [label = "0",color = "blue"];   3->2 [label = "0",color = "blue"];   4->3 [label = "0",color = "blue"];   5->4 [label = "0",color = "blue"];   6->5 [label = "0",color = "blue"];   7->6 [label = "0",color = "blue"];   8->7 [label = "0",color = "blue"];   9->8 [label = "0",color = "blue"];   10->9 [label = "0",color = "blue"];   11->10 [label = "0",color = "blue"];   12->11 [label = "0",color = "blue"];   13->12 [label = "0",color = "blue"];   14->13 [label = "0",color = "blue"];   15->14 [label = "0",color = "blue"];   16->15 [label = "0",color = "blue"];   17->14 [label = "1",color = "blue"];   18->17 [label = "0",color = "blue"];   }  (82 rows)
I also don't think I'm grokking the cost numbers because when I run the following query, it finishes in a couple seconds, which is closer to the performance I'm expecting:
select id, whoid from task order by id, whoid offset 5946020 limit 10;     Access Path:   +-SELECT  LIMIT 10 OFFSET 6M [Cost: 915M, Rows: 10 (NO STATISTICS)] (PATH ID: 0)   |  Output Only: 5946030 tuples   | +---> SORT [Cost: 915M, Rows: 6M (NO STATISTICS)] (PATH ID: 1)   | |      Order: task.id ASC, task.whoid ASC   | |      Output Only: 5946030 tuples   | | +---> STORAGE ACCESS for task [Cost: 38M, Rows: 6M (NO STATISTICS)] (PATH ID: 2)   | | |      Projection: task_super   | | |      Materialize: task.id, task.whoid
Note: I'm explicitly caring about performance without projections.  These are examples of ad-hoc queries.  I'm seeking query latencies on the order of several seconds.

Thanks for any answers!

Comments

  • I don't know why Get Satisfaction is formatting my post like that (with the repeated blob at the end and the double spacing) but this may be more readable:

    http://pastebin.com/Nix7FCQz
  • Hi,

    I'm not sure what the cause of this slowdown is.  Could you post more about what PostgreSQL is doing with the query?

    One immediate thing that you could experiment with -- there's a known performance issue in current versions of Vertica with multiple COUNT(DISTINCT [...]) aggregates in a single query.  We're quite aware of the issue; if you have the Enterprise Edition, you can contact Vertica Support about it if you have questions.  I don't personally know of a good universal workaround in currently-released versions of Vertica, though specific queries can sometimes be rewritten to avoid using multiple distinct aggregates.

    Also, I notice that your 'id' column is a VARCHAR(65000).  Are your 'id' values really tens of thousands of characters?  If not, Vertica tends to do much better with smaller VARCHAR fields.  Though I would expect the same, though likely to a lesser degree, from PostgreSQL.

    One other note, regarding projections -- Are you possibly used to optimizing using indices?  It's true that indices probably won't help random ad-hoc queries all that much, so it may not be worth adding them.  However, projections are different -- you say that you're explicitly caring about performance without projections, yet you do have projections here, and as you can see from the EXPLAIN plans, they are being used.  (You always have at least one projection, usually two; Vertica can't store data without them.  "SELECT * FROM PROJECTIONS;" will display more details.  Note further that all projections are sorted, have some compression options set, etc.  There really is no such thing as an unoptimized table in Vertica; just a poorly-optimized one.)  And there are lots of optimizations that can be performed on that storage that are workload-independent, that will help any query.  For example, the raw data can be reorganized so that it compresses better on disk; this means that doing a full table scan is cheaper.  Unless you're only querying a table a few times before dropping it (in which case load time isn't small as compared to query time), we always recommend at least running the Database Designer to automatically tune projection design.  Note that you do not need to give it a list of queries to work with; if you don't, it will still perform the types of optimizations described above.

    Adam
  • Hi!

    What do you want? You don't work by Vertica guidelines: no statistics, no DBD performed, no optimal projections built - all against super projection. You don't know how to work with Vertica and you are expecting for a good results?

    Learn how to work with Verica (read documentation), it isn't PostgreSQL, don't be a lazzy person.
  • TopicStarter must create good projection, to run DBD and to collect statistics before any analysis of "problem".

  • All of those things are certainly important.  But there are some workflows where they are difficult, or where you'd really like them to be unnecessary.  Really, it's a fair question (of any DBMS, not just Vertica):  Why do you have to learn all of these knobs?; why can't the database always simply be fast?  Sometimes there's a good reason why not; sometimes there's an opportunity for us to do better.  (Nothing's perfect; not even Vertica.  If we were already perfect, we wouldn't be hiring :-) )

    Relatedly, no need to call anyone lazy -- the topic starter has clearly done some looking around, posted a bunch of information, etc.  (As they note, the query plan looks reasonable, so statistics probably wouldn't help.)  Pretty much everyone I've seen migrating to Vertica has fallen into the "all databases are the same, right?" trap at some point or another.  A good response (and I think mine was too verbose, sorry about that!) will teach something about Vertica; give them a fact they didn't know, a reference to some bit of the docs that they may have missed, etc.  Helpful to both the topic starter and to other new Vertica users who might stumble across this question while learning how we work.
  • Learning about Vertica is certainly good -- relevant links here might include the Administrator's Guide:

    https://my.vertica.com/docs/6.1.x/PDF/HP_Vertica_6.1.x_AdminGuide.pdf

    (The DBD section starts on page 77; note that it does a lot of things, only one of which needs to know the queries that you most-often run.)

    I also feel like I should plug our standard training pages, just for good measure; probably overkill for this specific issue but maybe interesting:

    http://www.vertica.com/customer-experience/vertica-101/
    http://www.vertica.com/customer-experience/training/

    Anyway, as I mentioned, this could be a real performance bug.  If you have further information, feel free to post it!  We are, of course, always working to improve our performance; I  can't speak to specific roadmaps or features or anything like that, but if you have the chance, I'd encourage you to try this again on new Vertica releases as they come out.

    Adam
  • Hi Adam—

    PG is just scanning the full datasets, doing a hash join, and sorting for the group by:
    GroupAggregate  (cost=1127848.92..1160531.44 rows=2 width=24)    ->  Sort  (cost=1127848.92..1132517.85 rows=1867571 width=24)          Sort Key: ((lead.score > 0::double precision))          ->  Hash Right Join  (cost=130322.56..818414.39 rows=1867571 width=24)                Hash Cond: (task.whoid = lead.id)                ->  Seq Scan on task  (cost=0.00..555988.84 rows=5269184 width=19)                ->  Hash  (cost=127493.49..127493.49 rows=154085 width=24)                      ->  Seq Scan on lead  (cost=0.00..127493.49 rows=154085 width=24)                            Filter: ((NOT isdeleted) AND (createddate >= '2013-01-01 00:00:00'::timestamp without time zone) AND (createddate <= '2013-11-01 00:00:00'::timestamp without time zone))
    Even without the count(distinct), the query is slow.  It seems to be in the join.  I can boil much of the slowness down to just:
    select count(*) from lead left outer join task on lead.id = task.whoid;
    On other points:

    Our text fields can in fact be longer than 65K, but most are short, and it just so happens that in this example dataset the IDs are <65K.

    Regarding projections: we understand the data representation can be optimized, but at this point, we're just interested in the performance-debugging the naive arrangement—even with unordered, uncompressed data, etc. (why I mentioned the time needed to sort a column of the task table).  As a strawman, a sort-merge join should (theoretically :) take less time than half an hour.

    Just to verify that unused columns were not being read or anything silly like that, I tried this on pared-down versions of the lead and task table with all unused columns removed (same results).

    I'm happy to provide further details—feel free to let me know if there's anything else that would be helpful here.  Thanks in advance for any help.
  • Thanks for the reply!

    Hm...  Interesting that switching to a single COUNT(*) didn't help.

    From the EXPLAIN plan, it looks like "lead" has a little under 450k rows and "task" has about 6M rows.  Is that roughly correct?  Trying to get a sense of the size of the data.

    I'm also curious to get a better sense of where all that time is going.  When the system is executing the query, is it pegging one CPU core?  Several CPU cores?  Is it disk IO-bound?

    You could also poke at the "execution_engine_profiles" system table.  It's extremely verbose and its contents can, unfortunately, be a little hard to decipher.  But it will give you lots of stats about each operator in the query plan of the query as it's executing and after it has finished.  It would be interesting to know if one operator is running for dramatically longer than others, for example.

    It would also be interesting to know if the join is spilling to disk.  (I believe you'll see a message about join spills in the Vertica log if this happens, though it's been a little while since I poked at join spills...)  I'm not sure why that would happen on a data set of this size; but I'm also not sure why it's slow, so, might as well ask.

    Lastly, regarding projections, I feel like I should reiterate "no really you want optimized projections, it's always worth the time!"  But it sounds like you're trying to do something specific.  I also agree with you that, at first glance at least, it seems like you ought to get better performance than this even with a pessimal projection design; in which case something else is going on here.  (Though I'm sure there's a reason; just not sure what it is yet.)

    Thanks,
    Adam
  • Hi Adam—

    Quick answer to first questions (will dig into the rest later): all four cores on the machine max out during query execution.  dstat shows only a trickle of IO (50-100KB per second).  And yep, 450K and 6M are the table sizes.

    For the sort spilling—I'll need to re-run with more verbose logging but looking at the file handles the workers have open, there's some files like Sort_Temp_358.dat, which suggests sorts are spilling to disk (not surprising).

  • I was able to fix the problem by persuading the query planner to use merge joins, touched on in https://my.vertica.com/docs/6.1.x/HTML/index.htm#20001.htm.  Now the query takes a couple seconds.  For the given example join of two tables, the fix is simple enough:
    select
    lead.score > 0,
    count(*) as effort,
    count(distinct lead.id) as leads, count(distinct case lead.isconverted when false then lead.id end) as unconv, count(nullif(lead.isconverted, true)) as unconv_effort from (select * from lead order by id) lead left outer join (select * from task order by whoid) task on lead.id = task.whoid where not lead.isdeleted and lead.createddate between '2013-01-01' and '2013-11-01' group by lead.score > 0;
    Alternatively you can create projections to encourage the merge joins, but I don't think this would have helped in our real query, which is sadly more complex.  Essentially it's an n-way path join.  AFAICT you can't create projections on transient intermediate tables.  The ORDER BY trick required significant restructuring to make work.  Original:
    select
      group-by fields, aggregations of lead fields,
      aggregations of task fields, aggregations of opportunity fields, ...
    from lead left outer join task on lead.id = task.whoid
      left outer join opportunity on lead.convertedopportunityid = opportunity.id
      left outer join ...
    where ...
    group by group-by fields
    This had to be rewritten as:
    with
      lead_task as (
        select lead.id as lead_id, lead.isconverted as lead_isconverted, lead....
          task.id as task_id, task.whoid as task_whoid, task....
        from (select * from lead order by id) lead left outer join (select * from task order by whoid) task
          on lead.id = task.whoid
        where ...
      ),
      lead_task_opportunity as (
        select lead_task.*,
          opportunity.id as opportunity_id, opportunity....
        from (select * from lead_task order by lead_convertedopportunityid) first
          left outer join (select * from opportunity order by id) opportunity
            on lead_task.lead_convertedopportunityid = opportunity.id
        where ...
      ),
    lead_task_opportunity_... as ...

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file