Mixed buddy projections breaking routable queries

Hi,

 

I am testing routable queries and running into problems when Vertica chooses different buddy projections for the tables. Here is the setup:

 

CREATE TABLE location
(
    objectid int NOT NULL
)
 ORDER BY objectid
SEGMENTED BY hash(objectid) ALL NODES KSAFE 1;

 

 

CREATE TABLE Response
(
    objectId int NOT NULL,

    beginDate timestamp NOT NULL,
    locationObjectId int /* foreign key for location.objectId */
)
ORDER BY locationObjectId, beginDate
SEGMENTED BY hash(locationObjectId) ALL NODES KSAFE 1;

 

The query:

 

select count(*) from Response r

join location l on r.locationObjectId = l.objectId

and r.locationObjectId = 210

and l.objectId = 210

 

The (filtered) explain plan:

 

Access Path:
 +-GROUPBY NOTHING [Cost: 206, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(*)
 |  Execute on: v_vert_warehouse_082115_node0003
 | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 205, Rows: 143] (PATH ID: 2) Inner (BROADCAST)
 | |      Join Cond: (r.locationObjectId = l.objectid)
 | |      Execute on: v_vert_warehouse_082115_node0003
 | | +-- Outer -> STORAGE ACCESS for r [Cost: 109, Rows: 28K] (PATH ID: 3)
 | | |      Projection: vertica_f.Response_b1
 | | |      Materialize: r.locationObjectId
 | | |      Filter: (r.locationObjectId = 210)
 | | |      Execute on: v_vert_warehouse_082115_node0003
 | | |      Runtime Filter: (SIP1(MergeJoin): r.locationObjectId)
 | | +-- Inner -> STORAGE ACCESS for l [Cost: 93, Rows: 6K] (PATH ID: 4)
 | | |      Projection: vertica_f.location_b0
 | | |      Materialize: l.objectid
 | | |      Filter: (l.objectid = 210)
 | | |      Execute on: v_vert_warehouse_082115_node0002

 

Notice the bolded parts. Vertica is choosing the b0 projection on node 0002 for location, and the b1 projection for Response on node 0003. When I try to run this as a routable query (from jdbc or vsql), I get the following error:

 

ROLLBACK: Attempt to run multi-node KV plan

 

The weird part is that this happens on one test instance of our database, but not on another that has the same schema / data loaded. On the other instance, the b0 projections are chosen for both tables and the routable query succeeds.

 

So my question is, how does Vertica choose which buddy projection to use? It seems like mixed buddy projections will always result in a failure for a routable query. Is there a way to specifically instruct Vertica that it should not mix the buddy projection offsets for a routable query?

Comments

  • Mmmm ..  

    Try to define one of your projection as unsegmented ?

     

    Thanks 

  • Hi Eli,

     

    Thanks for the reply. I think that your suggestion would solve the problem nicely for this two-table example. Unfortunately, the example that I posted is a snippet of a larger test query that we are using that involves joining 8-10 of our big fact tables. And while I could see us unsegmenting one of these tables (location in this example is not very big compared to the others), I don't think we could unsegment any of the other tables due to their size. It seems that if Vertica chooses incompatible (not on the same node) buddy projections for any two segmented tables in the query, the routable query is doomed.

     

    I did discover the query hints "+SYNTACTIC_JOIN" and "+DISTRIB" which seem to do exactly what I want as far as telling Vertica to only do joins on the same node. But it doesn't look like I can use this through the routable query JDBC interface, VerticaRoutableExecutor, because the class wraps my query in another query with the "+KV" hint and ignores the join hints.

  • Hi ,

    Hints are  good options to solve this problems . 

    Routable API was design to reduce the need for distributed parsing  and all other overhead which are related to distributed execution , and at the end increases your concurrency .

    If your projection design , and in your case Hint's will lead to execution on specific nodes , you already have Routable execution , and you no longer needs  Routable API (You can used regular JDBC interface )  , you have two options for validate that your query is  routable:

     

    1. Look for Remote initiator  on your execution plan .
    2. By querying dc_query_executions table  and filter its content for your query

    Example :   

     

    select

     node_name ,

    execution_step

    from dc_query_executions

    where transaction_id = 45035996316060382

    and statement_id = 442

    order by 3 desc

    ;

     

     

    I hope you will find it helpful

     

    Thanks

  • Hi,

     

      The Vertica query optimizer needs to be taught to prefer routeable queries if the KV hint is supplied. If the desired projection includes features like sort order or segmentation that would be preferable for the query, the optimizer will look for them.  All else being equal, the optimizer tends to pick the earlier defined projection.

     

    -Ben

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file