Avoiding SORT BY PROJECTION SORT ORDER and RESEGMENT

Is there a way to avoid (SORT BY PROJECTION SORT ORDER) (RESEGMENT) -- seen in output below -- when copying data from identically segmented projections to another identically segmented projection? The way I see it, doing a SELECT * FROM test_2; chooses one super projection and then when inserting into projections of test_1, one projection is already sorted like the one chosen in SELECT, but the other is not hence the SORT BY PROJECTION SORT ORDER. However, if you look at it, both projections with the same OFFSETs are already sorted by the same order, so there would be no need to sort it again. Is there something I am missing?
# Create tables CREATE TABLE test_1 ( a date NOT NULL, b char(8) NOT NULL, c char(8) NOT NULL, d char(8) NOT NULL, PRIMARY KEY (a, b, c, d), UNIQUE (d) ); CREATE PROJECTION test_1_super_by_a ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_1 ORDER BY a, b, c, d SEGMENTED BY MODULARHASH(d) ALL NODES; CREATE PROJECTION test_1_super_by_b ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_1 ORDER BY b, a, c, d SEGMENTED BY MODULARHASH(d) ALL NODES OFFSET 1; # Create identically segmented tables # https://my.vertica.com/docs/6.1.x/HTML/index.htm#10248.htm CREATE TABLE test_2 ( a date NOT NULL, b char(8) NOT NULL, c char(8) NOT NULL, d char(8) NOT NULL, PRIMARY KEY (a, b, c, d), UNIQUE (d) ); CREATE PROJECTION test_2_super_by_a ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_2 ORDER BY a, b, c, d SEGMENTED BY MODULARHASH(d) ALL NODES; CREATE PROJECTION test_2_super_by_b ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_2 ORDER BY b, a, c, d SEGMENTED BY MODULARHASH(d) ALL NODES OFFSET 1; EXPLAIN INSERT INTO test_1 SELECT * FROM test_2; ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------ QUERY PLAN DESCRIPTION: ------------------------------ EXPLAIN INSERT INTO test_1 SELECT * FROM test_2; Access Path: +-DML INSERT [Cost: 0, Rows: 0] | Target Projection: public.test_1_super_by_b (SORT BY PROJECTION SORT ORDER) (RESEGMENT) | Target Projection: public.test_1_super_by_a | Target Prep: | Execute on: All Nodes | Execute on: All Nodes | Execute on: All Nodes | +---> STORAGE ACCESS for test_2 [Cost: 805, Rows: 10K (NO STATISTICS)] (PATH ID: 3) | | Projection: public.test_2_super_by_a | | Materialize: test_2.a, test_2.b, test_2.c, test_2.d | | Execute on: All Nodes Best regards, Grega

Comments

  • It's possible that the lack of statistics is what's causing this.  Can you try running an analyze_statistics() on both tables to see if that alters the EXPLAIN behavior?
  • Thanks, but that doesn't change anything. I think the problem (as described) is that Vertica chooses _one_ projection as the source and when it starts to insert to the other table, it finds that there are 2 projections backing up that table: one has the same sort order as the selected projection and the other one has different sort order. For the former, no re-sort is needed, but for the latter, additional sort is needed which is seen by SORT BY PROJECTION SORT ORDER in the explain output.

    However, my question is if there is a way to circumvent this. If vertica would somehow be able to recognize that both tables have projections with the same orders at the same offsets, it wouldn't need to sort the data again.
  • Hi Grega,

    I think you have aptly described the problem and the surrounding scenario.  I unfortunately don't know any way around it in current versions of Vertica.  (Though Vertica's a big product; I don't claim to be an expert in every little corner of it, so someone else might post here with a creative workaround that I'm not aware of.)

    Adam
  • The performance loss actually does not come from RE-SORTING projection, but by doing a RESEGMENT. So, if the source projection is the one with OFFSET 0, when inserting into projection with OFFSET 1, it will have to do a RESEGMENT, which is costly. Can we somehow circumvent this? Can we do a manual copy from projection with the same offset to a different projection with the same offset?
  • *IF* your interpretation of events is correct, and Vertica will indeed always use just one source projection, then I don't see how there could possibly be a way around this. :/

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file