Avoiding SORT BY PROJECTION SORT ORDER and RESEGMENT
Is there a way to avoid (SORT BY PROJECTION SORT ORDER) (RESEGMENT) -- seen in output below -- when copying data from identically segmented projections to another identically segmented projection? The way I see it, doing a SELECT * FROM test_2; chooses one super projection and then when inserting into projections of test_1, one projection is already sorted like the one chosen in SELECT, but the other is not hence the SORT BY PROJECTION SORT ORDER. However, if you look at it, both projections with the same OFFSETs are already sorted by the same order, so there would be no need to sort it again. Is there something I am missing?
# Create tables CREATE TABLE test_1 ( a date NOT NULL, b char(8) NOT NULL, c char(8) NOT NULL, d char(8) NOT NULL, PRIMARY KEY (a, b, c, d), UNIQUE (d) ); CREATE PROJECTION test_1_super_by_a ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_1 ORDER BY a, b, c, d SEGMENTED BY MODULARHASH(d) ALL NODES; CREATE PROJECTION test_1_super_by_b ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_1 ORDER BY b, a, c, d SEGMENTED BY MODULARHASH(d) ALL NODES OFFSET 1; # Create identically segmented tables # https://my.vertica.com/docs/6.1.x/HTML/index.htm#10248.htm CREATE TABLE test_2 ( a date NOT NULL, b char(8) NOT NULL, c char(8) NOT NULL, d char(8) NOT NULL, PRIMARY KEY (a, b, c, d), UNIQUE (d) ); CREATE PROJECTION test_2_super_by_a ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_2 ORDER BY a, b, c, d SEGMENTED BY MODULARHASH(d) ALL NODES; CREATE PROJECTION test_2_super_by_b ( a ENCODING RLE, b ENCODING RLE, c ENCODING RLE, d ENCODING RLE ) AS SELECT * FROM test_2 ORDER BY b, a, c, d SEGMENTED BY MODULARHASH(d) ALL NODES OFFSET 1; EXPLAIN INSERT INTO test_1 SELECT * FROM test_2; ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------ QUERY PLAN DESCRIPTION: ------------------------------ EXPLAIN INSERT INTO test_1 SELECT * FROM test_2; Access Path: +-DML INSERT [Cost: 0, Rows: 0] | Target Projection: public.test_1_super_by_b (SORT BY PROJECTION SORT ORDER) (RESEGMENT) | Target Projection: public.test_1_super_by_a | Target Prep: | Execute on: All Nodes | Execute on: All Nodes | Execute on: All Nodes | +---> STORAGE ACCESS for test_2 [Cost: 805, Rows: 10K (NO STATISTICS)] (PATH ID: 3) | | Projection: public.test_2_super_by_a | | Materialize: test_2.a, test_2.b, test_2.c, test_2.d | | Execute on: All Nodes Best regards, Grega
0
Comments
However, my question is if there is a way to circumvent this. If vertica would somehow be able to recognize that both tables have projections with the same orders at the same offsets, it wouldn't need to sort the data again.
I think you have aptly described the problem and the surrounding scenario. I unfortunately don't know any way around it in current versions of Vertica. (Though Vertica's a big product; I don't claim to be an expert in every little corner of it, so someone else might post here with a creative workaround that I'm not aware of.)
Adam