groupby pipeline

vcarusi · June 2016

Hi,

I have a projection with ORDER BY/SEGMENTATION BY clauses on 16 fields:

a int,

b int,

c varchar(10),

d int

e varchar(10)

f int

g varchar(10)

h int

i varchar(10)

j int

l int

m varchar(10)

o varchar(3)

p int

r varchar(5)

s int

I noticed GROUPBY PIPELINE is used only for first 10 fields.

GROUP BY (a,b,c,d,e,f,g,h,i,j)

If I add 'l' then is used GROUPBY HASH.

There are limitations on number or length of the fields ?

Thank you,

Veronica

Adrian_Oprea_1 · June 2016

I suggest you reformulate your question and tidy-up your text !

vcarusi · June 2016

OK, I discovered myself.

Speaking about resegmentation, my question was: is the number/length of fields involved in GROUP BY (of SELECT)/ SEGMENTATION BY (projection) clauses modifing the resegmentation modality (GROUPBY HASH / GROUPBY PIPELINE) ?

My opinion is: NO

Reading the documentation:

https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingGROUPBYHASHWithProjectionDesign.htm

https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringGROUPBYOptimizationWithProjectionDesign.htm

I saw:

"To avoid resegmentation, the GROUP BY clause must contain all the segmentation columns of the projection."

So, I created for my table a projection with SEGMENTATED BY clause having 16 fields ( named : field_1...field_16), datatypes: int and varchar.

create projection...

select...

order by field_1....field_16

segmented by hash(field_1....field_16)

And then I started to test if/when GROUPBY HASH is avoided. I noticed that if I use few or all fields in GROUP BY of my query in same order they are put in SEGMENTATION BY clause, then GROUPBY PIPELINED is used.

select avg(field_290),

field_2

from my table

where field_1 = 0

group by field_2

GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 85K, Rows: 149] (PATH ID: 1)

select avg(field_290),

field_2,

field_3,

field_4,

field_5,

field_6,

field_7,

field_8,

field_9,

field_10,

field_11,

field_12,

field_13,

field_14,

field_15,

field_16

from my table

where field_1 = 0

group by field_2,

field_3,

field_4,

field_5,

field_6,

field_7,

field_8,

field_9,

field_10

field_11,

field_12,

field_13,

field_14,

field_15,

field_16

GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 474K, Rows: 2K] (PATH ID: 1)

But, if I delete , for example field_3, from my GROUPBY, then GROUPBY HASH is used

select avg(field_290),

field_2,

--field_3,

field_4,

field_5,

field_6,

field_7,

field_8,

field_9,

field_10,

field_11,

field_12,

field_13,

field_14,

field_15,

field_16

from my table

where field_1 = 0

group by field_2,

--field_3,

field_4,

field_5,

field_6,

field_7,

field_8,

field_9,

field_10

field_11,

field_12,

field_13,

field_14,

field_15,

field_16

GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 690K, Rows: 35K] (PATH ID: 1)

with GROUPBY HASH - Memory 6.5G

with GROUPBY PIPELINE - Memory 3.5G

Veronica

We're Moving!

Create My New Community Account Now

groupby pipeline

Comments

Leave a Comment