groupby pipeline
vcarusi
✭✭
Hi,
I have a projection with ORDER BY/SEGMENTATION BY clauses on 16 fields:
a int,
b int,
c varchar(10),
d int
e varchar(10)
f int
g varchar(10)
h int
i varchar(10)
j int
l int
m varchar(10)
o varchar(3)
p int
r varchar(5)
s int
I noticed GROUPBY PIPELINE is used only for first 10 fields.
GROUP BY (a,b,c,d,e,f,g,h,i,j)
If I add 'l' then is used GROUPBY HASH.
There are limitations on number or length of the fields ?
Thank you,
Veronica
0
Comments
I suggest you reformulate your question and tidy-up your text !
OK, I discovered myself.
Speaking about resegmentation, my question was: is the number/length of fields involved in GROUP BY (of SELECT)/ SEGMENTATION BY (projection) clauses modifing the resegmentation modality (GROUPBY HASH / GROUPBY PIPELINE) ?
My opinion is: NO
Reading the documentation:
https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingGROUPBYHASHWithProjectionDesign.htm
https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringGROUPBYOptimizationWithProjectionDesign.htm
I saw:
"To avoid resegmentation, the GROUP BY clause must contain all the segmentation columns of the projection."
So, I created for my table a projection with SEGMENTATED BY clause having 16 fields ( named : field_1...field_16), datatypes: int and varchar.
create projection...
select...
order by field_1....field_16
segmented by hash(field_1....field_16)
And then I started to test if/when GROUPBY HASH is avoided. I noticed that if I use few or all fields in GROUP BY of my query in same order they are put in SEGMENTATION BY clause, then GROUPBY PIPELINED is used.
select avg(field_290),
field_2
from my table
where field_1 = 0
group by field_2
GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 85K, Rows: 149] (PATH ID: 1)
select avg(field_290),
field_2,
field_3,
field_4,
field_5,
field_6,
field_7,
field_8,
field_9,
field_10,
field_11,
field_12,
field_13,
field_14,
field_15,
field_16
from my table
where field_1 = 0
group by field_2,
field_3,
field_4,
field_5,
field_6,
field_7,
field_8,
field_9,
field_10
field_11,
field_12,
field_13,
field_14,
field_15,
field_16
GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 474K, Rows: 2K] (PATH ID: 1)select avg(field_290),
field_2,
--field_3,
field_4,
field_5,
field_6,
field_7,
field_8,
field_9,
field_10,
field_11,
field_12,
field_13,
field_14,
field_15,
field_16
from my table
where field_1 = 0
group by field_2,
--field_3,
field_4,
field_5,
field_6,
field_7,
field_8,
field_9,
field_10
field_11,
field_12,
field_13,
field_14,
field_15,
field_16
GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 690K, Rows: 35K] (PATH ID: 1)
with GROUPBY HASH - Memory 6.5G
with GROUPBY PIPELINE - Memory 3.5G
Veronica