Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

groupby pipeline

Hi,

I have a projection with ORDER BY/SEGMENTATION BY clauses on 16 fields:

 

a int,

b int,

c varchar(10),

d int

e varchar(10)

f int

g varchar(10)

h int

i varchar(10)

j int

 

l int

m varchar(10)

o varchar(3)

p int

r varchar(5)

s int

 

I noticed GROUPBY PIPELINE is used only for first 10 fields.

GROUP BY (a,b,c,d,e,f,g,h,i,j)

 

If I add 'l' then is used GROUPBY HASH.

 

There are limitations on number or length of the fields ?

Thank you,

Veronica

Comments

  • I suggest you reformulate your question and tidy-up your text ! 

     

  • OK, I discovered myself.

     

    Speaking about resegmentation, my question was: is the number/length of fields involved in GROUP BY (of SELECT)/ SEGMENTATION BY (projection) clauses modifing the resegmentation modality (GROUPBY HASH / GROUPBY PIPELINE) ?

    My opinion is: NO

     

    Reading the documentation:

    https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingGROUPBYHASHWithProjectionDesign.htm

    https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringGROUPBYOptimizationWithProjectionDesign.htm

    I saw:

    "To avoid resegmentation, the GROUP BY clause must contain all the segmentation columns of the projection."

     

    So, I created for my table a projection with SEGMENTATED BY clause having 16 fields ( named : field_1...field_16), datatypes: int and varchar.

    create projection...

    select...

    order by field_1....field_16

    segmented by hash(field_1....field_16)

     

    And then I started to test if/when GROUPBY HASH is avoided. I noticed that if I use few or all fields in GROUP BY of my query in same order they are put in SEGMENTATION BY clause, then GROUPBY PIPELINED is used. 

     

    select avg(field_290),

                     field_2

    from my table

         where field_1 = 0

    group by field_2

    GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 85K, Rows: 149] (PATH ID: 1)

     
     

     

    select avg(field_290),

                   field_2,

                   field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10,

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

     from my table

         where field_1 = 0

    group by field_2,

                   field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

    GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 474K, Rows: 2K] (PATH ID: 1)
     
     
     But, if I delete , for example field_3,  from my GROUPBY, then GROUPBY HASH is used

    select avg(field_290),

                   field_2,

                   --field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10,

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

     from my table

         where field_1 = 0

    group by field_2,

                   --field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

    GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 690K, Rows: 35K] (PATH ID: 1)

     

     

    with GROUPBY HASH - Memory 6.5G

    with GROUPBY PIPELINE - Memory 3.5G 

     

    Veronica

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.