We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


groupby pipeline — Vertica Forum

groupby pipeline

Hi,

I have a projection with ORDER BY/SEGMENTATION BY clauses on 16 fields:

 

a int,

b int,

c varchar(10),

d int

e varchar(10)

f int

g varchar(10)

h int

i varchar(10)

j int

 

l int

m varchar(10)

o varchar(3)

p int

r varchar(5)

s int

 

I noticed GROUPBY PIPELINE is used only for first 10 fields.

GROUP BY (a,b,c,d,e,f,g,h,i,j)

 

If I add 'l' then is used GROUPBY HASH.

 

There are limitations on number or length of the fields ?

Thank you,

Veronica

Comments

  • I suggest you reformulate your question and tidy-up your text ! 

     

  • OK, I discovered myself.

     

    Speaking about resegmentation, my question was: is the number/length of fields involved in GROUP BY (of SELECT)/ SEGMENTATION BY (projection) clauses modifing the resegmentation modality (GROUPBY HASH / GROUPBY PIPELINE) ?

    My opinion is: NO

     

    Reading the documentation:

    https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingGROUPBYHASHWithProjectionDesign.htm

    https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringGROUPBYOptimizationWithProjectionDesign.htm

    I saw:

    "To avoid resegmentation, the GROUP BY clause must contain all the segmentation columns of the projection."

     

    So, I created for my table a projection with SEGMENTATED BY clause having 16 fields ( named : field_1...field_16), datatypes: int and varchar.

    create projection...

    select...

    order by field_1....field_16

    segmented by hash(field_1....field_16)

     

    And then I started to test if/when GROUPBY HASH is avoided. I noticed that if I use few or all fields in GROUP BY of my query in same order they are put in SEGMENTATION BY clause, then GROUPBY PIPELINED is used. 

     

    select avg(field_290),

                     field_2

    from my table

         where field_1 = 0

    group by field_2

    GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 85K, Rows: 149] (PATH ID: 1)

     
     

     

    select avg(field_290),

                   field_2,

                   field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10,

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

     from my table

         where field_1 = 0

    group by field_2,

                   field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

    GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 474K, Rows: 2K] (PATH ID: 1)
     
     
     But, if I delete , for example field_3,  from my GROUPBY, then GROUPBY HASH is used

    select avg(field_290),

                   field_2,

                   --field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10,

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

     from my table

         where field_1 = 0

    group by field_2,

                   --field_3,

                   field_4,

                   field_5,

                   field_6,

                   field_7,

                   field_8,

                   field_9,

                   field_10

                   field_11,

                   field_12,

                   field_13,

                   field_14,

                   field_15,

                   field_16

    GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 690K, Rows: 35K] (PATH ID: 1)

     

     

    with GROUPBY HASH - Memory 6.5G

    with GROUPBY PIPELINE - Memory 3.5G 

     

    Veronica

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file