why group by hash has multiple activity id?
I know hash join has 2 activity id, one for inner table build, the other for output table probe. But for group by hash, why there are multiple activity id? What's the meaning of them? From below snapshot, there are activity id 1026,1023,1023 and 1016, each of have different start end and end time, they are based on the profile data
0
Answers
I could give you a more appropriate explanation if you had added the explain plan - to see if the group by only has LOCAL RESEGMENT or also GLOBAL RESEGMENT or no RESEGMENT at all.
But I would guess that, if you only have two starting parallel threads at the beginning, I would guess that that is the building of the hash table for the GROUP BY (in memory if it's big enough, or spilling to disk otherwise), followed by the reading from that hash table to consolidate the result(s) and channelling them to downstream ParallelUnion/ValExpr, and finally, via the Root and GroupByNothing to the DataTarget (I assume this is either a CREATE TABLE .. AS SELECT .. or an INSERT ... SELECT.)
Thanks for your reply, the plan is as below
| Target Projection: adhoc.test_v1_super (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 1M, Rows: 1] (PATH ID: 4)
| | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 1M, Rows: 1] (PATH ID: 6)
| | | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 1M, Rows: 1] (PATH ID: 7)
| | | | +-- Outer -> STORAGE ACCESS for AEDE [Cost: 530K, Rows: 3B] (PATH ID: 8)
| | | | +-- Inner -> JOIN MERGEJOIN(inputs presorted) [Cost: 570K, Rows: 1] (PUSHED GROUPING) (PATH ID: 9)
| | | | | +-- Outer -> STORAGE ACCESS for AE [Cost: 147K, Rows: 307M] (PATH ID: 10)
| | | | | +-- Inner -> STORAGE ACCESS for AER [Cost: 324K, Rows: 718M] (PUSHED GROUPING) (PATH ID: 11)