MultiPhase Trasnform UDx are pointless, stacking separate Transform UDx provide more functionality
I took a look on using MultiPhase Transform UDx.
I am not sure why anybody would want to use them. They restrict functionality comparing to use of stacked several regular TransformUDx.
Here is what I found:
Idea behind MultiPhase is to provide canned functionality, same as you have when calling stacked Transforms:
select transform1(v1.) over() from (
select transform2(v2.) over() from (
Output from transform 2 goes to input of transform 1, creating stacked Transforms
Same call to MultiPhase transform would be
select MPtransform(*) over() from (xxxx);
For regular transform in stacked calls, I can control where each stage is being executed.
I can use over() - single instance of transform executed on init node, over(partition nodes) - single instance of transform per each node, over(partition best) - several instances of transfomr per node. More important, I can specify where I want each transform to be executed individually. For example, I can request that each transform is executed on every node in cluster. And it works perfectly fine.
What I found is that MultiPhase transform is not that flexible. First stage of MultiPhase transform can be configured similar to regular transform, but all other stages are forced to be executed in single thread on initiator node.
So... I do not see a reason to use UDx MultiPhase Transform. Stacked transforms works better, more flexible and have more functionality. Performance is same for stacked transforms and MultiPhase transform.
And, looking at bigger picture, I do not see a point why Vertica released MultiPhase transform. Regular UDx transforms in stacked configuration already exits, works just fine, flexible configurable etc.
What is a point in MultiPhase UDx Transforms?
Looking forward to see comment from Vertica UDx developer.