How to enumerate concurrent instances of UDx
Hello,
I have a Transform UDx that is highly parallelized, and can run many concurrent instances per each node, on all nodes in cluster.
I am writing output of UDx processing to local file on node.
It is well scaling on Vertica. I can run it with over() producing single output file on initiator node, over(partition nodes) with one output file per node, and over(partition best) with 16 output files per each node in cluster.
Problem is that in over(partition best) I cannot differentiate between UDx instances. I can generate single file name per ndoe, by adding node_name, available through srvInterface. In this case, all 16 instances of UDx on each node are writing to same file.
I am adding random number to file name, and it makes each instance of UDx to get unique file name, and it works fine.
But, file names looks ugly...
What I am looking, is there any way to enumerate instances of UDx that are concurrently running?
UDl source do have planning stage that allow me to enumerate all instances of source UDl, and assign each instance to specific node. (I do use it a lot!).
Something like this for transform UDx?
All I want is a unique identifier for UDx instance, preferably short not to harm my beautiful file names.
May be I can get thread no, would it be unique?
It should work both in fenced and unfenced modes.
Answers
Unfortunately the way the UDT API is designed right now, there's not a reliable way to do this.
There is a clunky and unreliable way to do this though. Your TransformFunctionFactory implementation is a singleton instance on each node. It isn't really recommended to put fields in it, because it could be used concurrently by multiple queries or multiple threads within the same query, and also because you would have to be careful about fenced mode. But it is an option. You could probably achieve enumerated instances using some combination of atomic fields and session parameters.
Thanks for the feedback on our UDXes
Thanks for answering!
I am not sure it is a good idea to use any of methods you described. Everything looks like a hack that can cause unexpected behavior.
I will stick to pseudo random number I generated from nanos timestamp.
Hopefully, in future it will be a way to enumerate instances of UdX, supported in Vertica API.