Query with LISTAGG function fails on secondary subcluster but passes on primary (Error loading file)
Hi,
we have Vertica12 EON db on Google Cloud. Users reported a strange problem with one of the reports using a query with LISTAGG function in it.
We have 6 nodes in primary subcluster used for ETL/ELT and 6 nodes in secondary subcluster used for analytics.
Vertica native connection load balancing works great and analytics users are routed to the secondary subcluster.
This is the error we get on secondary subcluster:
SQL Error [9082] [55000]: [Vertica]VJDBC ERROR: Error loading library file [/vertica/catalog/NessieGCP/v_nessiegcp_node0008_catalog/Libraries/02ffc9adc1834b5e837f96abfca0502d00a00000000003d6/VFunctionsLib_02ffc9adc1834b5e837f96abfca0502d00a00000000003d6.so]: Failure in UDx RPC call InvokeCheckLibrary(): Error calling setupExecContext() in User Defined Object [] at [/data/jenkins/workspace/RE-ReleaseBuilds/RE-Knuckleboom_2/server/vertica/OSS/UDxFence/vertica-udx-C++.cpp:241], error code: 0, message: Error happened in dlopen(): [/vertica/catalog/NessieGCP/v_nessiegcp_node0008_catalog/Libraries/02ffc9adc1834b5e837f96abfca0502d00a00000000003d6/VFunctionsLib_02ffc9adc1834b5e837f96abfca0502d00a00000000003d6.so: cannot open shared object file: No such file or directory]
It works on primary no problem.
Should I copy those files to all nodes of secondary subcluster? Why arent they already there? What's going on?
Thank you
Best Answer
-
Solved by copying library files from node1 (primary) to all nodes of secondary subcluster (it works with files existing only on node8 but I copied them to all nodes in case particular node fails). Btw, secondary nodes had only my custom udx functions, nothing by Vertica that exists on primary subcluster. Nice bug. Something to keep in mind when expanding subclusters with new nodes.
1
Answers
In Eon mode, newly added nodes won't have libraries files copied to it. This is expected and done to expedite adding nodes to (sub-)clusters. Library files on those nodes are recovered from communal storage the first time the library is needed to run a query. That didn't happen in your cluster.
I suggest looking into vertica.log of the node that failed to run the query, searching for attempts to copy library files to said node's local library directory. Also, if you can, open a ticket with support to investigate further.
Thank you for the information Ariel. Good to know.