Creating external Python libraries
mgv12
✭
Hello all,
I would like to use some external Python libraries to run a model inside of Vertica. I have been taking a look to the documentation (Create Library). Apparently, it is possible to declare a library to replicate it in every node in Vertica. I suppose that it is OK when the library does not need to compile extra packages, such pandas. But the wonder arise when a package needs to be compiled and/or have C extra dependencies, such as TensorFlow or PyStan. Is it possible to share those libraries which are installed locally?
Another bullet point, in case it is possible to share those libraries, will they work well in a multinode Vertica?
Thanks in advance,
0
Comments
Hi, I put together a demo implementing FFT from NumPy as a Python UDTF. In order to include compiled packages like NumPy, I created a virtualenv, installed the relevant packages using pip, and uploaded the entire site-packages from the virtualenv to Vertica as follows:
CREATE LIBRARY TransformFunctions AS :libfile DEPENDS '/home/bryan/udx/fft351/lib/python3.5/site-packages/' LANGUAGE 'Python';
where /home/bryan/udx/fft351 is the base of my virtualenv. Note that Vertica only provides the base Python language and vertica_sdk import; everything else has to be imported as DEPENDS.
As far as performance, this will vary widely based on the functionality and implementation - my FFT function runs single-threaded so will not parallelize as written. For best performance, you'll want to implement processPartition() in order to map partitions to nodes for parallel processing.
Hopefully others will chime in with more details on how to implement Tensorflow with Vertica.