Options

PARTITION BY with Transform UDFs in R

Hello,


     I have an R UDTF in vertica which takes a simple input and returns a matrix. I am trying to run this function with different sets of parameters (preferably in parallel) using the vertica cluster.

I am trying something like,

SELECT myFunc(col1 USING PARAMETERS x=1,y=2) OVER(PARTITION BY col1) FROM mainTable;


[Vertica][VJDBC](3399) ERROR: Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [myFunc] at [/scratch_a/release/vbuild/vertica/UDxFence/RInterface.cpp:1236], error code: 0, message: Exception in processPartitionForR: [package ‘rJava’ could not be loaded] [SQL State=VP001, DB Errorcode=3399]

I am using rJava because within the UDTF I require several other pieces of information from queries to vertica.

Some help?

Comments

  • Options
    Hi Said,

    Is the rJava package installed on Vertica's version of R?  The Vertica R binary is located in /opt/vertica/R/bin/R.  

    First make sure you have JDK 1.4+ installed on your system and the JAVA_HOME environment variable is set correctly.

    Then run the javareconf utility so R knows where Java is (you may have to be root or sudo to do this):
    /opt/vertica/R/bin/R CMD javareconf
    Then run the Vertica packaged version of R and install the rJava package :
    /opt/vertica/R/bin/R
    install.packages("rJava");
    quit();
    Then try to run your R UDF.

    Please let us know if this helps!

  • Options
    My Vertica cluster admin has run javareconf succesfully and he installed rJava as well. The rJava fails to load when using PARTITION BY as part of the UDF call inside the OVER() statement.

    If i try to execute without PARTITION BY it runs succesfully.

    I have removed this feature of the script and replaced its inner workings. Further research will be done and posted if new information arises.
  • Options
    Do you able to find solution for this problem  ? , i am facing the same issue , R UDF run good without “partition by” and when use partition by  i keep seeing "[vertica-udx-R] <defunct>" process that being created  .

  • Options
    PRanaPRana Employee
    Hi Eli,

    The [vertica-udx-R] <defunct> processes is a known issue, but it doesn't interfere with the query completion. Its just that the clean up once the query is done is not done properly. The defunct processes are cleaned up once the parent exits.

    Are your queries completing fine or do you get an error?

    Thanks
    Pratibha


  • Options
    No error at the sql level , the udx log file include many . Please note that this issue is not related to specific r function , i get the same situation ehen i run the vertica sdk examples.
  • Options
    PRanaPRana Employee
    Hi Eli,

    We are working on fixing the defunct processes issue. We do print some messages to the logs which are a part of the normal execution. As you know R is an interpreted process, we have to parse the R library look for certain optional functions like returntypescallback and if we don't find them we just print the fact to the logs.

    Pratibha
  • Options
    For our implention is show stoper ,can you advice what is the eta for fix ?
  • Options
    Can you send example of code that do not create this issue , it will good for us as a reference to see if something is wrongin in our code.
  • Options
    PRanaPRana Employee
    Hi Eli,

    The fix will be included in the next release. I don't have an ETA for that. What version are you running? You can also run an addition dummy call to R function without partition by clause as suggested by another user here https://community.vertica.com/vertica/topics/why_r_udf_work_on_single_node_but_not_work_on_cluster?u...

  • Options
    We are 6.1.3 the workaround is not cosistent (not alwess shrink the defunk process) . i need hot fix asap . Iwill take it with suport. Thanks
  • Options
    Hi Eli!

    On this particular matter I can't remember what we did. But If you give me the last complete error line you encountered I can help you as well.

    On the side, we had to increase the JAVA memory limit value in order to give space to the query to execute. This helped us temporarily.

    =)
  • Options
    Said ,
    Can you verify that you R function do not create [vertica-udx-R] <defunct> process on the vertica nodes ?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file