We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


why R-udf work on single node, but not work on cluster — Vertica Forum

why R-udf work on single node, but not work on cluster

vsql:RFunctions_test.sql:31: ERROR 3399:  Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [cons] at [/scratch_a/release/vbuild/vertica/UDxFence/RInterface.cpp:1236], error code: 0, message: Exception in processPartitionForR: [cannot open the connection]

This RFunction has been tested on the single node ,it worked fine.

Comments

  • Employee
    Hi Haifeng,

    UDx in R work on multi culster setups. Please make sure that the vertica-R language pack is installed on all nodes. If the problem persists then please file a support case.


  • Below is the sample code, this worked on the single node, but not on the cluster, and i did install vertica-R.language pack on all nodes and the example with vertica work , but my code not , i don't know why?   what does that mean "cannot open the connection", but actually i can access vertica through vsql.

    The library and transform function can be created successfully , but when i invoke the UDF with below sql query , it pop up the above error message "can not open the connection", and i can see the udx process has died through "ps -ef".

    dbadmin  31367 31361  0 Mar04 ?        00:00:00 [vertica-udx-R] <defunct>
    dbadmin  31384 31361  0 Mar04 ?        00:00:00 [vertica-udx-R] <defunct>
    dbadmin  31385 31361  0 Mar04 ?        00:00:00 [vertica-udx-R] <defunct>
    dbadmin  31386 31361  0 Mar04 ?        00:00:00 [vertica-udx-R] <defunct>
    dbadmin  31387 31361  0 Mar04 ?        00:00:00 [vertica-udx-R] <defunct>


    cons <- function(x)
    {
        df <- data.frame(x)
        j <- NULL
        for(j in which(is.na(df[1,]))){
             df[1,j] <- df[min(which(!is.na(df[,j]))),j]
        }
        outdf <- df[1,]
        outdf
    }
    consFactory <- function()
    {
        list(name=cons ,udxtype=c("transform"),intype=c("any"), outtype=c("any"),outtypecallback=outtype,strictness=c("CALLED_ON_NULL_INPUT") )
    }

    #outtypecallback function
    outtype <- function(x)
    {
         params <- NULL
         params <- data.frame(datatype=rep(NA, 1), length=rep(NA,1), scale=rep(NA,1), name=rep(NA,1) )
         for(i in 1:nrow(x))
         {
             params[i,1] <- "varchar"
         }
         params
    }



    DROP TABLE T;

    DROP LIBRARY rlib CASCADE;

    -- Step 1: Create LIBRARY
    \set libfile '\'''pwd''/RFunctions/RFunctions_test.R\''
    CREATE LIBRARY rlib AS :libfile LANGUAGE 'R';

    -- Step 2: Create Function Factories
    CREATE TRANSFORM FUNCTION cons
    AS LANGUAGE 'R' NAME 'consFactory' LIBRARY rlib;

    /*** Example 1: Multiplication ***/
    CREATE TABLE T(qualifier varchar(20) not null,priority int,value1 varchar(20),value2 varchar(20));
    COPY T FROM STDIN DELIMITER ',';
    qua1,1,,o
    qua1,2,b,
    qua2,1,,,
    qua2,2,d,l
    \.

    -- Invoke the UDF
    SELECT  cons(qualifier,priority,value1,value2) OVER(partition by qualifier order by qualifier,priority) FROM T;
  • Hi  Pratibha , thanks for you reply, i have got the reason , i wrote a debug function in the R file , and when i copy the file to cluster , i do not create the same path with the file , so debug message can't find the path  when the UDF being executed , so vertica just throw the above error message , :) , thanks anyway.
  • Can you advice what make the <defunct> process to disappear ?
  • Employee
    Hi Eli,

    This is a known issue and we are working on a fix. In the meantime there are two workarounds. The defunct processes will disappear when you exit the session or when the vertica-udx-R process is killed.

    Thanks
    Pratibha

  • Thanks , its sounds good .

    We are using connection pooling we can't just close the connection , the kill options is not clean enough , i found that addition dummy call to R function without partition by clause  fix it .
    Thanks anyway 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file
You can use Markdown in your post.