UDF in R: subscript out of bounds error

Hi,

 

I wrote a User Defined Transform Function in R. Before setting up the UDF, I tested the core functions in client mode, i.e. using vRODBC client to pull data from Vertica and then run my algorithm. Everything works fine in this mode.

 

However, when I run the function as a UDF, sometimes I get the following exception.

 

ERROR 3399:  Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [myFunction] at [/scratch_a/release/vbuild/vertica/OSS/UDxFence/RInterface.cpp:1244], error code: 0, message: Exception in processPartitionForR: [subscript out of bounds]

 

This does not happen always.

 

My understanding is that "subscript out of bounds" exception is R is basically "array index out of bound". The two places in my code where I am accessing "arrays" are:

min_values <- lapply(1:n_rows, function(i) tmp[i,min_idx[i]])
min_domains <- lapply(1:n_rows, function(i) Domains[min_idx[i]])

One thing that I find baffling is that my input data is not changing but sometimes this exception happens and sometimes it does not.

 

Could it be some sort of memory allocation issue--i.e. R code is not able to get the allocated memory?

 

Thanks,

Abhishek

Comments

  • SruthiASruthiA Administrator

    Hi,

     

       Can you share me the code if possible. I can take a look at it and see. 

     

     

    Sruthi

  • # load data and functions
    source('R_scripts/tldextract.R')
    load('R_scripts/tldnames.rda')
    load('R_scripts/alexa.rda')

    extractFQDN <- function(attrbuf) {
    regexp <- "\fServer FQDN\t.*\n"
    m_tmp <- regexpr(regexp, attrbuf, perl = TRUE)
    s_tmp <- regmatches(attrbuf, m_tmp)
    split_tmp <- unlist(strsplit(s_tmp,"\t"))
    ServerFQDN <- sub("\n","",split_tmp[2])
    return(ServerFQDN)
    }

    getDomain <- function(x) {
    return(x$domain)
    }

    uniqueDomains <- function(df_domains) {
    tmp <- lapply(df_domains, tldextract)
    u_domains <- sapply(tmp, getDomain)
    u_domains <- unique(u_domains)
    u_domains <- u_domains[!is.na(u_domains)]
    return(u_domains)
    }

    minAdistfromAlexa <- function(domains, alexaDomains) {
    tmp <- adist(domains, alexaDomains)
    min_idx <- apply(tmp, 1, which.min)
    n_rows = nrow(tmp)
    min_values <- lapply(1:n_rows, function(i)tmp[i,min_idx[i]])
    min_alexa_domains <- lapply(1:n_rows, function(i)alexaDomains[min_idx[i]])
    t_df <- data.frame(cbind(domains, unlist(min_alexa_domains), unlist(min_values)), stringsAsFactors = FALSE)
    colnames(t_df) <- c("domain","alexaDomian","dist")
    t_df$dist <- as.numeric(t_df$dist)
    return(t_df)
    }

    # main function
    domainDistanceAlexa <- function(T) {
    names(T) <- c("attrbuf")
    domains <- lapply(T$attrbuf, extractFQDN)
    df_domains <- data.frame(unlist(domains), stringsAsFactors = FALSE)
    colnames(df_domains)[1] <- "domains"
    list_domains <- uniqueDomains(df_domains)
    df_domains_dist <- minAdistfromAlexa(list_domains, alexa_uniqueDomains)
    df_domains_dist
    }

    # factory function
    domainDistanceAlexaFactory <- function()
    {
    list(
    name=domainDistanceAlexa,
    udxtype=c("transform"),
    intype=c("varchar(4096)"),
    outtype=c("varchar(4096)", "varchar(4096)", "int")
    )
    }

    Here is the entire code. This function computes the edit distance between a list reference list of domains (stored in alexa.rda) and domain names extracted from one of the columns in vertica (using the extractFQDN function).

  • Hi,
    Which version of vertica & R you using ? I face similare problem in the past using vertica 6.1x .

    Thanks .
  • Hi, 

     

    I am using R 3.0.0 and Vertica 7.0.2

     

    Thanks

  • Hi ,

    Ok ,  in our case it was identified as a Bug and it was fix only on 7.1.1.3 (Vertica server SW) and 7.1.1-5  (RPM SW) . 

     

    Looks like you facing the same bug , I recommend  to test your code in  7.2 

     

    Thanks 

  • Hi eli_revach,

     

    Thank you for your suggestion. I will upgrade to Vertica 7.2

     

    What do you mean by 7.1.1-5  (RPM SW) ? Is it the RPM for R?

     

    Thanks.

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file