We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


UDF in R: subscript out of bounds error — Vertica Forum

UDF in R: subscript out of bounds error

Hi,

 

I wrote a User Defined Transform Function in R. Before setting up the UDF, I tested the core functions in client mode, i.e. using vRODBC client to pull data from Vertica and then run my algorithm. Everything works fine in this mode.

 

However, when I run the function as a UDF, sometimes I get the following exception.

 

ERROR 3399:  Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [myFunction] at [/scratch_a/release/vbuild/vertica/OSS/UDxFence/RInterface.cpp:1244], error code: 0, message: Exception in processPartitionForR: [subscript out of bounds]

 

This does not happen always.

 

My understanding is that "subscript out of bounds" exception is R is basically "array index out of bound". The two places in my code where I am accessing "arrays" are:

min_values <- lapply(1:n_rows, function(i) tmp[i,min_idx[i]])
min_domains <- lapply(1:n_rows, function(i) Domains[min_idx[i]])

One thing that I find baffling is that my input data is not changing but sometimes this exception happens and sometimes it does not.

 

Could it be some sort of memory allocation issue--i.e. R code is not able to get the allocated memory?

 

Thanks,

Abhishek

Comments

  • SruthiASruthiA Administrator

    Hi,

     

       Can you share me the code if possible. I can take a look at it and see. 

     

     

    Sruthi

  • # load data and functions
    source('R_scripts/tldextract.R')
    load('R_scripts/tldnames.rda')
    load('R_scripts/alexa.rda')

    extractFQDN <- function(attrbuf) {
    regexp <- "\fServer FQDN\t.*\n"
    m_tmp <- regexpr(regexp, attrbuf, perl = TRUE)
    s_tmp <- regmatches(attrbuf, m_tmp)
    split_tmp <- unlist(strsplit(s_tmp,"\t"))
    ServerFQDN <- sub("\n","",split_tmp[2])
    return(ServerFQDN)
    }

    getDomain <- function(x) {
    return(x$domain)
    }

    uniqueDomains <- function(df_domains) {
    tmp <- lapply(df_domains, tldextract)
    u_domains <- sapply(tmp, getDomain)
    u_domains <- unique(u_domains)
    u_domains <- u_domains[!is.na(u_domains)]
    return(u_domains)
    }

    minAdistfromAlexa <- function(domains, alexaDomains) {
    tmp <- adist(domains, alexaDomains)
    min_idx <- apply(tmp, 1, which.min)
    n_rows = nrow(tmp)
    min_values <- lapply(1:n_rows, function(i)tmp[i,min_idx[i]])
    min_alexa_domains <- lapply(1:n_rows, function(i)alexaDomains[min_idx[i]])
    t_df <- data.frame(cbind(domains, unlist(min_alexa_domains), unlist(min_values)), stringsAsFactors = FALSE)
    colnames(t_df) <- c("domain","alexaDomian","dist")
    t_df$dist <- as.numeric(t_df$dist)
    return(t_df)
    }

    # main function
    domainDistanceAlexa <- function(T) {
    names(T) <- c("attrbuf")
    domains <- lapply(T$attrbuf, extractFQDN)
    df_domains <- data.frame(unlist(domains), stringsAsFactors = FALSE)
    colnames(df_domains)[1] <- "domains"
    list_domains <- uniqueDomains(df_domains)
    df_domains_dist <- minAdistfromAlexa(list_domains, alexa_uniqueDomains)
    df_domains_dist
    }

    # factory function
    domainDistanceAlexaFactory <- function()
    {
    list(
    name=domainDistanceAlexa,
    udxtype=c("transform"),
    intype=c("varchar(4096)"),
    outtype=c("varchar(4096)", "varchar(4096)", "int")
    )
    }

    Here is the entire code. This function computes the edit distance between a list reference list of domains (stored in alexa.rda) and domain names extracted from one of the columns in vertica (using the extractFQDN function).

  • Hi,
    Which version of vertica & R you using ? I face similare problem in the past using vertica 6.1x .

    Thanks .
  • Hi, 

     

    I am using R 3.0.0 and Vertica 7.0.2

     

    Thanks

  • Hi ,

    Ok ,  in our case it was identified as a Bug and it was fix only on 7.1.1.3 (Vertica server SW) and 7.1.1-5  (RPM SW) . 

     

    Looks like you facing the same bug , I recommend  to test your code in  7.2 

     

    Thanks 

  • Hi eli_revach,

     

    Thank you for your suggestion. I will upgrade to Vertica 7.2

     

    What do you mean by 7.1.1-5  (RPM SW) ? Is it the RPM for R?

     

    Thanks.

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file