Returning a list in R UDF

vioravis · September 2013

I am working on a R UDF that requires returning a list of two data frames: I have given an example of what I am looking for below:

    df      <- data.frame(x1 = runif(1000),                          x2 = runif(1000),                          x3 = runif(1000),                          x4 = runif(1000))    returnList <- function(dataFrame)  {    x1 <- dataFrame[,1]    x2 <- dataFrame[,2]    x3 <- dataFrame[,3]    x4 <- dataFrame[,4]      dataFrame1 = data.frame(newX1 = x1[1:200],newX2 = x2[1:200])    dataFrame2 = data.frame(newX3 = x3[1:500],newX4 = x4[1:500])      return(list(dataFrame1,dataFrame2))    }    returned <- returnList(df)

Is there a way to accomplish this using Vertica R UDF i.e. return a list instead of a dataframe? Thank you. Ravi

Said_Montiel · February 2014

From my own experience you can't, but you can try to save your dataframes to a file and then read that file from the interface you need.

Or if it is possible to combine the data frames in a single one do so. And separate them manually on another module.

Daniel_Leybovic · February 2014

Hi!

Yes, its possible, use in transform function (transform function can be used as any function type - scalar, aggregative, analytic)
If you are looking to return list as columns so its also possible, but you need to define all columns in OUTTYPE.
FYI: table can has max 1600 columns.

Simple example:

valuesList <- function(x)
{
    result <- runif(5)
    result
}

valuesListFactory <- function()
{
    list(
            name=valuesList,
            udxtype=c("transform"),
            intype=c("int"),
            outtype=c("float")
        )
}

Usage:

daniel=> create library rList AS '/tmp/list.R' LANGUAGE 'R';
CREATE LIBRARY

daniel=> create transform function valuesList as LANGUAGE 'R' name 'valuesListFactory' library rList;
CREATE TRANSFORM FUNCTION

daniel=> select valuesList(5) over () from dual;
       col0        
-------------------
 0.707866848446429
 0.363775422330946
 0.267165389610454
  0.65500909416005
 0.883465250022709
(5 rows)

daniel=>

Daniel_Leybovic · February 2014

Or:

valuesList <- function(x)
{
    result <- iris[1:2,1:3]
    result
}

valuesListFactory <- function()
{
    list(
            name=valuesList,
            udxtype=c("transform"),
            intype=c("int"),
            outtype=c("float", "float", "float")
        )
}

daniel=> drop library rlist cascade;
DROP LIBRARY

daniel=> create library rList AS '/tmp/list.R' LANGUAGE 'R';
CREATE LIBRARY

daniel=> create transform function valuesList as LANGUAGE 'R' name 'valuesListFactory' library rList;
CREATE TRANSFORM FUNCTION

daniel=> select valuesList(5) over () from dual;
 col0 | col1 | col2 
------+------+------
  5.1 |  3.5 |  1.4
  4.9 |    3 |  1.4

Said_Montiel · February 2014

He meant an R list object as in the example he proposes he is sending two non-related data frames. That is not possible.

In your answer you're returning data.frames not R lists. A list can contain R objects, other lists or data frames without constraints.

Daniel_Leybovic · February 2014

and what is a list? sequence of values
So what the difference between list is "column of values" and list is "row of values"?
Do you need a row or values?

Said_Montiel · February 2014

a list is an array of objects which don't have to share characteristics. They may contain an R model object, a vector, a data frame, a matrix, a dist object, a sigle value or even sublists of different types and sizes.

In R you can return
list(obj1 = 1,
obj2 = data.frame,
obj3 = list(obj31 = single value,
objj32 = matrix)
obj4 = R's linear regression object
)

UDFs return a dataframe only because in the end they have to go through vertica.

A sequence of values, as you write it is not neccesarily a list in R, but a simple vector. A group of vector (of the same dimensions) can be coerced to a matrix or a data frame (With certain restrictions due to object type). A list has basically no restrictions.

Hope that helped

Said_Montiel · February 2014

Besides... notice how in the example given by vioravis. He has 1 dataframe with 2 columns and 200 rows, and the second data frame has 2 columns and 500 rows. The columns are not to be combined because they refer to different data.

Daniel_Leybovic · February 2014

>> Hope that helped

No. It doesn't helps at all. It you do not understands Vertica limitations.

1. Do you understands that RDBMS its strong typed and statically typed "query language"?
2. Also Vertica has limitation of 1600 columns per table, while list can be much more larger.
3. Also each field is basic built in Vertica data type, while object of list can be complicated, Vertica has no arrays like Postgres.

If you understands all this limitation so why you ask this question?

Said_Montiel · February 2014

I didn't asked the question originally. Check the names, I answered he can't create a list as he would expect within R. That's all. You misunderstood everything from the start.

[Deleted User] · February 2014

Hi all,

It is generally useful if you are able to transform your data into a form that Vertica understands. Vertica provides a pretty flexible set of semantics for dealing with data in all different forms. You can return multiple columns; you can effectively omit some columns in some cases (Vertica is very efficient with long strings of NULLs); etc.

One approach, with its own pro's an con's, to the original question might be to have output columns for each data frame, plus an additional output column saying which frame the current row corresponds to. Then write out frame 1 (and leave frame 2's columns as null); then vice versa. There are various other similar workarounds. Not ideal, granted; but it gives you the ability to write the data out in a form that's understood (and can be manipulated) by Vertica.

If you have an R structure that fundamentally doesn't map to something that can be represented in SQL, the only real workaround is to create a VARBINARY (or, in Vertica 7, LONG VARBINARY) column and use an R serialization library to turn your data structure into a byte-string that can be saved into the VARBINARY column. You can then read it out and deserialize it later with other R code, and get the original object back. This tends to be quite slow; it's also limited in the size of the objects you can write out per row. And you can't manipulate the resulting structure in SQL; not generally in interesting ways at least. But it lets you deal with arbitrary R types.

A possible alternative to all of this is to use our newly-released distributedR functionality. It hides a lot of this sort of stuff, and makes it possible to write distributed programs using various R data structures all in native R.

Adam

vioravis · February 2014

Hi Adam,

Could you please provide some information on the distributedR functionality? Is it available now in Vertica 7 (community edition)? Is it included in the R language pack? Is there any documentation that shows how to use the distributed feature in R? I don't see any mention of it on Vertica's website. Thank you.

On the original question that I asked, I found out a work around and worked with data frames instead

Thanks for your inputs.

[Deleted User] · February 2014

Hi Vioravis,

Sure -- take a look at:

https://vertica.hpwsportal.com/#/Product/{%22productId%22:%22420%22}/Show

DistributedR is a free Vertica add-on. I believe that it will work with the Vertica 7 CE. It is a separate download/installation. There's a YouTube video at that URL that describes some basic usage. If you download the package, it contains a README and an installation guide.

The URL above also contains a link to a forum topic about the feature. Feel free to post any questions or comments there. It's a brand-new feature; we're definitely interested in feedback.

Adam

Said_Montiel · October 2014

Just a suggestion I've using for the past few months but hadn't update this thread.

Depending on your requirements you could use a JSON string as output in a Varchar Column

I've been using either jsonlite or RJSON for object conversion (back and forth).

For this to work you need to do the following

1) Convert R object to JSON with toJSON()
2) Convert JSON Object to character with as.character()
3) Output as a Character Column in a data.frame

Wherever the output is used you can either reconvert into an R object or parse it as JSON.

Hope this helps you (at least in the future)

We're Moving!

Create My New Community Account Now

Returning a list in R UDF

Comments

Leave a Comment