Generating Row and Column Names when using R UDF
I am reposting the query I that started on vertica-forums.com (http://vertica-forums.com/viewtopic.php?f=62&t=1158) Hi, I am trying to use the R UDF to create a cross tab and have given the R function and the Vertica commands below. generateCrosstab <- function(x) { var1 = x[,1] var2 = x[,2] crossTab <- as.data.frame.matrix(table(var1,var2)) crossTab } crossTab_factory <- function() { inlist <- rep("int", 2) outlist <- c("int", "int") list(name=generateCrosstab, udxtype=c("transform"), intype=inlist, outtype=outlist) } DROP LIBRARY crossTab CASCADE; CREATE LIBRARY crossTab AS '/home/dbadmin/test/crossTab.r' LANGUAGE 'R'; CREATE TRANSFORM FUNCTION generateCrosstab AS NAME 'crossTab_factory' LIBRARY crossTab ; SELECT generateCrosstab(column1, column2) OVER() FROM testdb.table; This query generates the following output: col0 | col1 --------+-------- 466881 | 655735 21047 | 28336 (2 rows) Is there any way to generate the row names and the column names? I have given an example of what I am looking for: setosa versicolor virginica setosa 50 0 0 versicolor 0 50 0 virginica 0 0 50 Would appreciate any help on this? Thank you. Ravi
0
Comments
generateCrosstab <- function(x) { var1 = x[,1] var2 = x[,2] crossTab <- as.data.frame.matrix(table(var1,var2)) crossTab$rowname <- rownames(crossTab) save(file=modelfile, crossTab) crossTab } crossTab_factory <- function() { inlist <- rep("int", 2) outlist <- c("int","int","varchar") list(name=generateCrosstab, udxtype=c("transform"), intype=inlist, outtype=outlist) }However, when I try to add the column name (as a row), I am getting an error: I have given the code used below:generateCrosstab <- function(x) { var1 = x[,1] var2 = x[,2] crossTab <- as.data.frame.matrix(table(var1,var2)) crossTab$rowname <- rownames(crossTab) crossTab <- rbind(crossTab,colnames(crossTab)) save(file=modelfile, crossTab) crossTab } crossTab_factory <- function() { inlist <- rep("int", 2) outlist <- c("varchar","varchar","varchar") list(name=generateCrosstab, udxtype=c("transform"), intype=inlist, outtype=outlist) }I get the following error ERROR 3399: Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [generateCrosstab] at [RInterface.cpp:1006], error code: 0, message: Exception in processPartitionForR: not compatible with REALSXP I have changed all the column types to varchar since the column names are character (though the other values in the cross tab are numeric this should return them as characters as well). This works fine in R though. Could you please provide further guidance? Thank you. RavicrossTab_factory <- function() { inlist <- rep("int", 2) outlist <- c("int","int","varchar") list(name=generateCrosstab, udxtype=c("transform"), intype=inlist, outtype=outlist,outnames = c("0","1","VarName")) }However, this option seems very rigid since we have to define the column variable levels upfront. Is there any way to automate this? Thank you. Ravi