Does Vertica have an equivalent to MySQL's GROUP_CONCAT function?

Does Vertica have an equivalent to MySQL's GROUP_CONCAT() function? (for reference: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat )

Comments

  • https://github.com/vertica/Vertica-Extension-Packages/tree/master/strings_package is where you can install a UDF to add the analytic GROUP_CONCAT() function to Vertica
  • Hi Eugene ,

    Have look at Concatenate UDAF in vertica examples which comes with vertica installation
    that's the mysql equivalent. you can just directly install it.

    more  /opt/vertica/sdk/examples/AggregateFunctions/Concatenate.cpp

    -- Shell comppile
    cd /opt/vertica/sdk/examples/AggregateFunctions/
    g++ -D HAVE_LONG_INT_64  -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value \
          -fPIC -o Concatenate.so Concatenate.cpp /opt/vertica/sdk/include/Vertica.cpp

    -- Create LIBRARY
    CREATE LIBRARY AggregateFunctionsConcatenate AS '/opt/vertica/sdk/examples/AggregateFunctions/Concatenate.so';
    CREATE AGGREGATE FUNCTION agg_group_concat AS LANGUAGE 'C++' NAME 'ConcatenateFactory' LIBRARY AggregateFunctionsConcatenate;






  • Hi Ismail -- thanks for  a fabulously helpful post. 

    Just noticed the the UDAF deployed with Vertica doesn't apply delimiters whereas the MySQL version allows specifying a SEPARATOR parameter.

    For expedience, I crudely modified the Concatenate.cpp file to add a pipe delimiter, and added +1 to the length check.  So lines 68-72 look like this:
    if ((myConcat.length() + otherConcat.length()) +1 <= maxSize) {  
      string word = myConcat.str();
      if (word.length()>0 && otherConcat.str().length()>0) word.append("|");
      word.append(otherConcat.str());
       myConcat.copy(word);
    }
    This works for me now, but wish I knew how to parameterize the UDAF to specify a separator.


  • can any body help me how to do the installation for UDF's in vertica? Is there any documentation for this
  • Navin_CNavin_C Vertica Customer
    Hi Sushmit,

    I you just want to install them.
    Go through this link, its a part of documentation.

    CREATE FUNCTION.

    Majorly, the packages available come with a install.sql script which does the installation for you.

    Hope this helps
    NC
  • The instructions that Ismail posted were the exact recipe I needed right out of a standard install.  To be able to follow them, I also needed to install the g++ compiler
    $> yum install g++
  • PRanaPRana Employee
    Hi Pieter,

    You can add parameters to a UDAF by overriding the getParameterType method and the you can access the parameter via the ServerInterface object. Here is an example.
    virtual void getParameterType(ServerInterface &srvInterface,                               SizedColumnTypes &parameterTypes)     {
          parameterTypes.addVarchar(10, "delimiter");     } virtual void initAggregate(ServerInterface &srvInterface,                           IntermediateAggs &intermediates)     {        ParamReader paramReader = srvInterface.getParamReader();        std::string delimiter = srvInterface.getParamReader().getStringRef("delimiter").str();     }    HTH Pratibha
  • Navin_CNavin_C Vertica Customer
    yes, g++ packages is the basic package which you will need to run comile and build all Vertica UDF's
  • So does it mean  UDF's are accessible  to particular USER?
  • Navin_CNavin_C Vertica Customer
    Hi sushmit,

    UDF's once built and deployed into Vertica are accessible to all users in form of Functons(access can be restricted).

    Coming to building a UDF.
    If you want to build a new UDF from scratch , you will have to write a new .cpp file for it, and then install it as described in link above.

    But there are already many udf's built by Vertica team as well as the Developers around the globe. You can start with installing them.

    You can access all UDF here.

    Vertica Marketplace

    Just download them, checkout the readme files and install them.
    I believe all the packages on this link are very much self explanatory of what a function does along with examples.

    Please detail your question, If I am getting you wrong.

    Hope this helps.
    NC

  • Thanks Navin for your help . Yes this answers 
  • I am having an issue registering the Concatenate Aggregate function in Vertica on Linux.  The shared library compiles fine but when I run the 'CREATE LIBRARY' function it complains about not being able to find the lib: 

    => CREATE LIBRARY AggregateFunctionsConcatenate AS '/opt/vertica/sdk/examples/build/AggregateFunctions.so';

    ROLLBACK 3399:  Failure in UDx RPC call InvokeSetExecContext(): Error calling setupExecContext() in User Defined Object [] at [/scratch_a/release/vbuild/vertica/OSS/UDxFence/vertica-udx-C++.cpp:180], error code: 0, message: Error happened in dlopen(): [/opt/vertica/sdk/examples/build/AggregateFunctions.so: cannot open shared object file: No such file or directory]

    I have verified the vertica user the process is running as is able to read the file, but no dice registering the lib.  I tried adding the path to ld.so.conf, restarting vertica, etc.  No luck.

    The last think I can think of is that perhaps the user vertica is running as does not have permissions to copy the shared lib to some path in the vertica install ?  

    Quite stuck at the moment, any insight would be appreciated


  • Can you check the permissions on the file,

     ls -la /opt/vertica/sdk/examples/build/AggregateFunctions.so 

    and if it is limited in scope,  grant read permissions?  As a temporary diagnostic hack, assuming the set of other users on the machine is known and trusted, perhaps...?

    chmod a+r /opt/vertica/sdk/examples/build/AggregateFunctions.so
  • Thanks for the suggestions.  

    It is/was 755.  I changed it to 777 and owned by the verica process user.  I tried moving it around, no dice.  I also compiled a short class that did a dlopen() on the library and had no troubles running it from the command line as the vertica user.

    I am left thinking the problem is that vertica was installed as root, but later changed to run as an unprivileged user.  I am guessing that the resulting error is a misleading message and the file permission issue has something to do with writing the shared lib to a location on either the initiating server or one of the other nodes in the cluster.

    Again, any insight would be appreciated.

    Thx
  • Sorry for hijakacing the thread.

     

    I have tried to modify the Concatenate.cpp file as you mentioned. It compiled successfully but it is not changing the delimeter.Below is my code.

    Please help me on this. Also it there a way I can test (compile and run)the code in my local machine without intalling the fucntion so that I can check the functionally it is working or not.

    /* Copyright (c) 2005 - 2014, Hewlett-Packard Development Co., L.P. -*- C++ -*- */
    /*
    * Description: Example User Defined Aggregate Function: Concatenate strings
    *
    */

    #include "Vertica.h"
    #include <time.h>
    #include <sstream>
    #include <iostream>

    using namespace Vertica;
    using namespace std;


    /**
    * User Defined Aggregate Function concatenate that takes in strings and concatenates
    * them together. Right now, the max length of the resulting string is ten times the
    * maximum length of the input string.
    */
    class Concatenate : public AggregateFunction
    {

    virtual void initAggregate(ServerInterface &srvInterface, IntermediateAggs &aggs)
    {
    try {
    VString &concat = aggs.getStringRef(0);
    concat.copy("");
    } catch(exception& e) {
    // Standard exception. Quit.
    vt_report_error(0, "Exception while initializing intermediate aggregates: [%s]", e.what());
    }
    }

    void aggregate(ServerInterface &srvInterface,
    BlockReader &argReader,
    IntermediateAggs &aggs)
    {
    try {
    VString &concat = aggs.getStringRef(0);
    string word = concat.str();
    uint32 maxSize = aggs.getTypeMetaData().getColumnType(0).getStringLength();
    do {
    const VString &input = argReader.getStringRef(0);

    if (!input.isNull()) {
    if ((word.length() + input.length()) > maxSize) break;
    word.append(input.str());
    }
    } while (argReader.next());
    concat.copy(word);
    } catch(exception& e) {
    // Standard exception. Quit.
    vt_report_error(0, "Exception while processing aggregate: [%s]", e.what());
    }
    }

    virtual void combine(ServerInterface &srvInterface,
    IntermediateAggs &aggs,
    MultipleIntermediateAggs &aggsOther)
    {
    try {
    uint32 maxSize = aggs.getTypeMetaData().getColumnType(0).getStringLength();
    VString myConcat = aggs.getStringRef(0);

    do {
    const VString otherConcat = aggsOther.getStringRef(0);
    if ((myConcat.length() + otherConcat.length())+1 <= maxSize) {
    string word = myConcat.str();
    if (word.length()>0 && otherConcat.str().length()>0) word.append(",");
    word.append(otherConcat.str());
    myConcat.copy(word);
    }
    } while (aggsOther.next());
    } catch(exception& e) {
    // Standard exception. Quit.
    vt_report_error(0, "Exception while combining intermediate aggregates: [%s]", e.what());
    }
    }

    virtual void terminate(ServerInterface &srvInterface,
    BlockWriter &resWriter,
    IntermediateAggs &aggs)
    {
    try {
    const VString &concat = aggs.getStringRef(0);
    VString &result = resWriter.getStringRef();

    result.copy(&concat);
    } catch(exception& e) {
    // Standard exception. Quit.
    vt_report_error(0, "Exception while computing aggregate output: [%s]", e.what());
    }
    }

    InlineAggregate()
    };


    class ConcatenateFactory : public AggregateFunctionFactory
    {
    virtual void getIntermediateTypes(ServerInterface &srvInterface, const SizedColumnTypes &inputTypes, SizedColumnTypes &intermediateTypeMetaData)
    {
    int input_len = inputTypes.getColumnType(0).getStringLength();
    intermediateTypeMetaData.addVarchar(input_len*10);
    }

    virtual void getPrototype(ServerInterface &srvfloaterface, ColumnTypes &argTypes, ColumnTypes &returnType)
    {
    argTypes.addVarchar();
    returnType.addVarchar();
    }

    virtual void getReturnType(ServerInterface &srvfloaterface,
    const SizedColumnTypes &inputTypes,
    SizedColumnTypes &outputTypes)
    {
    int input_len = inputTypes.getColumnType(0).getStringLength();
    outputTypes.addVarchar(input_len*10);
    }

    virtual AggregateFunction *createAggregateFunction(ServerInterface &srvfloaterface)
    { return vt_createFuncObject<Concatenate>(srvfloaterface.allocator); }

    };

    RegisterFactory(ConcatenateFactory);

     

     


  • Pieter_Sheth-Vo wrote:

    Just noticed the the UDAF deployed with Vertica doesn't apply delimiters whereas the MySQL version allows specifying a SEPARATOR parameter.

    For expedience, I crudely modified the Concatenate.cpp file to add a pipe delimiter, and added +1 to the length check.  So lines 68-72 look like this:
    if ((myConcat.length() + otherConcat.length()) +1 <= maxSize) {  
      string word = myConcat.str();
      if (word.length()>0 && otherConcat.str().length()>0) word.append("|");
      word.append(otherConcat.str());
       myConcat.copy(word);
    }
    This works for me now, but wish I knew how to parameterize the UDAF to specify a separator.



     

    The simpler thing to do is just to use SQL to add the delimiters.  You won't need to use GROUP_CONCAT (which is an analytical function with certain UDx limitations) either. Just the simple concatenate function you mention. 

     

    SELECT name, rtrim(agg_concatenate(id || ', '),', ') AS ids
    FROM table
    GROUP BY name

     

    You can append the delimeter to each field then strip the last one off with the less-often-used rtrim parameter.

     

    For anyone trying to compile the function from examples, you can: 

     

    -- Shell commands
    cd /opt/vertica/sdk/examples/AggregateFunctions/
    g++ -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value -fPIC -o Concatenate.so Concatenate.cpp /opt/vertica/sdk/include/Vertica.cpp

    -- vsql commands
    CREATE LIBRARY AggregateFunctionsConcatenate AS '/opt/vertica/sdk/examples/AggregateFunctions/Concatenate.so';
    CREATE AGGREGATE FUNCTION agg_concatenate AS LANGUAGE 'C++' NAME 'ConcatenateFactory' LIBRARY AggregateFunctionsConcatenate;

     

  • It looks like the Concatenate.cpp example has disappeard in v9.1.1-0.
    Trying to compile it from archived versions works fine, but on debian 9.6, the CREATE LIBRARY fails:

    dbadmin=> CREATE LIBRARY AggregateFunctionsConcatenate AS '/opt/vertica/sdk/examples/AggregateFunctions/Concatenate.so';
    ROLLBACK 3399:  Failure in UDx RPC call InvokeGetLibraryManifest(): Exception calling getLibraryManifest() in User Defined Object [], message: basic_string::_S_create
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2018

    Hi,

    I grabbed the code from here:

    https://github.com/vertica/UDx-Examples/blob/master/Java-and-C++/AggregateFunctions/Concatenate.cpp

    I'm using CentOS, but I was able to compile it and create the Library/Function in Vertica 9.1.1-4:

    [dbadmin@s18384357 ~]$ g++ -D HAVE_LONG_INT_64  -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value -fPIC -o Concatenate.so Concatenate.cpp /opt/vertica/sdk/include/Vertica.cpp
    
    [dbadmin@s18384357 ~]$ ls Concatenate.so
    Concatenate.so
    
    [dbadmin@s18384357 ~]$ vsql
    Welcome to vsql, the Vertica Analytic Database interactive terminal.
    
    Type:  \h or \? for help with vsql commands
           \g or terminate with semicolon to execute query
           \q to quit
    
    dbadmin=> SELECT version();
                  version
    ------------------------------------
     Vertica Analytic Database v9.1.1-4
    (1 row)
    
    dbadmin=> CREATE LIBRARY AggregateFunctionsConcatenate AS '/home/dbadmin/Concatenate.so';
    CREATE LIBRARY
    
    dbadmin=> CREATE AGGREGATE FUNCTION agg_group_concat AS LANGUAGE 'C++' NAME 'ConcatenateFactory' LIBRARY AggregateFunctionsConcatenate;
    CREATE AGGREGATE FUNCTION
    
    dbadmin=> SELECT * FROM colors;
     color
    -------
     BLUE
     GREEN
     RED
    (3 rows)
    
    dbadmin=> SELECT rtrim(agg_group_concat(color || ','), ',') AS colors FROM colors;
          colors
    ------------------
     BLUE,GREEN,RED
    (1 row)
    

    Note that Vertica 9.1.1-4 now has a built-in function called LISTAGG that does the same thing...

    dbadmin=> SELECT listagg(color) AS colors FROM colors;
         colors
    ----------------
     BLUE,GREEN,RED
    (1 row)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file