We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Are there sql commands to pivot data? — Vertica Forum

Are there sql commands to pivot data?

Comments

  • I prefer an implementation like the one below if possible: 
    http://www.postgresql.org/docs/9.1/static/tablefunc.html



  • Hi!

    You can try to open a feature request. Post under topic IDEAS

    Regards
  • Hm... took a look on tablefunc... its not so hard to implement it with UDF. Just for fun I implemented an easiest function -  normal_rand(numrows, mean, stddev)

    Example
    dev=> select * from user_functions where function_name ilike '%normal%';
    -[ RECORD 1 ]----------+-----------------------------------------------------
    schema_name | public
    function_name | normal_rand
    procedure_type | User Defined Transform
    function_return_type | Float
    function_argument_type |
    function_definition | Class 'NormRandFactory' in Library 'public.Gaussian'
    volatility |
    is_strict | f
    is_fenced | t
    comment |


    dev=> select normal_rand(USING PARAMETERS rows=3, mean=3.3, stddev=6.3) over (PARTITION AUTO);
    RAND_VALUE
    -------------------
    2.53161555990794
    -3.54695367884655
    7.6110266459183
    (3 rows)

    Bench
    dev=> \! lscpu | grep -P '(^CPU.s.)|(MHz)'
    CPU(s): 8
    CPU MHz: 1200.000
    dev=> \o /dev/null
    dev=> \timing
    Timing is on.


    1MIL rows:
    dev=> select normal_rand(USING PARAMETERS rows=1000000, mean=3.3, stddev=6.3) over (PARTITION AUTO);
    Time: First fetch (1000 rows): 72.207 ms. All rows formatted: 1405.582 ms
    2,5MIL rows:
    dev=> select normal_rand(USING PARAMETERS rows=2500000, mean=3.3, stddev=6.3) over (PARTITION AUTO);
    Time: First fetch (1000 rows): 68.825 ms. All rows formatted: 3521.669 ms
    5MIL rows:
    dev=> select normal_rand(USING PARAMETERS rows=5000000, mean=3.3, stddev=6.3) over (PARTITION AUTO);
    Time: First fetch (1000 rows): 72.705 ms. All rows formatted: 7135.159 ms
    Nice linear scalability
    +=========+========+
    | ROWS | TIME(s)|
    +=========+========+
    | 1000000 | 1.4 |
    | 2500000 | 3.5 |
    | 5000000 | 7.1 |
    +---------+--------+
    I can share a code if you are interesting in this function. May be (also just for fun) I will implement others functions too. Can you tell me witch function is more important for you? And I will start from this function.

    best
  • Great !  !

    crosstab !

    see you soon :-  )

  • Hi!


    I will try, but syntax will be differ from PG implementation.
    I will update you in any case - success or failed.


    Regards.
  • As I got to say, it would be important for people like me who uses the database as a data source for simulations and statistical models of machine learning, have a function that creates horizontal metrics that were calculated, I thank you and I appreciate your help, I hope also that hp-vertica may notice your work and adopt it as soon as helping you develop it.
  • Hi!

    I failed. I will explain where I failed.

    Implementation of PIVOT require:
    • table data
    • cardinality of PIVOT'ed column (its dynamic and mutable and its a problem)
    The only way I can implement it - via external procedures or UDF.
    Unfortunately UDF is out of scope:
    • Vertica will try to parallelize UDF
    • I need to know a cardinality of pivoted data. I need it to define a pivoted table columns. Of cause I can use in ODBC/JDBC and create a Flex table and so columns will be dynamic, but Vertica will parallelize it and you will get a garbage.
    How I see it with EP?
    vsql=> select pivot(src_table=<Table>, dest_table=<Table>, pivot_column=<column>);
    where
    • src_table - an original data for pivoting
    • pivot_column -  pivoted column
    • dest_table - EP will insert results to this table
    Or suggest your syntax(but take in mind - I must know how many columns will be in a new table and so I have to query a pivoted column for cardinality)


    @massimo
    Will you accept solution with EP?

    PS
    The main problem - is cardinality of pivoted column. I can limit query execution on one node only, but I can't create table with dynamic columns.
  • Gaussian Distribution source code:
    http://pastebin.com/gDPcQTE5

    Compile:
    g++ -std=c++11 -D HAVE_LONG_INT_64  -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value -fPIC -o Gaussian.so NormalDistribution.cpp /opt/vertica/sdk/include/Vertica.cpp
    Deploy:
    CREATE LIBRARY Gaussian AS '/tmp/Gaussian.so';
    CREATE TRANSFORM FUNCTION normal_rand AS LANGUAGE 'C++' NAME 'NormalDistributionFactory' LIBRARY Gaussian;
    Regards.
  • Hello  Genius !,  " id10t " it seems not appropriate to you,
    do you think this could be a possible solution (from oracle 11g) ?

    select * from( select deptno, job, sal from emp ) e 

    pivot(   sum (sal )  for job in ( 'CLERK', 'SALESMAN', 'MANAGER', 'ANALYST', 'PRESIDENT' ) )

    order by deptno        DEPTNO     'CLERK'     'SALESMAN'      'MANAGER'    'ANALYST'     'PRESIDENT' -----------   --------    ------------   ------------   ----------   -------------          10       1300                           2450                         5000          20       1900                           2975         6000          30        950           5600            2850   
    http://www.oracle.com/technetwork/issue-archive/2008/08-mar/o28asktom-087592.html


    Ciao

  • Hi!

    Q: How can you query a database from UDF for pivot?
    A: Only via ODBC or JDBC connection. So I have to create a connection to fetch data for pivot. Vertica isn't released a native connector and UDF doesn't support "data fetch" from database.


    Q: And is it a problem?
    A: Yes, Vertica parallelizes UDF and so many connections are opened. And Vertica raises exception, because each connection tries to create pivot table and to insert a pivot data to it. First tread succeeded but others throws an exception. I can't use in "OVER (PARTITION AUTO)" because i need "OVER ()" to define a pivot column.


    Q: Does Oracle approach can be implemented with UDF?
    A: Hm... interesting. I will try. Its more suitable, since I don't need to know a cardinality. Cardinality defined in query. Nice.


    Q: But you still need to fetch data from database. How you will limit a query to a single execution?
    A:
    • I can insert data to temporary table(but Im not sure it will solve a problem with "many connections")
    • I can define "CREATE TABLE IF NOT EXISTS", so a second thread will trow an warning only.
    • If these method will fail so I have a last option to limit a query to a single thread execution. I can define witch projections to use
    Example:
    SELECT set_optimizer_directives('AvoidUsingProjections=prj_sup,prj_rep');
    I have no more options. May be someone will suggest a something?


    Q: And what if a projection is segmented?
    A: So far I can't answer, I need to investigate it, but Im afraid that without a help from Vertica Support I can't implement it with UDF if table is segmented.

    May be Vertica Support will provide a hint that limits a query to a single thread?

    PS
    Hm... and what about CPU Affinity? I will try, looks like "CREATE TABLE IF NOT EXISTS" solves a problem with exceptions.
    I will update.


    Regards.
  • Hi Massimo.


    Yes, it is possible with UDF Multi-Phase Transform function, but I compared a performance with DECODE function - performance is under any critics.  So far my code have some assumptions(read bugs), because I simplified a problem, just wanted to understand - can I do it?

    ***************************

    When I will fix a bugs I will publish it.  Feel free to remind.

    ***************************

    But I recommend you to compare a performance of some UDF function against a built-in function. For example, Vertica provides an UDF Aggregate function - AVG:
    /opt/vertica/sdk/examples/AggregateFunctions/Average.cpp
    and take in mind that Transform Multi-Phase will be x100(at least) slower than doing it with DECODE.


    Regards.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file