How do I PIVOT in Vertica
Hi folks,
I have spend several days on this, and my boss is getting upset. ![]()
I am trying to PIVOT the following table in Vertica. I found several options online, but they all require that the Pivoted column has to be an INTEGER. I tried the DECODE function, but I need to Pivot on a string and not an INT.
With the example below, the final result set should be... ID (not pivoted),DESCRIPTION (Columns across), EMPLOYED (will be the values).
Any help would be greatly appreciated.
/BEGIN/
DROP TABLE IF EXISTS Pivot_TEST
CREATE TABLE Pivot_TEST
(
ID INT,
DESCRIPTION VARCHAR(25),
EMPLOYED VARCHAR(5)
);
INSERT INTO Pivot_TEST VALUES (1, 'Fulltime','YES');
INSERT INTO Pivot_TEST VALUES (2, 'Fulltime','YES');
INSERT INTO Pivot_TEST VALUES (3, 'Contract','YES');
INSERT INTO Pivot_TEST VALUES (4, 'Contract','YES');
INSERT INTO Pivot_TEST VALUES (5, 'Seasonal','YES');
INSERT INTO Pivot_TEST VALUES (6, 'Seasonal','YES');
INSERT INTO Pivot_TEST VALUES (7, 'Terminated','NO');
INSERT INTO Pivot_TEST VALUES (8, 'Suspended','NO');
INSERT INTO Pivot_TEST VALUES (9, 'Fulltime','YES');
INSERT INTO Pivot_TEST VALUES (10,'Fulltime','YES');
SELECT * From Pivot_TEST;
/* END */
Comments
If you just want to create column values using the actual values in the DESCRIPTION column you can write the SQL with DECODE or CASE and have one column for each distinct value of DESCRIPTION. If ID is unique you don't need any aggregate functions. If ID is not unique you don't have to use SUM; you can use MIN or MAX or anything that works with a string. It doesn't really matter what aggregate is used, the purpose in this case is just to bring the like ID values into a single row.
Of course, if you have conflicting information in different rows (such as both 1|Fulltime|YES and 1|Fulltime|NO) then then whether MIN or MAX does matter but which one is really correct?
Let me know if this is what you're looking for:
SELECT ID,
CASE WHEN EMPLOYED = 'YES' THEN MIN(DECODE(DESCRIPTION, 'Fulltime', 'Yes', 'No')) ELSE 'No' END AS "Fulltime",
CASE WHEN EMPLOYED = 'YES' THEN MIN(DECODE(DESCRIPTION, 'Contract', 'Yes', 'No')) ELSE 'No' END AS "Contract",
CASE WHEN EMPLOYED = 'YES' THEN MIN(DECODE(DESCRIPTION, 'Seasonal', 'Yes', 'No')) ELSE 'No' END AS "Seasonal",
CASE WHEN EMPLOYED = 'YES' THEN MIN(DECODE(DESCRIPTION, 'Terminated', 'Yes', 'No')) ELSE 'No' END AS "Terminated",
CASE WHEN EMPLOYED = 'YES' THEN MIN(DECODE(DESCRIPTION, 'Suspended', 'Yes', 'No')) ELSE 'No' END AS "Suspended"
FROM Pivot_TEST GROUP BY ID, EMPLOYED order by ID;
Thank you very much for this, but I think I am over complicating this. Here is a smaller table that I just want to PIVOT.
create table test2
(
master_country varchar(25)
);
INSERT INTO test2 values ('French Polynesia');
INSERT INTO test2 values ('Colombia');
INSERT INTO test2 values ('Brazil');
INSERT INTO test2 values ('Mauritius');
INSERT INTO test2 values ('Faroe Islands');
select * from test2;
Similar to the suggestion by Shayon_Sanyal...
dbadmin=> select * from test2;
master_country
French Polynesia
Colombia
Brazil
Mauritius
Faroe Islands
(5 rows)
dbadmin=> select
dbadmin-> max(decode(rn, 1, master_country, null)) || ', ' ||
dbadmin-> max(decode(rn, 2, master_country, null)) || ', ' ||
dbadmin-> max(decode(rn, 3, master_country, null)) || ', ' ||
dbadmin-> max(decode(rn, 4, master_country, null)) || ', ' ||
dbadmin-> max(decode(rn, 5, master_country, null)) master_country
dbadmin-> from (select master_country, row_number() over () rn from test2) foo;
master_country
French Polynesia, Colombia, Brazil, Mauritius, Faroe Islands
(1 row)
Nice! What do you do, when the number of rows to pivot is not known? Is there something generic?
We will soon publish a few use cases for Stored Procedures, which will soon be part of a Vertica 11 subversion.
I actually have a pivot function that would do that for you in that collection.
But it's early days just now, and won't work in the current version.
The principle is this:
With this table
sensors:You need this query:
You find that out by previously firing a:
.. and then generating as many
MAX(CASE msr WHEN '???' THEN val END) AS val_???, as you get distinct values - and incorporate that into the final query.I have developed the sample UDx to transform the long-form data to the wide-form data. Please try to use it if you can.
https://github.com/h-serizawa/vertica-extension-functions/tree/master/long2wide
Now that Vertica 11.0.1 is out, and with that, support for forward cursors in Stored procedures - with the same table
sensorsas above, try this:Create the test table:
DROP TABLE IF EXISTS sensors; CREATE TABLE sensors (id,msr,val) AS SELECT 41,'rpm' ,1000 UNION ALL SELECT 41,'tmp' , 25 UNION ALL SELECT 41,'ps_i', 26.32 UNION ALL SELECT 42,'rpm' ,1200 UNION ALL SELECT 42,'tmp' , 26 UNION ALL SELECT 42,'ps_i', 21.32 ;Then, with a stored procedure, you can pivot the table. Into a target (temporary) table that you can then select from:
CALL public.pivot( 'id' -- list of identifers -> GROUP BY , 'msr' -- key column , 'val' -- value column , 'sensors' -- in table name , 'horizontal'-- out, pivoted, temp table name ); -- out NOTICE 2005: -- out *** Running this Query *** -- out CREATE LOCAL TEMPORARY TABLE horizontal ON COMMIT PRESERVE ROWS AS -- out SELECT -- out id -- out , MAX(CASE msr WHEN 'ps_i' THEN val END) AS val_ps_i -- out , MAX(CASE msr WHEN 'rpm' THEN val END) AS val_rpm -- out , MAX(CASE msr WHEN 'tmp' THEN val END) AS val_tmp -- out FROM sensors -- out GROUP BY -- out id -- out pivot -- out ------- -- out 0 SELECT * FROM horizontal; -- out id | val_ps_i | val_rpm | val_tmp -- out ----+----------+---------+--------- -- out 41 | 26.32 | 1000.00 | 25.00 -- out 42 | 21.32 | 1200.00 | 26.00The stored procedure is in prototype mode, and actually outputs the query it has generated, and it is going to fire.
And just for playing purposes: Here's the full source code for the stored procedure:
DROP PROCEDURE IF EXISTS public.pivot( idlist VARCHAR , keyname VARCHAR , valname VARCHAR ,tbname VARCHAR ,pvtbname VARCHAR ) ; CREATE PROCEDURE public.pivot ( IN idlist VARCHAR -- list of identifers -> GROUP BY , IN keyname VARCHAR -- key column , IN valname VARCHAR -- value column , IN tbname VARCHAR -- in table name , IN pvtbname VARCHAR -- out, pivoted, temp table name ) LANGUAGE PLpgSQL SECURITY INVOKER AS $$ DECLARE i INTEGER := 0; sq VARCHAR; c VARCHAR; r VARCHAR; BEGIN i := 1; WHILE SPLIT_PART(idlist,',',i) <> '' LOOP IF i = 1 THEN sq:= CHR(10)||'CREATE LOCAL TEMPORARY TABLE '||pvtbname||' ON COMMIT PRESERVE ROWS AS'|| CHR(10)||'SELECT'||CHR(10)||' '|| SPLIT_PART(idlist,',',i); ELSE sq:= sq||CHR(10)||', '|| SPLIT_PART(idlist,',',i); END IF; i := i + 1; END LOOP ; FOR c IN EXECUTE 'SELECT DISTINCT '||keyname||' FROM '||tbname||' ORDER BY 1' LOOP sq := sq||CHR(10)||', MAX(CASE '||keyname||' WHEN '||QUOTE_LITERAL(c)||' THEN '||valname||' END) AS '||valname||'_'||c; END LOOP; i := 1; sq := sq ||CHR(10)||'FROM '||tbname; LOOP EXIT WHEN SPLIT_PART(idlist,',',i) = ''; IF i = 1 THEN sq:= sq||CHR(10)||'GROUP BY'||CHR(10)||' '|| SPLIT_PART(idlist,',',i); ELSE sq:= sq||CHR(10)||', '|| SPLIT_PART(idlist,',',i); END IF; i := i + 1; END LOOP ; -- vv test output ... RAISE NOTICE ' *** Running this Query *** %',sq; -- ^^ test output ... EXECUTE sq; END; $$;Happy playing ....