How to extract elements in repeated field into separate records (rows)?

ar · February 2015

Hi all!

I have JSON flex table with nested records as array.

Like this:

{
   "ID":"12345",
   "Update":{
      "Time":"20150227",
      "Type":"input",
      "Record":[
         {
            "PartID":"4",
            "PartType":"0"
         },
         {
            "PartID":"7",
            "PartType":"1"
         }
      ]
   }
}

I need to transform it in table like this:

 ID  | PartID | PartType
-----+--------+---------
12345    4         0
12345    7         1

In Apache Drill it could easily be done through FLATTEN function.
See https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function for details.

Is that true that Vertica does not have such a great possibility? Or this could be done some other way?

NorbertK · February 2015

I've just started to explore the Vertica JSON functionality, but you will most likely need a script to format it in your desired output. You can specify to flatten_arrays=true in the fjsonparser to flatten the submaps:

dbadmin=> CREATE FLEX TABLE ar();
CREATE TABLE
dbadmin=> COPY ar FROM '/tmp/ar.json' PARSER fjsonparser(flatten_arrays=true);
 Rows Loaded
-------------
           1
(1 row)

dbadmin=> SELECT compute_flextable_keys_and_build_view('ar');
                          compute_flextable_keys_and_build_view
------------------------------------------------------------------------------------------
 Please see public.ar_keys for updated keys
The view public.ar_view is ready for querying
(1 row)

dbadmin=> \x
Expanded display is on.
dbadmin=> SELECT * FROM ar_view;
-[ RECORD 1 ]------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id                       | 12345
update.time              | 20150227
update.type              | input
update.record            | \001\000\000\000\030\000\000\000\004\000\000\000\024\000\000\000\025\000\000\000\026\000\000\000\027\000\000\0004071\004\000\000\000\024\000\000\000\034\000\000\000&\000\000\000.\000\000\0000.PartID0.PartType1.PartID1.PartType
update.record.0.partid   |
update.record.0.parttype |
update.record.1.partid   |
update.record.1.parttype |
-[ RECORD 2 ]------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id                       | 12345
update.time              | 20150227
update.type              | input
update.record            |
update.record.0.partid   | 4
update.record.0.parttype | 0
update.record.1.partid   | 7
update.record.1.parttype | 1

The closest I could get is:

dbadmin=> SELECT id, "update.record.0.partid", "update.record.0.parttype", "update.record.1.partid", "update.record.1.parttype" FROM ar_view;
  id   | update.record.0.partid | update.record.0.parttype | update.record.1.partid | update.record.1.parttype
-------+------------------------+--------------------------+------------------------+--------------------------
 12345 |                        |                          |                        |
 12345 | 4                      | 0                        | 7                      | 1
(2 rows)

HTH

ar · February 2015

Hi, norbert-krupa.

Thanks for your reply.

Unfortunately that is definitely what we try to avoid. We need to make selects based on Record fields PartID and PartType. Actually to aggregate them and count - how many pairs of "PartID - PartType" there is in the table. And while Record array splits to separate columns this could not be achived..

Typically we have hundreds of 'Record' in 'Update' array. So, in that case Vertica should create hundreds columns and that is not so useful.

NorbertK · February 2015

I'm going through a similar exercise in which I'm trying to normalize many nested arrays. If I'm successful I'll make sure to share my findings.

We're Moving!

Create My New Community Account Now

How to extract elements in repeated field into separate records (rows)?

Comments

Leave a Comment