Options

Import (complex) JSON to Vertica 7

Hi,

I have an issue that confuses me:

Your tutorial (Flex Tables Guide, 12/18/2013) explains how to load JSON files into Vertica 7. Your example (p. 7f., mountains.json) works fine.

Some days ago, I also tried to load the following example (from Standford Online) into Vertica. This didn't work (validated with www.jsonlint.com). This example is a bit more complex (e.g. "authors" array with 2/3 authors and an additional key "Remark") but my understanding of semi-structured data includes such complexicity.
{ "Books":[
   { "ISBN":"ISBN-0-13-713526-2", "Price":85, "Edition":3, "Title":"A First Course in Database Systems", "Authors":[
{"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] },
    { "ISBN":"ISBN-0-13-815504-6", "Price":100, "Remark":"Buy this book bundled with 'A First Course' - a great deal!", "Title":"Database Systems:The Complete Book", "Authors":[
{"First_Name":"Hector", "Last_Name":"Garcia-Molina"}, {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
}]}
Today, I tried to import simplier JSON file. Unfortunately, the import wasn't successful neither. Although there was no error, "0 rows" were imported. I retried and rechecked it twice but the result was still the same. The JSON file contains also hierarchy which might be the problem?! This is a simplified extract of the GPS tracking file file (also validated):
{"track": [
  {"lon": "8.5987119674682617", "lat": "50.2353782653808594", "ele": "246.0", "time": "2014-01-17T15:57:28.000Z"},
  {"lon": "8.5986804962158203", "lat": "50.2353210449218750", "ele": "242.0", "time": "2014-01-17T15:57:30.000Z"}
] }
Consequently, I tried to find an error in my JSON file. But then, I realized that the syntax of the tutorial's example is not correct:

This is your example (p.7):
{"name": "Everest", "type":"mountain", "height":29029, "hike_safety": 34.1}
{"name": "Mt St Helens", "type":"volcano", "height":29029, "hike_safety": 15.4}
{"name": "Denali", "type":"mountain", "height":17000, "hike_safety": 12.2}
{"name": "Kilimanjaro", "type":"mountain", "height":14000 }
{"name": "Mt Washington", "type":"mountain", "hike_safety": 50.6}
The correct syntax should contain separators " , " within the arrays and an object, here "mountains". Hence, the syntax looks as follows, validated via jsonlint.com
{"mountains": [
   {"name": "Everest", "type":"mountain", "height":29029, "hike_safety": 34.1} , 
   {"name": "Mt St Helens", "type":"volcano", "height":29029, "hike_safety": 15.4} , 
   {"name": "Denali", "type":"mountain", "height":17000, "hike_safety": 12.2} ,
   {"name": "Kilimanjaro", "type":"mountain", "height":14000 } ,
   {"name": "Mt Washington", "type":"mountain", "hike_safety": 50.6}
] }
Since your example isn't proper JSON in my eyes, I'm really confused and don't know how to go on because testing the import of different "semi-structured data levels" of JSON to Vertica is the main goal of my current task.

The main purpose of this thread is not to tell you that your example is incorrect. It's rather to ask you how I can load my data (which is mainly hierarchical) into Vertica.
Finally, this is the question:
What kind of JSON files (hierarchy, arrays, also syntax) can be loaded into Vertica 7? Do you have other examples or maybe advices how to "modify" the JSON files before importing them?

If you answer this question to me, I'll be very happy :)

Thank you in advance.

Adrian

PS: I've also read this thread (https://community.vertica.com/vertica/topics/flex_table) but it didn't help :(

Comments

  • Options
    Greetings, sorry for the troubles you've been experiencing.


    To respond about sample JSON files, the Twitter Firehose is especially great.


    To respond about our sample files, you've hit on the difference between loading all of your JSON into a single row, and loading parts of your JSON into separate rows. Separate rows may be indicated either by loading set of JSON objects nested within a top-level list ala:
    [
      {"key": "value" },
      {"key": "value"}
    ]
    or by simply loading a set if JSON objects ala:
    {"key": "value" }
    {"key": "value"}
    Your sample:
    {"mountains": [
       {"name": "Everest", "type":"mountain", "height":29029, "hike_safety": 34.1} , 
       {"name": "Mt St Helens", "type":"volcano", "height":29029, "hike_safety": 15.4} , 
       {"name": "Denali", "type":"mountain", "height":17000, "hike_safety": 12.2} ,
       {"name": "Kilimanjaro", "type":"mountain", "height":14000 } ,
       {"name": "Mt Washington", "type":"mountain", "hike_safety": 50.6}
    ] }
    would load a single row rather than five separate rows.


    To respond about your JSON:
    Your snippet:
    {"track": [
      {"lon": "8.5987119674682617", "lat": "50.2353782653808594", "ele": "246.0", "time": "2014-01-17T15:57:28.000Z"},
      {"lon": "8.5986804962158203", "lat": "50.2353210449218750", "ele": "242.0", "time": "2014-01-17T15:57:30.000Z"}
    ] }
    looks good, and I was able to load it without problem.

    I'm getting parse errors, however, with the other JSON you posted. Your snippet:
    { "Books":[
       { "ISBN":"ISBN-0-13-713526-2", "Price":85, "Edition":3, "Title":"A First Course in Database Systems", "Authors":[
        {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] },
        { "ISBN":"ISBN-0-13-815504-6", "Price":100, "Remark":"Buy this book bundled with 'A First Course' - a great deal!", "Title":"Database Systems:The Complete Book", "Authors":[
        {"First_Name":"Hector", "Last_Name":"Garcia-Molina"}, {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
            }]}
    has an extra closing }, and can be fixed to look like:
    { "Books":[
       { "ISBN":"ISBN-0-13-713526-2", "Price":85, "Edition":3, "Title":"A First Course in Database Systems", "Authors":[
        {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] },
        { "ISBN":"ISBN-0-13-815504-6", "Price":100, "Remark":"Buy this book bundled with 'A First Course' - a great deal!", "Title":"Database Systems:The Complete Book", "Authors":[
        {"First_Name":"Hector", "Last_Name":"Garcia-Molina"}, {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
            ]}
    To give a concrete example of how you could use your JSON:
    dbadmin=> CREATE FLEX TABLE foo();
    CREATE TABLE
    dbadmin=> COPY foo FROM STDIN PARSER FJSONParser();
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> {"track": [
      {"lon": "8.5987119674682617", "lat": "50.2353782653808594", "ele": "246.0", "time": "2014-01-17T15:57:28.000Z"},
      {"lon": "8.5986804962158203", "lat": "50.2353210449218750", "ele": "242.0", "time": "2014-01-17T15:57:30.000Z"}
    ] }
    \.
    dbadmin=> COPY foo FROM STDIN PARSER FJSONParser();
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> { "Books":[
       { "ISBN":"ISBN-0-13-713526-2", "Price":85, "Edition":3, "Title":"A First Course in Database Systems", "Authors":[
        {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] },
        { "ISBN":"ISBN-0-13-815504-6", "Price":100, "Remark":"Buy this book bundled with 'A First Course' - a great deal!", "Title":"Database Systems:The Complete Book", "Authors":[
        {"First_Name":"Hector", "Last_Name":"Garcia-Molina"}, {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
            ]}
    \.
    dbadmin=> SELECT MapToString(__raw__) FROM foo;
                                                                                                                                                                                                                                                                                                                                                                                                             MapToString                                                                                                                                                                                                                                 
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     {
       "track" : {
          "0.ele" : "246.0",
          "0.lat" : "50.2353782653808594",
          "0.lon" : "8.5987119674682617",
          "0.time" : "2014-01-17T15:57:28.000Z",
          "1.ele" : "242.0",
          "1.lat" : "50.2353210449218750",
          "1.lon" : "8.5986804962158203",
          "1.time" : "2014-01-17T15:57:30.000Z"
       }
    }

     {
       "Books" : {
          "0.Authors" : {
             "0.First_Name" : "Jeffrey",
             "0.Last_Name" : "Ullman",
             "1.First_Name" : "Jennifer",
             "1.Last_Name" : "Widom"
          },
          "0.Edition" : "3",
          "0.ISBN" : "ISBN-0-13-713526-2",
          "0.Price" : "85",
          "0.Title" : "A First Course in Database Systems",
          "1.Authors" : {
             "0.First_Name" : "Hector",
             "0.Last_Name" : "Garcia-Molina",
             "1.First_Name" : "Jeffrey",
             "1.Last_Name" : "Ullman",
             "2.First_Name" : "Jennifer",
             "2.Last_Name" : "Widom"
          },
          "1.ISBN" : "ISBN-0-13-815504-6",
          "1.Price" : "100",
          "1.Remark" : "Buy this book bundled with 'A First Course' - a great deal!",
          "1.Title" : "Database Systems:The Complete Book"
       }
    }

    (2 rows)
    Thanks very much for posting, and post back if you continue to encounter issues.



Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file