Work with XML data structure in Vertica

I create several tables (about 60 tables) to store my XML data structure (exactly IDMEF rfc4765), after inserting about 1 million data in tables (the tables are related to each other), executing queries become very slow.
For example a query with 6 joins (because of XML structure and related tables) executed in 13 seconds. (I have some queries with more than 6 joins now)
Would you have a suggestion about storing this type of data in Vertica?
(I use Vertica Analytic Database v9.2.0-0)

Best Answer

  • Options
    Answer ✓

    @noap @emoreno @Vertica_Curtis
    Recently I found some optimizations, solve my problem:

    • When we have multiple join in a query and joined tables have many related columns (foreign key and repeated keys), we should use all of these columns in join condition
    • Using optimized projection: use correct encodings such as block_dict, rle, ..
    • Using optimized projection: use correct segmentation (I segment with root id, it means root of xml file that all of tables have this root)
    • Using correct partition: I use function DATE for my timestamp column
    • Use conditions in WHERE that limit the results, such as limitation in time
    • Not use LONG VARCHAR for columns, I find my longest value and after that update the length of my VARCHAR column


  • Options

    I'm confused here. Vertica isn't storing XML, and last I checked we don't even have a native parser for XML. That's not to say you can't load that stuff, you just have to parse it out somehow. I've done that myself before, and I assume you're doing something similar.

    Regardless of the source of your data, normalized tables aren't going to be very good for querying due to all the joins. If you want to scale that up, you'll want to consider some sort of flattened table structures. Flattened tables will likely help a lot here. You may want to stage them all with an ELT type process and then work to transform them into a more star schema state for reporting. Make sure you don't neglect good projection design as well.

  • Options
    edited August 2019

    I don't store pure XML in Vertica. You are right. I parse my IDMEF XML with PHP and after that I insert data of each tag in it's table.
    For example for below XML we have 2 tables (category and data)

    table "category " has 1 column: type
    table "data " has 3 columns: value, address and category_id (relation with category table)

  • Options
    emorenoemoreno Employee

    If you already parse the data and transform it in a start schema, you can improve the projections in favor of the Joins. You could use DBD.
    Here you can find some shorts videos of Vertica some of vertica features such as working with projections and DBD.


  • Options
    noapnoap Vertica Customer

    can you explain more regarding loading xml to vertica table?
    whuch data type do you use for this table and who the copy is performed?

  • Options

    Hi Noap. Nice thread necromancy. :)

    Last time I needed to load XML, I found a utility called "xmlstarlet" that was a command line tool for parsing out individual fields. This was the command I used to load the data I needed.
    xmlstarlet sel -t -m "//VDD_SOC" -v "ID" -o "," -v "Value" -o "," -n $xml 2>&1 |grep -v ",," | vsql -Xa $us $pw -c "COPY $schema.vdd_soc (id, value) from stdin delimiter ',';"

    Of course, everything after the | is Vertica-specific stuff. It's been a while since I've used this, but the xmlstarlet command parses out individual columns based on delimiters - "," in this case. and then passes it to the Vertica COPY command via stdin.

  • Options
    ManikandanSManikandanS Vertica Customer

    Actually trying to integrate xml value from SQL server to Vertica database. Is "XML" datatype available in Vertica database? In Vertica side "Longvarchar" have been used for xml datatype. Is "longvarchar" correct one for replacement of "XML"? or any other options are available in Vertica?

  • Options

    Vertica doesn't have an XML data type. I recommend you parse it all out and load it as structured data. It will be much easier to work with that way.
    We've also recently added support for array types, in case that's needed.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file