The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.

Work with XML data structure in Vertica

I create several tables (about 60 tables) to store my XML data structure (exactly IDMEF rfc4765), after inserting about 1 million data in tables (the tables are related to each other), executing queries become very slow.
For example a query with 6 joins (because of XML structure and related tables) executed in 13 seconds. (I have some queries with more than 6 joins now)
Would you have a suggestion about storing this type of data in Vertica?
(I use Vertica Analytic Database v9.2.0-0)

Best Answer

  • Accepted Answer

    @noap @emoreno @Vertica_Curtis
    Recently I found some optimizations, solve my problem:

    • When we have multiple join in a query and joined tables have many related columns (foreign key and repeated keys), we should use all of these columns in join condition
    • Using optimized projection: use correct encodings such as block_dict, rle, ..
    • Using optimized projection: use correct segmentation (I segment with root id, it means root of xml file that all of tables have this root)
    • Using correct partition: I use function DATE for my timestamp column
    • Use conditions in WHERE that limit the results, such as limitation in time
    • Not use LONG VARCHAR for columns, I find my longest value and after that update the length of my VARCHAR column

Answers

  • I'm confused here. Vertica isn't storing XML, and last I checked we don't even have a native parser for XML. That's not to say you can't load that stuff, you just have to parse it out somehow. I've done that myself before, and I assume you're doing something similar.

    Regardless of the source of your data, normalized tables aren't going to be very good for querying due to all the joins. If you want to scale that up, you'll want to consider some sort of flattened table structures. Flattened tables will likely help a lot here. You may want to stage them all with an ELT type process and then work to transform them into a more star schema state for reporting. Make sure you don't neglect good projection design as well.

  • edited August 2019

    I don't store pure XML in Vertica. You are right. I parse my IDMEF XML with PHP and after that I insert data of each tag in it's table.
    For example for below XML we have 2 tables (category and data)

    table "category " has 1 column: type
    table "data " has 3 columns: value, address and category_id (relation with category table)

  • emorenoemoreno Employee

    Hi,
    If you already parse the data and transform it in a start schema, you can improve the projections in favor of the Joins. You could use DBD.
    Here you can find some shorts videos of Vertica some of vertica features such as working with projections and DBD.
    https://www.vertica.com/web-based-training/

    Eugenia

  • hi,
    can you explain more regarding loading xml to vertica table?
    whuch data type do you use for this table and who the copy is performed?
    thanks

  • Hi Noap. Nice thread necromancy. :)

    Last time I needed to load XML, I found a utility called "xmlstarlet" that was a command line tool for parsing out individual fields. This was the command I used to load the data I needed.
    xmlstarlet sel -t -m "//VDD_SOC" -v "ID" -o "," -v "Value" -o "," -n $xml 2>&1 |grep -v ",," | vsql -Xa $us $pw -c "COPY $schema.vdd_soc (id, value) from stdin delimiter ',';"

    Of course, everything after the | is Vertica-specific stuff. It's been a while since I've used this, but the xmlstarlet command parses out individual columns based on delimiters - "," in this case. and then passes it to the Vertica COPY command via stdin.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.