Default Storage Layout: When does Vertica create two files per column?

We are using Vertica Analytic Database v5.1 community edition. Per Vertica documentation, [Ref# 1], we expect Vertica to be creating two files per column for a given Projection/ROS container – one file containing actual data (.fdb file) and another file containing a position index (.pidx file). Additionally Vertica supports Grouping of columns that result in the Grouped columns being referenced into a single disk file.

 Documentation on this topic in Vertica says, that –

HP Vertica performs dynamic column grouping. For example, to provide better read and write efficiency for small loads, HP Vertica ignores any projection-defined column grouping (or lack thereof) and groups all columns together by default.

While verifying this concept, we checked the files for tables with small to huge data. In each case, it was found to create two files per projection and/or ROS container irrespective of the number of columns in the projection. The biggest table that is tested contains 13 columns, 21485672 rows and approximately 24 GB of raw data. Below are the steps followed to identify the files -

1.Found the path where the files are placed using below SQL-

 select storage_path from v_monitor.disk_storage where storage_usage like 'DATA%';

 image

 2.ROS container unique id for projection is found using SQL -

select storage_type, storage_oid from v_monitor.storage_containers where projection_name = 'Test_Files_super';

image

 3.Using the ROS container unique id we were able to locate the path for the mentioned ROS container.

image

 The above shows two files created per ROS container while we were expecting 2 files per column within ROS container.

Given that this default behaviour conflicts with a columnar storage layout, can someone throw light on -

1.What is the ideal scenario for creation of two files per column? (As per document columns are grouped for small data loads and not for bulk data loads)

2.Given that storage defaults to a single file per ROS container, can Vertica optimize and change the storage later based on the queries fired?

3.Is there a way to force Vertica to create files per column?

References:

1.The Vertica Analytic Database: CStore-7 Years Later (document)

2.http://vertica-forums.com/viewtopic.php?t=106

   

Comments

  • Hi Richa,

    So, first, you should upgrade!  The 5.1 CE is very old (it was the very first CE years ago); it has significant limitations and substantially different behavior at this point as compared to current Vertica releases.  You can download the latest CE from http://my.vertica.com/ .  Our current CE license is, I believe, significantly more lenient than the one that shipped with the initial 5.1 CE -- larger disk quota, more access to our ecosystem of tools, etc.

    To answer your questions, though:  Column stores are great; but simple disk-backed column stores have a significant fundamental limitation when used with small data sets, which is that they tend to produce zillions of tiny files.  Tiny files have very poor I/O performance on conventional hard drives.  So we collocate those files on disk when your data size is small.  Among other things, this increases the odds that your disk's readahead buffer will be useful.

    The preferred solution is to not create ROS containers at all in this situation.  I assume you are loading data with the DIRECT option to force creation of ROS containers?  Instead, try loading into WOS and letting your loads be batched and written out as larger containers.  (That's what the WOS is for, after all.)  Note that the WOS implementation in Vertica 5.1 differs from that in more-recent Vertica versions; newer versions should, among other things, have substantially better query performance.

    As your database grows, Vertica's Tuple Mover can and will combine these small grouped containers into individual large containers that have a separate file per column.

    Also, just a heads-up:  You refer to the C-Store 7 Years Later paper.  That paper was written at the time of Vertica 6.x; some of its functionality was not yet available in the version that you are testing against.  Also, that paper isn't really documentation; it's a conference paper :-)  You can refer to our documentation at http://vertica.com/documentation/ .  That link refers to our latest documentation by default; older versions can be reached via the navigation bar on the left of the page.

    Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file