Default Storage Layout: When does Vertica create two files per column?
We are using Vertica Analytic Database v5.1 community edition. Per Vertica documentation, [Ref# 1], we expect Vertica to be creating two files per column for a given Projection/ROS container – one file containing actual data (.fdb file) and another file containing a position index (.pidx file). Additionally Vertica supports Grouping of columns that result in the Grouped columns being referenced into a single disk file.
Documentation on this topic in Vertica says, that –
HP Vertica performs dynamic column grouping. For example, to provide better read and write efficiency for small loads, HP Vertica ignores any projection-defined column grouping (or lack thereof) and groups all columns together by default.
While verifying this concept, we checked the files for tables with small to huge data. In each case, it was found to create two files per projection and/or ROS container irrespective of the number of columns in the projection. The biggest table that is tested contains 13 columns, 21485672 rows and approximately 24 GB of raw data. Below are the steps followed to identify the files -
1.Found the path where the files are placed using below SQL-
select storage_path from v_monitor.disk_storage where storage_usage like 'DATA%';
2.ROS container unique id for projection is found using SQL -
select storage_type, storage_oid from v_monitor.storage_containers where projection_name = 'Test_Files_super';
3.Using the ROS container unique id we were able to locate the path for the mentioned ROS container.
The above shows two files created per ROS container while we were expecting 2 files per column within ROS container.
Given that this default behaviour conflicts with a columnar storage layout, can someone throw light on -
1.What is the ideal scenario for creation of two files per column? (As per document columns are grouped for small data loads and not for bulk data loads)
2.Given that storage defaults to a single file per ROS container, can Vertica optimize and change the storage later based on the queries fired?
3.Is there a way to force Vertica to create files per column?
1.The Vertica Analytic Database: CStore-7 Years Later (document)