How to save space in Vertica Tables?
slaborda
Vertica Customer
Hi all,
I am new in Vertica DB and i haven't all the concepts in mind.
We have some tables with important fields and non-important field, the importants fields must be conserved 20 years the non-important fields must be conserved only 5 years.
We don't want modify our table structure and we want to stock all the data in the same tables.
My question is, if we make and UPDATE of the historic (+5 years old data) non-important fields to VOID, NULL, '' or O we could save some space in disk. In theory to have few volatility in the possible value of the fields allow a better compression.
This is right?
In ours tables the 40% of the fiels are important and the 60% non important.
Thanks in advance,
Sergi
Tagged:
0
Answers
For best performance use Hierarchical Partitioning
Hierarchical Partitioning makes data lifecycle management easier and improves query performance.
Vertica provides capabilities to:
You can move partitions from one table to another with the Vertica function MOVE_PARTITIONS_TO_TABLE.
This function is useful for archiving old partitions, like in the following procedure:
Identify the partitions to archive, and move them to a temporary staging table with MOVE_PARTITIONS_TO_TABLE.
Backup the staging table and drop irrelevant data, or move old partitions to an archived table.
See: https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AdministratorsGuide/Partitions/HierarchicalPartitioning.htm
For best compression on disk try to define the best projection encoding.
If you are not 100% sure what's the best encoding or you're short in time: use AUTO (let Vertica decide for you)
Or better use Vertica Database Designer Programmatic Interface:
For a few (important) tables the better thing to do is to ..test & practice using subset of the data.
Here is an example how to see the best encoding for an existing projection with deploy=false, ignore existing encodings=true:
vsql -c "select designer_design_projection_encodings('schema.projection_name', '', false,true);"
See: https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/DatabaseDesigner/DESIGNER_DESIGN_PROJECTION_ENCODINGS.htm
And with mutable SET USING columns (supported as of 9.3) by expanding the scope of flattened table options one can nullify a column.
See: https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AnalyzingData/FlattenedTables/FlattenedTables.htm
.
If your non-important columns are truly irrelevant after a while, dropping them will save you the most space.
An alternative to still be able to access those column values and keep your database size small is to move them to (pure) flex tables. After that you can drop the original columns. Flex tables are heavily discounted (they count as 1/10th the size of regular tables towards your license capacity). The trade-off is performance. They add performance overhead to queries as flex table storage is not as optimized as the columnar format.
https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AdministratorsGuide/Licensing/MonitoringDatabaseSizeForLicenseCompliance.htm
zstd_fast is the best/most efficient compression algorithm. Database Designer will do a good job (over the default option), but if you really want to squeeze out performance and compression, go for zstd_fast. It's going to be around 2x better than what DBD will recommend.