Question of how a size based license is calculated
Hello,
I just have a query around Vertica licencing that has been answered before else ware, but the problem I have is that the accepted answer does not seem to correspond with what we are seeing here on our Vertica 8.11 cluster. We have a 4TB data licence, currently running at around 80% to 90% usage. When I run the attached code (credit to the original creator), and sum up the raw_estimate_gb column, it gives us a number that is pretty close to the actual licence usage.
Yet, when we look at what we are summing from projection_storage (pj.used_compressed_gb * la.ratio AS raw_estimate_gb ) we are using both the _b0 and _b1 projections. I thought from how the raw data licence was calculated, it was based on how the data size looked if it was exported as a text file, so it should only be using one projection per table in the calculation?
Here is an example of one table (our biggest)
Table_One Row_Count = 9,573,636,814 rows
If I query projection_storage and projections and filter on where super_projection = 'true' then I get the following
node_name projection_name row_count used_bytes
prd_node0001 Table_One_b1 1595509531 17340416962
prd_node0001 Table_One_b0 1595587369 17340419305
prd_node0002 Table_One_b1 1595691854 17496755059
prd_node0002 Table_One_b0 1595632895 17498868949
prd_node0003 Table_One_b0 1595600590 17479176184
prd_node0003 Table_One_b1 1595614575 17446102162
prd_node0004 Table_One_b1 1595587369 17909834555
prd_node0004 Table_One_b0 1595691854 18368868632
prd_node0005 Table_One_b0 1595614575 17697818513
prd_node0005 Table_One_b1 1595632895 18006727025
prd_node0006 Table_One_b1 1595600590 18052218242
prd_node0006 Table_One_b0 1595509531 17714081217
If I add up the row_count for all nodes and projections I get double the actual count, the same for used_bytes. If I only add up one projection (_b0) then I get a row_count and a used_bytes around what is expected.
So if I use the raw_estimate_gb from the first query, the only way I can get a figure close to what the licence usage is reporting is to add up all the super projections (both _b0 and _b1), which seems to be double the amount of actual raw data usage?
Hope this makes sense?
Comments
@darrenjf,
I can't see the "attached Code". Can you please share the code? Also I am interested in how is the la.ratio being computed in the query? Vertica actually uses data types of the columns, row counts, encoding, compression, distribution of the column values etc, to estimate the raw size of the data as described in the [admin guide]
Specifically the section Evaluating Data Type Footprint might be helpful.
Can you calculate a rough estimate of the table size using data types and row count? That should be close to audit() estimation on the table.
The footprint on disk has by no means a direct relation to the license size.
To get the license size of a table, you must imagine you exported the whole table to a CSV file and found out how many bytes / GB it took on disk. It's actually even a bit smaller than that, as Vertica does not count the column delimiter itself.
There is a function AUDIT() that you can call whenever you want and calculates that size for you.
Thanks for the responses, seems I was taking a bit of a simplistic view of how the licence was calculated. Running AUDIT() over the database brings back a result in line with the licence figure from the management console. Looks like we need a bigger licence :-) . Thanks again