Question of how a size based license is calculated
I just have a query around Vertica licencing that has been answered before else ware, but the problem I have is that the accepted answer does not seem to correspond with what we are seeing here on our Vertica 8.11 cluster. We have a 4TB data licence, currently running at around 80% to 90% usage. When I run the attached code (credit to the original creator), and sum up the raw_estimate_gb column, it gives us a number that is pretty close to the actual licence usage.
Yet, when we look at what we are summing from projection_storage (pj.used_compressed_gb * la.ratio AS raw_estimate_gb ) we are using both the _b0 and _b1 projections. I thought from how the raw data licence was calculated, it was based on how the data size looked if it was exported as a text file, so it should only be using one projection per table in the calculation?
Here is an example of one table (our biggest)
Table_One Row_Count = 9,573,636,814 rows
If I query projection_storage and projections and filter on where super_projection = 'true' then I get the following
node_name projection_name row_count used_bytes
prd_node0001 Table_One_b1 1595509531 17340416962
prd_node0001 Table_One_b0 1595587369 17340419305
prd_node0002 Table_One_b1 1595691854 17496755059
prd_node0002 Table_One_b0 1595632895 17498868949
prd_node0003 Table_One_b0 1595600590 17479176184
prd_node0003 Table_One_b1 1595614575 17446102162
prd_node0004 Table_One_b1 1595587369 17909834555
prd_node0004 Table_One_b0 1595691854 18368868632
prd_node0005 Table_One_b0 1595614575 17697818513
prd_node0005 Table_One_b1 1595632895 18006727025
prd_node0006 Table_One_b1 1595600590 18052218242
prd_node0006 Table_One_b0 1595509531 17714081217
If I add up the row_count for all nodes and projections I get double the actual count, the same for used_bytes. If I only add up one projection (_b0) then I get a row_count and a used_bytes around what is expected.
So if I use the raw_estimate_gb from the first query, the only way I can get a figure close to what the licence usage is reporting is to add up all the super projections (both _b0 and _b1), which seems to be double the amount of actual raw data usage?
Hope this makes sense?