Total Table Row Count Report by Schema
I had a colleague ask today how he could generate a report showing the total table row counts by schema. This should be easy, right? After all the PROJECTION_STORAGE system table has a ROW_COUNT column. But it turns out it’s a little complex considering there are segmented and un-segmented projections in Vertica.
We’ll need to join the TABLES, PROJECTIONS and PROJECTION_STORAGE system tables together, computing the row counts a little differently based on the projection type.
Example:
dbadmin=> SELECT table_schema, dbadmin-> SUM(row_count) AS row_count dbadmin-> FROM (SELECT table_schema, dbadmin(> NVL(CASE dbadmin(> WHEN NOT is_segmented THEN (row_count / node_cnt)::INT dbadmin(> ELSE row_count dbadmin(> END, 0) row_count dbadmin(> FROM (SELECT t.table_schema, dbadmin(> t.table_name, dbadmin(> ps.projection_name, dbadmin(> (SELECT MIN(is_segmented) dbadmin(> FROM projections p dbadmin(> WHERE p.projection_name = ps.projection_name dbadmin(> AND p.projection_schema = ps.anchor_table_schema) is_segmented, dbadmin(> node_cnt, dbadmin(> SUM(ps.row_count - NVL(dv.deleted_row_count, 0)) row_count dbadmin(> FROM tables t dbadmin(> LEFT JOIN projection_storage ps dbadmin(> ON t.table_id = ps.anchor_table_id dbadmin(> LEFT JOIN delete_vectors dv dbadmin(> ON dv.node_name = ps.node_name dbadmin(> AND dv.schema_name = ps.anchor_table_schema dbadmin(> AND dv.projection_name = ps.projection_name dbadmin(> CROSS JOIN (SELECT COUNT(1) node_cnt FROM nodes WHERE node_type = 'PERMANENT') n dbadmin(> GROUP BY 1, 2, 3, 4, 5 dbadmin(> LIMIT 1 OVER(PARTITION BY t.table_schema, t.table_name ORDER BY 1)) foo) foo2 dbadmin-> GROUP BY table_schema dbadmin-> ORDER BY table_schema; table_schema | row_count ---------------+----------- ctg_analytics | 0 dds | 39759 idea_dwh | 12288 jim1 | 16 public | 71007066 stlouis | 77396606 test | 8 (7 rows)
But I’m not comfortable with this solution. What happens when Vertica inevitably introduces new/improved projection types? We may have to modify the query to handle those.
Instead of calculating a table's row counts from its projetions, I prefer to get actual row counts from the tables. To do that, we can generate a SQL statement that literally runs a SELECT COUNT(*) from each table in the database, summing the counts by schema:
dbadmin=> \! vsql -Atc "SELECT 'SELECT table_schema, SUM(table_row_count) total_row_count FROM ('UNION ALL SELECT 'SELECT ''' || table_schema || ''' AS table_schema, (SELECT COUNT(1) FROM ' || table_schema || '.' || table_name || ') AS table_row_count' || DECODE(LEAD(table_name) OVER (ORDER BY 1), NULL, ') foo GROUP BY table_schema ORDER BY table_schema;', ' UNION ALL') FROM tables WHERE table_definition = '';" | vsql table_schema | total_row_count ---------------+----------------- ctg_analytics | 0 dds | 39759 idea_dwh | 12288 jim1 | 16 public | 71007066 stlouis | 77396606 test | 8 (7 rows)
This works as Vertica is pretty darn quick at performing table row counts!
Helpful Links:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/SystemTables/CATALOG/TABLES.htm
Have fun!