Total Table Row Count Report by Schema

Jim_KnicelyJim_Knicely - Select Field - Administrator
edited June 2019 in Tips from the Team

I had a colleague ask today how he could generate a report showing the total table row counts by schema. This should be easy, right? After all the PROJECTION_STORAGE system table has a ROW_COUNT column. But it turns out it’s a little complex considering there are segmented and un-segmented projections in Vertica.

We’ll need to join the TABLES, PROJECTIONS and PROJECTION_STORAGE system tables together, computing the row counts a little differently based on the projection type.


dbadmin=> SELECT table_schema,
dbadmin->        SUM(row_count) AS row_count
dbadmin->   FROM (SELECT table_schema,
dbadmin(>                NVL(CASE
dbadmin(>                      WHEN NOT is_segmented THEN (row_count / node_cnt)::INT
dbadmin(>                      ELSE row_count
dbadmin(>                    END, 0) row_count
dbadmin(>           FROM (SELECT t.table_schema,
dbadmin(>                        t.table_name,
dbadmin(>                        ps.projection_name,
dbadmin(>                        (SELECT MIN(is_segmented)
dbadmin(>                           FROM projections p
dbadmin(>                          WHERE p.projection_name = ps.projection_name
dbadmin(>                            AND p.projection_schema = ps.anchor_table_schema) is_segmented,
dbadmin(>                        node_cnt,
dbadmin(>                        SUM(ps.row_count - NVL(dv.deleted_row_count, 0)) row_count
dbadmin(>                   FROM tables t
dbadmin(>                   LEFT JOIN projection_storage ps
dbadmin(>                     ON t.table_id = ps.anchor_table_id
dbadmin(>                   LEFT JOIN delete_vectors dv
dbadmin(>                     ON dv.node_name = ps.node_name
dbadmin(>                    AND dv.schema_name = ps.anchor_table_schema
dbadmin(>                    AND dv.projection_name = ps.projection_name
dbadmin(>                  CROSS JOIN (SELECT COUNT(1) node_cnt FROM nodes WHERE node_type = 'PERMANENT') n
dbadmin(>                  GROUP BY 1, 2, 3, 4, 5
dbadmin(>                  LIMIT 1 OVER(PARTITION BY t.table_schema, t.table_name ORDER BY 1)) foo) foo2
dbadmin->  GROUP BY table_schema
dbadmin->  ORDER BY table_schema;
table_schema  | row_count
ctg_analytics |         0
dds           |     39759
idea_dwh      |     12288
jim1          |        16
public        |  71007066
stlouis       |  77396606
test          |         8
(7 rows)

But I’m not comfortable with this solution. What happens when Vertica inevitably introduces new/improved projection types? We may have to modify the query to handle those.

Instead of calculating a table's row counts from its projetions, I prefer to get actual row counts from the tables. To do that, we can generate a SQL statement that literally runs a SELECT COUNT(*) from each table in the database, summing the counts by schema:

dbadmin=> \! vsql -Atc "SELECT 'SELECT table_schema, SUM(table_row_count) total_row_count FROM ('UNION ALL SELECT 'SELECT ''' || table_schema || ''' AS table_schema, (SELECT COUNT(1) FROM ' || table_schema || '.' || table_name || ') AS table_row_count' || DECODE(LEAD(table_name) OVER (ORDER BY 1), NULL, ') foo GROUP BY table_schema ORDER BY table_schema;', ' UNION ALL') FROM tables WHERE table_definition = '';" | vsql
table_schema  | total_row_count
ctg_analytics |               0
dds           |           39759
idea_dwh      |           12288
jim1          |              16
public        |        71007066
stlouis       |        77396606
test          |               8
(7 rows)

This works as Vertica is pretty darn quick at performing table row counts!

Helpful Links:




Have fun!

Sign In or Register to comment.