Collect stats on Internal/System tables in Vertica
Hello,
When i try to query 'query_requests' table, it is going pretty slow.
When i ran an EXPLAIN on the query, i found that Stats are not refreshed on the system projections.
When i try to run analyze_statistics , i get this - "analyze_statistics: Can not analyze statistics of a virtual table/projection dc_requests_issued"
Can somebody let me know how to collect stats on System/Internal tables?
Below is the explain plan of that query:
explain select * from query_requests where user_name='ETLADMIN' and request ilike '%copy dimension.dim_customer%'
Access Path:
+-SORT [Cost: 185K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| Order: (rc."time" IS NULL) DESC, rc."time" DESC
| Execute on: All Nodes
| +---> JOIN HASH [LeftOuter] [Cost: 25K, Rows: 10K (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (ri.node_name = rc.node_name) AND (ri.session_id = rc.session_id) AND (ri.request_id = rc.request_id)
| | Materialize at Input: ri.node_name, ri.session_id, ri.request_id
| | Materialize at Output: ri."time", ri.user_name, ri.transaction_id, ri.statement_id, ri.request_type, ri.label, ri.search_path, ri.request
| | Execute on: All Nodes
| | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 19K, Rows: 10K (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
| | | Join Cond: (ri.node_name = ra.node_name) AND (ri.transaction_id = ra.transaction_id) AND (ri.statement_id = ra.statement_id)
| | | Materialize at Input: ri.node_name, ri.transaction_id, ri.statement_id
| | | Execute on: All Nodes
| | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 16K, Rows: 10K (NO STATISTICS)] (PATH ID: 5) Inner (BROADCAST)
| | | | Join Cond: (ri.node_name = de.node_name) AND (ri.session_id = de.session_id) AND (ri.request_id = de.request_id)
| | | | Execute on: All Nodes
| | | | +-- Outer -> STORAGE ACCESS for ri [Cost: 12K, Rows: 10K (NO STATISTICS)] (PATH ID: 6)
| | | | | Projection: v_internal.dc_requests_issued_p
| | | | | Materialize: ri.node_name, ri.session_id, ri.request_id
| | | | | Filter: (ri.user_name = 'ETLADMIN')
| | | | | Filter: (replace(replace(ri.request, '
', ' '), ' ', ' ') ~~* '%copy dimension.dim_customer%')
| | | | | Execute on: All Nodes
| | | | +-- Inner -> SELECT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 7)
| | | | | Execute on: All Nodes
| | | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 8)
| | | | | | Aggregates: count(*)
| | | | | | Group By: dc_errors.node_name, dc_errors.session_id, dc_errors.request_id
| | | | | | Execute on: All Nodes
| | | | | | +---> STORAGE ACCESS for dc_errors [Cost: 865, Rows: 10K (NO STATISTICS)] (PATH ID: 9)
| | | | | | | Projection: v_internal.dc_errors_p
| | | | | | | Materialize: dc_errors.node_name, dc_errors.session_id, dc_errors.request_id
| | | | | | | Filter: (dc_errors.error_level >= 20)
| | | | | | | Execute on: All Nodes
| | | +-- Inner -> SELECT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 10)
| | | | Execute on: All Nodes
| | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 11)
| | | | | Aggregates: max(dc_resource_acquisitions.memory_kb)
| | | | | Group By: dc_resource_acquisitions.node_name, dc_resource_acquisitions.transaction_id, dc_resource_acquisitions.statement_id
| | | | | Execute on: All Nodes
| | | | | +---> STORAGE ACCESS for dc_resource_acquisitions [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 12)
| | | | | | Projection: v_internal.dc_resource_acquisitions_p
| | | | | | Materialize: dc_resource_acquisitions.node_name, dc_resource_acquisitions.transaction_id, dc_resource_acquisitions.statement_id, dc_resource_acquisitions.memory_kb
| | | | | | Filter: (dc_resource_acquisitions.result = 'Granted')
| | | | | | Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for rc [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 13)
| | | Projection: v_internal.dc_requests_completed_p
| | | Materialize: rc."time", rc.node_name, rc.session_id, rc.request_id, rc.success
| | | Execute on: All Nodes
Thanks a lot!
Comments
Hi,
You cannot actually run statistics on this data becouse their are external tables.
You can find their content in your DataCollector directory under your catalog path.
If you want them to be faster just flush them into a table and then you can run stats on them.
Hi,
Thanks for the response.
As an alternative i was doing that.
But the problem happens wiht Management Console also which takes way too much to get refreshed .
Anything that i can do to improve that?
Thanks!
I don't relay alot on the historical capabilities of VMC becouse of this mainly, so i keep my DC policies quite low in terms of history and push aggregates historical into regualar tables.
hope this helps
I do the same as Adrian - storing DC table data in real tables for faster access and longer retention periods without affecting performance.
Starting in 8.0 there's a mechanism for storing monitoring data in a separate database via kafka so that MC runs faster - Extended Monitoring. Those dc tables could also be queried directly.
https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/ManagementConsole/ExtendedMonitoring/ExtendedMonitoringConceptual.htm
--Sharon
Thanks Adrian and Sharon for your response!