Collect stats on Internal/System tables in Vertica

abhi112 · January 2017

Hello,

When i try to query 'query_requests' table, it is going pretty slow.

When i ran an EXPLAIN on the query, i found that Stats are not refreshed on the system projections.

When i try to run analyze_statistics , i get this - "analyze_statistics: Can not analyze statistics of a virtual table/projection dc_requests_issued"

Can somebody let me know how to collect stats on System/Internal tables?

Below is the explain plan of that query:

explain select * from query_requests where user_name='ETLADMIN' and request ilike '%copy dimension.dim_customer%'

Access Path:
+-SORT [Cost: 185K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| Order: (rc."time" IS NULL) DESC, rc."time" DESC
| Execute on: All Nodes
| +---> JOIN HASH [LeftOuter] [Cost: 25K, Rows: 10K (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (ri.node_name = rc.node_name) AND (ri.session_id = rc.session_id) AND (ri.request_id = rc.request_id)
| | Materialize at Input: ri.node_name, ri.session_id, ri.request_id
| | Materialize at Output: ri."time", ri.user_name, ri.transaction_id, ri.statement_id, ri.request_type, ri.label, ri.search_path, ri.request
| | Execute on: All Nodes
| | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 19K, Rows: 10K (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
| | | Join Cond: (ri.node_name = ra.node_name) AND (ri.transaction_id = ra.transaction_id) AND (ri.statement_id = ra.statement_id)
| | | Materialize at Input: ri.node_name, ri.transaction_id, ri.statement_id
| | | Execute on: All Nodes
| | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 16K, Rows: 10K (NO STATISTICS)] (PATH ID: 5) Inner (BROADCAST)
| | | | Join Cond: (ri.node_name = de.node_name) AND (ri.session_id = de.session_id) AND (ri.request_id = de.request_id)
| | | | Execute on: All Nodes
| | | | +-- Outer -> STORAGE ACCESS for ri [Cost: 12K, Rows: 10K (NO STATISTICS)] (PATH ID: 6)
| | | | | Projection: v_internal.dc_requests_issued_p
| | | | | Materialize: ri.node_name, ri.session_id, ri.request_id
| | | | | Filter: (ri.user_name = 'ETLADMIN')
| | | | | Filter: (replace(replace(ri.request, '
', ' '), ' ', ' ') ~~* '%copy dimension.dim_customer%')
| | | | | Execute on: All Nodes
| | | | +-- Inner -> SELECT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 7)
| | | | | Execute on: All Nodes
| | | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 8)
| | | | | | Aggregates: count(*)
| | | | | | Group By: dc_errors.node_name, dc_errors.session_id, dc_errors.request_id
| | | | | | Execute on: All Nodes
| | | | | | +---> STORAGE ACCESS for dc_errors [Cost: 865, Rows: 10K (NO STATISTICS)] (PATH ID: 9)
| | | | | | | Projection: v_internal.dc_errors_p
| | | | | | | Materialize: dc_errors.node_name, dc_errors.session_id, dc_errors.request_id
| | | | | | | Filter: (dc_errors.error_level >= 20)
| | | | | | | Execute on: All Nodes
| | | +-- Inner -> SELECT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 10)
| | | | Execute on: All Nodes
| | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 11)
| | | | | Aggregates: max(dc_resource_acquisitions.memory_kb)
| | | | | Group By: dc_resource_acquisitions.node_name, dc_resource_acquisitions.transaction_id, dc_resource_acquisitions.statement_id
| | | | | Execute on: All Nodes
| | | | | +---> STORAGE ACCESS for dc_resource_acquisitions [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 12)
| | | | | | Projection: v_internal.dc_resource_acquisitions_p
| | | | | | Materialize: dc_resource_acquisitions.node_name, dc_resource_acquisitions.transaction_id, dc_resource_acquisitions.statement_id, dc_resource_acquisitions.memory_kb
| | | | | | Filter: (dc_resource_acquisitions.result = 'Granted')
| | | | | | Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for rc [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 13)
| | | Projection: v_internal.dc_requests_completed_p
| | | Materialize: rc."time", rc.node_name, rc.session_id, rc.request_id, rc.success
| | | Execute on: All Nodes

Thanks a lot!

Adrian_Oprea_1 · January 2017

Hi,

You cannot actually run statistics on this data becouse their are external tables.

You can find their content in your DataCollector directory under your catalog path.

If you want them to be faster just flush them into a table and then you can run stats on them.

abhi112 · January 2017

Hi,

Thanks for the response.

As an alternative i was doing that.

But the problem happens wiht Management Console also which takes way too much to get refreshed .

Anything that i can do to improve that?

Thanks!

Adrian_Oprea_1 · January 2017

I don't relay alot on the historical capabilities of VMC becouse of this mainly, so i keep my DC policies quite low in terms of history and push aggregates historical into regualar tables.

hope this helps

Sharon_Cutter · January 2017

I do the same as Adrian - storing DC table data in real tables for faster access and longer retention periods without affecting performance.

Starting in 8.0 there's a mechanism for storing monitoring data in a separate database via kafka so that MC runs faster - Extended Monitoring. Those dc tables could also be queried directly.

https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/ManagementConsole/ExtendedMonitoring/ExtendedMonitoringConceptual.htm

--Sharon

abhi112 · January 2017

Thanks Adrian and Sharon for your response!

We're Moving!

Create My New Community Account Now

Collect stats on Internal/System tables in Vertica

Comments

Leave a Comment