Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Collect stats on Internal/System tables in Vertica

Hello,

 

When i try to query 'query_requests' table, it is going pretty slow.

When i ran an EXPLAIN on the query, i found that Stats are not refreshed on the system projections.

 

When i try to run analyze_statistics , i get this - "analyze_statistics: Can not analyze statistics of a virtual table/projection dc_requests_issued"

 

Can somebody let me know how to collect stats on System/Internal tables?

 

Below is the explain plan of that query:

 

explain select * from query_requests where user_name='ETLADMIN' and request ilike '%copy dimension.dim_customer%'

Access Path:
+-SORT [Cost: 185K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| Order: (rc."time" IS NULL) DESC, rc."time" DESC
| Execute on: All Nodes
| +---> JOIN HASH [LeftOuter] [Cost: 25K, Rows: 10K (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (ri.node_name = rc.node_name) AND (ri.session_id = rc.session_id) AND (ri.request_id = rc.request_id)
| | Materialize at Input: ri.node_name, ri.session_id, ri.request_id
| | Materialize at Output: ri."time", ri.user_name, ri.transaction_id, ri.statement_id, ri.request_type, ri.label, ri.search_path, ri.request
| | Execute on: All Nodes
| | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 19K, Rows: 10K (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
| | | Join Cond: (ri.node_name = ra.node_name) AND (ri.transaction_id = ra.transaction_id) AND (ri.statement_id = ra.statement_id)
| | | Materialize at Input: ri.node_name, ri.transaction_id, ri.statement_id
| | | Execute on: All Nodes
| | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 16K, Rows: 10K (NO STATISTICS)] (PATH ID: 5) Inner (BROADCAST)
| | | | Join Cond: (ri.node_name = de.node_name) AND (ri.session_id = de.session_id) AND (ri.request_id = de.request_id)
| | | | Execute on: All Nodes
| | | | +-- Outer -> STORAGE ACCESS for ri [Cost: 12K, Rows: 10K (NO STATISTICS)] (PATH ID: 6)
| | | | | Projection: v_internal.dc_requests_issued_p
| | | | | Materialize: ri.node_name, ri.session_id, ri.request_id
| | | | | Filter: (ri.user_name = 'ETLADMIN')
| | | | | Filter: (replace(replace(ri.request, '
', ' '), ' ', ' ') ~~* '%copy dimension.dim_customer%')
| | | | | Execute on: All Nodes
| | | | +-- Inner -> SELECT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 7)
| | | | | Execute on: All Nodes
| | | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 8)
| | | | | | Aggregates: count(*)
| | | | | | Group By: dc_errors.node_name, dc_errors.session_id, dc_errors.request_id
| | | | | | Execute on: All Nodes
| | | | | | +---> STORAGE ACCESS for dc_errors [Cost: 865, Rows: 10K (NO STATISTICS)] (PATH ID: 9)
| | | | | | | Projection: v_internal.dc_errors_p
| | | | | | | Materialize: dc_errors.node_name, dc_errors.session_id, dc_errors.request_id
| | | | | | | Filter: (dc_errors.error_level >= 20)
| | | | | | | Execute on: All Nodes
| | | +-- Inner -> SELECT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 10)
| | | | Execute on: All Nodes
| | | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 11)
| | | | | Aggregates: max(dc_resource_acquisitions.memory_kb)
| | | | | Group By: dc_resource_acquisitions.node_name, dc_resource_acquisitions.transaction_id, dc_resource_acquisitions.statement_id
| | | | | Execute on: All Nodes
| | | | | +---> STORAGE ACCESS for dc_resource_acquisitions [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 12)
| | | | | | Projection: v_internal.dc_resource_acquisitions_p
| | | | | | Materialize: dc_resource_acquisitions.node_name, dc_resource_acquisitions.transaction_id, dc_resource_acquisitions.statement_id, dc_resource_acquisitions.memory_kb
| | | | | | Filter: (dc_resource_acquisitions.result = 'Granted')
| | | | | | Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for rc [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 13)
| | | Projection: v_internal.dc_requests_completed_p
| | | Materialize: rc."time", rc.node_name, rc.session_id, rc.request_id, rc.success
| | | Execute on: All Nodes

 

 

Thanks a lot!

 

Comments

  •  Hi,

     

    You cannot actually run statistics on this data becouse their are external tables. 

     

    You can find their content  in your DataCollector directory under your catalog path.

     

    If you want them to be faster just flush them into a table and then you can run stats on them.

     

     

  • Hi,

     

    Thanks for the response.

    As an alternative i was doing that.

    But the problem happens wiht Management Console also which takes way too much to get refreshed .

     

    Anything that i can do to improve that?

     

    Thanks!

  •  

    I don't relay alot on the historical capabilities of VMC becouse of this mainly, so i keep my DC policies quite low in terms of history and push aggregates historical into regualar tables.

     

    hope this helps 

  • I do the same as Adrian - storing DC table data in real tables for faster access and longer retention periods without affecting performance.

     

    Starting in 8.0 there's a mechanism for storing monitoring data in a separate database via kafka so that MC runs faster - Extended Monitoring.  Those dc tables could also be queried directly.

      https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/ManagementConsole/ExtendedMonitoring/ExtendedMonitoringConceptual.htm

     

       --Sharon

     

  • Thanks Adrian and Sharon for your response!

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.