-if yes use the admintools DBD with the incremental sql query feed.
Install Vertica Management Console and keep an eye on the Recomendations/Bottle necks.
Make sure your have statistics on all your column and they are not older then 7 days:
with a as ( SELECT table_schema, table_name, projection_column_name, encoding_type, statistics_type, statistics_updated_timestamp FROM projection_columns WHERE statistics_type = 'NONE' and statistics_updated_timestamp < getdate()-7 ORDER BY statistics_updated_timestamp DESC ) select distinct 'select analyze_statistics('''||table_schema||'.'||table_name||'.'||projection_column_name||''');' from a where table_name not in(select table_name from tables where is_temp_table ='true')
Check for encoding, unless you run DBD on the existing tables the base projections will be build with encoding auto.
SELECT DISTINCT anchor_table_schema, anchor_table_name, column_name, compressions, used_bytes, row_count FROM column_storage WHERE column_name IN ( SELECT projection_column_name FROM projection_columns WHERE encoding_type='AUTO') AND compressions='none' ORDER BY used_bytes DESC
- once you identify the tables with missing encoding run the
- carefull with this running it on big tables ! might take time.
Run the WLA.i run it based on the last 7 days
SELECT ANALYZE_WORKLOAD('', getdate()-7);
- see the output and work around it based on the tunning_cost HIGH to LOW
Make sure your Dimensions are dimensions and your Facts are facts ! many times i see dimensions with 200 million rows that are unsegemented.
Look for partitition oppotunities to make use of partition pruning and also for ease of loading/deleting/moving data from one table to another.
Load balance , don`t let all queryes hit only one node !
- vertica comes with native load balancer (Round Robin) and also you need to setup you clients(obdc/jdbc) for that
Label your queries / force your develpers to label them, this way you can track them and find patters on yor db usage and also find bad developers , that use * in all their execs .
Enable data collector and manage the data that comes with it, is of great help.
Unused projecitons - get rid of them !!! Will save you space / TM will work much less(lest projections to refresh/populate) / bkp space
use this query
SELECT p.table_schema, p.basename AS "Table Name", MAX(pu.last_used) AS last_used, extract(DAY from (getdate()-MAX(pu.last_used))) AS days_last_used, -- extract(DAY FROM GETDATE() - pu.last_used) AS "Days without Using", ROUND(SUM(sc.used_bytes/ (1024^2)),3.0) AS used_mb, ROUND(SUM(sc.used_bytes)/( 1024^3),3.0) AS used_gb FROM ( SELECT projection_oid, MAX(TIME) AS last_used FROM v_internal.dc_projections_used WHERE table_oid IN ( SELECT table_id FROM v_catalog.tables WHERE NOT is_system_table) GROUP BY projection_oid) pu JOIN ( SELECT oid, name, basename, "schemaname" AS table_schema FROM v_internal.vs_projections) p ON p.oid = pu.projection_oid JOIN ( SELECT projection_id, used_bytes FROM v_monitor.storage_containers) sc ON sc.projection_id = pu.projection_oid GROUP BY p.table_schema, p.basename ORDER BY last_used ASC ;
Note:
- there is a catck here, running analyze on the table even if it was never used the table will be marked as used , which invalidades the actuall look for unused projecitons .
- there are many more ideas of where to start ! but i guess tihs is the basic
The catch you mentioned: if the unused projections are identified prior analyzing statistics will help us to find the actual unused projections. What do u think >?
Below some highlights on how to find bottlenecks in querys :
The idea is to correlate the output you get in the explain plan with the output of query profiling .
As you can see in the attach explain plan output we have 4 different steps , each step do some execution.
Now in order to know the execution time of each step , you need to run profile to same query and correlate its output with the step numbers you have on the explain plan .
The data that was generated by the profile command include useful execution statistics , one of them is execution time ('clock time (us)) .
Using this approch you can know for sure which step in the plan take most of the time to execute .
See some simple example :
Eg: Explain plan
explain select * from tables;
Access Path:
+-STORAGE ACCESS for t [Cost: 18K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
HINT: Select * from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6;
NOTICE 3557: Initiator memory for query: [on pool sysquery: 5396 KB, minimum: 5396 KB]
NOTICE 5077: Total memory required by query: [5396 KB]
The column which you need to use for correlation is -> operator_id.
Select node_name,operator_name,operator_id ,counter_value from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6 and counter_name='clock time (us)';
Comments
-if yes use the admintools DBD with the incremental sql query feed.
- once you identify the tables with missing encoding run the
- carefull with this running it on big tables ! might take time.
- see the output and work around it based on the tunning_cost HIGH to LOW
- vertica comes with native load balancer (Round Robin) and also you need to setup you clients(obdc/jdbc) for that
use this query
Note:
- there is a catck here, running analyze on the table even if it was never used the table will be marked as used , which invalidades the actuall look for unused projecitons .
- there are many more ideas of where to start ! but i guess tihs is the basic
hope this was helpful !
EXCELLENT , Thanks for your reply :-) :-)
The catch you mentioned: if the unused projections are identified prior analyzing statistics will help us to find the actual unused projections. What do u think >?
Hi ,
Below some highlights on how to find bottlenecks in querys :
The idea is to correlate the output you get in the explain plan with the output of query profiling .
As you can see in the attach explain plan output we have 4 different steps , each step do some execution.
Now in order to know the execution time of each step , you need to run profile to same query and correlate its output with the step numbers you have on the explain plan .
The data that was generated by the profile command include useful execution statistics , one of them is execution time ('clock time (us)) .
Using this approch you can know for sure which step in the plan take most of the time to execute .
See some simple example :
Eg: Explain plan
explain select * from tables;
Access Path:
+-STORAGE ACCESS for t [Cost: 18K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Projection: v_internal.vs_tables_view_p
| Materialize: t.table_schema_id, t.table_schema, t.table_id, t.table_name, t.owner_id, t.owner_name, t.is_temp_table, t.has_aggregate_projection, t.is_system_table, t.force_outer, t.flextable_format, t.system_table_creator, t.partition_expression, t.create_time, t.table_definition, t.recover_priority
| Execute on: Query Initiator
------------------------------
-----------------------------------------------
PLAN: BASE QUERY PLAN (GraphViz Format)
-----------------------------------------------
digraph G {
graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: explain select * from tables;\n\nAll Nodes Vector: \n\n node[0]=v_opsadb_node0002 (initiator) Up\n node[1]=v_opsadb_node0001 (executor) Up\n node[2]=v_opsadb_node0003 (executor) Up\n", labelloc=t, labeljust=l ordering=out]
0[label = "Root \nOutBlk=[UncTuple(16)]", color = "green", shape = "house"];
1[label = "NewEENode \nOutBlk=[UncTuple(16)]", color = "green", shape = "box"];
2[label = "StorageUnionStep: vs_tables_view_p\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "purple", shape = "box"];
3[label = "ExprEval: \n t.table_schema_id\n t.table_schema\n t.table_id\n t.table_name\n t.owner_id\n t.owner_name\n t.is_temp_table\n t.is_system_table\n t.force_outer\n (t.flextable_format \<\> \'\')\n t.has_aggregate_projection\n t.system_table_creator\n t.partition_expression\n t.create_time\n t.table_definition\n t.recover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];
4[label = "ScanStep: vs_tables_view_p\ntable_schema_id\ntable_schema\ntable_id\ntable_name\nowner_id\nowner_name\nis_temp_table\nhas_aggregate_projection\nis_system_table\nforce_outer\nflextable_format\nsystem_table_creator\npartition_expression\ncreate_time\ntable_definition\nrecover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];
1->0 [label = "V[0] C=16",color = "black",style="bold", arrowtail="inv"];
2->1 [label = "0",color = "blue"];
3->2 [label = "0",color = "blue"];
4->3 [label = "0",color = "blue"];
}
(29 rows)
Eg: query profiling
dbadmin=> \o /tmp/a.a
dbadmin=> profile select * from tables;
NOTICE 4788: Statement is being profiled
HINT: Select * from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6;
NOTICE 3557: Initiator memory for query: [on pool sysquery: 5396 KB, minimum: 5396 KB]
NOTICE 5077: Total memory required by query: [5396 KB]
The column which you need to use for correlation is -> operator_id.
Select node_name,operator_name,operator_id ,counter_value from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6 and counter_name='clock time (us)';
I hope you will find it useful
Thanks