Options

Tuning plan

Hi Experts,

What is the tuning plan you follow in Vertica for fine tuning queries or database?

Comments

  • Options
    •  You have the queries ?

    -if yes use the admintools DBD with the incremental sql query feed.

     

    • Install Vertica Management Console and keep an eye on the Recomendations/Bottle necks.

     

    • Make sure your have statistics on all your column and they are not older then 7 days:

     

    with a as (
    SELECT
    table_schema,
    table_name,
    projection_column_name,
    encoding_type,
    statistics_type,
    statistics_updated_timestamp
    FROM
    projection_columns
    WHERE
    statistics_type = 'NONE'
    and statistics_updated_timestamp < getdate()-7
    ORDER BY
    statistics_updated_timestamp DESC
    )
    select distinct 'select analyze_statistics('''||table_schema||'.'||table_name||'.'||projection_column_name||''');'
    from a where table_name not in(select table_name from tables where is_temp_table ='true')

     

    • Check for encoding, unless you run DBD on the existing tables the base projections will be build with encoding auto.
    • SELECT DISTINCT
      anchor_table_schema,
      anchor_table_name,
      column_name,
      compressions,
      used_bytes,
      row_count
      FROM
      column_storage
      WHERE
      column_name IN
      (
      SELECT
      projection_column_name
      FROM
      projection_columns
      WHERE
      encoding_type='AUTO')
      AND compressions='none'
      ORDER BY
      used_bytes DESC

    - once you identify the tables with missing encoding run the 

    • select DESIGNER_DESIGN_PROJECTION_ENCODINGS ('schema','projection','deploy/don`t deploy');

    - carefull with this running it on big tables ! might take time.

     

    • Run the WLA.i run it based on the last 7 days

     

     SELECT ANALYZE_WORKLOAD('', getdate()-7);

    - see the output and work around it based on the tunning_cost HIGH to LOW

     

     

    •   Make sure your Dimensions are dimensions and your Facts are facts ! many times i see dimensions with 200 million rows that are unsegemented.

     

    •   Look for partitition oppotunities to make use of partition pruning  and also for ease of loading/deleting/moving data from one table to another.

     

    • Load balance , don`t let all queryes hit only one node ! 

    - vertica comes with native load balancer (Round Robin) and also you need to setup you clients(obdc/jdbc) for that

     

    • Label your queries / force your develpers to label them, this way you can track them and find patters on yor db usage and also find bad developers :), that use * in all their execs :)

     

    • Enable data collector and manage the data that comes with it, is of great help.

     

    • Unused projecitons - get rid of them !!! Will save you space / TM will work much less(lest projections to refresh/populate) / bkp space

    use this query

     

    SELECT
    p.table_schema,
    p.basename AS "Table Name",
    MAX(pu.last_used) AS last_used,
    extract(DAY from (getdate()-MAX(pu.last_used))) AS days_last_used,
    -- extract(DAY FROM GETDATE() - pu.last_used) AS "Days without Using",
    ROUND(SUM(sc.used_bytes/ (1024^2)),3.0) AS used_mb,
    ROUND(SUM(sc.used_bytes)/( 1024^3),3.0) AS used_gb
    FROM
    (
    SELECT
    projection_oid,
    MAX(TIME) AS last_used
    FROM
    v_internal.dc_projections_used
    WHERE
    table_oid IN
    (
    SELECT
    table_id
    FROM
    v_catalog.tables
    WHERE
    NOT is_system_table)
    GROUP BY
    projection_oid) pu
    JOIN
    (
    SELECT
    oid,
    name,
    basename,
    "schemaname" AS table_schema
    FROM
    v_internal.vs_projections) p
    ON
    p.oid = pu.projection_oid
    JOIN
    (
    SELECT
    projection_id,
    used_bytes
    FROM
    v_monitor.storage_containers) sc
    ON
    sc.projection_id = pu.projection_oid
    GROUP BY
    p.table_schema,
    p.basename
    ORDER BY
    last_used ASC ;

    Note:

     

    - there is a catck here, running analyze on the table even if it was never used the table will be marked as used , which invalidades the actuall look for unused projecitons :(

     

    - there are many more ideas of where to start ! but i guess tihs is the basic :)

     

    hope this was helpful !  

  • Options

    EXCELLENT , Thanks for your reply :-) :-)

    The catch you mentioned: if the unused projections are identified prior analyzing statistics will help us to find the actual unused projections. What do u think >?

  • Options

    Hi ,

    Below some highlights on how to find bottlenecks  in querys   :

     

    The idea is to correlate the output you get in the explain plan with the output of  query profiling  .

     

    As you can see in the attach explain plan output  we have 4 different steps , each step do some execution.

     

    Now in order to  know the execution time of each step , you need to run  profile to same query and correlate its output with the   step numbers you have on the explain plan   .

     

    The data that was generated by the profile command include useful  execution statistics   , one of them is  execution time  ('clock time (us)) .  

     

    Using this approch you can know for sure which step in the plan take most of the time to execute .

     

     

    See some simple example :

     

    Eg:  Explain plan

     

    explain select * from tables;

     

     Access Path:

     +-STORAGE ACCESS for t [Cost: 18K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)

     |  Projection: v_internal.vs_tables_view_p

     |  Materialize: t.table_schema_id, t.table_schema, t.table_id, t.table_name, t.owner_id, t.owner_name, t.is_temp_table, t.has_aggregate_projection, t.is_system_table, t.force_outer, t.flextable_format, t.system_table_creator, t.partition_expression, t.create_time, t.table_definition, t.recover_priority

     |  Execute on: Query Initiator

     

     

     ------------------------------

     -----------------------------------------------

     PLAN: BASE QUERY PLAN (GraphViz Format)

     -----------------------------------------------

     digraph G {

     graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: explain select * from tables;\n\nAll Nodes Vector: \n\n  node[0]=v_opsadb_node0002 (initiator) Up\n  node[1]=v_opsadb_node0001 (executor) Up\n  node[2]=v_opsadb_node0003 (executor) Up\n", labelloc=t, labeljust=l ordering=out]

     0[label = "Root \nOutBlk=[UncTuple(16)]", color = "green", shape = "house"];

     1[label = "NewEENode \nOutBlk=[UncTuple(16)]", color = "green", shape = "box"];

     2[label = "StorageUnionStep: vs_tables_view_p\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "purple", shape = "box"];

     3[label = "ExprEval: \n  t.table_schema_id\n  t.table_schema\n  t.table_id\n  t.table_name\n  t.owner_id\n  t.owner_name\n  t.is_temp_table\n  t.is_system_table\n  t.force_outer\n  (t.flextable_format \<\> \'\')\n  t.has_aggregate_projection\n  t.system_table_creator\n  t.partition_expression\n  t.create_time\n  t.table_definition\n  t.recover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];

     4[label = "ScanStep: vs_tables_view_p\ntable_schema_id\ntable_schema\ntable_id\ntable_name\nowner_id\nowner_name\nis_temp_table\nhas_aggregate_projection\nis_system_table\nforce_outer\nflextable_format\nsystem_table_creator\npartition_expression\ncreate_time\ntable_definition\nrecover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];

     1->0 [label = "V[0] C=16",color = "black",style="bold", arrowtail="inv"];

     2->1 [label = "0",color = "blue"];

     3->2 [label = "0",color = "blue"];

     4->3 [label = "0",color = "blue"];

     }

    (29 rows)

     

     

    Eg:  query profiling  

     

    dbadmin=> \o /tmp/a.a

    dbadmin=> profile  select * from tables;

    NOTICE 4788:  Statement is being profiled

    HINT:  Select * from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6;

    NOTICE 3557:  Initiator memory for query: [on pool sysquery: 5396 KB, minimum: 5396 KB]

    NOTICE 5077:  Total memory required by query: [5396 KB]

     

    The column which you need to use for correlation  is  -> operator_id.

    Select node_name,operator_name,operator_id ,counter_value from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6 and counter_name='clock time (us)';

     

     

    I hope you will find it useful

    Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file