We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Tuning plan — Vertica Forum

Tuning plan

Hi Experts,

What is the tuning plan you follow in Vertica for fine tuning queries or database?

Comments

    •  You have the queries ?

    -if yes use the admintools DBD with the incremental sql query feed.

     

    • Install Vertica Management Console and keep an eye on the Recomendations/Bottle necks.

     

    • Make sure your have statistics on all your column and they are not older then 7 days:

     

    with a as (
    SELECT
    table_schema,
    table_name,
    projection_column_name,
    encoding_type,
    statistics_type,
    statistics_updated_timestamp
    FROM
    projection_columns
    WHERE
    statistics_type = 'NONE'
    and statistics_updated_timestamp < getdate()-7
    ORDER BY
    statistics_updated_timestamp DESC
    )
    select distinct 'select analyze_statistics('''||table_schema||'.'||table_name||'.'||projection_column_name||''');'
    from a where table_name not in(select table_name from tables where is_temp_table ='true')

     

    • Check for encoding, unless you run DBD on the existing tables the base projections will be build with encoding auto.
    • SELECT DISTINCT
      anchor_table_schema,
      anchor_table_name,
      column_name,
      compressions,
      used_bytes,
      row_count
      FROM
      column_storage
      WHERE
      column_name IN
      (
      SELECT
      projection_column_name
      FROM
      projection_columns
      WHERE
      encoding_type='AUTO')
      AND compressions='none'
      ORDER BY
      used_bytes DESC

    - once you identify the tables with missing encoding run the 

    • select DESIGNER_DESIGN_PROJECTION_ENCODINGS ('schema','projection','deploy/don`t deploy');

    - carefull with this running it on big tables ! might take time.

     

    • Run the WLA.i run it based on the last 7 days

     

     SELECT ANALYZE_WORKLOAD('', getdate()-7);

    - see the output and work around it based on the tunning_cost HIGH to LOW

     

     

    •   Make sure your Dimensions are dimensions and your Facts are facts ! many times i see dimensions with 200 million rows that are unsegemented.

     

    •   Look for partitition oppotunities to make use of partition pruning  and also for ease of loading/deleting/moving data from one table to another.

     

    • Load balance , don`t let all queryes hit only one node ! 

    - vertica comes with native load balancer (Round Robin) and also you need to setup you clients(obdc/jdbc) for that

     

    • Label your queries / force your develpers to label them, this way you can track them and find patters on yor db usage and also find bad developers :), that use * in all their execs :)

     

    • Enable data collector and manage the data that comes with it, is of great help.

     

    • Unused projecitons - get rid of them !!! Will save you space / TM will work much less(lest projections to refresh/populate) / bkp space

    use this query

     

    SELECT
    p.table_schema,
    p.basename AS "Table Name",
    MAX(pu.last_used) AS last_used,
    extract(DAY from (getdate()-MAX(pu.last_used))) AS days_last_used,
    -- extract(DAY FROM GETDATE() - pu.last_used) AS "Days without Using",
    ROUND(SUM(sc.used_bytes/ (1024^2)),3.0) AS used_mb,
    ROUND(SUM(sc.used_bytes)/( 1024^3),3.0) AS used_gb
    FROM
    (
    SELECT
    projection_oid,
    MAX(TIME) AS last_used
    FROM
    v_internal.dc_projections_used
    WHERE
    table_oid IN
    (
    SELECT
    table_id
    FROM
    v_catalog.tables
    WHERE
    NOT is_system_table)
    GROUP BY
    projection_oid) pu
    JOIN
    (
    SELECT
    oid,
    name,
    basename,
    "schemaname" AS table_schema
    FROM
    v_internal.vs_projections) p
    ON
    p.oid = pu.projection_oid
    JOIN
    (
    SELECT
    projection_id,
    used_bytes
    FROM
    v_monitor.storage_containers) sc
    ON
    sc.projection_id = pu.projection_oid
    GROUP BY
    p.table_schema,
    p.basename
    ORDER BY
    last_used ASC ;

    Note:

     

    - there is a catck here, running analyze on the table even if it was never used the table will be marked as used , which invalidades the actuall look for unused projecitons :(

     

    - there are many more ideas of where to start ! but i guess tihs is the basic :)

     

    hope this was helpful !  

  • EXCELLENT , Thanks for your reply :-) :-)

    The catch you mentioned: if the unused projections are identified prior analyzing statistics will help us to find the actual unused projections. What do u think >?

  • Hi ,

    Below some highlights on how to find bottlenecks  in querys   :

     

    The idea is to correlate the output you get in the explain plan with the output of  query profiling  .

     

    As you can see in the attach explain plan output  we have 4 different steps , each step do some execution.

     

    Now in order to  know the execution time of each step , you need to run  profile to same query and correlate its output with the   step numbers you have on the explain plan   .

     

    The data that was generated by the profile command include useful  execution statistics   , one of them is  execution time  ('clock time (us)) .  

     

    Using this approch you can know for sure which step in the plan take most of the time to execute .

     

     

    See some simple example :

     

    Eg:  Explain plan

     

    explain select * from tables;

     

     Access Path:

     +-STORAGE ACCESS for t [Cost: 18K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)

     |  Projection: v_internal.vs_tables_view_p

     |  Materialize: t.table_schema_id, t.table_schema, t.table_id, t.table_name, t.owner_id, t.owner_name, t.is_temp_table, t.has_aggregate_projection, t.is_system_table, t.force_outer, t.flextable_format, t.system_table_creator, t.partition_expression, t.create_time, t.table_definition, t.recover_priority

     |  Execute on: Query Initiator

     

     

     ------------------------------

     -----------------------------------------------

     PLAN: BASE QUERY PLAN (GraphViz Format)

     -----------------------------------------------

     digraph G {

     graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: explain select * from tables;\n\nAll Nodes Vector: \n\n  node[0]=v_opsadb_node0002 (initiator) Up\n  node[1]=v_opsadb_node0001 (executor) Up\n  node[2]=v_opsadb_node0003 (executor) Up\n", labelloc=t, labeljust=l ordering=out]

     0[label = "Root \nOutBlk=[UncTuple(16)]", color = "green", shape = "house"];

     1[label = "NewEENode \nOutBlk=[UncTuple(16)]", color = "green", shape = "box"];

     2[label = "StorageUnionStep: vs_tables_view_p\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "purple", shape = "box"];

     3[label = "ExprEval: \n  t.table_schema_id\n  t.table_schema\n  t.table_id\n  t.table_name\n  t.owner_id\n  t.owner_name\n  t.is_temp_table\n  t.is_system_table\n  t.force_outer\n  (t.flextable_format \<\> \'\')\n  t.has_aggregate_projection\n  t.system_table_creator\n  t.partition_expression\n  t.create_time\n  t.table_definition\n  t.recover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];

     4[label = "ScanStep: vs_tables_view_p\ntable_schema_id\ntable_schema\ntable_id\ntable_name\nowner_id\nowner_name\nis_temp_table\nhas_aggregate_projection\nis_system_table\nforce_outer\nflextable_format\nsystem_table_creator\npartition_expression\ncreate_time\ntable_definition\nrecover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];

     1->0 [label = "V[0] C=16",color = "black",style="bold", arrowtail="inv"];

     2->1 [label = "0",color = "blue"];

     3->2 [label = "0",color = "blue"];

     4->3 [label = "0",color = "blue"];

     }

    (29 rows)

     

     

    Eg:  query profiling  

     

    dbadmin=> \o /tmp/a.a

    dbadmin=> profile  select * from tables;

    NOTICE 4788:  Statement is being profiled

    HINT:  Select * from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6;

    NOTICE 3557:  Initiator memory for query: [on pool sysquery: 5396 KB, minimum: 5396 KB]

    NOTICE 5077:  Total memory required by query: [5396 KB]

     

    The column which you need to use for correlation  is  -> operator_id.

    Select node_name,operator_name,operator_id ,counter_value from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6 and counter_name='clock time (us)';

     

     

    I hope you will find it useful

    Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file