Tuning plan

SRA1 · February 2016

Hi Experts,

What is the tuning plan you follow in Vertica for fine tuning queries or database?

Adrian_Oprea_1 · February 2016

You have the queries ?

-if yes use the admintools DBD with the incremental sql query feed.

Install Vertica Management Console and keep an eye on the Recomendations/Bottle necks.

Make sure your have statistics on all your column and they are not older then 7 days:

with a as (
SELECT
    table_schema,
    table_name,
    projection_column_name,
    encoding_type,
    statistics_type,
    statistics_updated_timestamp
FROM
    projection_columns
WHERE
    statistics_type = 'NONE'
    and statistics_updated_timestamp < getdate()-7
ORDER BY
    statistics_updated_timestamp DESC
    )
    select distinct 'select analyze_statistics('''||table_schema||'.'||table_name||'.'||projection_column_name||''');' 
    from a where table_name not in(select table_name from tables where is_temp_table ='true')

Check for encoding, unless you run DBD on the existing tables the base projections will be build with encoding auto.

SELECT DISTINCT
    anchor_table_schema,
    anchor_table_name,
    column_name,
    compressions,
    used_bytes,
    row_count
FROM
    column_storage
WHERE
    column_name IN
    (
        SELECT
            projection_column_name
        FROM
            projection_columns
        WHERE
            encoding_type='AUTO')
AND compressions='none'
ORDER BY
    used_bytes DESC

- once you identify the tables with missing encoding run the

select DESIGNER_DESIGN_PROJECTION_ENCODINGS ('schema','projection','deploy/don`t deploy');

- carefull with this running it on big tables ! might take time.

Run the WLA.i run it based on the last 7 days

 SELECT ANALYZE_WORKLOAD('', getdate()-7);

- see the output and work around it based on the tunning_cost HIGH to LOW

Make sure your Dimensions are dimensions and your Facts are facts ! many times i see dimensions with 200 million rows that are unsegemented.

Look for partitition oppotunities to make use of partition pruning and also for ease of loading/deleting/moving data from one table to another.

Load balance , don`t let all queryes hit only one node !

- vertica comes with native load balancer (Round Robin) and also you need to setup you clients(obdc/jdbc) for that

Label your queries / force your develpers to label them, this way you can track them and find patters on yor db usage and also find bad developers , that use * in all their execs .

Enable data collector and manage the data that comes with it, is of great help.

Unused projecitons - get rid of them !!! Will save you space / TM will work much less(lest projections to refresh/populate) / bkp space

use this query

SELECT
    p.table_schema,
    p.basename                                 AS "Table Name",
    MAX(pu.last_used)                          AS last_used,
    extract(DAY from (getdate()-MAX(pu.last_used)))        AS days_last_used,
   -- extract(DAY FROM GETDATE() - pu.last_used) AS "Days without Using",
    ROUND(SUM(sc.used_bytes/ (1024^2)),3.0)    AS used_mb,
    ROUND(SUM(sc.used_bytes)/( 1024^3),3.0)    AS used_gb
FROM
    (
        SELECT
            projection_oid,
            MAX(TIME) AS last_used
        FROM
            v_internal.dc_projections_used
        WHERE
            table_oid IN
            (
                SELECT
                    table_id
                FROM
                    v_catalog.tables
                WHERE
                    NOT is_system_table)
        GROUP BY
            projection_oid) pu
JOIN
    (
        SELECT
            oid,
            name,
            basename,
            "schemaname" AS table_schema
        FROM
            v_internal.vs_projections) p
ON
    p.oid = pu.projection_oid
JOIN
    (
        SELECT
            projection_id,
            used_bytes
        FROM
            v_monitor.storage_containers) sc
ON
    sc.projection_id = pu.projection_oid
GROUP BY
    p.table_schema,
    p.basename
ORDER BY
    last_used ASC ;

Note:

- there is a catck here, running analyze on the table even if it was never used the table will be marked as used , which invalidades the actuall look for unused projecitons .

- there are many more ideas of where to start ! but i guess tihs is the basic

hope this was helpful !

SRA1 · February 2016

EXCELLENT , Thanks for your reply :-) :-)

The catch you mentioned: if the unused projections are identified prior analyzing statistics will help us to find the actual unused projections. What do u think >?

eli_revach · February 2016

Hi ,

Below some highlights on how to find bottlenecks in querys :

The idea is to correlate the output you get in the explain plan with the output of query profiling .

As you can see in the attach explain plan output we have 4 different steps , each step do some execution.

Now in order to know the execution time of each step , you need to run profile to same query and correlate its output with the step numbers you have on the explain plan .

The data that was generated by the profile command include useful execution statistics , one of them is execution time ('clock time (us)) .

Using this approch you can know for sure which step in the plan take most of the time to execute .

See some simple example :

Eg: Explain plan

explain select * from tables;

Access Path:

+-STORAGE ACCESS for t [Cost: 18K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)

| Projection: v_internal.vs_tables_view_p

| Materialize: t.table_schema_id, t.table_schema, t.table_id, t.table_name, t.owner_id, t.owner_name, t.is_temp_table, t.has_aggregate_projection, t.is_system_table, t.force_outer, t.flextable_format, t.system_table_creator, t.partition_expression, t.create_time, t.table_definition, t.recover_priority

| Execute on: Query Initiator

------------------------------

-----------------------------------------------

PLAN: BASE QUERY PLAN (GraphViz Format)

-----------------------------------------------

digraph G {

graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: explain select * from tables;\n\nAll Nodes Vector: \n\n node[0]=v_opsadb_node0002 (initiator) Up\n node[1]=v_opsadb_node0001 (executor) Up\n node[2]=v_opsadb_node0003 (executor) Up\n", labelloc=t, labeljust=l ordering=out]

0[label = "Root \nOutBlk=[UncTuple(16)]", color = "green", shape = "house"];

1[label = "NewEENode \nOutBlk=[UncTuple(16)]", color = "green", shape = "box"];

2[label = "StorageUnionStep: vs_tables_view_p\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "purple", shape = "box"];

3[label = "ExprEval: \n t.table_schema_id\n t.table_schema\n t.table_id\n t.table_name\n t.owner_id\n t.owner_name\n t.is_temp_table\n t.is_system_table\n t.force_outer\n (t.flextable_format \<\> \'\')\n t.has_aggregate_projection\n t.system_table_creator\n t.partition_expression\n t.create_time\n t.table_definition\n t.recover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];

4[label = "ScanStep: vs_tables_view_p\ntable_schema_id\ntable_schema\ntable_id\ntable_name\nowner_id\nowner_name\nis_temp_table\nhas_aggregate_projection\nis_system_table\nforce_outer\nflextable_format\nsystem_table_creator\npartition_expression\ncreate_time\ntable_definition\nrecover_priority\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Boolean(1)\nUnc: Integer(8)\nUnc: Varchar(128)\nUnc: Varchar(128)\nUnc: Varchar(8192)\nUnc: TimestampTz(8)\nUnc: Varchar(8192)\nUnc: Integer(8)", color = "brown", shape = "box"];

1->0 [label = "V[0] C=16",color = "black",style="bold", arrowtail="inv"];

2->1 [label = "0",color = "blue"];

3->2 [label = "0",color = "blue"];

4->3 [label = "0",color = "blue"];

}

(29 rows)

Eg: query profiling

dbadmin=> \o /tmp/a.a

dbadmin=> profile select * from tables;

NOTICE 4788: Statement is being profiled

HINT: Select * from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6;

NOTICE 3557: Initiator memory for query: [on pool sysquery: 5396 KB, minimum: 5396 KB]

NOTICE 5077: Total memory required by query: [5396 KB]

The column which you need to use for correlation is -> operator_id.

Select node_name,operator_name,operator_id ,counter_value from v_monitor.execution_engine_profiles where transaction_id=49539596043655460 and statement_id=6 and counter_name='clock time (us)';

I hope you will find it useful

Thanks

We're Moving!

Create My New Community Account Now

Tuning plan

Comments

Leave a Comment