We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Performance issue - collecting ideas — Vertica Forum
Options

Performance issue - collecting ideas

dimitri_pdimitri_p
edited June 23 in General Discussion

So I have two 3-node vertica clusters (v25), where one cluster is about 1.5 times more powerful than the other. About 1.5 times more CPU cores, about 1.5 times more memory, about 1.5 time faster network, about 1.5times faster storage.

Both clusters have the same database image deployed: same resource pools, same tables, same projections, same data. Same OS version too.

I have a query that I run on both clusters and the weaker cluster executes it 3 times faster.

I have checked EXPLAIN and it matches to a t (projections, costs, row counts, paths). All statistics are up to date. Resource pools are configured identically albeit parallelexecution is set to AUTO and maxmemory is set to 40%, so different absolute numbers.

I have checked execution_query_profiles counters and the more powerful system just looks uniformly slower in every major counter, nothing is particularly slower, everything is just slower across the board.

Resource consumption shows that the more powerful system spends much more CPU cycles on the query, but the disk spillage is about the same.

Query events are the same (group by don't fit in memory).

There is no data skew (projections used are segmented)

What else do I check? Give me anything that comes to mind!

Answers

  • moshegmosheg Vertica Employee Administrator

    The short answer is to open a support case, run scrutinize on both and compare the score each cluster get in the Vadvisor report.
    VAdvisor scores include deeper diagnostics that are not always visible through manual inspection.
    Also, keep in mind that there are a lot of things that can differ across systems, even if on paper they look identical. Hidden OS-level configurations, firmware differences, disk scheduling policies, CPU throttling, and even BIOS power-saving modes can significantly impact performance. So don't rule out lower-level or seemingly irrelevant discrepancies.

    Before changing any parameter, always validate the impact in a QA environment first.
    Some changes may degrade performance or affect stability depending on workload patterns.

    The longer answer breaks the issue down into multiple layers..
    1. Reconfirm actual CPU models, clock speeds, NUMA configurations –
    Since h/w with more cores isn't always faster for all workloads.
    2. Software? (OS version, OS settings as recommended by vertica)
    3. Compare vcpuperf, vioperf, vnetperf
    4. Explain plans are really identical?
    5. Extract data and compare operator side by side in Excel:

    select node_name, path_id, operator_name, counter_name, counter_tag, max(counter_value)
    from execution_engine_profiles
    where counter_name ilike '%(us)%'
    group by 1,2,3,4,5
    order by 6 desc;

    select node_name, path_id, operator_name, counter_name, counter_tag, counter_value
    from execution_engine_profiles
    where counter_name ilike '%(us)%'
    order by node_name, path_id, operator_name, counter_name, counter_tag, counter_value asc;

    1. Check your LOCALE on both systems because the reason for performance degradation can be sorting,
      Vertica sorts data and sorting depends on LOCALE.
      LOCALE is a parameter that defines the user's language, country, and any special variant preferences, such as collation.
      Vertica uses the locale to determine the behavior of various string functions as well for ordering and comparison,
      for example, GROUP BY, ORDER BY joins.

    2. To eliminate environment reasons for slowness, compare the following check results on both clusters.
      Run as root: /opt/vertica/oss/python3/bin/python3 -m vertica.local_verify

    3. Check and compare your V Catalog size on both clusters:
      select node_name, max(catalog_size_in_MB) ::int as END_CATLOG_SIZE_MEM_MB
      from(select node_name, trunc((dc_allocation_pool_statistics_by_second."time") ::timestamp,'SS'::varchar(2)) as ts,
      sum((dc_allocation_pool_statistics_by_second. total_memory_max_value - dc_allocation_pool_statistics_by_second. free_memory_min_value)) /(1024 * 1024) as catalog_size_in_MB
      from dc_allocation_pool_statistics_by_second
      group by 1, trunc((dc_allocation_pool_statistics_by_second."time") ::timestamp,'SS'::varchar(2))) foo
      group by 1
      order by 1;

    4. Check and compare the amount of ROSs in both systems..
      For example use the following query to determine the top 50 high number of ROS containers for each projection per node:
      SELECT s.node_name, p.table_schema, s.projection_name,
      COUNT(DISTINCT s.storage_oid) storage_container_count,
      COUNT(DISTINCT partition_key) partition_count,
      COUNT(r.rosid) ros_file_count
      FROM storage_containers s LEFT OUTER JOIN PARTITIONS p
      ON s.storage_oid = p.ros_id JOIN vs_ros r
      ON r.delid = s.storage_oid
      GROUP BY 1,2,3 ORDER BY 4 DESC LIMIT 50;

    5. Check and compare changed parameters from default on both:
      select node_name,parameter_name,current_value,default_value,description
      from vs_configuration_parameters
      where current_value != default_value;

    Check also changed parameters between nodes:
    select aa.node_name, aa.parameter_name, aa.current_value,
    bb.node_name, bb.current_value
    from vs_configuration_parameters aa, vs_configuration_parameters bb
    where aa.current_value != bb.current_value
    and aa.node_name != bb.node_name
    and aa.parameter_name = bb.parameter_name;

    1. Compare on both requests for resources that are rejected by the Resource Manager.
      SELECT * FROM resource_rejections;

    2. Modern kernels usually have enough entropy, however with TLS-heavy workloads it can become a problem.
      Check the status of both clusters entropy, run the following in bash (on each node):
      cat /proc/sys/kernel/random/entropy_avail
      If it returns something less than 300, you are on the edge of a problem.

    3. Identify Spin Lock problem symptoms on both with: perf top

    4. Check and compare Spread retransmit with this query :
      SELECT a.node_name, max(a.retrans) as max_retrans, max(a.time_interval) as max_interval, max(a.packet_count) as max_packet_count,
      max((a.retrans / (a.time_interval / '00:00:01'::interval))::numeric(18,2)) AS max_retrans_per_second
      , count(1)
      FROM (
      SELECT (dc_spread_monitor."time")::timestamp AS "time", dc_spread_monitor.node_name,
      (dc_spread_monitor.retrans - lag(dc_spread_monitor.retrans, 1, NULL::int) OVER
      (PARTITION BY dc_spread_monitor.node_name ORDER BY (dc_spread_monitor."time")::timestamp)) AS retrans,
      (((dc_spread_monitor."time")::timestamp - lag((dc_spread_monitor."time")::timestamp, 1, NULL::timestamp) OVER
      (PARTITION BY dc_spread_monitor.node_name ORDER BY (dc_spread_monitor."time")::timestamp))) AS time_interval,
      (dc_spread_monitor.packet_sent - lag(dc_spread_monitor.packet_sent, 1, NULL::int) OVER
      (PARTITION BY dc_spread_monitor.node_name ORDER BY (dc_spread_monitor."time")::timestamp)) AS packet_count
      FROM v_internal.dc_spread_monitor
      ) a
      GROUP BY 1
      ORDER BY 5 DESC
      limit 10;

    5. If you are using NVME check if you enabled noatime on each of the NVME partitions on both clusters.

    6. Any difference in the Jumbo Frames enablement on ALL NODES on both clusters?
      ifconfig | grep mtu

    7. Relevant ONLY for EON --> One can improve query performance by ~10-20% after running update_storage_catalog.
      Check if update_storage_catalog is required on one of the clusters:
      Run the following query to determine which projections have invalid bundle metadata in the database catalog:
      SELECT node_name, projection_name, total_ros_count, ros_without_bundle_info_count
      FROM v_monitor.storage_bundle_info_statistics where ros_without_bundle_info_count > 0
      ORDER BY projection_name, node_name;

    Only for the projections from the query above run the following DO_TM_TASK
    The following will write the bundle metadata on all projections in table my_schema.my_table
    SELECT DO_TM_TASK ('update_storage_catalog', 'my_schema.my_table');

    1. More RAM may defer spill decisions, ironically increasing latency if memory thresholds are hit unpredictably..
      So if one of the clusters has more RAM, a query on that cluster will take longer before it fails to fit in memory and automatically retries.
      For example, if a hash table cannot fit in memory, the join operation must spill to disk, causing both the fact and dimension tables to spill. If you know in advance that a specific query will spill, you can add a hint to skip memory hashing:
      SELECT /+add_vertica_options(EE,ENABLE_JOIN_SPILL)/ …

    To find out which queries retried and why, check the DC_REQUESTS_ISSUED and DC_REQUESTS_RETRIED Data Collector tables.
    SELECT transaction_id, statement_id, request FROM dc_requests_issued WHERE is_retry ORDER BY time DESC LIMIT 1;
    And:
    SELECT transaction_id, statement_id, request, reason_for_retry FROM dc_requests_retried ORDER BY time DESC LIMIT 1;

  • Moshe, thank you very much for such a comprehensive reply, I truly appreciate the effort!!!

    While I will certainly be going over the entire list, I wanted to clarify a few things:

    1. These are lab machines built from the same image (both OS and Vertica) so they are as identical as is humanly possible.
    2. I've already run vnet/cpu/io perf tools and the better cluster came out on top in every single case quite proportionally to its more powerful hardware. I've also used sysbench and those results are very much aligned with v...perf.
    3. The difference in performance in my query is not minor or intermittent. It is very obvious and can be reproduced every single time I run the query. That is to say that I am not looking for a needle in a haystack, rather an elephant in the room.

    thanks again, I will be updating this thread if I find the solution!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file