How to determine the memory need of a query?

HI all,

 

I am trying to configure a MAXMEMORYSIZE for a resource pool used for application queries. For this, I profiled a set of representative queries to figure out maximum memory need for these queries. While profiling, I noticed that the memory acquired by the query changes based on which pool I am using.

 

If I used general pool for profiling the query, it uses more than 3 GB and completes in less than 3 seconds. However, if I used another pool with '3G' MEMORYSIZE and planned concurrency of 9,  the query completes in 20 seconds and uses 231 MB. GROUP BY HASH is the most expensive operation in this query and no other queries were runing when the query was profiled.

 

Because I see different values of "Total memory required by the query" when profling the query using different pools, I am wondering whay would be the best way to estimate MAXMEMORYSIZE parameter for the pool.

 

Any suggestions?

 

 

Thanks,

Rupendra

Comments

  • Different resource pool settings, including whether or not MEMORYSIZE is set, affect the "query budget" for each query.   When planning the query, vertica will try to accomodate the query budget.  Each query plan operator has a minimum amount of memory that it requires to run, so a small query budget will limit the number of threads that each query operator runs on.  

     

    When you give the query a larger "query budget", queries will execute faster simply because the operators can fan out across more threads/cores.  If you were to review the profiling data in EXECUTION_ENGINE_PROFILES for your query with the small 231 MB budget, you'll find that the GroupByHash operator is running across only one or a few threads (each distinct occurrance of GroupByHash represents a thread) and is likely bottlenecked on that step. More memory, more threads, better parallelization, better performance.

     

    So find your sweet spot for typical queries in the resource pool to determine how much memory you want each resource pool to provide for the budget.  Complex queries with subqueries definitely want more memory. The more straightforward the query, the less memory required for optimal performance.

     

    More details on the query budget:

     

    http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/AdministratorsGuide/ManageWorkloads/TargetMemoryDeterminationForQueriesInConcurrentEnvironments.htm?Highlight=memorysize

     

      --Sharon

     

  • Thanks a lot Sharon for detailed explanation.

     

    When I checked EXECUTION_ENGINE_PROFILES, I found that GROUPBYHASH operator has same number of occurences for both runs, but other operarators (GroupByPipe, ExprEval, Scan) had more occurences when query budget was high. I think that explains why the query with more query budget completed sooner.

     

    A follow up question on this - can a query ever get more memory than query budget if it needs? If the query budget is 200 MB for a pool with MAXMEMORYSIZE of 800 MB, will a query wait until memory is available (or until queue timeout) when there are already 4 queries running?

     

    Thanks,

    Rupendra

  • A query can use more than its query budget.   If the query budget is low - such as 200 MB - complicated queries can require more than that just to start executing.  

     

    Once executing, some operations require additional memory at run-time, such as hash joins.  Queries will queue when they need more memory during planning, but won't queue when they need more memory at run-time - they either get it or they don't.  In the case of a hash join, if the additional memory isn't acquired, the query will abort and automatically retry.

     

      --Sharon

     

     

     

  • ==> in the case of a hash join, if the additional memory isn't acquired, the query will abort and automatically retry.

    If it aborts and automatically retry, will the SQL get a different transaction_id/statement_id combination?

    And way to find all such queries in the system?

  • Bryan_HBryan_H Vertica Employee Administrator

    The query will be re-planned using the same transaction ID and statement ID. You can find join spill events in QUERY_EVENTS system table: https://docs.vertica.com/12.0.x/en/sql-reference/system-tables/v-monitor-schema/query-events/#CriticalEventTypes
    On completion, the QUERY_CONSUMPTION table shows number of bytes spilled to disk for the transaction ID and statement ID.

  • Re-planned query will have different REQUEST_ID.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file