Insufficient resources to execute plan on pool general [Timedout waiting for resource request: Reque

We recently upgraded our Vertica database server from 6.4 to Vertica Analytic Database v7.1.0-1. We use Talend, and since the upgrade, we have had issues with Talend jobs failing on our Vertica server due to memory issues. The most reliable repro is a daily job - which previously always ran successfully, in less than 4 minutes - and now it times out and dies. The error on the db logs: 2014-09-19 10:56:15.533 Init Session:0x2b3e70013220-a000000742bfd5 @v_pdw_node0001: 53000/3587: Insufficient resources to execute plan on pool general [Timedout waiting for resource request: Request exceeds limits: Memory(KB) Exceeded: Requested = 4288595, Free = 118927 (Limit = 19981816, Used = 19862889) (queueing threshold)] LOCATION: Exec_compilePlan, /scratch_a/release/vbuild/vertica/Dist/Dist.cpp:1217 2014-09-19 10:56:16.263 Init Session:0x2b3e701a4390-a000000742bfdf [Txn] Starting Commit: Txn: a000000742bfdf 'select atl.id, atl.APP_DOMAIN, atl.CREATE_DATE, atl.TRANS_ID, atl.SOURCE_PARTNER, atl.LAST_NAME, atl.FIRST_NAME, bcf.LP_ID LOYALTY_PROFILE_ID, atl.MEMBER_ID, bcf.TX_ID ORDER_ID, atl.AMOUNT, atl.BATCH_ID, atl.BATCH_KEY from"STAGING".JPMC_ATL_VW atl left outer join "STAGING".JPMC_BCF_VW bcf on atl.APP_DOMAIN = bcf.APP_DOMAIN and atl.TRANS_ID = bcf.MINOR_KEY where atl.BATCH_KEY > 12303 or bcf.BATCH_KEY > 12303 ;' Some interesting notes: • We have a single node with 28GB RAM • the identical job runs on our Staging Vertica server (which is also Vertica v7.1.0-1 and is the same as production in terms of the volume of data, single node with less memory) – with no problems • execution of the identical job through the Talend Studio, succeeds – Vertica is happen enough to run the query and return the data when run in that mode • Talend version is 5.4.1 Things that we have tried that have not made any difference: • Increase the queuetimeout from 600 to 900 to 1200 of the general resource pool. Performance only degraded. • Increase maximum memory maps configuration of the db server • Modified plannedconcurrency to AUTO • Created new projections for the tables the query is using. Our questions – • has anyone seen these sort of memory issues come up after upgrading to Vertica v7.1.0? • how can we resolve this? what else should we try? Thanks in advance for your help.

Comments

  • You may check how many queries are running / memory in use etc in the pools

    select node_name, pool_name, memory_inuse_kb,running_query_count from resource_pool_status ;

    is it the same pool where the query used to run?

  • How will I know which pool the query uses? node_name pool_name memory_inuse_kb running_query_count -------------- --------- --------------- ------------------- v_pdw_node0001 general 209152 0 v_pdw_node0001 sysquery 0 0 v_pdw_node0001 sysdata 102400 0 v_pdw_node0001 wosdata 0 0 v_pdw_node0001 tm 0 0 v_pdw_node0001 refresh 0 0 v_pdw_node0001 recovery 0 0 v_pdw_node0001 dbd 0 0 v_pdw_node0001 jvm 0 0
  • select * from query_request where request ilike '%<part of your query>%';
    Get the transaction_id and statement_id

    dbadmin=> select node_name, request_type, pool_name, memory_kb, queries, succeeded, failing_resource from dc_resource_acquisitions where transaction_id = <> and statement_id = <>;

    Also need to check query_budget_kb in all the pools ( specially where you are running the query)

    select * from sessions; should also give some idea about how many sessions are running.

    check other columns in the resource_pool_status pool.

    Try changing the planned_concurrency ( lower the value) in resource_pools table for the general pool where your are running the query. This should allow more query budget memory.





  • When the ETL with the query is running node_name pool_name memory_inuse_kb running_query_count -------------- --------- --------------- ------------------- v_pdw_node0001 general 19862889 1 v_pdw_node0001 sysquery 0 0 v_pdw_node0001 sysdata 102400 0 v_pdw_node0001 wosdata 0 0 v_pdw_node0001 tm 0 0 v_pdw_node0001 refresh 0 0 v_pdw_node0001 recovery 0 0 v_pdw_node0001 dbd 0 0 v_pdw_node0001 jvm 0 0
  • 19GB is in use during that time
  • I have lowered the planned_concurrency to 2 and the ETL still failed. node request_type pool memory_kb queries succeeded failing_resource v_pdw_node0001 Acquire general 102400 0 t v_pdw_node0001 Reserve general 9986812 1 t v_pdw_node0001 AcquireAdditional general 11377853 1 t v_pdw_node0001 AcquireAdditional general 14159934 1 t v_pdw_node0001 AcquireAdditional general 19724095 1 t v_pdw_node0001 AcquireAdditional general 19857216 1 t v_pdw_node0001 AcquireAdditional general 20123457 1 t v_pdw_node0001 AcquireAdditional general 20655938 1 t v_pdw_node0001 AcquireAdditional general 21720899 1 f Memory(KB) v_pdw_node0001 AcquireAdditional general 21720899 1 f Memory(KB)
  • Hi,
    It should be more simple .

    Check and compare the execution plan you have between both Staging  and the current one .
    Do you gather statistics ?   

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file