Query failing with Insufficient resource to execute

Veena_1 · July 2014

Hi All

We need your help / thought on one scenario.

We have a dimension table with 50M record on customer data and we forecast this table to grow up to 100Million records soon.
This table have frequent updates as and when we get data from multiple sources. Table has unique records.
Daily updates are happening on 50K records but it will scan full table.We understand that in vertica, updates are delete and inserts.
We have an 18 Node cluster and KSafe=1. Table is segmented across this 18 Nodes and have 2 super projections.
We are noticing, many a times UPDATES on this table is time consuming and few updates ends up with below error messages
Insufficient resources to execute plan on pool
Terminated abnormally
Please help us in understanding how we can solve this issue?
a. Is that data segmented among all the 18 nodes is causing such issues and do we need to think of segment only on 4 to 5 nodes to see better performance and avoid errors ?
b. Increase the k-safe value will help?
c. Also, we are using /*DIRECT*/for all DML statements. How it is impacting performance

Thanks in Advance

Veena

[Deleted User] · July 2014

This is a very general question and reasons could be many.

can you provide

1. the table schema (create table statement) (schema.sql)
2. the projection (create projection statement) (projection.sql)
3. the explain plan (explain verbose update table ......) (plan.txt)
4. profile the query and provide the output ( Select * from v_monitor.execution_engine_profiles where transaction_id=XXXX and statement_id=Y;)
5. version of vertica you are using

Sajan_Sangraula · July 2014

Hi Veena,
I would do the following :

Remove Direct. For the type of workload you have described where records are being updated on an ongoing basis, going though WOS can give you better performance.
Check the resource pool that is associated with the user executing these operations, Do you have a user defined pool, or are the queries running off the general pool? The insufficient resource error suggest some resource pool adjustment may be needed.
Analyze statistics on the table that is being updated frequently. If that helps performance, you may want to schedule that so statistics are up to date as new inserts and updates happen.
Run EVALUATE_DELETE_PERFORMANCE on the table. This should give you an idea on whether a better projection design can help optimize updates.

Hope that helps.

/Sajan

Siddarth · July 2014

Hi Veena,

You can also check for delete vector since here update is being used frequently. Try purging the table which you are updating manually and see if it improves the performance.

Regards,
Siddarth

Veena_1 · July 2014

Hi All

Already we have purge in the script. Allocated resource pool is 80%. We are working on other suggestions. Can anyone help how segmentation impact the load. Now table is segmented across 18 nodes and ksafe =1 hence 2 superprojection

Thanks
Veena

eli_revach · July 2014

Hi Veena

Please share the details that being ask on comment #1 without it , the answers will be guesswork .
In most of the cases having the right projection will solve this type of problem . But we don't see your update statement and the projections that was define on top of it .

Veena_1 · August 2014

Hi All

This is the subquery running on the sample table. Based on the data, we are doing update.

Can you please help me in understanding the exlain plan

Memory usage on this subquery is taking 53GB.

INFO table count - 5000

LOG table count - 11000

NAME- Varchar(100)

FAMILY-Varchar(50)

TYPE- Varchar(50)

explain SELECT DISTINCT STL.FAMILY FAMILY, NVL(UPPER(INFO.TYPE),'UNKNOWN') TYPE FROM sales.INFO INFO, sales.LOG STL where INFO.NAME =STL.NAME

Access Path:
+-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 217, Rows: 5K (NO STATISTICS)] (PATH ID: 1)
| Group By: STL.FAMILY, coalesce(upper(INFO.TYPE), 'UNKNOWN')
| Execute on: All Nodes
| +---> JOIN MERGEJOIN(inputs presorted) [Cost: 126, Rows: 5K (NO STATISTICS)] (PATH ID: 2) Outer (RESEGMENT)
| |      Join Cond: (INFO.NAME = STL.NAME)
| |      Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for STL [Cost: 25, Rows: 11K (NO STATISTICS)] (PUSHED GROUPING) (PATH ID: 3)
| | |      Projection: sales.LOG_b0
| | |      Materialize: STL.NAME, STL.FAMILY
| | |      Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for INFO [Cost: 23, Rows: 5K (NO STATISTICS)] (PUSHED GROUPING) (PATH ID: 4)
| | |      Projection: sales.INFO_b0
| | |      Materialize: INFO.NAME, INFO.TYPE
| | |      Execute on: All Nodes

Thanks in Advance

Veena

kushal_alester · August 2014

Hi Veena,

Make sure your EXPLAIN Plan should not have-
1. (NO STATISTICS).
2. Try to remove DISTINCT from the query and use GROUP BY.

So it is always advisable to gather stats once there is a change in table data (20-30%), it may be because of INSERT, UPDATE or DELETE.

Regards,
Kushal Sen

kushal_alester · August 2014

Hi Sid,

As you mention in your comment- "check for delete vector "
Can you please explain in detail ? How to check for delete vector, never heard of it before.

Thanks

Regards,
Kushal Sen

We're Moving!

Create My New Community Account Now

Query failing with Insufficient resource to execute

Comments

Leave a Comment