Query failing with Insufficient resource to execute
Hi All
We need your help / thought on one scenario.
We have a dimension table with 50M record on customer data and we forecast this table to grow up to 100Million records soon.
This table have frequent updates as and when we get data from multiple sources. Table has unique records.
Daily updates are happening on 50K records but it will scan full table.We understand that in vertica, updates are delete and inserts.
We have an 18 Node cluster and KSafe=1. Table is segmented across this 18 Nodes and have 2 super projections.
We are noticing, many a times UPDATES on this table is time consuming and few updates ends up with below error messages
Insufficient resources to execute plan on pool
Terminated abnormally
Please help us in understanding how we can solve this issue?
a. Is that data segmented among all the 18 nodes is causing such issues and do we need to think of segment only on 4 to 5 nodes to see better performance and avoid errors ?
b. Increase the k-safe value will help?
c. Also, we are using /*DIRECT*/for all DML statements. How it is impacting performance
Thanks in Advance
Veena
Comments
This is a very general question and reasons could be many.
can you provide
1. the table schema (create table statement) (schema.sql)
2. the projection (create projection statement) (projection.sql)
3. the explain plan (explain verbose update table ......) (plan.txt)
4. profile the query and provide the output ( Select * from v_monitor.execution_engine_profiles where transaction_id=XXXX and statement_id=Y;)
5. version of vertica you are using
I would do the following :
- Remove Direct. For the type of workload you have described where records are being updated on an ongoing basis, going though WOS can give you better performance.
- Check the resource pool that is associated with the user executing these operations, Do you have a user defined pool, or are the queries running off the general pool? The insufficient resource error suggest some resource pool adjustment may be needed.
- Analyze statistics on the table that is being updated frequently. If that helps performance, you may want to schedule that so statistics are up to date as new inserts and updates happen.
- Run EVALUATE_DELETE_PERFORMANCE on the table. This should give you an idea on whether a better projection design can help optimize updates.
Hope that helps./Sajan
You can also check for delete vector since here update is being used frequently. Try purging the table which you are updating manually and see if it improves the performance.
Regards,
Siddarth
Already we have purge in the script. Allocated resource pool is 80%. We are working on other suggestions. Can anyone help how segmentation impact the load. Now table is segmented across 18 nodes and ksafe =1 hence 2 superprojection
Thanks
Veena
Please share the details that being ask on comment #1 without it , the answers will be guesswork .
In most of the cases having the right projection will solve this type of problem . But we don't see your update statement and the projections that was define on top of it .
Hi All
This is the subquery running on the sample table. Based on the data, we are doing update.
Can you please help me in understanding the exlain plan
Memory usage on this subquery is taking 53GB.
INFO table count - 5000
LOG table count - 11000
NAME- Varchar(100)
FAMILY-Varchar(50)
TYPE- Varchar(50)
explain SELECT DISTINCT STL.FAMILY FAMILY, NVL(UPPER(INFO.TYPE),'UNKNOWN') TYPE FROM sales.INFO INFO, sales.LOG STL where INFO.NAME =STL.NAME
Access Path:
+-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 217, Rows: 5K (NO STATISTICS)] (PATH ID: 1)
| Group By: STL.FAMILY, coalesce(upper(INFO.TYPE), 'UNKNOWN')
| Execute on: All Nodes
| +---> JOIN MERGEJOIN(inputs presorted) [Cost: 126, Rows: 5K (NO STATISTICS)] (PATH ID: 2) Outer (RESEGMENT)
| | Join Cond: (INFO.NAME = STL.NAME)
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for STL [Cost: 25, Rows: 11K (NO STATISTICS)] (PUSHED GROUPING) (PATH ID: 3)
| | | Projection: sales.LOG_b0
| | | Materialize: STL.NAME, STL.FAMILY
| | | Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for INFO [Cost: 23, Rows: 5K (NO STATISTICS)] (PUSHED GROUPING) (PATH ID: 4)
| | | Projection: sales.INFO_b0
| | | Materialize: INFO.NAME, INFO.TYPE
| | | Execute on: All Nodes
Thanks in Advance
Veena
Make sure your EXPLAIN Plan should not have-
1. (NO STATISTICS).
2. Try to remove DISTINCT from the query and use GROUP BY.
So it is always advisable to gather stats once there is a change in table data (20-30%), it may be because of INSERT, UPDATE or DELETE.
Regards,
Kushal Sen
As you mention in your comment- "check for delete vector "
Can you please explain in detail ? How to check for delete vector, never heard of it before.
Thanks
Regards,
Kushal Sen