Resource limit not being enforce for analytic functions with order by
Hi
We are developing analytic platform on top of Vertica , one of the challenges we face is to be able to run our solution on development / QA environment which very small data set , the environment are mostly run on top VM’s ,which include limited resource like 4 core’s and 4G -8G ram .
We recently face many issues which are related to “java.sql.SQLException: [Vertica][VJDBC](3587) ERROR: Insufficient resources to execute plan on pool general [Request Too Large:Memory(KB) Exceeded: Requested = 3149852, Free = 2983279 (Limit = 2983279, Used = 0)]”
First think I do is to try to run in my test environment , a machine with 32G RAM and 24Core , I copy the table from the source ( Very small table - 1 records , some of the cases the table is empty ) and execution the query (not include any join – flat execution with aggregation) , I see that query successfully executed but allocate the same size as in the exception above.
I gather statistics on the table , I do not define any user define resources pool , so my query use the general pool , the general pool include the default setup values , setting plannedconcurrency to very small value (1 -10) cause extra memory allocation (more the 3G) , value bigger than 10 do not affect my query at all . I do found out that plannedconcurrency with high value have good impact for most of our query’s (less memory allocation ) , the common denominator of this queries are the fact that this query do not have ORDER BY clause inside the analytic function .
We have a cases that we have query’s with 60 and more columns on the select-list each column is a results of analytic function with ORDER BY , the underline table include no data and Vertica try to allocate 3G memory (profile command show it) in spite of the value of the plannedconcurrency is very high (100) .
This lead to the understating that the resource limit can’t be a tool for reducing my memory allocation for this type of query’s as Vertica ignore . I will expect Vertica to allocate the memory base on my resource limit and my data set , looks like this is not the case , and we are in problem and can’t use VM’s for our Dev environment .
I can send you the SQL and the table for internal testing if you want , but I need to understand how Memory being allocated by this type query’s and whet we can do ?
Part of my query select list :
select
MAX (moving_agg_query.cpu_idle_time) over (partition BY moving_agg_query.host_name order by cpu_idle_time) as cpu_idle_time,
MAX (moving_agg_query.cpu_user_mode_util) over (partition BY moving_agg_query.host_name order by cpu_user_mode_util) as cpu_user_mode_util,
MAX (moving_agg_query.mem_pageout_rate) over (partition BY moving_agg_query.host_name order by mem_pageout_rate) as mem_pageout_rate,
MAX (moving_agg_query.process_started) over (partition BY moving_agg_query.host_name order by process_started) as process_started,
MAX (moving_agg_query.mem_util) over (partition BY moving_agg_query.host_name order by mem_util) as mem_util,
MAX (moving_agg_query.disk_write_byte_rate) over (partition BY moving_agg_query.host_name order by disk_write_byte_rate) as disk_write_byte_rate,
Comments
Could you provide the design (schema) of the tables in your query? I suspect this may be related to the design of the table and the large number of columns being referenced, but we would have to look closely at the plan to know for sure. Providing the table design would be sufficient for further investigation.
Thank you for reporting your problem.
Thanks for replay ,
The design is very simple , its single table without any join's (flat access) , and yes the amount of columns which was being reference is up to 60 columns and each column include analytic function with order by , as my underline table include very few amount of data ( up to 100 rec’s , - its dev env ) I will expect that Vertica will consider it when its calculate the needed resources , however setting SET_DEBUG_LOG to EE->SORT , I see that the execution engine estimate different and high amount of records (~2500) .
According to Sharon’s answer in the attach looks like EE is taking consider the amount of memory from resource manager however when he think it need more he allocate more .
http://vertica-forums.com/viewtopic.php?f=44&t=1441&p=4797&hilit=ERROR%3A+Insufficient+resources+to+sharon+ORDER+BY+#p4797
“- simple queries without an ORDER BY or where the ORDER BY matches a projection's ORDER BY will use very little memory - they don't need to sort for example
- queries with "externalizing operators" - that is, operators that can spill to disk - sort, group by, hash join - will start with a default buffer size for all operators and will then split the remaining memory among the externalizing operators up to the "query budget"
- if the minimum buffer size for all operators exceeds the "query budget", then the query will use more than the "query budget".
Above statment lead to the understending that at some point resource limit is being ignore ! .
Bottom , I need to find way to limit this allocation for Dev and Demo environments where I can’t allocate storage servers with big memory .
Let me know if you need more info .
Thanks
When we talk about the design of a table, we mean its schema. For example, if you had 60 columns of varchar(12000), you might see that accessing 60 columns is very expensive! Without seeing your schema, I cannot say more about why your query might be requiring more memory than the resource pool can accommodate.
https://my.vertica.com/docs/6.1.x/HTML/index.htm#12722.htm
If you would like to share, please try EXPORT_TABLES('', 'schema.table_name');
CREATE TABLE mytable
(
id IDENTITY ,
cpu_idle_time float,
cpu_user_mode_util float,
mem_pageout_rate float,
process_started float,
mem_util float,
disk_write_byte_rate float,
"timestamp" timestamptz NOT NULL,
timestamp_utc timestamp NOT NULL,
disk_read_byte_rate float,
net_packet_tx_rate float,
cpu_context_switch_rate float,
disk_io float,
alive_processes float,
disk_request_queue float,
cpu_util_time float,
cpu_run_queue float,
net_packet_rx_rate float,
mem_swap_util float,
num_net_packet_tx float,
cpu_clock_speed float,
num_net_packet_rx float,
cpu_sys_mode_pct float,
active_processes float,
host_name varchar(25),
disk_io_rate float,
mem_free float,
net_packet_rate float,
cpu_util float,
io_interrupt_rate float,
sys_uptime float,
source varchar(25),
net_error_rate float,
sourceid varchar(25),
disk_byte_rate float
)
PARTITION BY (((date_part('year', mytable.timestamp_utc) * 100) + date_part('month', mytable.timestamp_utc)));
ALTER TABLE mytable ADD CONSTRAINT PK_OA_SYSPERF_GLOBAL PRIMARY KEY (id);
Your query seems to be unnecessarily complex. Is this a simplified example of a problem you are seeing with a more interesting query? Unless I am misunderstanding, you can write your query using simple aggregate (non-analytic) 'max' functions and a GROUP BY hostname. Removing the unnecessary ORDER BY will also give you a large amount of savings.
I was able to reproduce your results. Each MAX() OVER (... ORDER BY...) requires a large amount of memory - planned concurrency cannot help you reduce this memory requirement. I'll follow up with the appropriate groups in engineering to investigate if there is an improvement that we can make. Unfortunately, we will not be communicating our progress on this matter to the community (these forums).
It appears that each of the MAX() clauses has a high memory requirement, partly because of the ORDER BY portion of the OVER clause. If you remove the ORDER BY, I suspect you will find much better memory usage.
Thank you very much for your report. I look forward to helping you resolve the problem, now that we know what is causing it.
Cheers.
I am using 4 Cores 8GB none cluster machine, redhat 6.3
Also,
I have created a pool named ETL_BATCH and assign user to work with this pool:
MEMORYSIZE= 1G
MAXMEMORYSIZE =6.5G
PLANNED_CONCURRENCY=2
MAX_CONCURENCY=150
EXECUTING_PARALLELISM =AUTO
PRIORITY=40
RUNTIME_PRIORITY=2
The error that i got:
ERROR 3587: Insufficient resources to execute plan on pool batch_etl [Request Too Large:Memory(KB) Exceeded: Requested = 7019241, Free = 6520059 (Limit = 6520059, Used = 0)]
can you advice please?
The query you are running appears to have a memory requirement larger than the resource pool on which it is being run. A 660 column merge is large. It includes reading from two projections (660 columns each) and, in your scenario, writing to 3 (660 columns each) including updates, which have additional cost (in comparison to inserts).