correlation between resource pool memory usage and o/s memory usage

Hi all,
i'm having hard time to correlate the consumption of memory in resource_pool_status 
and system_resource_usage/host_resources or even memory usage from the os itself.

my cluster / os info :

10 node cluster reside on amazon aws (m3.2xlarg)
vertica 6.1.3
30 GB RAM 8 CPU

When we are running about 6-8 queries I'm seeing in resource_pool_status that it consume
about 22 GB of ram which is almost 80% usage while looking at the o/s metrics (all of them give the same results) I see  about 30%-40% usage.

If the o/s is right then we are not utilizing all of our memory and getting into the queue at about 40% memory usage.
If resource pool is right then why do I see this diff.
Does someone has this issue ? or can explian me how to solve this issue ?


I ran a simple test case that insert into a table the result of the os memory and resource pool memory.
I run it in loop for several minutes.


insert into rpvsos
select *  from
(select sysdate::timestamptz as os_date,substr(node_name,instr(node_name,'node')) as os_node_name
,((total_memory_bytes - (total_memory_free_bytes+total_buffer_memory_bytes+total_memory_cache_bytes))/1024/1024)::number(30,2)
||' ('||((total_memory_bytes - (total_memory_free_bytes+total_buffer_memory_bytes+total_memory_cache_bytes)) / total_memory_bytes*100)::number(4,2)||'%)'  "usage"
from host_resources hr join nodes n on (hr.host_name = n.node_address)
where (node_name = 'v_insight_node0003' )
) a
join
(select sysdate::timestamptz as rp_date,node_name as rp_node_name,sum(running_query_count) as running_query_count,
sum((memory_inuse_kb/1024))::int as mem_inuse
FROM RESOURCE_POOL_STATUS
where node_name = 'v_insight_node0003'
group by node_name ) b
on (1=1)
;
commit;



Thanks 

Comments

  • Hi zvika,
    The memory usage reported by the resource pool is the reserved memory, and not the actual memory. In your case, assuming your plannedconcurrency is 8 (for the 8 cores) and you have , say, 16 G memory allocated for this pool, Vertica is going to allocate memory to each session by dividing the available memory by the planned concurrency. This means 2G of memory per session allocated, but its very unlikely all of the memory will be used per query. So the usage reported by the OS is the actual memory used, and the one reported by the resource pool is the allocted memory based on your pool settings. You can alter the resource pool to limit the memory and or plannedconcurreny and see how this allocation changes.

    Hope that helps.

    /Sajan
  • Thanks Sajan,
    in resource_pool_status column MEMORY_INUSE_KB  is "Amount of memory in kilobytes acquired by requests running against this pool."
    The same is in RESOURCE_ACQUISITIONS 

    So from this I understand that resource pool acquired  the memory and the os cannot use this memory until released by the pool otherwise when needed this memory it may not be able to get it .
    Also, queries will get into the queue once reached the threshold by memory_inuse_kb so I will get into the queue while I have in theory a lot of free memory .

    Correct me if I wrong.


  • Hi Zvika,

    I've observed the same thing.  Resource pools are fully utilized in resource_pool_status and the memory that the running queries "might use", but the memory at the os level is only at ~40% usage.  

    Your statement:
    resource pool acquired  the memory and the os cannot use this memory until released by the pool 
    The os can actually use the memory that hasn't been acquired.  Queries are reserving memory within the resource pools, but not actually acquiring that memory because they don't need it.  But if your resource pools are fully-allocated then queries within vertica won't be able to reserve additional memory.

    I actually asked Engineering about this behavior recently.  They confirmed it as true and didn't suggest that there were any improvements coming in the area of memory utilization.

      --Sharon






  • 10x Sharon,
    I also open case for it but wanted to get quick answer from the community. 

    So queries will acquire memory while the os will see it as free (the reserve memory that is not being really used by the query).
    the os see the real consumption of memory while the pool see the acquire consumption.

    So what if I have 28 GB in general pool and queries acquired/reserve most of this memory (till queue threshold),
    The os will see that for example only 10 Gb usage (real query memory usage).
    Next query will go into the queue (resource pool doesn't have any reserve and all been acquired)

    If now an external process will need 10 GB of memory, what will happened ?
    Are we wasting more than 10 GB of free Ram ?
  • Hi Zvika,

    That's my take on it - an external process would be able to allocate that 10G of memory since Vertica isn't really using it.  But you should never allow an external process to use 10G if your general pool is configured with the default of 95% of memory.  Allowing other apps to use significant memory could lead to swapping because you don't know up front which queries will really allocate how much memory.  But I think this specific question was just theoretical.

    It's my impression that the memory really is being under-utilized.  If you get confirmation from Support or hear differently, do let us know.

      --Sharon




  • Thanks Sharon for the detailed reply!

    For what it's worth:  Vertica doesn't necessarily use a constant amount of memory over the whole duration of the query.  In fact, with certain operations (not the typical case), Vertica's memory utilization can be extremely bursty; a basic sampling approach might miss a very brief, very large allocation of system memory.  This may change in future releases; in fact, our memory utilization can in general change from release to release (as long as it stays within the bounds set by resource pools), so beware of making any assumptions based on apparent resource usage.

    That said -- resource pools are necessarily an upper bound, since our queries can't exceed them.  If the data that the query ends up accessing is smaller than expected, or if you're running the data through some operator and all of the values happen to be really cheap for the operator to compute its results, then we don't use as much memory.  We can improve our estimates in many cases; but in the general case, how do we know that the last row in the whole data set isn't the one that requires tons of memory to compute?  So, because they're required to be a conservative upper bound, resource reservations will always tend to be underestimates.

    In any case, as always, you're welcome to get in touch with Vertica Support if you have questions about this sort of stuff.

    Adam
  • Thanks Sharon & Adam

    Sharon you are right. This is only theoretical question as I want to understand better how Vertica works (Actually I have some more Internal architecture questions :-) )


    So from what Adam wrote (correct me if i'm wrong)


    Vertica real memory usage is dynamic and it depends on the operator and it's data 
    A query can use 10% of the reserved memory it allocated but no other query can use this reserved memory until the query finished (This because Vertica doesn't know if the last row will use all the memory **). 


    That being said
    We can get into Vertica queue even if we are using only small amount of memory.




    ** Let's say we reserved 2 gb ram for on query.
    the operator (never mind which one ) only needs 100 kb but the last row needs 5 gb .
    what will happened then (When we have free memory in our resource pool and when we don't )?




    Zvika
  •  Great topic, 
    I think here is were the dynamic resource pool should come in ! 
    As it states in the Dragline paper dynamic resource pool are available - but i haven't got any doc online that show how they will work. 

  • Good question -- so, in theory, this scenario shouldn't be possible.  For each query, when we plan the query, the amount of memory that we reserve is intended to be the maximum amount of memory that the query could possibly use in the worst case.  So if there exists a set of data that would cause the query to use 5gb RAM, the query should reserve 5gb RAM up front.  Currently-released versions of Vertica will never queue a query at runtime.

    (Adrian mentions "dynamic resource pools".  I can't comment on features that haven't been publicly released yet, but such a question would, hypothetically, be a very good question to ask in the context of this conversation.  The following applies to already-released Vertica versions.)

    Many SQL operations simply know up front exactly how much memory they are going to need.  So often this is a non-issue.  But there are a few common scenarios where this is not the case:

    - COPY:  Vertica stores its data sorted.  Sorting algorithms are fastest if they have lots of RAM; disk-backed sorts are slow.  And we don't always know up front exactly how much data you're going to give us.  (Remember, we want a hard upper bound, not a best guess.)  So COPY will tend to reserve as much memory as it is allowed to consume.  But what if you run a COPY, then feed it just one small row and close the statement?  We won't need most of those resources.  In this case you probably want an INSERT .. VALUES instead.

    - Say you have a free-form text field.  Maybe you make it a max-size LONG VARCHAR, since it could in theory be quite long.  Say most of your users only type 100 characters or so into the field.  But the data could in theory end up being 32,000,000 bytes...  There is, of course, much more to this story in practice.  But conceptually, that's why Vertica performs better if you can put a tighter bound on your field sizes.

    - User-Defined Functions.  Vertica can run your C++/Java/R code for you.  Arbitrary custom code can, well, do anything.  Our SDK has an API that lets a UDx specify up front the maximum amount of memory that it could use; this is used by the resource manager to reserve memory for the query.  Vertica reserves the right (though current versions don't always do this) to abort queries containing user code that exceeds what it committed to as its maximum-possible memory usage.
  • We see few  cases  where   plannedconcurrency do not have special  impact on the allocation (when analytic functions are being use ) I recommend to read this   https://community.vertica.com/vertica/topics/resource_limit_not_being_enforce_for_analytic_functions_with_order_by

     

      

  • So In theory
    If I will profile all my queries 
    I can find the real memory usage of all queries. 
    Then I will adjust the planned concurrency high enough so the query budget
    will be enough for 95% (hopefully 100%) of the queries.
    Now, what will happen if a query get less memory than it really needs (and we have free memory to give)
    1. at parse time 
    2. at run time 

    @adam you said 
    " COPY will tend to reserve as much memory as it is allowed to consume. "

    Did you mean that copy will get the memory budget all the time, even if it's really small copy ?
    What if the copy needs more memory in real time (above the budget and we have free memory) ?



    p.s

    @Adam, Can you elaborate on how Vertica calculate the query memory footprint ?


    Thanks,

    Zvika
  • Hi
    I personally don’t see until now cases where query failed with memory allocation at the runtime , Vertica will probably use its  temp area  for that cases that extra memory is needed and not available .
  • Hi Zvika,

    Vertica memory calculations are explained below:

    1 MB * Number of cores/node * Number of columns of all tables involved in the query) + ( 2 MB * Number of DVROSs).

    For example: 
    Say each node in the cluster has 24 cores.
    There are 2 tables involved in the query, each table is having 50 columns and there are 20 DVROSs 
    (i.e. there are 20 records in the delete_vectors table for the projections related to the query)

    (1 MB * 24 * 2*50 + 2MB*10 = 2400 MB) + (2 MB * 20 = 40MB) = 2440 MB = 2.38GB.

    We need 2.38 GB/node to process that query.

    I hope this helps in planning your resource pools better and also considering other factors outside resource pool parameters.

    Thanks,

    Sashi

    ~Vertica Support

  • Thanks a lot,

    Do you know also the answer to my other questions above ?
  • Hi Zvika,

    Let me summarize, so you get your answers this way.

    Planned concurrency defaults to the number of cores/node(Will take higher values when hyper threaded - Recommended only in few cases). This will help you budget the query and allow concurrent queries to use memory based on what pool you are running in. 95% means several gigs(majority) availabe and the queries should be self sufficient. (You are right about profiling before hand to learn the estimates and tuning the pools as per your needs). Of course, EXPLAIN and PROFILE tells you a lot. Keep in mind - memory estimates can go higher in PROFILE when projections require optimization on joins, group bys, stale statistics on columns/tables, broadcasting/re-segmentation due to poor design - Tons of things to do before budgeting queries/tuning pools. Rewriting sqls for better plans in some cases could be a solution as well.

    After all that optimization and tuning steps, we have some numbers in the pool.
    Now, when you run the query in the pool, It will only reserve the memory it needs based on the plan and planning steps. It will reserve certain x KBs of memory and executes the query. These estimates tend be closer to accurate figures ideally when you have done all of the performance tuning steps and with no options left. This is all good till here. There is no excessive memory reserved and not being utilized - all of the problems described in your question. 

    However, these estimates can go wrong, memory can be over-estimated(which cannot be used as we really don't know when we might need it during the query execution - due to stale stats mainly), under-estimated(gives you wrong numbers and more memory is borrowed during the query execution if required) when skipped/missed any of the tuning/optimization steps.
    Bad sqls, Poor projection design, Stale stats, DVROSs and other things will always lead to inefficient resource usage. 

    That is why we insist to run the DBD to have the best possible design for optimal performance, follow optimization steps, tune resource pools and follow best practices to avoid any issues. 

    I hope I have given all you expected.


    Thanks,

    Sashi

    ~ Vertica Support

    PS: Remember, it is always estimates and this is how it is any database. 
  • 10x for the info and sorry for the delay 

    I want to read it carefully and understand the whole picture .
    I will update once finished .


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file