Setting memory limits for ENABLE_JOIN_SPILL

Shyam_KattiShyam_Katti Registered User
We have a SQL query that takes upto 30GB of memory and runs for 20 minutes without the ENABLE_JOIN_SPILL and approx. 2 GB and 2 hours when the ENABLE_JOIN_SPILL is turned on. Is it possible to change the memory requirements when a query runs with the ENABLE_JOIN_SPILL turned on? We would like to bump it up to balance out speed and memory consumption.

Comments

  • You could run the query against a different resource pool. For details, see the "Managing Workloads" section of the administrator's guide: https://my.vertica.com/docs/CE/6.0.1/PDF/Administrator's%20Guide.pdf Note that allowing more memory consumption has implications for other queries. It's a bit tricky; again, see that document for further details. In short, if you guarantee more memory to a query (rather than just letting it take memory from the system as it's available, as happens with the base no-spill case), then it can use a different, smarter algorithm; but even if it doesn't use all of that memory in practice, Vertica has still reserved it, so other queries can't use it.
  • Shyam_KattiShyam_Katti Registered User
    Thanks Adam for your suggestion. We did create a pool as follows: create resource pool sampleonly memorysize '20G' maxmemorysize '20G' queuetimeout none plannedconcurrency 1 singleinitiator false maxconcurrency 1;<CODE> We plan to have only 1 SQL query being executed in this pool at any point in time. When we run the SQL, it only takes 10.2G. SHOW MEMORYCAP query indicates the session memory is UNLIMITED. Is there any way to make the query use the entire MEMORYSIZE of the pool, and not borrow anything from the GENERAL pool?
  • Well, the spilling join algorithm is a different algorithm from the in-memory join. (Specifically, you're going from a hash join to a sort-merge join.) It's not like a continuous slider where, by giving the JOIN more memory, you asymtotically approach the same logic (and performance) as the fully in-memory case. Beyond a point, you can't just give the merge algorithm more memory; it wouldn't know what to do with it. I don't know your particular query; but typically, the best way to really speed up this type of query is to create a set of projections that are sorted such that you can do a merge without sorting first. This is the fastest kind of join, and it uses very little memory. The Database Designer will do this for you if it is possible. (With more-complicated queries it's not always an option, if you would need multiple mutually-exclusive sort orders or if you're joining on a complex expression and don't want to pre-compute that expression and save it to a column.) The merge-join algorithm is multi-threaded; each thread can use up to a certain amount of memory. So make sure to adjust the system so as to maximize the computed AUTO value for EXECUTIONPARALLELISM. (You can manually increase the value bu that's not likely to work well.)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file