Vertica 6.1.2 memory consumption

We are migrating Vertica 6.1 to Vertica 6.1.2. and I observed one strange behavior. The same query on the same data mart require around 3GB of memory in Vertica 6.1.2 and only 100MB of memory in Vertica 6.1. The only difference in hardware configuration is that Vertica 6.1 instance has 8GB or RAM and Vertica 6.1.2 16GB. Does anybody know whether it's some feature in configuration of version 6.1.2 or have another explanation of such behavior. Vertica 6.1 reports average 30% of memory usage and Vertica 6.1.2 reports 80% of memory usage.I can't find explanation of that while the data mart is exactly the same for both instances. Best regards Bartek

Comments

  • Hi Bartlomiej, Hm... When you say these two data marts are exactly the same, what, precisely, have you verified to be identical? Is the data identical? Are the projections identical? Are primary keys declared identically on both databases? Do they both have up-to-date histograms and other statistics, computed with a reasonable sample size? The most common cause for this sort of issue is that Vertica chooses a different algorithm for a join or group-by operation between the two installations. The choice of algorithm is based on the information and options that the optimizer has available to it at query-plan time. You can view the query plan with the EXPLAIN directive; see the documentation for details. There may be another reason for this sort of change on particular queries, but I'm not aware of one offhand. Adam
  • Hi Adam, Both databases are loaded exactly from the same files and have exactly the same structure (tables, column formats, etc...). For both instances all projections are created only by DBD with the same optimize queries. Statistics are actual and calculated for all population (It's small data mart below 1GB of compressed data) I've checked one table used in query with group by and no join and found that both versions have only one projection with the same columns and sort positions. The only difference is encoding type for 2 columns. Vertica 6.1.2 has DELTARANGE_COMP and Vertica 6.1 has AUTO (for integer column) and BLOCK_DICT for decimal column (numeric(17,4)). The other 18 columns have exactly the same configuration for both projections. The only difference in query plan is statement "LOCAL RESEGMENT GROUPS" and higher cost (33K vs 27K). Vertica 6.1.2 version GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 33K, Rows: 17K] (PATH ID: 3) Vertica 6.1 version GROUPBY HASH [Cost: 27K, Rows: 29K] (PATH ID: 3) I can't find any reason for such different memory requirements, and especially requirement for around 3GB of memory seems strange while "used_bytes" for tested table is around 100MB (as reported by column_storage table). Bartek
  • Hi Bartek, Hm... Two more questions: Do the two databases have the same resource-pool and local-segmentation configurations? It sounds like something, either a configuration difference or a change in 6.1.2, has enabled the newer database to take advantage of local segmentation. If you're not familiar with it, local segmentation is where we break the data up into multiple streams, typically one per CPU core, and process each stream in parallel. (You can find more about local segmentation, and how to enable/disable it, in our documentation.) We don't always know the exact data distribution in advance; there's some probability that most or all of the records that you care about will end up in one thread's aggregator, so we allocate conservatively for each worker to avoid the risk of swapping in the worst case. This sort of change can happen very easily with small tweaks to the queries that you're running. Vertica's Resource Pools mechanism governs how much memory a query is allowed to use; you can decrease query memory usage with resource pools. (Vertica is designed to run big queries fast, so queries will use as much RAM as they think is useful, up to their resource pool's limit.) Less than 1gb compressed is a very small data size for Vertica. (Note that our free tier is up to 1TB raw. You can certainly use us with much-smaller data, but you're not hitting Vertica's sweet spot :-) ) If that's your normal working set, you might actually see some performance increases in general (as well as decreased RAM usage) by tweaking your parallelism/concurrency settings to decrease the number of threads that are used per query. If you're only looking at a few thousand or million rows, sometimes it's faster for Vertica to just churn through them than to set up a heavyweight parallelism infrastructure tuned for billions or trillions of rows. Adam
  • Adam , Any mechanism to copy stat's between databases ? oracle have such ability and it become very useful when you like to test your system behave on none production environments when your none production environment include partial data set . I think such ability will be very useful for debugging the below case .
  • Hey Eli, I can't say I've used this personally, but the documentation does discuss how to export and import statistics: https://my.vertica.com/docs/6.1.x/HTML/index.htm#11890.htm https://my.vertica.com/docs/6.1.x/HTML/index.htm#14013.htm Adam
  • Hi Adam, Both instances use general pool. I think that version 6.1.2 has local segments enabled. I don't now how to check this paramter. According to your suggestions I've set executionparalelism to 1 and increased plannedconcurency (more than cores and memory/2GB), this caused decrease of memory consumption per query. Thanks for the advice. Bartek

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file