Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
Is there any possibility of creating hierarchical partitioning using a non-date type column
We have some complex requirements to partition the tables based on some ID column and grouped every few IDs (say 10) into one partition. But this limits to Max ROS containers count per projection.
Is there any way of creating hierarchical partitioning using a non-date type column?
Leave a Comment
Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.
I would like to better understand your requirement as partitioning by ID's over date. Can you please open Support case mark it as for Shrirang Kamat . I am worried that you may be running into other issues with mergeout etc when loading into ID's
I don't think Hierarchical Partitioning cares. You certainly don't have to use the calendar_days table to assist. By default, the syntax just uses basic CASE statements. The calendar_days usage is just an obfuscation to simplify the syntax. I've done hierarchical partitioning on LINUX epochs, for example. It just requires the necessary math to convert that into some sort of date, in order to establish the groupings. But the underlying partition syntax in that example was on an integer, not a date (it was just date-like).
Having said that, it's generally not value-added to partition on things that aren't date (or date-like) because you run the risk of taxing the mergeout process behind the scenes. Vertica keeps tabs on something called the "active partition". In a normal workflow, if I have a weeks' worth of data, and I partition by day, and I loaded data in real-time, then I'm only ever really loading data into the last partition. The partition that's created last is known as the active partition. When the mergeout runs, in order to consolidate data into fewer ROS containers, it tends to ignore the active partition, because it knows it's getting loaded into. Think of it like sleeping in your bed, waking up at 2 o'clock to use the bathroom, deciding to make your bed. There's no point in that, because you're just going to sleep in it again.
In this example, only the last partition (the active one) is really getting data loaded into it, and the inactive partitions can be merged down into a single ROS (generally).
With an int-based scheme for partitioning, it creates this counter-intuitive situation where ALL partitions could be recently loaded. and even though the mergeout will believe that inactive partitions (all the ones that aren't the active one) should be merged down to a single ROS container. That might be difficult if I'm constantly loading data into them.
Some clients have done this kind of thing. But then to compensate for it, they will either A) turn mergeout off (dangerous!), or alter the ActivePartitionCount (the number of partitions Vertica considers to be active at any given moment) to either the # of partitions, or to 0. If you set it to zero, then Mergeout tries to merge down every partition in the system. If you set it to the # of partitions, then mergeout tends to ignore everything (until it has to deal with it). Neither is ideal, and both are "hacks" designed to force the system into a weird non-standard behavior.
So, ultimately, I'd recommend against it, unless you've thought through all the ramifications. When you say you have a "complex requirement" for this, is it because you wish to use Vertica's COPY_ or MOVE_Partition functions for some reason? What requirement is driving this desire? Perhaps there's a more agreeable way to achieving that.
If you use the CALENDAR_HIERARCHY_DAY function for the GROUP BY clause it turns into
(CASE WHEN ("datediff"('year', foo.RequestDate, ((now())::timestamptz(6))::date) >= 2) THEN (date_trunc('year', foo.RequestDate))::date WHEN ("datediff"('month', foo.RequestDate, ((now())::timestamptz(6))::date) >= 6) THEN (date_trunc('month', foo.RequestDate))::date ELSE foo.RequestDate END)
if you check your table definitions.
So if you can come up with a CASE statement for your partition clause I think it is possible.
Thanks all for your suggestion. I will raise a support case to better discuss about the "Complex" requirements.