Tuple mover aborts on background partitioning in hard to detect infinite loop
Hi,
Yesterday I partitioned ~10TB table.
alter table xxx.yyy partitioned by ccc group by calendar_hierarchy_day(ccc,2,2);
alter table xxx.ttt reorganize;
Background repartitioning operation started.
This morning I am checking and finding a bunch of null values in partition_key in partitions.
Little digging show tuple mover aborting on partitioning, with error
Execution time exceeded run time cap of 00:30
Yes, it is a runtime limit of my personal admin account with pseudosuperuser.
Problem: Vertica is cycling seems to be in infinite loop, trying to partition and aborting again and again.
That is definitely not an optimal behaviour and should be addressed. Partitioning is a resource consuming operation and will affect cluster throughput. It is not very obvious to detect.
Looks like background repartitioning task is running with superuser privileges, but inherit resource pool from user who started operation. At least, it is worth mentioning in docs.
Tuple mover session is not listed in sessions. Each iteration of background repartitioning is running under different session. Fortunately, close_session kills background repartitioning task for good.
Looks like solution is to set resource pool in your session and do foreground partitioning with partition_table.
It is possible that setting resource pool in your session and starting background repartitioning will be working as well. It will be time consuming exercise and I did not check.
Worth to mention, I saw infinite loop on tuple mover mergeout aborts in past few times. So called problem of Jan 1st, when many tables start making yearly partitions. Vertica insisted on running several huge yearly mergeout in parallel, and running out of disk space. That was easy to detect, as Vertica screaming about running out of disk space. It appears problem fixed somewhere around v 11 or 12, this year Jan 1st was quiet.
Also... null values in partition_key in partitions are not good, can cause some strange effects.
Thank you
Sergey
Best Answer
-
mosheg Vertica Employee Administrator
Thank you Sergey,
Have you opened a support case regarding this issue?0
Answers
Hi @mosheg
This issue is not hard to work around (for me). I am not looking for help from support.
I hope I explained it well enough, and you can open internal JIRA with engineering.
Working with support would be very time consuming You will save me a lot of time if you can open JIRA by yourself.