Hung task timeout error
Why am I seeing this error?
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
vertica D 0000000000000002 0 60068 1 0x00000080
ffff88024d9e7c98 0000000000000082 ffffffff81ed09f0 ffff882012336ae0
ffff88024d9e7c68 ffffffff810aa570 ffff88024d9e7ca0 ffff882012336ae0
ffff882012337098 ffff88024d9e7fd8 000000000000fb88 ffff882012337098 Call Trace:
2401 [<ffffffff810aa570>] ? exit_robust_list+0x90/0x160
2402 [<ffffffff81072f95>] exit_mm+0x95/0x180
2403 [<ffffffff810733df>] do_exit+0x15f/0x870
2404 [<ffffffff81063340>] ? wake_up_state+0x10/0x20
2405 [<ffffffff81073b48>] do_group_exit+0x58/0xd0
2406 [<ffffffff81088e16>] get_signal_to_deliver+0x1f6/0x460
2407 [<ffffffff8100a265>] do_signal+0x75/0x800
2408 [<ffffffff81435b25>] ? sys_sendto+0x185/0x190
2409 [<ffffffff8100bbee>] ? invalidate_interrupt1+0xe/0x20
2410 [<ffffffff8100bc2e>] ? invalidate_interrupt3+0xe/0x20
2411 [<ffffffff8100bbce>] ? invalidate_interrupt0+0xe/0x20
2412 [<ffffffff810ace0b>] ? sys_futex+0x7b/0x170
2413 [<ffffffff8100aa80>] do_notify_resume+0x90/0xc0
2414 [<ffffffff8100b341>] int_signal+0x12/0x17
Comments
This problem occurs when there is an imbalance in processing power. In Vertica, the memory and processing power (CPU) may be too high and the disk throughput may be comparatively too low to keep up. All Vertica nodes must have an appropriate balance of CPU, RAM, disk throughput and network bandwidth.
To avoid this problem, you need to tune the vm.dirty_ratio and vm.dirty_background_ratio parameters in Linux. In addition, you need to set hung_task_panic to 0.
For details, see Tuning Linux Dirty Data Parameters for Vertica in the Vertica Knowledge Base in the Developer Community.