Memory issue occurring repeatedly on a particular node with same spec as other nodes
Below is what the log typically looks like before a crash. Any thoughts?
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 536870920: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> Global pool memory usage: NewPool(0x4735320) 'GlobalPool': totalDtors 0 totalSize 132120576 (54010720 unused) totalChunks 6
2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> SAL global pool memory usage: NewPool(0x4725380) 'SALGlobalPool': totalDtors 0 totalSize 2097152 (1523720 unused) totalChunks 1
2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> SS::stopPoller()
2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> DC::shutDown()
2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> Shutdown complete. Exiting.
Comments
Hi,
Is the cluster UP?? Can you access the database? Can you share me the output of
select ahm_epoch,current_epoch, last_good_epoch
Sruthi
ahm_epoch | current_epoch | last_good_epoch
-----------+---------------+-----------------
684168 | 684169 | 684168
Hi ,
Were there any changes mede on OS level , like limits or something related to this ?
The machines should be exactly the same. The ulimit on all three machines are the same, and when they were built on aws they were created in the same manner. The only main difference with this machine (from the other two) is that it is used substantially more for reading/writing rows into the db.
Hi ,
Do you have the same server HW spec on each one of the nodes ? do you have other process than vertica running on this node ? , looks like some process is eating your file system cache.
Vertica is heavy use file system cache.
You should monitor your file system cache using the below method :
watch -n 2 free .
I hope you will find it useful
Thanks