Memory issue occurring repeatedly on a particular node with same spec as other nodes

Below is what the log typically looks like before a crash.  Any thoughts?

 

 

 

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 536870920: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.371 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:

        DETAIL:  Size 1073741832: 0 on free list; 0 still in use (0 bytes)

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use

2016-02-15 11:49:21.372 unknown:0x7fea8c0ae700 [SAL] <INFO> Large LRU usage: 0 free 0 in use

2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> Global pool memory usage: NewPool(0x4735320) 'GlobalPool': totalDtors 0 totalSize 132120576 (54010720 unused) totalChunks 6

2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> SAL global pool memory usage: NewPool(0x4725380) 'SALGlobalPool': totalDtors 0 totalSize 2097152 (1523720 unused) totalChunks 1

2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> SS::stopPoller()

2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> DC::shutDown()

2016-02-15 11:49:21.387 unknown:0x7fea8c0ae700 [Init] <INFO> Shutdown complete. Exiting.

Comments

  • SruthiASruthiA Administrator

    Hi,

     

       Is the cluster UP?? Can you access the database? Can you share me the output of

     

    select ahm_epoch,current_epoch, last_good_epoch

     

     

    Sruthi

  •  ahm_epoch | current_epoch | last_good_epoch 

    -----------+---------------+-----------------

        684168 |        684169 |          684168

  • Hi ,

    Were there any changes mede on OS level , like limits or something related to this ? 

     

     

  • The machines should be exactly the same.  The ulimit on all three machines are the same, and when they were built on aws they were created in the same manner.  The only main difference with this machine (from the other two) is that it is used substantially more for reading/writing rows into the db.

  •  

    Hi ,

    Do you have the same server HW  spec on each one of the nodes ? do you have other process  than vertica running on this node ? , looks like some process is eating  your file system  cache.

     

    Vertica is heavy use file system cache.
    You should monitor your  file system cache using the below method :
    watch -n 2 free .

     

     

    I hope you will find it useful

     

    Thanks 

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file