Options

Unable to restart vertica cluster

Hi,

 

I have a 3node AWS cluster and suddenly got shutdown. I am unable to find the root cause of the Vertica cluster down.Please help me.

 

While i try to restart i saw the below log:

 

2015-07-01 04:26:16.172 Timer Service:0x5f185c0 <LOG> @v_est_node0001: 00000/5021: Timer service done; closing session
2015-07-01 04:26:16.173 Spread Mailbox Dequeue:0x5edac60 [Comms] <INFO> Spread dequeue thread exiting
2015-07-01 04:26:16.173 Main:0x4fb3900 [Comms] <INFO> stop: disconnecting #node_a#N010050013222 from spread daemon
2015-07-01 04:26:16.173 Main:0x4fb3900 [Comms] <INFO> connected: false
2015-07-01 04:26:16.173 Main:0x4fb3900 [Comms] <INFO> DB Group changed
2015-07-01 04:26:16.173 Main:0x4fb3900 [VMPI] <INFO> DistCall: Set current group members called with 0 members
2015-07-01 04:26:16.173 Spread Client:0x5d3ad00 [Comms] <WARNING> error SP_receive: Illegal spread was provided
2015-07-01 04:26:16.173 Spread Client:0x5d3ad00 [Comms] <INFO> spread thread exiting
2015-07-01 04:26:16.173 Main:0x4fb3900 [VMPI] <INFO> Ending session prdae-vtc22e-10593:0x78 due to loss of 45035996273718950
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeSetNotifier: node v_est_node0002 left the cluster
2015-07-01 04:26:16.174 Main:0x4fb3900 [Recover] <INFO> Node left cluster, reassessing k-safety...
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeSetNotifier: node v_est_node0001 left the cluster
2015-07-01 04:26:16.174 Main:0x4fb3900 [Recover] <INFO> Node left cluster, reassessing k-safety...
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Lost membership of the DB group
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Removing #node_a#N010050013222->v_est_node0001 from processToNode and other maps due to departure from Vertica:all
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeToState map:
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Removing #node_b#N010050013043->v_est_node0002 from processToNode and other maps due to departure from Vertica:all
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeToState map:
2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Lost membership of V:All
2015-07-01 04:26:16.174 DistCall Dispatch:0x7f14b4002c30-b0000002467ab6 [Txn] <INFO> Rollback Txn: b0000002467ab6 'rebalance_cluster(background)'
2015-07-01 04:26:16.674 Main:0x4fb3900 [Recover] <INFO> Moving-out all projections for node
2015-07-01 04:26:16.675 Main:0x4fb3900 [Txn] <INFO> Begin Txn: a00000004780ef 'Recovery: Analyze move-out'
2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Starting Commit: Txn: a00000004780ef 'Recovery: Analyze move-out'
2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Commit Complete: Txn: a00000004780ef at epoch 0xf9156a
2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Begin Txn: a00000004780f0 'Recovery: Update CPEs'
2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Rollback Txn: a00000004780f0 'Recovery: Update CPEs'
2015-07-01 04:26:16.676 Main:0x4fb3900 [Recover] <INFO> Node move-out complete. Last good epoch=0xf91569
2015-07-01 04:26:16.677 Main:0x4fb3900 [Main] <INFO> Writing epoch=0xf91569, ending at '2015-06-30 20:10:50.486351-04', catalog version=0x10cda75, K-safety=1, AHM=0xf8caa9, ending at '2015-06-30 15:44:05.732736-04', to epoch log file [/vertica/data/est/v_est_node0001_catalog/Epoch.log]
2015-07-01 04:26:16.677 Main:0x4fb3900 [Shutdown] <INFO> Shutting down node
2015-07-01 04:26:16.677 Main:0x4fb3900 [Init] <INFO> Stopping Executor service
2015-07-01 04:26:16.677 Main:0x4fb3900 [Comms] <INFO> Stopping spread monitoring
2015-07-01 04:26:16.677 Main:0x4fb3900 [Init] <INFO> Stopping thread manager
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [Init] <INFO> Uninitializing storage
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool general -  Queries: 10000 Threads: 10630 File Handles: 53939 Memory(KB): 27018158
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool sysquery -  Queries: 10000 Threads: 10655 File Handles: 54069 Memory(KB): 27083694
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool sysdata -  Memory(KB): 1048576
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool wosdata -  Memory(KB): 2097152
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool tm -  Queries: 3 Threads: 10710 File Handles: 54346 Memory(KB): 27222958
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool refresh -  Queries: 10000 Threads: 10630 File Handles: 53939 Memory(KB): 27018158
2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool recovery -  Queries: 3 Threads: 10630 File Handles: 53939 Memory(KB): 27018158
2015-07-01 04:26:16.679 unknown:0x7f1587601700 [ResourceManager] <INFO> pool dbd -  Queries: 10000 Threads: 10630 File Handles: 53939 Memory(KB): 27018158
2015-07-01 04:26:16.679 unknown:0x7f1587601700 [ResourceManager] <INFO> pool jvm -  Queries: 10000 Threads: 1077 File Handles: 5468 Memory(KB): 2739089
2015-07-01 04:26:16.679 unknown:0x7f1587601700 [Init] <INFO> Dumping out open file descriptors
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 0[[STDIN]] -> /dev/null
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 1[[STDOUT]] -> /vertica/data/est/dbLog
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 2[[STDERR]] -> /vertica/data/est/dbLog
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 3[Unknown] -> /vertica/data/est/v_est_node0001_catalog/startup.log
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 4[Unknown] -> /vertica/data/est/v_est_node0001_catalog/ErrorReport.txt
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 5[Unknown] -> /proc/88585/maps
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 6[Unknown] -> /vertica/data/est/v_est_node0001_catalog/vertica.log
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 7[Unknown] -> /proc/stat
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 8[Unknown] -> socket:[417699222]
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 9[Unknown] -> /proc/88590/fd
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 10[Unknown] -> /vertica/data/est/v_est_node0001_catalog/vertica.pid
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 12[Unknown] -> pipe:[417699274]
2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 13[Unknown] -> pipe:[417699274]
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 14[Unknown] -> socket:[417699341]
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 16[Unknown] -> pipe:[417699934]
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 17[Unknown] -> pipe:[417699934]
2015-07-01 04:26:16.680 unknown:0x7f1587601700 [Init] <INFO> Dumping out memory usage data
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3917: Memory usage in Tiered Free List(global):
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^3: 2239 on free list; 64156 still in use (531160 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^4: 203 on free list; 32879 still in use (529312 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^5: 7 on free list; 27299 still in use (873792 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^6: 4 on free list; 6361 still in use (407360 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^7: 8135 on free list; 13757 still in use (2802176 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^8: 286 on free list; 2479 still in use (707840 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^9: 2 on free list; 1126 still in use (577536 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^10: 118 on free list; 2817 still in use (3005440 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^11: 0 on free list; 312 still in use (638976 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^12: 0 on free list; 554 still in use (2269184 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^13: 1 on free list; 465 still in use (3817472 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^14: 0 on free list; 164 still in use (2686976 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^15: 23 on free list; 82 still in use (3440640 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^16: 1 on free list; 31 still in use (2097152 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^17: 1 on free list; 6 still in use (917504 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^18: 0 on free list; 8 still in use (2097152 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^19: 2 on free list; 0 still in use (1048576 bytes)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5072: Total memory accounted for by Tiered Pool Allocator: 28448248
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3918: Memory usage in Typed Pool Allocator
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKS0_IyxExEE: 0 used / 1 free @ 56 (56 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN3CAT4Tier11CatalogTierEN6Basics6gpvsetIyEEEE: 0 used / 2 free @ 88 (176 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKPN3CAT13CatalogObjectENS1_13TieredCatalog14NewbornDetailsEEE: 0 used / 3 free @ 56 (168 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyPN3CAT13CatalogObjectEEE: 0 used / 84046 free @ 48 (4034208 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT13TieredCatalogE: 0 used / 6 free @ 13944 (83664 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN3CAT13TieredCatalog13SchemaAndNameEN6Basics6gpvmapIxPNS1_13CatalogObjectEEEEE: 1644 used / 687 free @ 256 (596736 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKjN3CAT16VersionedOidListEEE: 94206 used / 7794 free @ 136 (13872000 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeIyE: 11473 used / 29668 free @ 40 (1645640 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN3CAT10CatNameStrEN6Basics6gpvmapIxPNS1_13CatalogObjectEEEEE: 61 used / 0 free @ 248 (15128 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St10_List_nodeISt4pairIyxEE: 871655 used / 49201 free @ 32 (29467392 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKxPN3CAT13CatalogObjectEEE: 163087 used / 15517 free @ 48 (8572992 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyN3CAT13TieredCatalog16CatObjIndexEntryEEE: 172855 used / 14834 free @ 336 (63063504 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKxN3CAT13TieredCatalog12SnapshotInfoEEE: 0 used / 8 free @ 72 (576 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyxEE: 0 used / 17 free @ 48 (816 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT12CommitRecordE: 0 used / 1 free @ 112 (112 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT18TruncateTableEventE: 22 used / 0 free @ 32 (704 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT15SnapshotMementoE: 4 used / 0 free @ 80 (320 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9DVMiniRosE: 2994 used / 198 free @ 112 (357504 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT13SegmentBoundsE: 1768 used / 102 free @ 40 (74800 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9MinMaxObjE: 57215 used / 4194 free @ 88 (5403992 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7MiniRosE: 59033 used / 4296 free @ 128 (8106112 bytes total)
2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9LocalNodeE: 1 used / 1 free @ 32 (64 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9SALColumnE: 18397 used / 0 free @ 32 (588704 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7SegmentE: 342 used / 0 free @ 136 (46512 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10ProjColumnE: 18397 used / 1385 free @ 264 (5222448 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKiS0_IysEEE: 18397 used / 0 free @ 56 (1030232 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKS0_IysEiEE: 18397 used / 0 free @ 56 (1030232 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10ProjectionE: 747 used / 0 free @ 632 (472104 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT18LicenseAuditRecordE: 644 used / 0 free @ 160 (103040 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10ConstraintE: 119 used / 0 free @ 280 (33320 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9AttributeE: 0 used / 1 free @ 240 (240 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT5TableE: 517 used / 0 free @ 560 (289520 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8SequenceE: 33 used / 0 free @ 256 (8448 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9ProcedureE: 32 used / 0 free @ 448 (14336 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7LibraryE: 2 used / 0 free @ 328 (656 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT14ElasticClusterE: 1 used / 0 free @ 80 (80 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT15TuningRuleParamE: 0 used / 1 free @ 40 (40 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10TuningRuleE: 19 used / 0 free @ 264 (5016 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT16ViewRelationInfoE: 0 used / 1 free @ 24 (24 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4ViewE: 244 used / 0 free @ 328 (80032 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7ProfileE: 1 used / 0 free @ 296 (296 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT12ResourcePoolE: 9 used / 0 free @ 336 (3024 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7LicenseE: 2 used / 0 free @ 288 (576 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT14GlobalSettingsE: 1 used / 0 free @ 304 (304 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT15StorageLocationE: 3 used / 0 free @ 272 (816 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4NodeE: 3 used / 2 free @ 328 (1640 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT5GrantE: 805 used / 0 free @ 48 (38640 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN5boost14dynamic_bitsetIySaIyEEEjEE: 4 used / 8 free @ 72 (864 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyjEE: 3 used / 6 free @ 48 (432 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8DatabaseE: 1 used / 1 free @ 416 (832 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKxxEE: 83 used / 82 free @ 48 (7920 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8EpochMapE: 1 used / 1 free @ 136 (272 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4RoleE: 4 used / 0 free @ 296 (1184 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8PasswordE: 0 used / 1 free @ 32 (32 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4UserE: 4 used / 0 free @ 536 (2144 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT6SchemaE: 17 used / 0 free @ 232 (3944 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5078: Total memory usage in Typed Pool Allocator: 144284568
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3917: Memory usage in Tiered Free List(global):
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5072: Total memory accounted for by Tiered Pool Allocator: 0
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3918: Memory usage in Typed Pool Allocator
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3SAL8WOSAllocE: 0 used / 2 free @ 128 (256 bytes total)
2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5078: Total memory usage in Typed Pool Allocator: 256
2015-07-01 04:26:16.681 unknown:0x7f1587601700 [Init] <INFO> Global pool memory usage: NewPool(0x4dcca80) 'GlobalPool': totalDtors 0 totalSize 266338304 (93605264 unused) totalChunks 7
2015-07-01 04:26:16.681 unknown:0x7f1587601700 [Init] <INFO> SAL global pool memory usage: NewPool(0x4e26760) 'SALGlobalPool': totalDtors 0 totalSize 2097152 (2096864 unused) totalChunks 1
2015-07-01 04:26:16.681 unknown:0x7f1587601700 [Init] <INFO> SS::stopPoller()
2015-07-01 04:26:16.683 unknown:0x7f1587601700 [Init] <INFO> DC::shutDown()
2015-07-01 04:26:16.683 unknown:0x7f1587601700 [Init] <INFO> Shutdown complete. Exiting.
2015-07-01 04:26:16.683 unknown:0x7f1587601700 [SAL] <INFO> Unmounting file system 0(Default Linux File System).
2015-07-01 04:26:16.683 unknown:0x7f1587601700 [SAL] <INFO> Unmounting file system 1(Hadoop File System).
2015-07-01 04:26:16.956 unknown:0x7f1587601700 [Command] <INFO> Library file has been unloaded successfully
2015-07-01 04:26:16.956 unknown:0x7f1587601700 [Command] <INFO> Library file has been unloaded successfully
 
 

Comments

  • Options
    SruthiASruthiA Vertica Employee Administrator

    Hi Dilip,

     

         While installation, Did you configure spread to use point-to-point communication between all vertica nodes?

     

     

    -Regards,

     Sruthi

  • Options

    Can you post the content of your /vertica/data/est/dbLog !?

  • Options

    Hi,

    Thanks for the quick reply.

    Tried to restart again and it still shows node3 is under recovery mode:

    dbadmin=> select * from nodes;
    node_name | node_id | node_state | node_address | export_address | catalog_path | is_ephemeral
    ----------------+-------------------+------------+--------------+----------------+--------------------------------------------------+--------------
    v_est_node0001 | 45035996273704980 | UP | 10.50.13.222 | 10.50.13.222 | /vertica/data/est/v_est_node0001_catalog/Catalog | f
    v_est_node0002 | 45035996273718950 | UP | 10.50.13.43 | 10.50.13.43 | /vertica/data/est/v_est_node0002_catalog/Catalog | f
    v_est_node0003 | 45035996273718954 | RECOVERING | 10.50.13.200 | 10.50.13.200 | /vertica/data/est/v_est_node0003_catalog/Catalog | f
    (3 rows)

    dbLog Output:

    Conf_load_conf_file: using file: /vertica/data/est/v_est_node0003_catalog/spread.conf
    Successfully configured Segment 0 [10.50.13.43:4803] with 1 procs:
    N010050013043: 10.50.13.43
    Successfully configured Segment 1 [10.50.13.200:4803] with 1 procs:
    N010050013200: 10.50.13.200
    Successfully configured Segment 2 [10.50.13.222:4803] with 1 procs:
    N010050013222: 10.50.13.222
    Connected to spread on local domain socket 4803
    Starting UDxSideProcess for language C++
    with command line: /opt/vertica/bin/vertica-udx-C++ 3 prdae-vtc23e-2944:0x2 debug-log-off /vertica/data/est/v_est_node0003_catalog/UDxLogs
    Starting UDxSideProcess for language C++
    with command line: /opt/vertica/bin/vertica-udx-C++ 3 prdae-vtc23e-2944:0x15 debug-log-off /vertica/data/est/v_est_node0003_catalog/UDxLogs

  • Options

    Hi,

     

    Thanks for the quick reply.

    Some one else installed this for me. How to check this?

     

    Thanks

  • Options
    SruthiASruthiA Vertica Employee Administrator

    HI,

     

      Check the entry for controlmode parameter in admintools.conf. is it broadcast or pt2pt?

     

     

    -Regards,

     Sruthi

  • Options

    Hi,

     

    I forgot to mention one more thing.

    node 3 got crashed and then node 2 cpu is very high before vertica went down.

     

    Now when the node is back, when we tried to restart the node we got the db back and node 3 shows in recovery mode.

     

    Thanks in advance.

  • Options
    SruthiASruthiA Vertica Employee Administrator

    Hi,

     

      Is the database k-safe? Is the node in RECOVERING state from long time? Can you share me the output of

     

    -> select * from projection_recoveries

     

     

    -Regards,

     Sruthi

  • Options

     

    Hi, 

    Run the following : 

    select * from projection_storage where wos_used_bytes < 0;

    If nothing return stop the vertica process on the node where the and restart it. Recover should start again.

     

    If you get any output from the check to see if you have your AHM stuck(left behind)

    select current_epoch,ahm_epoch,last_good_epoch,refresh_epoch from system;

    If soo see if you can execute this:

    select make_ahm_now();

     

  • Options

    Hi 

     

    Is node 03 still recovering? Can you kindly provide us the latest output of below commands:

     

    vsql => select * from system;

    vsql => select * from nodes;

    vsql => select * from recovery_status where is_running='t';

    vsql => select * from projection_recoveries where status = 'running';

     

    Regards

    Rahul Choudhary

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file