Unable to restart vertica cluster
Hi,
I have a 3node AWS cluster and suddenly got shutdown. I am unable to find the root cause of the Vertica cluster down.Please help me.
While i try to restart i saw the below log:
2015-07-01 04:26:16.172 Timer Service:0x5f185c0 <LOG> @v_est_node0001: 00000/5021: Timer service done; closing session2015-07-01 04:26:16.173 Spread Mailbox Dequeue:0x5edac60 [Comms] <INFO> Spread dequeue thread exiting2015-07-01 04:26:16.173 Main:0x4fb3900 [Comms] <INFO> stop: disconnecting #node_a#N010050013222 from spread daemon2015-07-01 04:26:16.173 Main:0x4fb3900 [Comms] <INFO> connected: false2015-07-01 04:26:16.173 Main:0x4fb3900 [Comms] <INFO> DB Group changed2015-07-01 04:26:16.173 Main:0x4fb3900 [VMPI] <INFO> DistCall: Set current group members called with 0 members2015-07-01 04:26:16.173 Spread Client:0x5d3ad00 [Comms] <WARNING> error SP_receive: Illegal spread was provided2015-07-01 04:26:16.173 Spread Client:0x5d3ad00 [Comms] <INFO> spread thread exiting2015-07-01 04:26:16.173 Main:0x4fb3900 [VMPI] <INFO> Ending session prdae-vtc22e-10593:0x78 due to loss of 450359962737189502015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeSetNotifier: node v_est_node0002 left the cluster2015-07-01 04:26:16.174 Main:0x4fb3900 [Recover] <INFO> Node left cluster, reassessing k-safety...2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeSetNotifier: node v_est_node0001 left the cluster2015-07-01 04:26:16.174 Main:0x4fb3900 [Recover] <INFO> Node left cluster, reassessing k-safety...2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Lost membership of the DB group2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Removing #node_a#N010050013222->v_est_node0001 from processToNode and other maps due to departure from Vertica:all2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeToState map:2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Removing #node_b#N010050013043->v_est_node0002 from processToNode and other maps due to departure from Vertica:all2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> nodeToState map:2015-07-01 04:26:16.174 Main:0x4fb3900 [Comms] <INFO> Lost membership of V:All2015-07-01 04:26:16.174 DistCall Dispatch:0x7f14b4002c30-b0000002467ab6 [Txn] <INFO> Rollback Txn: b0000002467ab6 'rebalance_cluster(background)'2015-07-01 04:26:16.674 Main:0x4fb3900 [Recover] <INFO> Moving-out all projections for node2015-07-01 04:26:16.675 Main:0x4fb3900 [Txn] <INFO> Begin Txn: a00000004780ef 'Recovery: Analyze move-out'2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Starting Commit: Txn: a00000004780ef 'Recovery: Analyze move-out'2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Commit Complete: Txn: a00000004780ef at epoch 0xf9156a2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Begin Txn: a00000004780f0 'Recovery: Update CPEs'2015-07-01 04:26:16.676 Main:0x4fb3900 [Txn] <INFO> Rollback Txn: a00000004780f0 'Recovery: Update CPEs'2015-07-01 04:26:16.676 Main:0x4fb3900 [Recover] <INFO> Node move-out complete. Last good epoch=0xf915692015-07-01 04:26:16.677 Main:0x4fb3900 [Main] <INFO> Writing epoch=0xf91569, ending at '2015-06-30 20:10:50.486351-04', catalog version=0x10cda75, K-safety=1, AHM=0xf8caa9, ending at '2015-06-30 15:44:05.732736-04', to epoch log file [/vertica/data/est/v_est_node0001_catalog/Epoch.log]2015-07-01 04:26:16.677 Main:0x4fb3900 [Shutdown] <INFO> Shutting down node2015-07-01 04:26:16.677 Main:0x4fb3900 [Init] <INFO> Stopping Executor service2015-07-01 04:26:16.677 Main:0x4fb3900 [Comms] <INFO> Stopping spread monitoring2015-07-01 04:26:16.677 Main:0x4fb3900 [Init] <INFO> Stopping thread manager2015-07-01 04:26:16.678 unknown:0x7f1587601700 [Init] <INFO> Uninitializing storage2015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool general - Queries: 10000 Threads: 10630 File Handles: 53939 Memory(KB): 270181582015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool sysquery - Queries: 10000 Threads: 10655 File Handles: 54069 Memory(KB): 270836942015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool sysdata - Memory(KB): 10485762015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool wosdata - Memory(KB): 20971522015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool tm - Queries: 3 Threads: 10710 File Handles: 54346 Memory(KB): 272229582015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool refresh - Queries: 10000 Threads: 10630 File Handles: 53939 Memory(KB): 270181582015-07-01 04:26:16.678 unknown:0x7f1587601700 [ResourceManager] <INFO> pool recovery - Queries: 3 Threads: 10630 File Handles: 53939 Memory(KB): 270181582015-07-01 04:26:16.679 unknown:0x7f1587601700 [ResourceManager] <INFO> pool dbd - Queries: 10000 Threads: 10630 File Handles: 53939 Memory(KB): 270181582015-07-01 04:26:16.679 unknown:0x7f1587601700 [ResourceManager] <INFO> pool jvm - Queries: 10000 Threads: 1077 File Handles: 5468 Memory(KB): 27390892015-07-01 04:26:16.679 unknown:0x7f1587601700 [Init] <INFO> Dumping out open file descriptors2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 0[[STDIN]] -> /dev/null2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 1[[STDOUT]] -> /vertica/data/est/dbLog2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 2[[STDERR]] -> /vertica/data/est/dbLog2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 3[Unknown] -> /vertica/data/est/v_est_node0001_catalog/startup.log2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 4[Unknown] -> /vertica/data/est/v_est_node0001_catalog/ErrorReport.txt2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 5[Unknown] -> /proc/88585/maps2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 6[Unknown] -> /vertica/data/est/v_est_node0001_catalog/vertica.log2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 7[Unknown] -> /proc/stat2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 8[Unknown] -> socket:[417699222]2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 9[Unknown] -> /proc/88590/fd2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 10[Unknown] -> /vertica/data/est/v_est_node0001_catalog/vertica.pid2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 12[Unknown] -> pipe:[417699274]2015-07-01 04:26:16.679 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 13[Unknown] -> pipe:[417699274]2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 14[Unknown] -> socket:[417699341]2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 16[Unknown] -> pipe:[417699934]2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4273: Open FD 17[Unknown] -> pipe:[417699934]2015-07-01 04:26:16.680 unknown:0x7f1587601700 [Init] <INFO> Dumping out memory usage data2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3917: Memory usage in Tiered Free List(global):2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^3: 2239 on free list; 64156 still in use (531160 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^4: 203 on free list; 32879 still in use (529312 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^5: 7 on free list; 27299 still in use (873792 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^6: 4 on free list; 6361 still in use (407360 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^7: 8135 on free list; 13757 still in use (2802176 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^8: 286 on free list; 2479 still in use (707840 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^9: 2 on free list; 1126 still in use (577536 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^10: 118 on free list; 2817 still in use (3005440 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^11: 0 on free list; 312 still in use (638976 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^12: 0 on free list; 554 still in use (2269184 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^13: 1 on free list; 465 still in use (3817472 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^14: 0 on free list; 164 still in use (2686976 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^15: 23 on free list; 82 still in use (3440640 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^16: 1 on free list; 31 still in use (2097152 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^17: 1 on free list; 6 still in use (917504 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^18: 0 on free list; 8 still in use (2097152 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/4752: Size 2^19: 2 on free list; 0 still in use (1048576 bytes)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5072: Total memory accounted for by Tiered Pool Allocator: 284482482015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3918: Memory usage in Typed Pool Allocator2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKS0_IyxExEE: 0 used / 1 free @ 56 (56 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN3CAT4Tier11CatalogTierEN6Basics6gpvsetIyEEEE: 0 used / 2 free @ 88 (176 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKPN3CAT13CatalogObjectENS1_13TieredCatalog14NewbornDetailsEEE: 0 used / 3 free @ 56 (168 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyPN3CAT13CatalogObjectEEE: 0 used / 84046 free @ 48 (4034208 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT13TieredCatalogE: 0 used / 6 free @ 13944 (83664 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN3CAT13TieredCatalog13SchemaAndNameEN6Basics6gpvmapIxPNS1_13CatalogObjectEEEEE: 1644 used / 687 free @ 256 (596736 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKjN3CAT16VersionedOidListEEE: 94206 used / 7794 free @ 136 (13872000 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeIyE: 11473 used / 29668 free @ 40 (1645640 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN3CAT10CatNameStrEN6Basics6gpvmapIxPNS1_13CatalogObjectEEEEE: 61 used / 0 free @ 248 (15128 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St10_List_nodeISt4pairIyxEE: 871655 used / 49201 free @ 32 (29467392 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKxPN3CAT13CatalogObjectEEE: 163087 used / 15517 free @ 48 (8572992 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyN3CAT13TieredCatalog16CatObjIndexEntryEEE: 172855 used / 14834 free @ 336 (63063504 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKxN3CAT13TieredCatalog12SnapshotInfoEEE: 0 used / 8 free @ 72 (576 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyxEE: 0 used / 17 free @ 48 (816 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT12CommitRecordE: 0 used / 1 free @ 112 (112 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT18TruncateTableEventE: 22 used / 0 free @ 32 (704 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT15SnapshotMementoE: 4 used / 0 free @ 80 (320 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9DVMiniRosE: 2994 used / 198 free @ 112 (357504 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT13SegmentBoundsE: 1768 used / 102 free @ 40 (74800 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9MinMaxObjE: 57215 used / 4194 free @ 88 (5403992 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7MiniRosE: 59033 used / 4296 free @ 128 (8106112 bytes total)2015-07-01 04:26:16.680 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9LocalNodeE: 1 used / 1 free @ 32 (64 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9SALColumnE: 18397 used / 0 free @ 32 (588704 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7SegmentE: 342 used / 0 free @ 136 (46512 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10ProjColumnE: 18397 used / 1385 free @ 264 (5222448 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKiS0_IysEEE: 18397 used / 0 free @ 56 (1030232 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKS0_IysEiEE: 18397 used / 0 free @ 56 (1030232 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10ProjectionE: 747 used / 0 free @ 632 (472104 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT18LicenseAuditRecordE: 644 used / 0 free @ 160 (103040 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10ConstraintE: 119 used / 0 free @ 280 (33320 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9AttributeE: 0 used / 1 free @ 240 (240 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT5TableE: 517 used / 0 free @ 560 (289520 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8SequenceE: 33 used / 0 free @ 256 (8448 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT9ProcedureE: 32 used / 0 free @ 448 (14336 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7LibraryE: 2 used / 0 free @ 328 (656 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT14ElasticClusterE: 1 used / 0 free @ 80 (80 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT15TuningRuleParamE: 0 used / 1 free @ 40 (40 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT10TuningRuleE: 19 used / 0 free @ 264 (5016 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT16ViewRelationInfoE: 0 used / 1 free @ 24 (24 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4ViewE: 244 used / 0 free @ 328 (80032 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7ProfileE: 1 used / 0 free @ 296 (296 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT12ResourcePoolE: 9 used / 0 free @ 336 (3024 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT7LicenseE: 2 used / 0 free @ 288 (576 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT14GlobalSettingsE: 1 used / 0 free @ 304 (304 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT15StorageLocationE: 3 used / 0 free @ 272 (816 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4NodeE: 3 used / 2 free @ 328 (1640 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT5GrantE: 805 used / 0 free @ 48 (38640 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKN5boost14dynamic_bitsetIySaIyEEEjEE: 4 used / 8 free @ 72 (864 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKyjEE: 3 used / 6 free @ 48 (432 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8DatabaseE: 1 used / 1 free @ 416 (832 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type St13_Rb_tree_nodeISt4pairIKxxEE: 83 used / 82 free @ 48 (7920 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8EpochMapE: 1 used / 1 free @ 136 (272 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4RoleE: 4 used / 0 free @ 296 (1184 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT8PasswordE: 0 used / 1 free @ 32 (32 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT4UserE: 4 used / 0 free @ 536 (2144 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3CAT6SchemaE: 17 used / 0 free @ 232 (3944 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5078: Total memory usage in Typed Pool Allocator: 1442845682015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3917: Memory usage in Tiered Free List(global):2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5072: Total memory accounted for by Tiered Pool Allocator: 02015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/3918: Memory usage in Typed Pool Allocator2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5113: Type N3SAL8WOSAllocE: 0 used / 2 free @ 128 (256 bytes total)2015-07-01 04:26:16.681 unknown:0x7f1587601700 <LOG> @v_est_node0001: 00000/5078: Total memory usage in Typed Pool Allocator: 2562015-07-01 04:26:16.681 unknown:0x7f1587601700 [Init] <INFO> Global pool memory usage: NewPool(0x4dcca80) 'GlobalPool': totalDtors 0 totalSize 266338304 (93605264 unused) totalChunks 72015-07-01 04:26:16.681 unknown:0x7f1587601700 [Init] <INFO> SAL global pool memory usage: NewPool(0x4e26760) 'SALGlobalPool': totalDtors 0 totalSize 2097152 (2096864 unused) totalChunks 12015-07-01 04:26:16.681 unknown:0x7f1587601700 [Init] <INFO> SS::stopPoller()2015-07-01 04:26:16.683 unknown:0x7f1587601700 [Init] <INFO> DC::shutDown()2015-07-01 04:26:16.683 unknown:0x7f1587601700 [Init] <INFO> Shutdown complete. Exiting.2015-07-01 04:26:16.683 unknown:0x7f1587601700 [SAL] <INFO> Unmounting file system 0(Default Linux File System).2015-07-01 04:26:16.683 unknown:0x7f1587601700 [SAL] <INFO> Unmounting file system 1(Hadoop File System).2015-07-01 04:26:16.956 unknown:0x7f1587601700 [Command] <INFO> Library file has been unloaded successfully2015-07-01 04:26:16.956 unknown:0x7f1587601700 [Command] <INFO> Library file has been unloaded successfully
0
Comments
Hi Dilip,
While installation, Did you configure spread to use point-to-point communication between all vertica nodes?
-Regards,
Sruthi
Can you post the content of your /vertica/data/est/dbLog !?
Hi,
Thanks for the quick reply.
Tried to restart again and it still shows node3 is under recovery mode:
dbadmin=> select * from nodes;
node_name | node_id | node_state | node_address | export_address | catalog_path | is_ephemeral
----------------+-------------------+------------+--------------+----------------+--------------------------------------------------+--------------
v_est_node0001 | 45035996273704980 | UP | 10.50.13.222 | 10.50.13.222 | /vertica/data/est/v_est_node0001_catalog/Catalog | f
v_est_node0002 | 45035996273718950 | UP | 10.50.13.43 | 10.50.13.43 | /vertica/data/est/v_est_node0002_catalog/Catalog | f
v_est_node0003 | 45035996273718954 | RECOVERING | 10.50.13.200 | 10.50.13.200 | /vertica/data/est/v_est_node0003_catalog/Catalog | f
(3 rows)
dbLog Output:
Conf_load_conf_file: using file: /vertica/data/est/v_est_node0003_catalog/spread.conf
Successfully configured Segment 0 [10.50.13.43:4803] with 1 procs:
N010050013043: 10.50.13.43
Successfully configured Segment 1 [10.50.13.200:4803] with 1 procs:
N010050013200: 10.50.13.200
Successfully configured Segment 2 [10.50.13.222:4803] with 1 procs:
N010050013222: 10.50.13.222
Connected to spread on local domain socket 4803
Starting UDxSideProcess for language C++
with command line: /opt/vertica/bin/vertica-udx-C++ 3 prdae-vtc23e-2944:0x2 debug-log-off /vertica/data/est/v_est_node0003_catalog/UDxLogs
Starting UDxSideProcess for language C++
with command line: /opt/vertica/bin/vertica-udx-C++ 3 prdae-vtc23e-2944:0x15 debug-log-off /vertica/data/est/v_est_node0003_catalog/UDxLogs
Hi,
Thanks for the quick reply.
Some one else installed this for me. How to check this?
Thanks
HI,
Check the entry for controlmode parameter in admintools.conf. is it broadcast or pt2pt?
-Regards,
Sruthi
Hi,
I forgot to mention one more thing.
node 3 got crashed and then node 2 cpu is very high before vertica went down.
Now when the node is back, when we tried to restart the node we got the db back and node 3 shows in recovery mode.
Thanks in advance.
Hi,
Is the database k-safe? Is the node in RECOVERING state from long time? Can you share me the output of
-> select * from projection_recoveries
-Regards,
Sruthi
Hi,
Run the following :
If nothing return stop the vertica process on the node where the and restart it. Recover should start again.
If you get any output from the check to see if you have your AHM stuck(left behind)
If soo see if you can execute this:
Hi
Is node 03 still recovering? Can you kindly provide us the latest output of below commands:
vsql => select * from system;
vsql => select * from nodes;
vsql => select * from recovery_status where is_running='t';
vsql => select * from projection_recoveries where status = 'running';
Regards
Rahul Choudhary