Problem with restart after crash

Hi. i have 1 node vertica. Yesterday it crashed and today when i was trying to manually restart it from admin tools some issue apperas and it doesn't start. Vertica.log below

 

016-06-29 14:25:35.469 INFO New log
2016-06-29 14:25:35.469 unknown:0x7f66885f8700 [Init] <INFO> Log /data/dbadmin/Vivat/v_vivat_node0001_catalog/vertica.log opened; #1
2016-06-29 14:25:35.469 unknown:0x7f66885f8700 [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -D /data/dbadmin/Vivat/v_vivat_node0001_catalog -C Vivat -n v_vivat_node0001 -h 10.3.129.25 -p 5433 -P 4803 -Y ipv4 -c
2016-06-29 14:25:35.469 unknown:0x7f66885f8700 [Init] <INFO> Starting up Vertica Analytic Database v7.1.2-4
2016-06-29 14:25:35.469 unknown:0x7f66885f8700 [Init] <INFO> Project Codename: Dragline
2016-06-29 14:25:35.469 unknown:0x7f66885f8700 [Init] <INFO> vertica(v7.1.2-4) built by release@build2.verticacorp.com from releases/VER_7_1_RELEASE_BUILD_2_4_20150806@170157 on 'Thu Aug 6 22:04:34 America/New_York 2015' $BuildId$
2016-06-29 14:25:35.469 unknown:0x7f66885f8700 [Init] <INFO> 64-bit Optimized Build
2016-06-29 14:25:35.469 unknown:0x7f66885f8700 [Init] <INFO> Compiler Version: 4.1.2 20080704 (Red Hat 4.1.2-55)
2016-06-29 14:25:35.470 unknown:0x7f66885f8700 <LOG> @[initializing]: 00000/5081: Total swap memory used: 0
2016-06-29 14:25:35.470 unknown:0x7f66885f8700 <LOG> @[initializing]: 00000/4435: Process size resident set: 31096832
2016-06-29 14:25:35.470 unknown:0x7f66885f8700 <LOG> @[initializing]: 00000/5075: Total Memory free + cache: 13713727488
2016-06-29 14:25:35.470 unknown:0x7f66885f8700 [Txn] <INFO> Looking for catalog at: /data/dbadmin/Vivat/v_vivat_node0001_catalog/Catalog
2016-06-29 14:25:35.470 unknown:0x7f66885f8700 [Catalog] <INFO> Loading Checkpoint 262750
2016-06-29 14:25:35.470 unknown:0x7f66885f8700 [Init] <INFO> Startup [Reading Catalog] Reading Checkpoint (bytes) - 0 / 36539320
2016-06-29 14:25:37.318 unknown:0x7f66885f8700 [Init] <INFO> Startup [Reading Catalog] Reading Checkpoint (bytes) - 36539320 / 36539320
2016-06-29 14:25:37.319 unknown:0x7f66885f8700 [Catalog] <INFO> Replaying 1 Txnlogs
2016-06-29 14:25:37.319 unknown:0x7f66885f8700 [Init] <INFO> Startup [Reading Catalog] Applying transaction log (bytes) - 0 / 3273037
2016-06-29 14:25:37.594 unknown:0x7f66885f8700 [Init] <INFO> Startup [Reading Catalog] Applying transaction log (bytes) - 3273037 / 3273037
2016-06-29 14:25:37.594 unknown:0x7f66885f8700 [Txn] <INFO> Installing objects...
2016-06-29 14:25:37.594 unknown:0x7f66885f8700 [Init] <INFO> Startup [Reading Catalog] Indexing Objects - 0 / 18464
2016-06-29 14:25:37.608 unknown:0x7f66885f8700 [Init] <INFO> Startup [Reading Catalog] Indexing Objects - 4927 / 18464
2016-06-29 14:25:37.646 unknown:0x7f66885f8700 [Init] <INFO> Startup [Reading Catalog] Indexing Objects - 18464 / 18464
2016-06-29 14:25:37.646 unknown:0x7f66885f8700 [Txn] <INFO> Catalog loaded from path: /data/dbadmin/Vivat/v_vivat_node0001_catalog/Catalog [18464 objects, GLOBAL version 253357, LOCAL version 205196] (no checkpoint needed)
2016-06-29 14:25:37.648 unknown:0x7f66885f8700 [Txn] <INFO> switchToLocalNode: v_vivat_node0001 with path /data/dbadmin/Vivat/v_vivat_node0001_catalog/Catalog
2016-06-29 14:25:37.648 unknown:0x7f66885f8700 [Txn] <INFO> Transaction sequence set, seq num=1cd486, nodeID=a
2016-06-29 14:25:37.648 unknown:0x7f66885f8700 [Txn] <INFO> Catalog sequence set, seq num=a5f7f8, nodeID=a
2016-06-29 14:25:37.648 unknown:0x7f66885f8700 [Txn] <INFO> Found my node (v_vivat_node0001) in the catalog
2016-06-29 14:25:37.648 unknown:0x7f66885f8700 [Txn] <INFO> Catalog info: version=0x3ddad, number of nodes=1, permanent #=1, K=0
2016-06-29 14:25:37.648 unknown:0x7f66885f8700 [Txn] <INFO> Catalog info: current epoch=0x2f13b
2016-06-29 14:25:37.653 unknown:0x7f66885f8700 [Catalog] <INFO> Catalog OID generator updated based on GLOBAL tier catalog
2016-06-29 14:25:37.654 unknown:0x7f66885f8700 [Init] <INFO> Catalog loaded
2016-06-29 14:25:37.656 unknown:0x7f66885f8700 [Comms] <INFO> About to launch spread with '/opt/vertica/spread/sbin/spread -c /data/dbadmin/Vivat/v_vivat_node0001_catalog/spread.conf'
2016-06-29 14:25:37.660 unknown:0x7f66885f8700 [Comms] <INFO> forked spread pid=6860, wrote pidfile /data/dbadmin/Vivat/v_vivat_node0001_catalog/spread.pid
2016-06-29 14:25:37.660 unknown:0x7f66885f8700 [Init] <INFO> Listening on port: 5433
2016-06-29 14:25:37.660 unknown:0x7f66885f8700 [Init] <INFO> About to fork
2016-06-29 14:25:37.661 unknown:0x7f66885f8700 [Init] <INFO> About to fork again
2016-06-29 14:25:37.662 unknown:0x7f66885f8700 [Init] <INFO> Completed forking
2016-06-29 14:25:37.662 unknown:0x7f66885f8700 [Init] <INFO> Startup [Connecting to Spread] Connecting to spread 4803
2016-06-29 14:26:07.675 unknown:0x7f66885f8700 [Init] <INFO> Spread daemon does not appear to be running on 10.3.129.25 -- exiting!

 

moreover there is no epoch.log file in data directory, and when i trying to restart it from last epoch only epoch 0 appears.

 

I have tried to kill vertica process on host and start after that, but without any progress, error still the same.

 

Vertica Analytic Database v7.1.2-4 $BrandId$
vertica(v7.1.2-4) built by release@build2.verticacorp.com from releases/VER_7_1_RELEASE_BUILD_2_4_20150806@170157 on 'Thu Aug 6 22:04:34 America/New_York 2015' $BuildId$

Comments

  • Hi Tomek

     

    Looks like one or more projections have its cpe value equal to 0 causing the LGE to be calculated as 0.

     

    Can you kindly attach the entire vertica.log file to certain this fact? I think since unfortunately you are running on single node cluster and doesn't have k-safety (no buddy nodes) the only option left out in front of you is to perform force restart this single node cluster using below command with -F switch:

     

    $admintools -t restart_node -F -s <this_Hostname_or_IP> -d <dbname> 

     

    Its your luck if it comes up else you need to build the cluster from scratch since no backup must also be available as well (which is what I assume).

     

    Try your luck and let us know how it goes.

     

    All the best!!!

     

    Thanks

    Rahul

  • Rahul,

     

    thank you for answer, but it does not help. I have even try to start from epoch 0, but the problem in starting database is the same.

    That problem is strange, becouse even when i am trying to create some fresh new database the same error apears. So it is not a problem with database, it looks like problem with Vertica.

  • Did you check if spread is keeping up?

     

    2016-06-29 14:25:37.662 unknown:0x7f66885f8700 [Init] <INFO> Startup [Connecting to Spread] Connecting to spread 4803
    2016-06-29 14:26:07.675 unknown:0x7f66885f8700 [Init] <INFO> Spread daemon does not appear to be running on 10.3.129.25 -- exiting!

     

    Would be worth to enable spread logging (parameter "EventLogFile" inside spread.conf) and see if there's anything suggestive.  

     

    Thanks

     

  • Hi Tomas

     

    My bad I missed the spread exiting part as Pravesh mentioned.Looks like to be an issue with your spread process.

     

    Can you check that spread shouldn't blocked on port 4803?.

     

    Check the iptables rules and firewall should be disabled on port 4803.

     

    Thanks

    Rahul

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file