Starting database at AWS stuck in INITIALIZING
Hi,
I have created Vertica 10 nodes Cluster at AWS (7.1.1-10) and I did CopyCluster from production environment to AWS.
42TB … finished after 10 days.
Now after the CopyCluster finished, I'm trying to start the database from the admintools:
*** Starting database: vertica_polonium_mtc ***
Starting nodes:
v_vertica_polonium_mtc_node0001
v_vertica_polonium_mtc_node0002
v_vertica_polonium_mtc_node0003
v_vertica_polonium_mtc_node0004
v_vertica_polonium_mtc_node0005
v_vertica_polonium_mtc_node0006
v_vertica_polonium_mtc_node0007
v_vertica_polonium_mtc_node0008
v_vertica_polonium_mtc_node0009
v_vertica_polonium_mtc_node0010
Starting Vertica on all nodes. Please wait, databases with large catalogs may take a while to initialize.
Node Status: v_vertica_polonium_mtc_node0001: (INITIALIZING) v_vertica_polonium_mtc_node0002: (INITIALIZING)
v_vertica_polonium_mtc_node0003: (INITIALIZING) v_vertica_polonium_mtc_node0004: (INITIALIZING)
v_vertica_polonium_mtc_node0005: (INITIALIZING) v_vertica_polonium_mtc_node0006: (INITIALIZING)
v_vertica_polonium_mtc_node0007: (INITIALIZING) v_vertica_polonium_mtc_node0008: (INITIALIZING)
v_vertica_polonium_mtc_node0009: (INITIALIZING) v_vertica_polonium_mtc_node0010: (INITIALIZING)
...
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes] no
Database startup successful, but it may be incomplete. Some nodes
remain in a transitional state. Waiting can sometimes resolve this
problem. See Database Cluster State in Main Menu for updated
cluster state. If this state persists, try using the Advanced menu
to Stop Vertica on Host, then Restart Database.
Press RETURN to continue
The INITIALIZING running for several hours and didn't end.
I didn't find anything in the vertica.log or in the adminTools-dbadmin.log (attached) or in any log ... everything seems ok
I checked the admintools.conf (all right and the same on all nodes), the spread is running on all nodes (when the start database command is running), restart the cluster ... and again, everything seems ok, so what is the problem? why the nodes don't come UP ?
Anyone can help ?
Comments
Hi ,
Check Your dbLog file , many time it include useful information which is related to database startup
Thanks
Checked...nothing
dbLog:
Conf_load_conf_file: using file: /db/catalog/vertica_polonium_mtc/v_vertica_polonium_mtc_node0001_catalog/spread.conf
Setting active IP version to 0
Successfully configured Segment 0 [10.99.87.255]:4803 with 10 procs:
N010099084071: 10.99.85.71
N010099084182: 10.99.85.182
N010099084184: 10.99.85.184
N010099084212: 10.99.85.212
N010099085111: 10.99.86.111
N010099085231: 10.99.86.231
N010099086170: 10.99.85.170
N010099086234: 10.99.84.234
N010099086244: 10.99.85.244
N010099087103: 10.99.84.103
Connecting to spread at 4803
Connected to spread on local domain socket 4803
Starting UDxSideProcess for language C++
with command line: /opt/vertica/bin/vertica-udx-C++ 3 ip-10-99-85-111-22634:0x2 debug-log-off /db/catalog/vertica_polonium_mtc/v_vertica_polonium_mtc_node0001_catalog/UDxLogs
03/07/16 17:54:07 SP_connect: DEBUG: Auth list is: NULL
03/07/16 17:54:07 SP_connect: connected with private group(21 bytes): #node_a#N010099085111; mbox=9, pid=22634
03/07/16 17:56:09 SP_disconnect: mbox=9, pid=22634, send_group=#node_a#N010099085111
Fixed
The source cluster catalog was at Broadcast and target cluster (AWS) catalog was at pt2pt (point to piont).
With help from HP support, we edit the catalog of the target cluster to pt2pt
(the AWS created with pt2pt parameter, but the copycluster command overwrite the configuration and update it to Broadcast)