Adding 2nd node to Vertica 7.0.1 not working, first node also down

Did a fresh install of Vertica 7.0.1 community edition. No issues installing.
Created a database standalone. No issues.
Migrated data from another Vertica 7 using cluster copy.  No issues.
Used update_vertica --add-host.  No issues.
In admintools used add host under cluster management, installs data and catalog but the database is not able to come up.  Database on Node 1, when performed stop and start, it also fails and does not come up.

Tried with 7.0.0 as well, same issue...

Any idea...Need help.

Comments

  • Hi AbdurRahman,

    When you first did the copy cluster (using vbr.py?), were you able to bring the destination DB UP before adding the new node? Can you check to see if the spread.conf on the destination DB is correct for that cluster.

    - Mitch


  • Yes, I was able to bring it up and down multiple time.  I even connected from external client tools to verify if all the tables and data has come thru.  And it has 

    I was not able to locate spread.conf under /opt/vertica/config

    But my admintools.conf on Node 1 is...

    admintools.conf
    [Configuration]
    last_port = 5433
    default_base = /home/dbadmin
    format = 3
    install_opts = --update --add-hosts nex-db-40 --rpm '/tmp/vertica-7.0.1-0.x86_64.RHEL5.rpm' --failure-threshold NONE
    spreadlog = False
    controlsubnet = default
    controlmode = broadcast

    [Cluster]
    hosts = 10.104.3.63,10.104.3.64

    [Nodes]
    node0001 = 10.104.3.63,/home/dbadmin,/home/dbadmin
    v_csm_node0001 = 10.104.3.63,/data,/data
    node0002 = 10.104.3.64,/home/dbadmin,/home/dbadmin
    v_csm_node0002 = 10.104.3.64,/data,/data

    [Database:csm]
    restartpolicy = ksafe
    port = 5433
    path = /data/csm/v_csm_node0001_catalog
    nodes = v_csm_node0001

    Node 2...

    admintools.conf 
    [Configuration]
    last_port = 5433
    default_base = /home/dbadmin
    format = 3
    install_opts = --update --add-hosts nex-db-40 --rpm '/tmp/vertica-7.0.1-0.x86_64.RHEL5.rpm' --failure-threshold NONE
    spreadlog = False
    controlsubnet = default
    controlmode = broadcast

    [Cluster]
    hosts = 10.104.3.63,10.104.3.64

    [Nodes]
    node0001 = 10.104.3.63,/home/dbadmin,/home/dbadmin
    v_csm_node0001 = 10.104.3.63,/data,/data
    node0002 = 10.104.3.64,/home/dbadmin,/home/dbadmin

    [Database:csm]
    restartpolicy = ksafe
    port = 5433
    path = /data/csm/v_csm_node0001_catalog
    nodes = v_csm_node0001
  • Hi AbdurRahman,

    It looks like the second node wasn't  added to the cluster per the admintools.conf file on node0002, however it is on node0001 as you provided here.

    You can find spread.conf in your /v_csm_node0001_catalog folder on all nodes.

    does the file dbLog (located in the csm directory) contain any errors? Are you seeing any errors in the vertica.log (in the catalog directory) file on both nodes?

    - Mitch
  • My Node 1 spread.conf...
    # 273428
    # Auto-generated by vertica - do not edit
    Spread_Segment 10.104.3.64:4803 {
      N010104003064    10.104.3.64 {
        10.104.3.64
      }
    }
    Spread_Segment 127.0.0.1:4803 {
      N127000000001    127.0.0.1 {
        127.0.0.1
      }
    }
    # begin end matter
    EventLogFile = /data/csm/spread.log
    EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"
    DebugFlags = { PRINT EXIT }
    ExitOnIdle = yes

    Node 2 spread.conf..

    # 273428
    # Auto-generated by vertica - do not edit
    Spread_Segment 10.104.3.64:4803 {
      N010104003064    10.104.3.64 {
        10.104.3.64
      }
    }
    Spread_Segment 127.0.0.1:4803 {
      N127000000001    127.0.0.1 {
        127.0.0.1
      }
    }
    # begin end matter
    EventLogFile = /data/csm/spread.log
    EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"
    DebugFlags = { PRINT EXIT }
    ExitOnIdle = yes

    My Node 1 spread.log

    Membership id is ( 2130706433, 1394479028)
    [Mon 10 Mar 2014 14:17:07] --------------------
    [Mon 10 Mar 2014 14:17:07] Configuration at N127000000001 is:
    [Mon 10 Mar 2014 14:17:07] Num Segments 1
    [Mon 10 Mar 2014 14:17:07]      1       127.0.0.1         4803
    [Mon 10 Mar 2014 14:17:07]              N127000000001           127.0.0.1       
    [Mon 10 Mar 2014 14:17:07] ====================

    [Mon 10 Mar 2014 14:23:50] Daemon idle, exiting
    Exit caused by Alarm(EXIT)
    [Mon 10 Mar 2014 14:24:01] Conf_load_conf_file: Invalid configuration:
    [Mon 10 Mar 2014 14:24:01] --------------------
    [Mon 10 Mar 2014 14:24:01] Configuration at  is:
    [Mon 10 Mar 2014 14:24:01] Num Segments 2
    [Mon 10 Mar 2014 14:24:01]      1       10.104.3.64       4803
    [Mon 10 Mar 2014 14:24:01]              N010104003064           10.104.3.64     
    [Mon 10 Mar 2014 14:24:01]      1       127.0.0.1         4803
    [Mon 10 Mar 2014 14:24:01]              N127000000001           127.0.0.1       
    [Mon 10 Mar 2014 14:24:01] ====================
    [Mon 10 Mar 2014 14:24:01] 
    [Mon 10 Mar 2014 14:24:01] Conf_load_conf_file: Localhost segments can not be used along with regular network address segments.
    Most likely you need to remove or comment out the 
    Spread_Segment 127.0.0.255 {...}
     section of your configuration file.
    Exit caused by Alarm(EXIT)
    [Mon 10 Mar 2014 14:26:41] Conf_load_conf_file: Invalid configuration:
    [Mon 10 Mar 2014 14:26:41] --------------------
    [Mon 10 Mar 2014 14:26:41] Configuration at  is:
    [Mon 10 Mar 2014 14:26:41] Num Segments 2
    [Mon 10 Mar 2014 14:26:41]      1       10.104.3.64       4803
    [Mon 10 Mar 2014 14:26:41]              N010104003064           10.104.3.64     
    [Mon 10 Mar 2014 14:26:41]      1       127.0.0.1         4803
    [Mon 10 Mar 2014 14:26:41]              N127000000001           127.0.0.1       
    [Mon 10 Mar 2014 14:26:41] ====================
    [Mon 10 Mar 2014 14:26:41] 
    [Mon 10 Mar 2014 14:26:41] Conf_load_conf_file: Localhost segments can not be used along with regular network address segments.
    Most likely you need to remove or comment out the 
    Spread_Segment 127.0.0.255 {...}
     section of your configuration file.
    Exit caused by Alarm(EXIT)

    Node 2 spread.log
    [Mon 10 Mar 2014 14:21:01] Conf_load_conf_file: Invalid configuration:
    [Mon 10 Mar 2014 14:21:01] --------------------
    [Mon 10 Mar 2014 14:21:01] Configuration at  is:
    [Mon 10 Mar 2014 14:21:01] Num Segments 2
    [Mon 10 Mar 2014 14:21:01]      1       10.104.3.64       4803
    [Mon 10 Mar 2014 14:21:01]              N010104003064           10.104.3.64     
    [Mon 10 Mar 2014 14:21:01]      1       127.0.0.1         4803
    [Mon 10 Mar 2014 14:21:01]              N127000000001           127.0.0.1       
    [Mon 10 Mar 2014 14:21:01] ====================
    [Mon 10 Mar 2014 14:21:01] 
    [Mon 10 Mar 2014 14:21:01] Conf_load_conf_file: Localhost segments can not be used along with regular network address segments.
    Most likely you need to remove or comment out the 
    Spread_Segment 127.0.0.255 {...}
     section of your configuration file.
    Exit caused by Alarm(EXIT)

    I see 127.0.0.1...not sure where it is picking that up from...

    dbLog Node 1...

    /===========================================================================\
    | The Spread Toolkit.                                                       |
    | Copyright (c) 1993-2012 Spread Concepts LLC                               |
    | All rights reserved.                                                      |
    |                                                                           |
    | The Spread toolkit is licensed under the Spread Open-Source License.      |
    | You may only use this software in compliance with the License.            |
    | A copy of the license can be found at http://www.spread.org/license       |
    |                                                                           |
    | This product uses software developed by Spread Concepts LLC for use       |
    | in the Spread toolkit. For more information about Spread,                 |
    | see http://www.spread.org                                                 |
    |                                                                           |
    | This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF     |
    | ANY KIND, either express or implied.                                      |
    |                                                                           |
    | Creators:                                                                 |
    |    Yair Amir             yairamir@cs.jhu.edu                              |
    |    Michal Miskin-Amir    michal@spreadconcepts.com                        |
    |    Jonathan Stanton      jstanton@gwu.edu                                 |
    |    John Schultz          jschultz@spreadconcepts.com                      |
    |                                                                           |
    | Major Contributors:                                                       |
    |    Ryan Caudy           rcaudy@gmail.com - contribution to process groups.|
    |    Claudiu Danilov      claudiu@acm.org - scalable, wide-area support.    |
    |    Cristina Nita-Rotaru crisn@cs.purdue.edu - GC security.                |
    |    Theo Schlossnagle    jesus@omniti.com - Perl, autoconf, old skiplist.  |
    |    Dan Schoenblum       dansch@cnds.jhu.edu - Java interface.             |
    |                                                                           |
    | Special thanks to the following for discussions and ideas:                |
    |    Ken Birman, Danny Dolev, Jacob Green, Mike Goodrich, Ben Laurie,       |
    |    David Shaw, Gene Tsudik, Robbert VanRenesse.                           |
    |                                                                           |
    | Partial funding provided by the Defense Advanced Research Project Agency  |
    | (DARPA) and the National Security Agency (NSA) 2000-2004. The Spread      |
    | toolkit is not necessarily endorsed by DARPA or the NSA.                  |
    |                                                                           |
    | For a full list of contributors, see Readme.txt in the distribution.      |
    |                                                                           |
    | WWW:     www.spread.org     www.spreadconcepts.com                        |
    | Contact: info@spreadconcepts.com                                          |
    |                                                                           |
    | Version 4.02.00 Built 18/June/2012      (Vertica)                         |
    \===========================================================================/
    Conf_load_conf_file: using file: /data/csm/v_csm_node0001_catalog/spread.conf
    Successfully configured Segment 0 [10.104.3.64:4803] with 1 procs:
                   N010104003064: 10.104.3.64
    Successfully configured Segment 1 [127.0.0.1:4803] with 1 procs:
                   N127000000001: 127.0.0.1
    03/10/14 14:26:41 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:42 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:42 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:43 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:43 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:44 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:44 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:45 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:45 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:46 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:46 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:47 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:47 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:48 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:48 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:49 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:49 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:50 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:50 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:51 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:51 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:52 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:52 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:53 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:53 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:54 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:54 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:55 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:55 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:56 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:56 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:57 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:57 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:58 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:58 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:59 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:26:59 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:00 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:00 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:01 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:01 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:02 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:02 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:03 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:03 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:04 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:04 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:05 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:05 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:06 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:06 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:07 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:27:07 SP_connect: unable to VSpread could not connect on local domain socket 4803: -2
    Unable to open indirect spread information: /opt/vertica/config/local-spread.conf

    dbLog on node 2...

    /===========================================================================\
    | The Spread Toolkit.                                                       |
    | Copyright (c) 1993-2012 Spread Concepts LLC                               |
    | All rights reserved.                                                      |
    |                                                                           |
    | The Spread toolkit is licensed under the Spread Open-Source License.      |
    | You may only use this software in compliance with the License.            |
    | A copy of the license can be found at http://www.spread.org/license       |
    |                                                                           |
    | This product uses software developed by Spread Concepts LLC for use       |
    | in the Spread toolkit. For more information about Spread,                 |
    | see http://www.spread.org                                                 |
    |                                                                           |
    | This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF     |
    | ANY KIND, either express or implied.                                      |
    |                                                                           |
    | Creators:                                                                 |
    |    Yair Amir             yairamir@cs.jhu.edu                              |
    |    Michal Miskin-Amir    michal@spreadconcepts.com                        |
    |    Jonathan Stanton      jstanton@gwu.edu                                 |
    |    John Schultz          jschultz@spreadconcepts.com                      |
    |                                                                           |
    | Major Contributors:                                                       |
    |    Ryan Caudy           rcaudy@gmail.com - contribution to process groups.|
    |    Claudiu Danilov      claudiu@acm.org - scalable, wide-area support.    |
    |    Cristina Nita-Rotaru crisn@cs.purdue.edu - GC security.                |
    |    Theo Schlossnagle    jesus@omniti.com - Perl, autoconf, old skiplist.  |
    |    Dan Schoenblum       dansch@cnds.jhu.edu - Java interface.             |
    |                                                                           |
    | Special thanks to the following for discussions and ideas:                |
    |    Ken Birman, Danny Dolev, Jacob Green, Mike Goodrich, Ben Laurie,       |
    |    David Shaw, Gene Tsudik, Robbert VanRenesse.                           |
    |                                                                           |
    | Partial funding provided by the Defense Advanced Research Project Agency  |
    | (DARPA) and the National Security Agency (NSA) 2000-2004. The Spread      |
    | toolkit is not necessarily endorsed by DARPA or the NSA.                  |
    |                                                                           |
    | For a full list of contributors, see Readme.txt in the distribution.      |
    |                                                                           |
    | WWW:     www.spread.org     www.spreadconcepts.com                        |
    | Contact: info@spreadconcepts.com                                          |
    |                                                                           |
    | Version 4.02.00 Built 18/June/2012      (Vertica)                         |
    \===========================================================================/
    Conf_load_conf_file: using file: /data/csm/v_csm_node0002_catalog/spread.conf
    Successfully configured Segment 0 [10.104.3.64:4803] with 1 procs:
                   N010104003064: 10.104.3.64
    Successfully configured Segment 1 [127.0.0.1:4803] with 1 procs:
                   N127000000001: 127.0.0.1
    03/10/14 14:21:01 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:02 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:02 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:03 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:03 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:04 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:04 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:05 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:05 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:06 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:06 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:07 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:07 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:08 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:08 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:09 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:09 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:10 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:10 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:11 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:11 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:12 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:12 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:13 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:13 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:14 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:14 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:15 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:15 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:16 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:16 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:17 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:17 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:18 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:18 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:19 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:19 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:20 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:20 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:21 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:21 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:22 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:22 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:23 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:23 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:24 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:24 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:25 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:25 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:26 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:26 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:27 SP_connect: unable to connect mailbox 9: Connection refused
    03/10/14 14:21:27 SP_connect: unable to VSpread could not connect on local domain socket 4803: -2
    Unable to open indirect spread information: /opt/vertica/config/local-spread.conf

    Vertica Log Node 1...

    2014-03-10 14:26:41.714 INFO New log
    2014-03-10 14:26:41.714 unknown:0x7f2080f86700 [Init] <INFO> Log /data/csm/v_csm_node0001_catalog/vertica.log opened; #1
    2014-03-10 14:26:41.714 unknown:0x7f2080f86700 [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -D /data/csm/v_csm_node0001_catalog -C csm -n v_csm_nod
    e0001 -h 10.104.3.63 -p 5433 -P 4803
    2014-03-10 14:26:41.714 unknown:0x7f2080f86700 [Init] <INFO> Starting up Vertica Analytic Database v7.0.1-0
    2014-03-10 14:26:41.714 unknown:0x7f2080f86700 [Init] <INFO> Project Codename: Crane
    2014-03-10 14:26:41.714 unknown:0x7f2080f86700 [Init] <INFO> vertica(v7.0.1-0) built by release@build2.verticacorp.com from releases/VER_7_0_RELEASE_BUILD_1_0_2014021
    2@130255 on 'Wed Feb 12 19:00:56 America/New_York 2014' $BuildId$
    2014-03-10 14:26:41.714 unknown:0x7f2080f86700 [Init] <INFO> 64-bit Optimized Build
    2014-03-10 14:26:41.714 unknown:0x7f2080f86700 [Init] <INFO> Compiler Version: 4.1.2 20080704 (Red Hat 4.1.2-52)
    2014-03-10 14:26:41.715 unknown:0x7f2080f86700 <LOG> @[initializing]: 00000/5081: Total swap memory used: 0
    2014-03-10 14:26:41.715 unknown:0x7f2080f86700 <LOG> @[initializing]: 00000/4435: Process size resident set: 22499328
    2014-03-10 14:26:41.715 unknown:0x7f2080f86700 <LOG> @[initializing]: 00000/5075: Total Memory free + cache: 3329134592
    2014-03-10 14:26:41.715 unknown:0x7f2080f86700 [Txn] <INFO> Looking for catalog at: /data/csm/v_csm_node0001_catalog/Catalog
    2014-03-10 14:26:41.715 unknown:0x7f2080f86700 [Catalog] <INFO> Loading Checkpoint 2
    2014-03-10 14:26:41.715 unknown:0x7f2080f86700 [Init] <INFO> Startup [Reading Catalog] Reading Checkpoint (bytes) - 0 / 637894
    2014-03-10 14:26:41.811 unknown:0x7f2080f86700 [Init] <INFO> Startup [Reading Catalog] Reading Checkpoint (bytes) - 637894 / 637894
    2014-03-10 14:26:41.811 unknown:0x7f2080f86700 [Catalog] <INFO> Replaying 1 Txnlogs
    2014-03-10 14:26:41.812 unknown:0x7f2080f86700 [Init] <INFO> Startup [Reading Catalog] Applying transaction log (bytes) - 0 / 93546
    2014-03-10 14:26:41.830 unknown:0x7f2080f86700 [Init] <INFO> Startup [Reading Catalog] Applying transaction log (bytes) - 93546 / 93546
    2014-03-10 14:26:41.830 unknown:0x7f2080
  • Was the initial install of the single node DB a localhost installation? 127.0.0.1 is the localhost loopback address which is only used in single node localhost installs.

    You can find this out by taking a look at the install parameters present at the top of an old admintools.conf file, renamed to admintools.conf.bak.date in the /opt/vertica/config directory. If you didn't include a --hosts option with the node IP, Vertica is installed as loaclhost only. If you installed the initial single node as localhost, you won't be able to add additional nodes without re-installing and specifying the single node IP in the hosts list.
  • Hi Mitch,
       The new Node - NEX-DB-39 was installed using the 10.x subnet.  After looking more around, when I do the cluster copy, 127.0.0.1 information is coming from the other Vertica 7.0 version.  Even then the after the cluster copy the DB is fine on Node 1.

       The next thing I tried was to redo the cluster copy.  Followed the change ip document to change by changing...

    1. Verify and change in admintools.con
    2. Verify and spread.conf

        Brought the database back up and down multiple time to test.  Also ran..

         select host_name from host_resources;
         To verify the correct ip information...
         Also ran...
         select node_name, node_address from v_catalog.nodes
         Verified the information and 10.x ip subnet is showing.

         I even checked catalog by running this commands...

         /opt/vertica/bin/vertica -D . -E
         listdetails Site;

    :Siteoid:45035996273704980
    name:v_csm_node0001
    schema:0
    address:10.104.3.63
    ei_address:0
    catalogPath:/data/csm/v_csm_node0001_catalog/Catalog
    hasCatalog:false
    bdbPath:/data/csm/v_csm_node0001_data/SAL
    siteUniqueID:10
    isEphemeral:false
    isRecoveryClerk:true
    parentFaultGroupId:45035996273704974
    clientPort:5433
    controlAddress:127.0.0.1
    controlBroadcast:127.0.0.255
    controlPort:4803
    controlNode:45035996273704980
    .

    The only thing what is different was controlAddress and controlBroadcast.  I did not changed that.

    Then did update_vertica
    Performed sanity check of DB by bring it up and down multiple times.
    Then did add host from admintools, and soon I did, admintools crashed and database was down.

    Is there any step I am missing after cluster copy???

    The source database where I am copying from was installed using 127.0.0.1

    Any idea?? Any recommendations please???

    Mujeeb
  • Hey Mujeeb,

    It appears that the copycluster has copied the source cluster's spread.conf over to the destination cluster, which is a bug. I have been able to reproduce this issue in house and a bug has been filed.

    I am not aware of a workaround currently but am working on one and will let you know if I come up with something for now. Sorry for the inconvenience.

    You may consider setting up both clusters independently and using import/export functions to move your data. Here is more info: https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/AdministratorsGuide/CopyExportData/UsingE...

    - Mitch

  • Thank you Mitch...I will looking forward for the bug fix.

    Regards,

    Mujeeb
  • Hi Mitch! 

    A customer of mine has a similar problem. See: https://community.vertica.com/vertica/topics/2nd_node_wont_come_up_after_initial_database_creation

    Any help would be much appreciated!!

    Thanks,
    Kelley

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file