Failed to create database
Maybe somebody has seen this before and can point me in the right direction.
First off, I can create a database on this server using Vertica 9.2.1. But it fails for 9.3.1 or 10.0.1. What I really need is import a 9.2.1 database and upgrade it to 9.3.1 and then 10.0.1. The import works, but the upgrade from 9.2.1 to 9.3.1 failed because the catalog failed to bootstrap. It was then that I found I can't even create a database on a single node. The logfiles don't really tell me what is wrong.
$ admintools -t create_db --hosts 30.201.184.175 -d 'ksdsspv1' -c /catalog -D /data -p xxx
Database with 1 or 2 nodes cannot be k-safe and it may lose data if it crashes
Distributing changes to cluster.
Creating database ksdsspv1
Bootstrap on host 30.201.184.175 return code -4 stdout '' stderr ''
Error: Bootstrap on host 30.201.184.175 return code -4 stdout '' stderr ''
Vertica is 9.3.1-20
OS is RedHat 7.9
In admintools.log, I see these entries:
2021-05-04 12:28:36.005 at_exec/30684:0x7f8d17ba4740 [root.setup_custom_logging] New log for 'at_exec'
2021-05-04 12:28:36.007 at_exec/30684:0x7f8d17ba4740 [root.setup_custom_logging] sys.argv: '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/at_runner.py' --no-prompt --module 'vertica.engine.api.bootstrap_catalog'
2021-05-04 12:28:36.129 at_exec/30684:0x7f8d17ba4740 [ATRunner.exec_module] running: module=vertica.engine.api.bootstrap_catalog version=1.0 args={"node": {"name": "v_ksdsspv1_node0001", "oid": null, "catalogpath": "/catalog/ksdsspv1/v_ksdsspv1_node0001_catalog", "storagelocs": ["/data/ksdsspv1/v_ksdsspv1_node0001_data"], "host": "30.201.184.175", "port": 5433, "controlnode": null, "startcmd": null, "isprimary": true}, "db_name": "ksdsspv1", "control_addr": "30.201.184.175", "broadcast_addr": "30.201.184.255", "largecluster": null, "mode": "broadcast", "logging": "False", "ipv6": false, "client_port": 5433, "bootstrap_params": "*****", "__dbpasswd": "*****", "communal_storage_url": null, "num_shards": null, "depot_path": null, "depot_size": null, "branch_name": "", "aws_access_key_id": "*****", "aws_secret_access_key": "*****"}
2021-05-04 12:28:36.551 at_exec/30684:0x7f8d17ba4740 [ATRunner.exec_module] result: status=Failure host=None content={"returncode": -4, "stdout": "", "stderr": "", "runner_ack": true} error_message=None
2021-05-04 12:28:36.691 admintools/30612:0x7fbf96e63740 [at_command.to_python_invocation] Command: [/opt/vertica/oss/python3/bin/python3 -m vertica.engine.api.at_runner --module=vertica.engine.api.bootstrap_catalog]; input: [#{"node": {"name": "v_ksdsspv1_node0001", "oid": null, "catalogpath": "/catalog/ksdsspv1/v_ksdsspv1_node0001_catalog", "storagelocs": ["/data/ksdsspv1/v_ksdsspv1_node0001_data"], "host": "30.201.184.175", "port": 5433, "controlnode": null, "startcmd": null, "isprimary": true}, "db_name": "ksdsspv1", "control_addr": "30.201.184.175", "broadcast_addr": "30.201.184.255", "largecluster": null, "mode": "broadcast", "logging": "False", "ipv6": false, "client_port": 5433, "communal_storage_url": null, "num_shards": null, "depot_path": null, "depot_size": null, "branch_name": "", "rollback": false, "protocol": "1.0"}]
2021-05-04 12:28:36.692 admintools/30612:0x7fbf96e63740 [NewSSH.createDBMultiNodes] Bootstrap on host 30.201.184.175 return code -4 stdout '' stderr ''
Answers
As it's failing on Bootstrap, have you checked bootstrap log on node 30.201.184.175?
Bootstrap on host 30.201.184.175 return code -4 stdout '' stderr ''
Error: Bootstrap on host 30.201.184.175 return code -4 stdout '' stderr ''
Bootstrap log is empty. Issue appears to be specific to AMD Opteron CPU. But I don't know exactly what the problem is. Things work fine on Intel. In the messages log I see: kernel: traps: bootstrap-catal[447046] trap invalid opcode ip:4baf047 sp:7ffe87d2b0f0 error:0 in vertica[400000+6f96000]. CPU sends a SIGILL to the bootstrap process.