Reviving DB Error
Not sure why revive DB fails.
The json file referenced seems present and accessible. Ideas?
[dbadmin@01 ~]$ aws s3 cp s3://vertica-test-dev-data/test/metadata/TEST/cluster_config.json /tmp/
download: s3://vertica-test-dev-data/test/metadata/TEST/cluster_config.json to ../../tmp/cluster_config.json
[dbadmin@01 ~]$ /opt/vertica/bin/admintools -t revive_db -s 10.168.36.58,10.168.38.174,10.168.39.38 --communal-storage-location=s3://vertica-test-dev-data/test/ -d TEST --force
Attempting to retrieve file: [s3://vertica-test-dev-data/test/metadata/TEST/cluster_config.json]
Attempting to retrieve file: [s3://vertica-test-dev-data/test/metadata/TEST/cluster_config_prev.json]
Database could not be revived.
Error:
Failed to run vertica-download-file on host(s). See /opt/vertica/log/adminTools.log for details.
10.168.36.58: Initialization thread logged exception:
Could not copy file [s3://vertica-test-dev-data/test/metadata/TEST/cluster_config.json] to [/tmp/desc.json]: Access Denied
Exiting process ( exit(1) ).
Note: I didn't set aws auth key's , assuming AWS IAM Role will provide necessary permissions when reviving DB.
Best Answer
-
I see this error in admintools.log file
2022-07-04 17:22:48.829 admintools/117044:0x7fba1bb9b740 [DBReviveHelper.downloadFile] Failing or not connected host 10.168.36.58: status=Failure host=10.168.36.58 content={"returncode": 1, "stdout": "", "stderr": "Initialization thread logged exception:\nCould not copy file [s3://vertica-test-dev-data/test/metadata/TEST/cluster_config_prev.json] to [/tmp/desc.json]: Access Denied\nExiting process ( exit(1) ).", "runner_ack": true} error_message=None
All nodes as same access and able to access S3 file from the nodes.
Key pairs are not listed in config file. I see below parameters in admintools.conf file
[BootstrapParameters]
awsendpoint = null
awsregion = null0
Answers
Were there any other errors listed in the log at /opt/vertica/log/admintools.log ?
How is permission to S3 configured: if it is using a key pair, is the key pair listed in /opt/vertica/config/admintools.conf? If configured using IAM role, please ensure all nodes have same role access, as download access is tested on all nodes.
I see this error in admintools.log file
2022-07-04 17:22:48.829 admintools/117044:0x7fba1bb9b740 [DBReviveHelper.downloadFile] Failing or not connected host 10.168.36.58: status=Failure host=10.168.36.58 content={"returncode": 1, "stdout": "", "stderr": "Initialization thread logged exception:\nCould not copy file [s3://vertica-test-dev-data/test/metadata/TEST/cluster_config.json] to [/tmp/desc.json]: Access Denied\nExiting process ( exit(1) ).", "runner_ack": true} error_message=None
All nodes as same Role attached and able to access S3 file from all nodes.
Key pairs are not listed in config file. I see below parameters in admintools.conf file
[BootstrapParameters]
awsendpoint = null
awsregion = null
Other files to check are in the catalog directory on the same node as admintools, such as vertica.log, startup.log, bootstrap-catalog.log
One of these should show the root cause, such as credential failure or local filesystem access.
I can't locate above mentioned log files. Found vertica-download-file.log as below information
2022-07-04 17:45:02.682 INFO New log
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] Log /opt/vertica/log/vertica-download-file.log opened; #1
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] Processing command line: /opt/vertica/bin/vertica-download-file --source-file s3://vertica-test-dev-data/test/metadata/TEST/cluster_config_prev.json --destination-file /tmp/desc.json --logdir /opt/vertica/log
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] Starting up Vertica Analytic Database v11.0.2-3
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] Project Codename: Jackhammer
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] vertica(v11.0.2-3) built by @re-docker4 from releases/VER_11_0_RELEASE_BUILD_2_3_20220120@6b72cf4302c1929c7bc342021a845dcca1ed7005 on 'Thu Jan 20 17:55:36 2022' $BuildId$
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] CPU architecture: x86_64
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] 64-bit Optimized Build
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] Compiler Version: 7.3.1 20180303 (Red Hat 7.3.1-5)
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] LD_LIBRARY_PATH=/opt/vertica/lib:/opt/vertica/lib
2022-07-04 17:45:02.682 Main Thread:0x7fcd3af38280 [Init] LD_PRELOAD=
2022-07-04 17:45:02.683 Main Thread:0x7fcd3af38280 @[initializing]: 00000/5081: Total swap memory used: 0
2022-07-04 17:45:02.683 Main Thread:0x7fcd3af38280 @[initializing]: 00000/4435: Process size resident set: 47468544
2022-07-04 17:45:02.683 Main Thread:0x7fcd3af38280 @[initializing]: 00000/5075: Total Memory free + cache: 132048199680
2022-07-04 17:45:02.683 Main Thread:0x7fcd3af38280 [Main] Trying to reach http://169.254.169.254 to infer AWS region.
2022-07-04 17:45:02.684 Main Thread:0x7fcd3af38280 [Basics] Attempting to copy file [s3://vertica-test-dev-data/test/metadata/TEST/cluster_config_prev.json] -> [/tmp/desc.json]
2022-07-04 17:45:02.773 Main Thread:0x7fcd3af38280 [SAL] Shutting down HadoopFS watchdog
2022-07-04 17:45:02.773 Main Thread:0x7fcd3af38280 [SAL] Attempted to add an shutdown without an initialized Watchdog
2022-07-04 17:45:02.773 Main Thread:0x7fcd3af38280 [SAL] Unmounting file system 0(Default Linux File System).
2022-07-04 17:45:02.773 Main Thread:0x7fcd3af38280 [SAL] Unmounting file system 1(Hadoop File System).
2022-07-04 17:45:02.773 Main Thread:0x7fcd3af38280 [SAL] Unmounting file system 2(Libhdfs++ File System).
2022-07-04 17:45:02.773 Main Thread:0x7fcd3af38280 [SAL] Unmounting file system 3(S3 File System).
2022-07-04 17:45:02.774 Main Thread:0x7fcd3af38280 [SAL] Unmounting file system 4(Google Cloud Storage).
2022-07-04 17:45:02.774 Main Thread:0x7fcd3af38280 [SAL] Unmounting file system 5(AzureBlob File System).
2022-07-04 17:45:02.774 Main Thread:0x7fcd3af38280 [SAL] Unmounting file system 6(Vertica File System).
Here all the files that are under vertica log directory (/opt/vertica/log/)
[dbadmin@ 01 ~]$ ls -l /opt/vertica/log
total 712
-rw-r--r-- 1 dbadmin dbadmin 0 Jul 4 15:44 adminTools.errors
-rw-r--r-- 1 dbadmin dbadmin 310473 Jul 5 17:16 adminTools.log
-rw-r--r-- 1 dbadmin dbadmin 0 Jul 4 15:44 agent.log
-rw-r--r-- 1 dbadmin dbadmin 6 Jul 4 15:44 agent.pid
-rw-r--r-- 1 dbadmin dbadmin 1716 Jul 4 15:44 agentStdMsg.log
-rw-r--r-- 1 dbadmin dbadmin 0 Jul 4 15:44 agent_dbadmin.err
-rw-r--r-- 1 dbadmin dbadmin 0 Jul 4 15:44 agent_dbadmin.log
drwxr-xr-x 2 root root 100 Jul 4 15:43 all-local-verify-20220704_154347
-rw-r--r-- 1 dbadmin dbadmin 921 Jul 5 03:05 do_logrotate.log
-rw-r--r-- 1 root root 359046 Jul 4 15:44 install.log
drwxr-xr-x 5 root root 73 Jul 4 15:43 local-coerce-20220704_154344
-rw-r--r-- 1 dbadmin dbadmin 224 Jul 5 03:05 logrotate.state
lrwxrwxrwx 1 root root 46 Jul 4 15:43 verify-latest.xml -> local-coerce-20220704_154344/iter03/verify.xml
-rw-rw-r-- 1 dbadmin dbadmin 36888 Jul 4 17:45 vertica-download-file.log
[dbadmin@ 01 ~]$
Could you check your AWS credentials with "aws configure list"? Don't post any of the info, but please check: is the region correct? Is there an access_key and/or secret_key listed?
If there is an access_key and/or secret_key, these must also be configured to revive Vertica DB using -x switch to pass AWS credentials as shown in "Creating a Parameter File" section at https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/Eon/RevivingAnEonDatabase.htm
No we didn't setup aws configuration on any of the nodes and we don't want to use access and secret keys due to security reasons. We do setup AWS IAM Role with appropriate permissions to access communal storage (we tested these permissions on each node and able to access objects in communal storage).
_Depending on your environment, run one of the following admintools command lines:
AWS:
$ admintools -t revive_db \
--communal-storage-location=s3://communal_store_path \
-s host1,... -d database_name_
I was reviving database on AWS based above statement, I should able to revive db without -x switch. I found this on same link you provided.
Do you have access to the IAM policy? I checked our CFT and it appears that Vertica expects the following S3 permissions:
"s3:ListBucket",
"s3:GetObject",
"s3:GetBucketLocation",
"s3:GetObjectAcl",
"s3:PutObject",
"s3:PutObjectAcl"
It might be possible to copy from command line without all of these, but Vertica probably also verifies one or more of the above permissions before or during copy using AWS API.
Yes, we provide full s3 permissions "s3:*"
The last thing I can think of is a local issue: verify permissions on /tmp (e.g. ls -altr) and also check permissions on /opt/vertica as well as catalog folder shown in /opt/vertica/config/admintools.conf since it's possible that the real issue is that the JSON file is downloaded to /tmp but then can't be copied to /opt/vertica/config or the catalog folder.
If there is no local permissions issue, I recommend opening a support case if possible.
Thanks Bryan for all your help. Open a support case and able to resolve the issue. The issue we run into with EC2 metadata services, we were using IMDSv2 which is the version not supported yet. After switching back to previous version IMDSv1 we were able to connect to s3.