COPY FROM S3 - Proxy Parameters
I am copying data from an S3 compliant service (MinIO) successfully when both Vertica and MinIO are hosted within corporate network.
I am using information from this documentation page - https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/Eon/LoadingDataFromS3.htm
I have a need for hosting MinIO outside of corporate environment while Vertica remains within corporate environment.
In this instance, Vertica tries connecting for some time and gives up. This looks like a proxy issue. I did not find any parameters to setup for proxy parameters in the list of parameters described at https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/ConfiguringTheDB/S3Parameters.htm
Can someone give me a clue about this?
Many Thanks,
Sandeep.
I am using information from this documentation page - https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/Eon/LoadingDataFromS3.htm
I have a need for hosting MinIO outside of corporate environment while Vertica remains within corporate environment.
In this instance, Vertica tries connecting for some time and gives up. This looks like a proxy issue. I did not find any parameters to setup for proxy parameters in the list of parameters described at https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/ConfiguringTheDB/S3Parameters.htm
Can someone give me a clue about this?
Many Thanks,
Sandeep.
0
Answers
Does Vertica return a specific error message after apparently timing out? Could you also check vertica.log on the initiator node, check for the transaction of the COPY command and send any messages relating to that transaction, even debug/info level?
Hello Bryan,
I am waiting to see some errors. There are too many of these message as below:
2020-07-21 20:52:17.329 Init Session:0x7f0c4f5f5700-a00000010fcf5d [SAL] [S3UDFS] [RETRY] [Attempt 18/50] [Canceled? N] [Should retry? Y] [Operation virtual void SAL::S3FileSystem::glob(const string&, std::vector<std::pair<std::basic_string, stat> >&, SAL::GlobPruner&, const SAL::GlobContext&) const] (99) : 'Unable to connect to endpoint'
2020-07-21 20:52:17.329 Init Session:0x7f0c4f5f5700-a00000010fcf5d [SAL] [S3UDFS] [RETRY] [Attempt 18/50] [Delay 3927 msec]
2020-07-21 20:52:31.258 Init Session:0x7f0c4f5f5700-a00000010fcf5d [SAL] [S3UDFS] [RETRY] [Attempt 19/50] [Canceled? N] [Should retry? Y] [Operation virtual void SAL::S3FileSystem::glob(const string&, std::vector<std::pair<std::basic_string, stat> >&, SAL::GlobPruner&, const SAL::GlobContext&) const] (99) : 'Unable to connect to endpoint'
2020-07-21 20:52:31.258 Init Session:0x7f0c4f5f5700-a00000010fcf5d [SAL] [S3UDFS] [RETRY] [Attempt 19/50] [Delay 58349 msec]
2020-07-21 20:53:39.609 Init Session:0x7f0c4f5f5700-a00000010fcf5d [SAL] [S3UDFS] [RETRY] [Attempt 20/50] [Canceled? N] [Should retry? Y] [Operation virtual void SAL::S3FileSystem::glob(const string&, std::vector<std::pair<std::basic_string, stat> >&, SAL::GlobPruner&, const SAL::GlobContext&) const] (99) : 'Unable to connect to endpoint'
2020-07-21 20:53:39.609 Init Session:0x7f0c4f5f5700-a00000010fcf5d [SAL] [S3UDFS] [RETRY] [Attempt 20/50] [Delay 30754 msec]
20
I also turned on debugging through 'set_debug_log' for UDX and also through 'AWSLogLevel' config parameter. However, I am not able to see any more debug messages. I will post once again once I see iteration 50. :-)
Final error message is as follows.
020-07-21 21:18:02.111 Init Session:0x7f0c4f5f5700-a00000010fcf5d [SAL] [S3UDFS] [RETRY] [Attempt 50/50] [Canceled? N] [Should retry? N] [Operation virtual void SAL::S3FileSystem::glob(const string&, std::vector<std::pair<std::basic_string, stat> >&, SAL::GlobPruner&, const SAL::GlobContext&) const] (99) : 'Unable to connect to endpoint'
2020-07-21 21:18:02.112 Init Session:0x7f0c4f5f5700-a00000010fcf5d @v_vertica_node0001: 22023/7160: Cannot expand glob pattern due to error: Unable to connect to endpoint
LOCATION: expandGlobLocal, /data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_2_1-x_grader/build/vertica/Optimizer/Path/BulkLoad.cpp:2431
Are you able to connect to your out-of-corporate MinIO from within your Vertica nodes but using the MinIO client instead?
Yes, I am able to access using MinIO client. I have setup HTTP_PROXY and HTTPS_PROXY environment variables at system wide. And this helps MinIO client to access MinIO outside of the corporate network.
Please try setting HTTP_PROXY on all Vertica nodes. I am not 100% sure whether it applies to S3, but the binary does check the variable in some places.
An alternative would be to set up a SSH tunnel to MinIO, but try the above first and see if it works.
Wait, sorry missed the previous comment. You may need to set "http_proxy" - lower case, since that's what compiled in. You may need to set this for user profile as well. Also, it is necessary to restart Vertica after changing environment variables since the environment is set when the server starts.
Hi Bryan, I do have http_proxy set in my environment. However this was not set in the Vertica use profile. So I will try this. And I am using a single node Vertica. I will keep you posted.
One other thing to try if it doesn't seem to work, S3FS is handled by Storage Abstraction Layer or "SAL" so you can turn on debug for that component with
Hi Bryan, setting up http_proxy in /etc/profile file and restarting Vertica did the trick. I have a different issue now. However, there is communication between my Vertica node and MinIO instance I setup outside of my corporate network.
Thank you very kindly for the help.