Upgrade to 9.0.1 - difficulties transitioning from "SOURCE HDFS(...)" to direct HDFS

KKirkpatrickKKirkpatrick Registered User

With support for HDFS Connector dropped in 9.0.1, I need to convert many HDFS COPY scripts to use direct HDFS syntax. I have successfully done this (and tested), but it only works if I configure just one Hadoop cluster.

That is, I have a hdfs-site.xml and core-site.xml file in directory /etc/hadoop/conf/ENV1 (migrated to all nodes), and set:
ALTER DATABASE srvvertica SET HadoopConfDir = '/etc/hadoop/conf/ENV1';

There is no problem.... SELECT VERIFY_HADOOP_CONF_DIR (); returns no errors and COPY table FROM 'hdfs:///file.dat' works perfectly.

Our database pulls from multiple Hadoop clusters. This seems to be supported, and is documented here: https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/libhdfs/ConfiguringAccessToHDFS.htm?TocPath=Integrating%20with%20Apache%20Hadoop|Reading%20Directly%20from%20HDFS|_____1
under sub-section, "Using More Than One Hadoop Cluster"

However, when I try to configure both clusters by setting:
ALTER DATABASE srvvertica SET HadoopConfDir = '/etc/hadoop/conf/ENV1:/etc/hadoop/conf/ENV2';

I get following validation error for each node:
SELECT CLEAR_HDFS_CACHES();
Cleared
SELECT VERIFY_HADOOP_CONF_DIR ();
v_node0001: Configuration at [/etc/hadoop/conf/ENV2] declares defaultFS but it was already declared in the configuration at [/etc/hadoop/conf/ENV1]

If I remove the defaultFS from ENV2 (which doesn't make sense, but worth a shot), I get the opposite error:
SELECT CLEAR_HDFS_CACHES();
Cleared
v_node0001: No fs.defaultFS parameter found in config files in [/etc/hadoop/conf/ENV2]

I should note, the problem is not with ENV2. If I change back to a single-cluster configuration that points to ENV2, that also works.

Comments

  • Jim_KnicelyJim_Knicely Employee, Registered User, VerticaExpert

    When Vertica loads HDFS configurations it will warn when it detects any of the following:

    1. Multiple declarations of dfs.defaultFS in different configurations.
    2. Multiple declarations of the same nameservice in different configurations.
    3. A directory path that contains an empty or invalid configuration.

    I think you first hit issue 1 first, and then issue 3.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file