Upgrade to 9.0.1 - difficulties transitioning from "SOURCE HDFS(...)" to direct HDFS

KKirkpatrickKKirkpatrick Registered User

With support for HDFS Connector dropped in 9.0.1, I need to convert many HDFS COPY scripts to use direct HDFS syntax. I have successfully done this (and tested), but it only works if I configure just one Hadoop cluster.

That is, I have a hdfs-site.xml and core-site.xml file in directory /etc/hadoop/conf/ENV1 (migrated to all nodes), and set:
ALTER DATABASE srvvertica SET HadoopConfDir = '/etc/hadoop/conf/ENV1';

There is no problem.... SELECT VERIFY_HADOOP_CONF_DIR (); returns no errors and COPY table FROM 'hdfs:///file.dat' works perfectly.

Our database pulls from multiple Hadoop clusters. This seems to be supported, and is documented here: https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/libhdfs/ConfiguringAccessToHDFS.htm?TocPath=Integrating%20with%20Apache%20Hadoop|Reading%20Directly%20from%20HDFS|_____1
under sub-section, "Using More Than One Hadoop Cluster"

However, when I try to configure both clusters by setting:
ALTER DATABASE srvvertica SET HadoopConfDir = '/etc/hadoop/conf/ENV1:/etc/hadoop/conf/ENV2';

I get following validation error for each node:
SELECT CLEAR_HDFS_CACHES();
Cleared
SELECT VERIFY_HADOOP_CONF_DIR ();
v_node0001: Configuration at [/etc/hadoop/conf/ENV2] declares defaultFS but it was already declared in the configuration at [/etc/hadoop/conf/ENV1]

If I remove the defaultFS from ENV2 (which doesn't make sense, but worth a shot), I get the opposite error:
SELECT CLEAR_HDFS_CACHES();
Cleared
v_node0001: No fs.defaultFS parameter found in config files in [/etc/hadoop/conf/ENV2]

I should note, the problem is not with ENV2. If I change back to a single-cluster configuration that points to ENV2, that also works.

Comments

  • Jim_KnicelyJim_Knicely Administrator, Moderator, Employee, Registered User, VerticaExpert

    When Vertica loads HDFS configurations it will warn when it detects any of the following:

    1. Multiple declarations of dfs.defaultFS in different configurations.
    2. Multiple declarations of the same nameservice in different configurations.
    3. A directory path that contains an empty or invalid configuration.

    I think you first hit issue 1 first, and then issue 3.

  • MSQMSQ Registered User
    edited October 18

    Hi,
    @Jim_Knicely, when you said "Multiple declarations of dfs.defaultFS in different configurations" - does that mean Vetica does not really support using more than one hadoop cluster? (in contrary to what is documented in guide?) ?

    I am having same issue that KKirkpatrick user has. I am trying to configure Vertica 9.1.03 to be able to access two different Hadoop clusters. I followed the steps in "Using More Than One Hadoop Cluster" section in this guide: https://www.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/libhdfs/ConfiguringAccessToHDFS.htm?TocPath=Integrating%20with%20Apache%20Hadoop|Using%20HDFS%C2%A0URLs|_____2

    my HadoopConfDir parameter is as below:
    ALTER DATABASE PDW SET HadoopConfDir = '/opt/app/vertica/config/hadoop/conf/cpp_ft:/opt/app/vertica/config/hadoop/conf/cpp_dr2'

    I get below validation failure.
    SELECT VERIFY_HADOOP_CONF_DIR();
    Validation Failure
    v_pdw_node0001: Configuration at [/opt/app/vertica/config/hadoop/conf/cpp_dr2] declares defaultFS but it was already declared in the configuration at [/opt/app/vertica/config/hadoop/conf/cpp_ft]
    v_pdw_node0002: Configuration at [/opt/app/vertica/config/hadoop/conf/cpp_dr2] declares defaultFS but it was already declared in the configuration at [/opt/app/vertica/config/hadoop/conf/cpp_ft]
    v_pdw_node0003: Configuration at [/opt/app/vertica/config/hadoop/conf/cpp_dr2] declares defaultFS but it was already declared in the configuration at [/opt/app/vertica/config/hadoop/conf/cpp_ft]

    A note that cpp_ft and cpp_dr2 directories have correct core-site.xm and hdfs-site.xml files.

    Appreciate your help.

    Thank you!

  • MSQMSQ Registered User

    Hi,
    @Jim_Knicely, when you said "Multiple declarations of dfs.defaultFS in different configurations" - does that mean Vetica does not really support using more than one hadoop cluster? (in contrary to what is documented in guide?) ?

    I am having same issue that KKirkpatrick user has. I am trying to configure Vertica 9.1.03 to be able to access two different Hadoop clusters. I followed the steps in "Using More Than One Hadoop Cluster" section in this guide: https://www.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/libhdfs/ConfiguringAccessToHDFS.htm?TocPath=Integrating%20with%20Apache%20Hadoop|Using%20HDFS%C2%A0URLs|_____2

    my HadoopConfDir parameter is as below:
    ALTER DATABASE PDW SET HadoopConfDir = '/opt/app/vertica/config/hadoop/conf/cpp_ft:/opt/app/vertica/config/hadoop/conf/cpp_dr2'

    I get below validation failure.
    SELECT VERIFY_HADOOP_CONF_DIR();
    Validation Failure
    v_pdw_node0001: Configuration at [/opt/app/vertica/config/hadoop/conf/cpp_dr2] declares defaultFS but it was already declared in the configuration at [/opt/app/vertica/config/hadoop/conf/cpp_ft]
    v_pdw_node0002: Configuration at [/opt/app/vertica/config/hadoop/conf/cpp_dr2] declares defaultFS but it was already declared in the configuration at [/opt/app/vertica/config/hadoop/conf/cpp_ft]
    v_pdw_node0003: Configuration at [/opt/app/vertica/config/hadoop/conf/cpp_dr2] declares defaultFS but it was already declared in the configuration at [/opt/app/vertica/config/hadoop/conf/cpp_ft]

    A note that cpp_ft and cpp_dr2 directories have correct core-site.xm and hdfs-site.xml files.

    Appreciate your help. Thank you!

  • SruthiASruthiA Employee, Registered User, VerticaExpert

    @MSQ: What you are seeing is expected and you can ignore the warning. This warning is only seen when loading the hadoop config path and not per query.. You need to take care of one thing. You shouldn't specify hdfs:/// in your copy statement because it can cause ambiguity. So you should always specify namenode in your hdfs URL of the copy statement.

  • Sharon_CutterSharon_Cutter Registered User

    So you should always specify namenode in your hdfs URL of the copy statement.

    Or better, the name service.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file