Not Able to access HDFS table from Vertica

Hi,
I have created the text tables in HDFS. Which i am not able to access from vertica external table.

admin=> select count(*),APPLICATION_ID from HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT group by 2 order by 2;
ERROR 2886: Could not open file [hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text_test/IPBILL_RAW_TRANSFORMED.txt] for reading; No such file or directory
HINT: Make sure the file is available on the specified node. If using ON ANY NODE, it must be available at this path on all nodes in the cluster since any node could load the file

But i am able to access parquet/orc tables from HDFS with same ipaddress and port.

Is there anything i missed? why i am able to access parquet/orc table and why not text table using vertica external?

Thanks & Regards,
Akshath Hegde

Comments

  • So I shall post the obvious question first. Please specify the definition of your external table and the hdfs -dfs -ltr . Have you followed the necessary steps for Vertica to be able to talk to HDFS?

  • Yes, I have followed the steps. And i have copied all the necessary configuration files to vertica nodes and executed steps mentioned in documentation.

    And i am able to access the parquet/orc table from vertica, But not able to access the text file:
    ORC file path:

    -bash-4.1$ hadoop fs -ls /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_orc
    Found 1 items
    -rwxrwxrwt 3 hdfs hive 16704106 2017-10-16 20:33 /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_orc/000000_0

    Text file Path:

    -bash-4.1$ hadoop fs -ls /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text
        Found 1 items
        -rwxrwxrwt   3 hdfs hive  180182335 2017-10-16 20:30 /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/000000_0
    

    Table creation as follows:

    Text table
    CREATE EXTERNAL TABLE HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT(SEQUENCE_NO int,....,TERMINATING_TIME timestamp,FIELD_ID int,INFO_PARAMETER varchar(150)) AS COPY FROM 'hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*' delimiter ',';
    
    ORC Table
    CREATE EXTERNAL TABLE HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_ORC(SEQUENCE_NO int,....,TERMINATING_TIME timestamp,FIELD_ID int,INFO_PARAMETER varchar(150)) AS COPY FROM 'hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_orc/*' orc;
    

    Please suggest where i am doing wrong?

  • From hdfs dfs -ls command output:

    /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text
    /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/000000_0

    Error from Vertica posted above.
    /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text_test/IPBILL_RAW_TRANSFORMED.txt

    path from your table DDL:

    hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*'

    Which one should I go with?

  • Hi,
    Ignore earlier one, pasting all information in this.
    Following are the commands i used:

    Hadoop File listing:

    -bash-4.1$ hdfs dfs -ls /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*
    -rwxrwxrwt   3 hdfs hive  180182335 2017-10-16 20:30 /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/000000_0
    

    Created Table:
    drop table if exists HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT;

    CREATE EXTERNAL TABLE HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT(SEQUENCE_NO int,....,TERMINATING_TIME timestamp,FIELD_ID int,INFO_PARAMETER varchar(150)) AS COPY FROM 'hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*' delimiter ',';
    

    Querying table:

    tdaadmin=> select count(*),APPLICATION_ID from HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT group by 2 order by 2;
    ERROR 2886:  Could not open file [hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/000000_0] for reading; No such file or directory
    HINT:  Make sure the file is available on the specified node.  If using ON ANY NODE, it must be available at this path on all nodes in the cluster since any node could load the file
    

    Not able to query anything from the vertica external table, data is available in table. Still we are not able to query.

    Data Available in hdfs is proper and i am able to run same query from Hive QL:

    select count(*),APPLICATION_ID from WTDADEV.WTDA_IP_BILL_TRANSFORMED_TEXT group by APPLICATION_ID order by APPLICATION_ID;
    1275 NULL
    946822 1
    16351 2
    28 102
    Time taken: 70.786 seconds, Fetched: 9 row(s)

    Please suggest what's wrong here.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    I think the file format is ORC.

    Can you try:

    CREATE EXTERNAL TABLE HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT(SEQUENCE_NO int,....,TERMINATING_TIME timestamp,FIELD_ID int,INFO_PARAMETER varchar(150)) AS COPY FROM 'hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*' ON ANY NODE ORC;

  • @Jim. I believe you are right. The file was saved as ORC but he is trying to read it as a text file.

  • Hi,
    Its TEXT Table from hive, its not ORC format.

    Here is table creation from hive:

    create table WTDADEV.WTDA_IP_BILL_TRANSFORMED_TEXT (
    SEQUENCE_NO int,
    .
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
    STORED AS textfile;

    Same table loaded from the one more text table directly, that's reason when i do hdfs dfs -ls i am getting filename as 000000_0.

    I am able to query the text file from Hive, but not able to query the same from Vertica.

    I am 100% sure that the file which is stored in HDFS is TEXT file. Content of file as follows:

    -bash-4.1$ll 000000_0
    ls: cannot access 000000_0: No such file or directory
    -bash-4.1$
    -bash-4.1$hdfs dfs -copyToLocal /user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/* .
    -bash-4.1$
    -bash-4.1$ll 000000_0
    -rw-r--r-- 1 hdfs hdfs 180182335 Nov 2 16:48 000000_0
    -bash-4.1$
    -bash-4.1$
    -bash-4.1$ cat 000000_0 | head -3
    2700001,1,2015-11-09 00:00:02,\N,1,23054738408,2482632189,2482632189,36,2,2000000000000285,2000000000000315,37,1111,0,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0,1,940,2,4,2015-11-09 00:00:38,3080,
    2700002,1,2015-11-09 00:00:03,\N,1,2482628963,2482613793,2482613793,0,2,2000000000000285,2000000000000315,\N,1111,0,0,0,0,0,19,0,0,0,0,0,0,0,0,0,0,1,940,2,4,2015-11-09 00:00:03,2918,
    2700003,1,2015-11-09 00:00:03,\N,2,2482519009,2543456,2482519009,47,2,2000000000000285,2000000000000315,37,954,1,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0,1,1344,2,6,2015-11-09 00:00:50,3629,

    Please suggest where i am doing wrong.

  • lets try this. Push that file to HDFS yourself. copy from local to hdfs as .txt file. so when you do hdfs dfs -ls you get IPBILL_RAW_TRANSFORMED.txt. Create an external Hive table on top of it to verify you can read it. Then it vertica point to the hdfs path where you placed that file. so your vertica external table will be what you did originally. "COPY FROM 'hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*.txt". Try that and let us know if that works.

  • Hi,
    Done the same as suggested:
    File is copied to HDFS location:

    -bash-4.1$ hdfs dfs -put IPBILL_RAW_TRANSFORMED.txt /user/hdfs/externalTableDir/
    -bash-4.1$
    -bash-4.1$ hdfs dfs -ls /user/hdfs/externalTableDir/
    Found 1 items
    -rw-r--r-- 3 hdfs supergroup 221877145 2017-11-03 16:52 /user/hdfs/externalTableDir/IPBILL_RAW_TRANSFORMED.txt

    Hive table created on the same as follows:

    hive>create EXTERNAL table WTDADEV.WTDA_IP_BILL_TRANSFORMED_HIVE_EXT (
    SEQUENCE_NO int,
    .
    .
    INFO_PARAMETER varchar(150)
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    LOCATION '/user/hdfs/externalTableDir/';

    hive>select count(*),APPLICATION_ID from WTDADEV.WTDA_IP_BILL_TRANSFORMED_HIVE_EXT group by APPLICATION_ID order by APPLICATION_ID;
    OK
    1275 NULL
    946822 1
    16351 2
    28 102
    42 103
    1 202
    1 203
    1 244
    35479 284

    Created vertica external table on same table:

    tdaadmin=> drop table HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT;
    DROP TABLE
    tdaadmin=> CREATE EXTERNAL TABLE HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT(SEQUENCE_NO int,...,ETER varchar(150)) AS COPY FROM 'hdfs:///user/hdfs/externalTableDir/*' delimiter ',';
    CREATE TABLE

    Trying to query the table from vertica external table:

    tdaadmin=>
    tdaadmin=> select count(*),APPLICATION_ID from HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT group by APPLICATION_ID order by APPLICATION_ID;
    ERROR 2886: Could not open file [hdfs:///user/hdfs/externalTableDir/IPBILL_RAW_TRANSFORMED.txt] for reading; No such file or directory
    HINT: Make sure the file is available on the specified node. If using ON ANY NODE, it must be available at this path on all nodes in the cluster since any node could load the file

    This is giving error as before, can anyone please suggest where i am doing wrong?

  • By any chance, is your cluster Kerberized?
    also, can you please post the output from the vertica log when you run the query?

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2017

    Is "tdaadmin" the Vertica database admin account (i.e. dbadmin)?

    From the doc:
    You must be a database superuser to create external tables, unless you have created a user-accessible storage location to which the COPY refers, see CREATE LOCATION. If external tables exist, you must also be a database superuser to access them through a select statement.

    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/CREATEEXTERNALTABLEASCOPY.htm

  • By any chance, is your cluster Kerberized?

    No, its not kerberised.

    also, can you please post the output from the vertica log when you run the query?

    [tdaadmin@tda3 ~]$ tail -f /opt/vertica/tdadb/tdadb/v_tdadb_node0003_catalog/vertica.log
    2017-11-06 15:26:21.001 TM Moveout:7f9cfc7f2700-c00000005d1fb5 [Txn] <INFO> Begin Txn: c00000005d1fb5 'Moveout: Tuple Mover'
    2017-11-06 15:26:21.001 TM Moveout:7f9cfc7f2700-c00000005d1fb5 [Txn] <INFO> Rollback Txn: c00000005d1fb5 'Moveout: Tuple Mover'
    2017-11-06 15:26:21.001 TM Moveout:7f9cfc7f2700 [Util] <INFO> Task 'TM Moveout' enabled
    2017-11-06 15:26:22.001 DiskSpaceRefresher:7f9cfb3f0700 [Util] <INFO> Task 'DiskSpaceRefresher' enabled
    2017-11-06 15:26:22.445 Init Session:7f9cfbdf1700-c00000005cbfa6 [Session] <INFO> [Query] TX:c00000005cbfa6(v_tdadb_node0003-1450:0x186367) select count(*),APPLICATION_ID from HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT group by APPLICATION_ID order by APPLICATION_ID;
    2017-11-06 15:26:22.486 Init Session:7f9cfbdf1700-c00000005cbfa6 [LocalPlanner] <INFO> finalizeThreadsToUseForLoad(): max_threads_allowed = 4, targetThreads = 1
    2017-11-06 15:26:32.003 DiskSpaceRefresher:7f9cdcf91700 [Util] <INFO> Task 'DiskSpaceRefresher' enabled
    2017-11-06 15:26:39.005 AnalyzeRowCount:7f9bc2eff700 [Util] <INFO> Task 'AnalyzeRowCount' enabled
    2017-11-06 15:26:42.000 DiskSpaceRefresher:7f9bc0e27700 [Util] <INFO> Task 'DiskSpaceRefresher' enabled
    2017-11-06 15:26:52.004 DiskSpaceRefresher:7f9bc0e27700 [Util] <INFO> Task 'DiskSpaceRefresher' enabled
    2017-11-06 15:26:52.517 Spread Client:7f9d964f8700 [Dist] <INFO> Dist::cancelPlan: canceling state EXECUTION COMPILED(2): c00000005d1fb6
    2017-11-06 15:26:52.517 Spread Client:7f9d964f8700 [Dist] <INFO> Dist::cancelPlan: canceled state EXECUTION COMPILED(2): c00000005d1fb6
    2017-11-06 15:26:52.524 Init Session:7f9cfbdf1700-c00000005cbfa6 <ERROR> @v_tdadb_node0001: 58V01/2886: Could not open file [hdfs:///user/hdfs/externalTableDir/IPBILL_RAW_TRANSFORMED.txt] for reading; No such file or directory
            HINT:  Make sure the file is available on the specified node.  If using ON ANY NODE, it must be available at this path on all nodes in the cluster since any node could load the file
            LOCATION:  stat, /scratch_a/release/svrtar11561/vbuild/vertica/Plan/Steps.cpp:5828
    

    Is "tdaadmin" the Vertica database admin account (i.e. dbadmin)?

    Yes it is admin user

    I am able to access hdfs files if a create it in ORC or PARQUET format, but only text format is giving error when we create the table!!!

  • This might be a silly question but have you copied the hdfs config files to all the nodes right?
    By default vertica looks for these files under /etc/hadoo/conf.
    Run clear hdfs as well. Something isnt setup right on your system.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file