Not Able to access HDFS table from Vertica
Hi,
I have created the text tables in HDFS. Which i am not able to access from vertica external table.
admin=> select count(*),APPLICATION_ID from HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT group by 2 order by 2;
ERROR 2886: Could not open file [hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text_test/IPBILL_RAW_TRANSFORMED.txt] for reading; No such file or directory
HINT: Make sure the file is available on the specified node. If using ON ANY NODE, it must be available at this path on all nodes in the cluster since any node could load the file
But i am able to access parquet/orc tables from HDFS with same ipaddress and port.
Is there anything i missed? why i am able to access parquet/orc table and why not text table using vertica external?
Thanks & Regards,
Akshath Hegde
Comments
So I shall post the obvious question first. Please specify the definition of your external table and the hdfs -dfs -ltr . Have you followed the necessary steps for Vertica to be able to talk to HDFS?
Yes, I have followed the steps. And i have copied all the necessary configuration files to vertica nodes and executed steps mentioned in documentation.
And i am able to access the parquet/orc table from vertica, But not able to access the text file:
ORC file path:
Text file Path:
Table creation as follows:
Please suggest where i am doing wrong?
From hdfs dfs -ls command output:
/user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text
/user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/000000_0
Error from Vertica posted above.
/user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text_test/IPBILL_RAW_TRANSFORMED.txt
path from your table DDL:
hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*'
Which one should I go with?
Hi,
Ignore earlier one, pasting all information in this.
Following are the commands i used:
Hadoop File listing:
Created Table:
drop table if exists HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT;
Querying table:
Not able to query anything from the vertica external table, data is available in table. Still we are not able to query.
Data Available in hdfs is proper and i am able to run same query from Hive QL:
Please suggest what's wrong here.
Hi,
I think the file format is ORC.
Can you try:
CREATE EXTERNAL TABLE HADOOPTEST.WTDA_IP_BILL_TRANSFORMED_TEXT(SEQUENCE_NO int,....,TERMINATING_TIME timestamp,FIELD_ID int,INFO_PARAMETER varchar(150)) AS COPY FROM 'hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*' ON ANY NODE ORC;
@Jim. I believe you are right. The file was saved as ORC but he is trying to read it as a text file.
Hi,
Its TEXT Table from hive, its not ORC format.
Here is table creation from hive:
Same table loaded from the one more text table directly, that's reason when i do hdfs dfs -ls i am getting filename as 000000_0.
I am able to query the text file from Hive, but not able to query the same from Vertica.
I am 100% sure that the file which is stored in HDFS is TEXT file. Content of file as follows:
Please suggest where i am doing wrong.
lets try this. Push that file to HDFS yourself. copy from local to hdfs as .txt file. so when you do hdfs dfs -ls you get IPBILL_RAW_TRANSFORMED.txt. Create an external Hive table on top of it to verify you can read it. Then it vertica point to the hdfs path where you placed that file. so your vertica external table will be what you did originally. "COPY FROM 'hdfs:///user/hive/warehouse/wtdadev.db/wtda_ip_bill_transformed_text/*.txt". Try that and let us know if that works.
Hi,
Done the same as suggested:
File is copied to HDFS location:
Hive table created on the same as follows:
Created vertica external table on same table:
Trying to query the table from vertica external table:
This is giving error as before, can anyone please suggest where i am doing wrong?
By any chance, is your cluster Kerberized?
also, can you please post the output from the vertica log when you run the query?
Is "tdaadmin" the Vertica database admin account (i.e. dbadmin)?
From the doc:
You must be a database superuser to create external tables, unless you have created a user-accessible storage location to which the COPY refers, see CREATE LOCATION. If external tables exist, you must also be a database superuser to access them through a select statement.
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/CREATEEXTERNALTABLEASCOPY.htm
By any chance, is your cluster Kerberized?
also, can you please post the output from the vertica log when you run the query?
Is "tdaadmin" the Vertica database admin account (i.e. dbadmin)?
I am able to access hdfs files if a create it in ORC or PARQUET format, but only text format is giving error when we create the table!!!
This might be a silly question but have you copied the hdfs config files to all the nodes right?
By default vertica looks for these files under /etc/hadoo/conf.
Run clear hdfs as well. Something isnt setup right on your system.