Accessing a Partitioned ORC
Hi,
this is my first time here, so sorry for any mistake.
I am running Vertica 7.2 and need to map some external tables based on ORC files which reside on a Hadoop (Hive) database. I'm using the ORC reader instead of the HCatalog because is much more fast.
I defined the external tables as described in the documentation (create external table ... as copy ... from '/<path>/*' ORC) and this works for all the "tables" that have multiple ORC files in the same folder.
I also have a "table" that is partitioned by a field and the result is that I see a bunch of (I guess, I'm not keen on Hive) unpredictable sub-folders (each containing a set of ORC files) based on that partitioning criteria. The issue here is that I cannot point Vertica to the "main" folder because it will give me an error saying that it finds a folder instead of a file, but at the same time I don't know how many "sub-folders" we have (supposing I want to create a different external table for each partition).
I cannot find so much documentation on these ORC files, did anybody faced this problem before?
Any help is appreciated
Stefano
Comments
You can read nested folders by specifying a "/*" for each sub-directory.
Assuming you have a single partition column in Hive, the above syntax will read into all the sub-directories in the partition folder.
Does this solve your problem ?
And if you have more than one layer of partitioning, add wildcard directories to the path. For example, if you have two partition columns, use '/path/*/*/*'. The number of wildcards should always be one more than the number of levels of partitioning in the directory tree.
Hi Deepak and Monica,
thank you very much for your solutions! You both explained really well how it works.
I'm going to try it immediatly.
Best regards
Stefano