How to load data from a file where the column position in that file keeps on changing?
How to load data from a file where the column position in that file keeps on changing. How to map a column in a file to the correct column in the corresponding table?
For example:
Table: foo(col1,col2,col3,col4)
Now load the data from a file where column position in that file keeps on changing.
File1: col1,col2,col3,col4
File2: col2,col3,col4,col1
File3: col3,col4,col1,col2
and so on...
How to map correct column from file to table?
For example:
Table: foo(col1,col2,col3,col4)
Now load the data from a file where column position in that file keeps on changing.
File1: col1,col2,col3,col4
File2: col2,col3,col4,col1
File3: col3,col4,col1,col2
and so on...
How to map correct column from file to table?
0
Comments
Can we use one single COPY command for above 2 cases you mentioned?
The number of columns in data files that we have to load is fixed but not the columns order. We are looking for a way by which we can write a generic COPY command statement that we can use for all the combinations.
Like in case of your example can we somehow write one COPY command that solves both the scenarios you have explained.
We may have 100 datafiles to load, so for all those files we will have to write 100 copy command.
How do you know what order the columns are in each file? Do they have a header row or something?
If the columns can just be in any random order and there's no programmatic way to tell what order they are in, the problem becomes somewhat harder...
Adam
1. It's possible with some conditions with COPY statement only, but its ugly
2. You can write UDF parser https://my.vertica.com/docs/6.1.x/HTML/index.htm#17840.htm
3. To use in some decorator (script) that will process a file and will create a right COPY statement
Example (1)
There is no way by which we can check the ordering of the columns. As per my discussion, they told me that columns may or may not have headers. Even that is not fixed. Order of the columns will be random. The only thing fixed is the column number. I don't think Vertica Copy command support any of this requirement.
Thanks Daniel I will give it a try and will get back to you on this.
file1
col1|col2|col3|col4|col5|col61
a2|2000-01-01|a4|15|2000-01-022|b2|2000-01-02|b4|25|2000-01-02
file2
col2|col3|col4|col5|col6|col1
a2|2000-01-01|a4|15|2000-01-02|1b2|2000-01-02|b4|15|2000-01-02|2
So, if there are no headers, and the columns could be in any order, and they could be all the same data types and similar values, how can you tell the columns apart at all?
Could you give an example of another existing program that does what you're trying to do? Or describe how you would identify the columns by hand, if you don't know how you would do it?
Or are you just looking for a place to dump the data? More like file storage than a database. In that case you could always just create a table that just has VARCHAR columns (even one big VARCHAR column for the whole row), and not worry about the real data type up front. Your users could then write a SELECT statement with type casts, regular expressions, etc., to get the data into whatever form they really want.
Incidentally, Vertica 7's new (and recently-announced) FlexTables feature bundles some new COPY parsers, one of which will I believe will handle even this case, with as little information as you have about your data. Given how little information you have, though, what it will do will be basically equivalent to a bunch of VARCHAR columns. And as I'm sure Daniel will be quick to point out, Vertica 7 is brand-new; I assume you're not using it yet...
Adam
dbadmin@twingo:/tmp$ cat file_01.csv
1|a|2000-01-01
2|b|2000-01-02
Yes Adam this problem is somewhat confusing. We are creating every thing at run time from client application based on user inputs. We will get multiple files with a specific set of columns for which we already have created our facts and dimensions. But the problem is the data file which we are getting has random column order. It may have all INT or combination of all data types.
We have to restrict the structure of the file.
I have also worked on Crane as a part of a Optimizer testing team in my previous organization .
Regards,
Siddarth Kotnala