Options

Mainframe file to Vertica Table Load

Hi All, I have a requirement for loading mainframe files into vertica tables. Currently vertica doesnt support connectivity to mainframes and we are pulling the files from Mainframe server to Linux and loadind to Vertica tables using COPY command. What i am looking at is , is there any UDLs present in Vertica to pull data directly from Mainframe files and to load into vertica tables by avoiding data transfer between Mainframe and Linux. Challenge what we are facing here is the files in Mainframes are huge and it take enormous amount of time and space to moving data from Mainframe to Linux and then load from Linux to Vertica using COPY. Any suggestions/idea would be helpful to me.

Comments

  • Options
    Hi Lakshman, I'm not personally familiar with this scenario; hopefully others will speak up. It sounds, though, like Vertica is able to load the file format; the problem is just getting the data over the network without making a copy of it? The ideal solution would be if you could run vsql on the mainframe computer with the data. Then you can use our COPY LOCAL functionality to load it directly. We may not support vsql on that architecture, though. If that doesn't work, you could stream the data over ssh. For example, if the data is located in /data/file.dat, you could do (from a Vertica host) ssh mainframe 'cat /data/file.dat' | vsql -c 'COPY mytable FROM STDIN;' If you don't have access to ssh or if ssh is too slow, you could look into netcat (assuming that your network is secure), or any other tool that piping bytes over the network. Also, note that you can run any command inside the ssh session that will dump the data to stdout. Vertica can load from named pipes as well, so there are many clever things that you can do to try to stream the data from one machine to the other without writing it to disk in between. Also, if there's a program that you can run on the Vertica host that will fetch the data from the mainframe and print it to stdout, you may want to look into the open-source ExternalSource UDL, part of the shell_load_package: https://github.com/vertica/Vertica-Extension-Packages Note that this example is open-source; you are welcome to modify it to get your own UDL that does what you need it to do. If you do, you are of course more than welcome to share it on the github repository so that the next person with this problem can find it :-)
  • Options
    Hi Adam, Thanks for your quick reply. We have already implemented Named pipes method for pulling data from the Teradata tables and loading into Vertica tables using COPY command and vice versa. This works fine with good performance throughput. Since TD client compatibility is there in Mainframes they were able to load the data from Mainframe files whereas in Vertica since vsql cant be installed on Mainframes we are getting the data to Linux and from there we were loading. File size would be approximately 20-50GB (atleast 20 files) on day-day basis and moving this around servers will impact time and space. Anyways i will have a look at the external Source UDL as you have mentioned in your reply. Many Thanks Lakshman R

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file