How to import data from SQL Server into Vertica

Hi Guys,

Off the top of my head transferring data from SQL Server seems a far fetch for me. Please guide me on how to achieve this. I have nearly 1TB of data sitting in the SQL Server and i want to transfer the schemas, views and tables to Vertica. 

It would help if there is some documentation for doing this.

Thanks,
Anmol

Comments

  • Navin_CNavin_C Vertica Customer
    Hi Anmol,

    First of all , Is it a onetime load or a daily process.

    I find it as a one time load considering the amount of data you are planning to load.

    This can be achived two ways
    1. The data extracted from SQL server on windows and loaded to Vertica on linux -- easy way
    2. The data extracted from SQL server and directly loaded to Vertica on Linux -- hard way

    The best approach will be using the official ODBC driver / JDBC driver for SQL Server.

    Following are the steps to load one table form SQL server to vertica (using first option)
    1. Dump the complete catalog from SQL server for a table and change the data types that are not 
       compatible with Vertica in a sql file.
    2. Create the table in Vertica using this sql file.
    3. Dump/Export data from SQL Server table wise in a txt file delimited by '|' on windows.
    4. Now you can directly pick up these files for loading in Vertica using "COPY FROM LOCAL" option

    Following are the steps in brief to load one table from SQL server to Vertica ( this is for second option 
    1. Dump the complete catalog from SQL server for a table and change the data types that are not compatible with Vertica in a sql file.2. Create the table in Vertica using this sql file.
    3. Install / use official ODBC driver from SQL server on Linux and check the proper installation using SQL server utilities.
    4. Now when you have setup the ODBC driver on Linux, you need to install the ODBC loader and parser for Vertica.This is an external package for Vertica and supports different databases.
    5. Once installed, you can use the COPY command with ODBCLoader and ODBCparser components to load data directly into Vertica.

    Brief us more on your requirement and we can help you more precisely.

    Hope this helps

  • Hey Naveen,

    I have some 100 odd data tables in SQL server.
    I have to migrate them to vertica. Right now it is a one time load for data analysis purposes.
    And i m having a hard time doing it.
    1. Am i correct to say that for the copy from local command to work, i need to have a blank table created in vertica which has the same column names and data type properties as the table i want to import?
    2. I have exported some of the tables to CSV but it will be very tedious to manually create a table with 50 odd columns in vertica. How do i load these csv files into vertica?
    3. I Scripted the tables from sql server and i got the table structure in a .sql file but its not working well and throwing errors. How do i load the .sql file?
    4. I tried uploading the CSV in a flex table but i am unable to query the keys.
    Please Guide!

    Regards,
    Anmol
  • Navin_CNavin_C Vertica Customer
    Hi anmol,
    1. Am i correct to say that for the copy from local command to work, i need to have a blank table created in vertica which has the same column names and data type properties as the table i want to import?
    Yes, the table has to be present already in Vertica database, but its not mandatory to have the same column names as the data is going to come from Files, as long as each column adheres to its properties( length) it can intake data using COPY FROM LOCAL option.
    Regarding columns, Vertica COPY command gives the flexibility to you for specifying which columns you need to load. this way you can load the column you want from SQL server to Vertica.
    1. I have exported some of the tables to CSV but it will be very tedious to manually create a table with 50 odd columns in vertica. How do i load these csv files into vertica?
    If you don't want to load data into Columnar tables, you can use Flex tables to load CSV without having a proper/strict schema, this will make sure columns are created on the fly in flex tables. Loading CSV files to Flex tables can be done through a a parser called fdelimitedparser
    Refer this 
    Loading Flex tables
    1. I Scripted the tables from sql server and i got the table structure in a .sql file but its not working well and throwing errors. How do i load the .sql file?
    Check you sql file , there will be compatibility issues when it comes to data types between SQL server and Vertica
    For example :
    SQL server - nvarchar
    Vertica - varchar

    If you are facing any other issue other then this while executing the SQL, let us know on this.
    1. I tried uploading the CSV in a flex table but i am unable to query the keys.
    Please paste your example here, so that its easy to understand.

    Hope this helps


  • Thanks a ton for the info Naveen!!

    All the points check out. I was able to load data into the flex tables though i havent tried running the .sql files.

    One point i am stuck on:

    Flex tables compute both the keys and the views. Essentially that means it can divide the data into columns as determined by the parser. Correct?

    So saying i have loaded data into a flex table, queried the flex view for the columns i need. How do i get them into a non flex data table?

    Thanks a ton again!

    Anmol
  • Navin,

    Waiting for your feednack :)

    regards
  • Navin_CNavin_C Vertica Customer
    Hi Anmol,

    This can help you understand the flow better.

    How to load CSV files into Flex table

    Hope this helps.
  • Hey Navin,

    The blog post is supremely helpful. Great job putting it together. :)

    While reiterating my problem, You have mentioned that we can load data into vertica from SQL server.

    The extraction and loading of CSV files worked like a charm using the flex tables.

    My second problem. There is another environment where i have a SQL server and a vertica machine, both different machines. How do i use the drivers to fetch data into vertica from SQL server? I hope there should not be any extract or transform involved. Is it possible?

    Best Regards,
    Anmol
  • One more query on that,

    How does this whole sql server -> vertica scenario play out?

    #1: Install odbc drivers on linux and then the parser and loader components are used to fetch data from MSS to vertica

                                                              OR

    #2: Install microsoft connectivity pack on windows and then create a batch file to push data into vertica.

    Are both the assumptions true, considering that i will have to create a table manually in vertica in both scenarios.

    Regards,
    Anmol
  • Navin_CNavin_C Vertica Customer
    Hi Anmol,

    Yes both assumptions are true, besides the first method has an advantage over the second method,

    The advantage is after setting up the complete environment, there is no need of extract into a file and then load into vertica. the COPY command takes care of this process using SOURCE and PARSER.

    Regarding the "CREATE TABLE" , yes you have to do that manually on vertica.

    Hope this helps.

  • parveshparvesh Community Edition User

    Hi anmol, Have you abled to import table from sql server to vertica??

  • parveshparvesh Community Edition User

    Hi Navin, I need your help on same problem

  • mark_d_drakemark_d_drake Community Edition User
    edited June 2022

    Check out YADAMU. https://github.com/markddrake/YADAMU---Yet-Another-DAta-Migration-Utility

    It support bi-direction movement of data between Vertica and a number of other platforms including SQL Server

    Here's an example of migrating the AdventureWorksDW schema from SQL Server 2019 to Vertica 11

    (Both databases running in docker containers)

    C:\Development\YADAMU>bin\copy config=qa\YDB2\sampleJob#1.json encryption=false
    2022-06-15T05:12:57.829Z [INFO][READER][MSSQLSERVER][15][Manager]: Ready.
    2022-06-15T05:12:57.830Z [INFO][WRITER][Vertica][Vertica Analytic Database v11.1.0-0][DATA_ONLY][Manager]: Ready.
    2022-06-15T05:12:58.136Z [DDL][MSSQLSERVER]: Loaded metadata for 31 tables. Elapsed time: 00:00:00.260s.
    2022-06-15T05:12:58.232Z [DDL][Vertica][DatabaseLog]: LONG VARCHAR and LONG VARBINARY columns restricted to 16379832 bytes
    2022-06-15T05:12:58.245Z [INFO][PIPELINE][PARALLEL][4][MSSQLSERVER][Vertica]: Processing 31 Tables
    2022-06-15T05:12:58.246Z [DDL][Vertica]: Generated 31 "Create Table" statements and 31 DML statements. Elapsed time: 00:00:00.016s.
    2022-06-15T05:12:58.363Z [DDL][Vertica]: Executed 31 DDL operations. Elapsed time: 00:00:00.117s.
    2022-06-15T05:12:58.407Z [INFO][AdventureWorksDWBuildVersion][Copy]: Rows 1. Reader Elapsed Time: 00:00:00.001s. Throughput 898 rows/s. Writer Elapsed Time: 00:00:00.114s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 9 rows/s.
    2022-06-15T05:12:58.412Z [INFO][DimCurrency][Copy]: Rows 105. Reader Elapsed Time: 00:00:00.002s. Throughput 41327 rows/s. Writer Elapsed Time: 00:00:00.111s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 943 rows/s.
    2022-06-15T05:12:58.414Z [INFO][DimAccount][Copy]: Rows 99. Reader Elapsed Time: 00:00:00.004s. Throughput 22633 rows/s. Writer Elapsed Time: 00:00:00.114s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 865 rows/s.
    2022-06-15T05:13:00.120Z [INFO][DatabaseLog][Copy]: Rows 96. Reader Elapsed Time: 00:00:00.007s. Throughput 13013 rows/s. Writer Elapsed Time: 00:00:01.823s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 53 rows/s.
    2022-06-15T05:13:00.177Z [INFO][DimDepartmentGroup][Copy]: Rows 7. Reader Elapsed Time: 00:00:00.000s. Throughput 10710 rows/s. Writer Elapsed Time: 00:00:01.761s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 4 rows/s.
    2022-06-15T05:13:00.208Z [INFO][DimDate][Copy]: Rows 3652. Reader Elapsed Time: 00:00:00.097s. Throughput 37337 rows/s. Writer Elapsed Time: 00:00:01.794s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 2035 rows/s.
    2022-06-15T05:13:00.970Z [INFO][DimGeography][Copy]: Rows 655. Reader Elapsed Time: 00:00:00.020s. Throughput 32168 rows/s. Writer Elapsed Time: 00:00:00.787s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 831 rows/s.
    2022-06-15T05:13:00.997Z [INFO][DimOrganization][Copy]: Rows 14. Reader Elapsed Time: 00:00:00.000s. Throughput 34474 rows/s. Writer Elapsed Time: 00:00:00.788s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 18 rows/s.
    2022-06-15T05:13:01.185Z [INFO][DimProductCategory][Copy]: Rows 4. Reader Elapsed Time: 00:00:00.000s. Throughput 9816 rows/s. Writer Elapsed Time: 00:00:00.187s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 21 rows/s.
    2022-06-15T05:13:01.208Z [INFO][DimCustomer][Copy]: Rows 18484. Reader Elapsed Time: 00:00:01.695s. Throughput 10902 rows/s. Writer Elapsed Time: 00:00:02.797s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 6608 rows/s.
    2022-06-15T05:13:01.279Z [INFO][DimProductSubcategory][Copy]: Rows 37. Reader Elapsed Time: 00:00:00.000s. Throughput 42794 rows/s. Writer Elapsed Time: 00:00:00.091s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 406 rows/s.
    2022-06-15T05:13:01.314Z [INFO][DimPromotion][Copy]: Rows 16. Reader Elapsed Time: 00:00:00.000s. Throughput 25621 rows/s. Writer Elapsed Time: 00:00:00.105s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 151 rows/s.
    2022-06-15T05:13:01.388Z [INFO][DimSalesReason][Copy]: Rows 10. Reader Elapsed Time: 00:00:00.000s. Throughput 16964 rows/s. Writer Elapsed Time: 00:00:00.072s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 139 rows/s.
    2022-06-15T05:13:01.403Z [INFO][DimReseller][Copy]: Rows 701. Reader Elapsed Time: 00:00:00.025s. Throughput 27149 rows/s. Writer Elapsed Time: 00:00:00.123s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 5670 rows/s.
    2022-06-15T05:13:01.448Z [INFO][DimScenario][Copy]: Rows 3. Reader Elapsed Time: 00:00:00.000s. Throughput 5731 rows/s. Writer Elapsed Time: 00:00:00.043s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 69 rows/s.
    2022-06-15T05:13:08.382Z [INFO][DimEmployee][Copy]: Rows 296. Reader Elapsed Time: 00:00:00.093s. Throughput 3180 rows/s. Writer Elapsed Time: 00:00:08.260s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 36 rows/s.
    2022-06-15T05:13:12.781Z [INFO][FactCallCenter][Copy]: Rows 120. Reader Elapsed Time: 00:00:00.003s. Throughput 37818 rows/s. Writer Elapsed Time: 00:00:04.397s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 27 rows/s.
    2022-06-15T05:13:13.434Z [INFO][FactCurrencyRate][Copy]: Rows 14264. Reader Elapsed Time: 00:00:00.162s. Throughput 87820 rows/s. Writer Elapsed Time: 00:00:00.652s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 21867 rows/s.
    2022-06-15T05:13:14.025Z [INFO][FactAdditionalInternationalProductDescription][Copy]: Rows 15168. Reader Elapsed Time: 00:00:00.147s. Throughput 102954 rows/s. Writer Elapsed Time: 00:00:12.575s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 1206 rows/s.
    
  • mark_d_drakemark_d_drake Community Edition User

    Continued

    2022-06-15T05:13:14.026Z [INFO][DimSalesTerritory][Copy]: Rows 11. Reader Elapsed Time: 00:00:00.004s. Throughput 2476 rows/s. Writer Elapsed Time: 00:00:12.637s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 1 rows/s.
    2022-06-15T05:13:16.935Z [INFO][DimProduct][Copy]: Rows 606. Reader Elapsed Time: 00:00:00.132s. Throughput 4567 rows/s. Writer Elapsed Time: 00:00:15.962s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 38 rows/s.
    2022-06-15T05:13:19.616Z [INFO][FactFinance][Copy]: Rows 39409. Reader Elapsed Time: 00:00:00.410s. Throughput 96014 rows/s. Writer Elapsed Time: 00:00:06.180s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 6376 rows/s.
    2022-06-15T05:13:25.603Z [INFO][FactInternetSalesReason][Copy]: Rows 64515. Reader Elapsed Time: 00:00:05.020s. Throughput 12851 rows/s. Writer Elapsed Time: 00:00:11.573s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 5574 rows/s.
    2022-06-15T05:13:25.644Z [INFO][FactSalesQuota][Copy]: Rows 163. Reader Elapsed Time: 00:00:00.001s. Throughput 86362 rows/s. Writer Elapsed Time: 00:00:00.040s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 4069 rows/s.
    2022-06-15T05:13:27.404Z [INFO][FactSurveyResponse][Copy]: Rows 2727. Reader Elapsed Time: 00:00:00.034s. Throughput 78061 rows/s. Writer Elapsed Time: 00:00:01.758s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 1551 rows/s.
    2022-06-15T05:13:27.592Z [INFO][NewFactCurrencyRate][Copy]: Rows 50. Reader Elapsed Time: 00:00:00.001s. Throughput 31572 rows/s. Writer Elapsed Time: 00:00:00.185s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 269 rows/s.
    2022-06-15T05:13:29.447Z [INFO][ProspectiveBuyer][Copy]: Rows 2059. Reader Elapsed Time: 00:00:00.069s. Throughput 29793 rows/s. Writer Elapsed Time: 00:00:01.854s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 1110 rows/s.
    2022-06-15T05:13:30.450Z [INFO][FactInternetSales][Copy]: Rows 60398. Reader Elapsed Time: 00:00:05.738s. Throughput 10526 rows/s. Writer Elapsed Time: 00:00:16.424s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 3677 rows/s.
    2022-06-15T05:13:30.713Z [INFO][sysdiagrams][Copy]: Rows 9. Reader Elapsed Time: 00:00:00.002s. Throughput 3558 rows/s. Writer Elapsed Time: 00:00:01.263s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 7 rows/s.
    2022-06-15T05:13:32.060Z [INFO][FactResellerSales][Copy]: Rows 60855. Reader Elapsed Time: 00:00:06.184s. Throughput 9839 rows/s. Writer Elapsed Time: 00:00:12.442s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 4891 rows/s.
    2022-06-15T05:13:50.607Z [INFO][FactProductInventory][Copy]: Rows 776286. Reader Elapsed Time: 00:00:32.756s. Throughput 23699 rows/s. Writer Elapsed Time: 00:00:33.669s.  Idle Time: 00:00:00.000s. SQL Exection Time: 00:00:00.000s. Throughput: 23056 rows/s.
    2022-06-15T05:13:50.618Z [INFO][YADAMU][COPY]: Operation completed successfully. Elapsed time: 00:00:53.353.
    2022-06-15T05:13:50.619Z [INFO][YADAMU][COPY]: Operation complete. Source:["conn#1"://"undefined"."dbo"]. Target:["conn#2"://"ADVWKSDW"].
    2022-06-15T05:13:50.620Z [INFO][YADAMU][COPY]: Operation complete: Configuration:"qa\YDB2\sampleJob#1.json". Elapsed Time: 00:00:53.327s.
    

    The config file is shown below

    {
        "connections": {
            "conn#1": {
                "mssql": {
                    "user": "sa",
                    "server": "yadamu-db2",
                    "database": "AdventureWorksDW",
                    "password": "oracle#1",
                    "port": 1434,
                    "requestTimeout": 360000000,
                    "options": {
                        "encrypt": false,
                        "abortTransactionOnError": false
                    }
                }
            },
            "conn#2": {
                "vertica": {
                    "user": "dbadmin",
                    "host": "yadamu-db2",
                    "database": "VMart",
                    "password": "oracle",
                    "port": 5433
                }
            }
        },
        "schemas": {
            "from": {
                "owner": "dbo"
            },
            "to": {
                "schema": "ADVWKSDW"
            }
        },
        "jobs": [
            {
                "source": {
                    "connection": "conn#1",
                    "schema": "from"
                },
                "target": {
                    "connection": "conn#2",
                    "schema": "to"
                },
                "parameters": {
                    "REMOTE_STAGING_AREA": "/mnt/shared/stagingArea/db/vertica",
                    "LOCAL_STAGING_AREA": "Y:\\stagingArea\\db\\vertica",
                    "PARALLEL" : 4
                }
            }
        ]
    }
    
  • mark_d_drakemark_d_drake Community Edition User

    Data is read from SQL Server, and staged to CSV files that are then automatically loaded using Vertica COPY operations.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file