Why Vertica not showing all data in the table after bulk load from SSIS?

I was loading 200 millions of rows from SQL Server to Vertica table using SSIS. I started running the SSIS package and left it run in the night. Morning I saw SSIS package showing all the 200 million rows exported successfully (no errors shown in SSIS package execution results). But when I query the table in Vertica, it shows only 38 million rows.

Can anybody tell me what is the issue here?

Comments

  •  Take a look at the rejected data !
    Do you have any "exceptions filepath" and "rejected filepath" declared in your copy command string ? 
    • If you don't ! by default the rejected and exceptions are located in the \path\..\node_catalog\CopyErrorLogs
    • You can also check the rejected and accepted data using get_num_accepted_rows() and get_num_rejected_rows() functions. 
    • Also check the load_streams table:
    SELECT stream_name,         table_name,         load_start,         accepted_row_count,         rejected_row_count,         read_bytes,         unsorted_row_count,         sorted_row_count,         sort_complete_percent    FROM load_streams;
  • Thank you for your quick response...

    I did not use COPY command, as mentioned in my post I used SSIS to move data from SQL Server to Vertica database.

    As I have not used the COPY command, I did not expect any results in the first two points you mentioned in your response, but still I verified CopyErrorLogs folder which is empty and the two functions get_num_accepted_rows() and get_num_rejected_rows() returned "No single-source bulk loads have been executed in this session"

    But when I checked the load_streams table it has rows with table name I imported to Vertica database.

    Please let me know if you require any more information.
  • yehudityehudit Community Edition User

    Hello, I ran into the same issue
    Did you find a solution?

  • moshegmosheg Vertica Employee Administrator
    1. First, SELECT max(your_unique_row_id_key) FROM the data which was successfully loaded to Vertica
      Then, choose few rows with a higher unique row id value from your data input and try to load only those specific rows.
      To debug the reject reason, load it with Vertica COPY command which includes 'ABORT ON ERROR'

    2. In the following example, change the WHERE clause for your specific use case to monitor the load.

    with current_load as
    (select
     load_start,
     case when is_executing then ((DATEDIFF(second, load_start::TIMESTAMP, sysdate())) / 60)
          else (load_duration_ms / 1000 / 60)
     end as Load_so_far_Minutes,
     load_duration_ms,
     input_file_size_bytes,
     (input_file_size_bytes/1024^3)::numeric(9,3) as input_file_size_bytes_GB,
     unsorted_row_count,
     accepted_row_count,
     rejected_row_count,
     read_bytes,
     parse_complete_percent || '%' AS parse_complete_percent,
     sorted_row_count,
     sort_complete_percent || '%' AS sort_complete_percent
     from load_streams
     where load_start::timestamp > sysdate -2 and
           stream_name like 'MY_STREAM_NAME%'
    )
    SELECT
     TO_CHAR(read_bytes, '999,999,999,999,999') AS read_bytes,
     TO_CHAR(read_bytes / Load_so_far_Minutes, '999,999,999,999,999') AS read_bytes_per_minute,
     TO_CHAR(input_file_size_bytes, '999,999,999,999,999') AS input_file_size_bytes,
     input_file_size_bytes_GB,
     TO_CHAR(unsorted_row_count, '999,999,999,999,999') AS unsorted_row_count,
     TO_CHAR(accepted_row_count, '999,999,999,999,999') AS accepted_row_count,
     TO_CHAR(rejected_row_count, '999,999,999,999,999') AS rejected_row_count,
     parse_complete_percent,
     TO_CHAR(sorted_row_count, '999,999,999,999,999') AS sorted_row_count,
     TO_CHAR(sorted_row_count / Load_so_far_Minutes, '999,999,999,999,999') AS sorted_row_count_per_Minute,
     sort_complete_percent,
     Load_so_far_Minutes::numeric(9,1)
     FROM current_load
     order by load_duration_ms desc
     limit 100;
    
  • VivJainVivJain Vertica Employee Employee

    Hello,

    I tested the load of 200 million rows of data from source (SQL Server) to target (Vertica) and I could not find any issues in the load process. When I queried Vertica database, it showed me the exact no of rows (i.e. 203,241,461) being loaded from SQL Server to Vertica. Herewith, I have attached a document highlighting the configurations, settings and versions that I used wherein I did not face any issue.

    PFA : DataLoad_SQLServerToVertica.docx

    Hope this helps!

    Regards/Vivek

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file