Is there a way to capture the raw size of the payload carried by a copy command issued ?

karthikkarthik Vertica Customer

Hello,
We are sending huge data-set as part of a copy command to vertica. We want to know for every copy command that succeeded or failed, what is the raw size (in bytes) of the copy command ?

Answers

  • SergeBSergeB - Select Field - Employee
  • moshegmosheg Vertica Employee Administrator
    edited January 2020

    For example:
    with current_load as
    (select
    load_start,
    case when is_executing then ((DATEDIFF(second, load_start::TIMESTAMP, sysdate())) / 60)
    else (load_duration_ms / 1000 / 60)
    end as Load_so_far_Minutes,
    load_duration_ms,
    input_file_size_bytes,
    (input_file_size_bytes/1024^3)::numeric(9,3) as input_file_size_bytes_GB,
    unsorted_row_count,
    accepted_row_count,
    rejected_row_count,
    read_bytes,
    parse_complete_percent || '%' AS parse_complete_percent,
    sorted_row_count,
    sort_complete_percent || '%' AS sort_complete_percent
    from load_streams
    where load_start::timestamp > sysdate -2 and
    stream_name like 'MY_STREAM_NAME%'
    )
    SELECT
    TO_CHAR(read_bytes, '999,999,999,999,999') AS read_bytes,
    TO_CHAR(read_bytes / Load_so_far_Minutes, '999,999,999,999,999') AS read_bytes_per_minute,
    TO_CHAR(input_file_size_bytes, '999,999,999,999,999') AS input_file_size_bytes,
    input_file_size_bytes_GB,
    TO_CHAR(unsorted_row_count, '999,999,999,999,999') AS unsorted_row_count,
    TO_CHAR(accepted_row_count, '999,999,999,999,999') AS accepted_row_count,
    TO_CHAR(rejected_row_count, '999,999,999,999,999') AS rejected_row_count,
    parse_complete_percent,
    TO_CHAR(sorted_row_count, '999,999,999,999,999') AS sorted_row_count,
    TO_CHAR(sorted_row_count / Load_so_far_Minutes, '999,999,999,999,999') AS sorted_row_count_per_Minute,
    sort_complete_percent,
    Load_so_far_Minutes::numeric(9,1)
    FROM current_load
    order by load_duration_ms desc
    limit 100;

        -[ RECORD 1 ]---------------+---------------------
    read_bytes                                |      785,433,843,954
    read_bytes_per_minute       |       10,069,664,666
    input_file_size_bytes             |      785,433,843,954
    input_file_size_bytes_GB    | 731.492
    unsorted_row_count             |       10,000,000,000
    accepted_row_count            |       10,000,000,000
    rejected_row_count              |                    0
    parse_complete_percent    | 100%
    sorted_row_count                  |       10,000,000,000
    sorted_row_count_per_Minute |      128,205,128
    sort_complete_percent       | 100%
    Load_so_far_Minutes            | 78.0
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    I believe that the size in INPUT_FILE_SIZE_BYTES includes all of field delimeters and special characters (i.e. line feeds). Plus it shows the compressed file size of the file is compressed.

    If you want the raw size as Vertica defines it, I think you'd have to do something crazy like this?

    dbadmin=> CREATE TABLE some_table(c1 VARCHAR, c2 VARCHAR, c3 VARCHAR, c4 INT);
    CREATE TABLE
    
    dbadmin=> \! ls -l /home/dbadmin/test_data.txt.gz | awk '{print $5, $9}'
    68762071 /home/dbadmin/test_data.txt.gz
    
    dbadmin=> COPY some_table FROM '/home/dbadmin/test_data.txt.gz' GZIP DIRECT;
     Rows Loaded
    -------------
         4194304
    (1 row)
    
    dbadmin=> SELECT source_name, input_size FROM load_sources;
              source_name           | input_size
    --------------------------------+------------
     /home/dbadmin/test_data.txt.gz |   68762071
    (1 row)
    
    dbadmin=> CREATE LOCAL TEMP TABLE some_table_temp ON COMMIT PRESERVE ROWS AS SELECT * FROM some_table WHERE epoch = (SELECT MAX(epoch) FROM some_table);
    CREATE TABLE
    
    dbadmin=> SELECT AUDIT('some_table_temp', 0, 100);
       AUDIT
    -----------
     100197130
    (1 row)
    
    dbadmin=> DROP TABLE some_table_temp;
    DROP TABLE
    

    Or, in my example, this is much easier:

    dbadmin=> SELECT SUM(LENGTH(c1) + LENGTH(c2) + LENGTH(c3) + LENGTH(c4::VARCHAR)) raw_size FROM some_table WHERE epoch = (SELECT MAX(epoch) FROM some_table);
     raw_size
    -----------
     100197130
    (1 row)
    
  • karthikkarthik Vertica Customer

    @mosheg said:
    For example:
    with current_load as
    (select
    load_start,
    case when is_executing then ((DATEDIFF(second, load_start::TIMESTAMP, sysdate())) / 60)
    else (load_duration_ms / 1000 / 60)
    end as Load_so_far_Minutes,
    load_duration_ms,
    input_file_size_bytes,
    (input_file_size_bytes/1024^3)::numeric(9,3) as input_file_size_bytes_GB,
    unsorted_row_count,
    accepted_row_count,
    rejected_row_count,
    read_bytes,
    parse_complete_percent || '%' AS parse_complete_percent,
    sorted_row_count,
    sort_complete_percent || '%' AS sort_complete_percent
    from load_streams
    where load_start::timestamp > sysdate -2 and
    stream_name like 'MY_STREAM_NAME%'
    )
    SELECT
    TO_CHAR(read_bytes, '999,999,999,999,999') AS read_bytes,
    TO_CHAR(read_bytes / Load_so_far_Minutes, '999,999,999,999,999') AS read_bytes_per_minute,
    TO_CHAR(input_file_size_bytes, '999,999,999,999,999') AS input_file_size_bytes,
    input_file_size_bytes_GB,
    TO_CHAR(unsorted_row_count, '999,999,999,999,999') AS unsorted_row_count,
    TO_CHAR(accepted_row_count, '999,999,999,999,999') AS accepted_row_count,
    TO_CHAR(rejected_row_count, '999,999,999,999,999') AS rejected_row_count,
    parse_complete_percent,
    TO_CHAR(sorted_row_count, '999,999,999,999,999') AS sorted_row_count,
    TO_CHAR(sorted_row_count / Load_so_far_Minutes, '999,999,999,999,999') AS sorted_row_count_per_Minute,
    sort_complete_percent,
    Load_so_far_Minutes::numeric(9,1)
    FROM current_load
    order by load_duration_ms desc
    limit 100;

    -[ RECORD 1 ]---------------+---------------------
    read_bytes | 785,433,843,954
    read_bytes_per_minute | 10,069,664,666
    input_file_size_bytes | 785,433,843,954
    input_file_size_bytes_GB | 731.492
    unsorted_row_count | 10,000,000,000
    accepted_row_count | 10,000,000,000
    rejected_row_count | 0
    parse_complete_percent | 100%
    sorted_row_count | 10,000,000,000
    sorted_row_count_per_Minute | 128,205,128
    sort_complete_percent | 100%
    Load_so_far_Minutes | 78.0

    Thanks. I was looking for something equivalent in 8.x vertica.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file