How to load data type fixed width support thai language

edited December 2022 in General Discussion

sample data utf8 have thai language

9999Aเจ้าหน้าที่.........................................................................................AX999....XXXX20221214


table structure name ABC
A1 char(5)
A2 varchar(100)
A3 char(1)
A4 char(4)
A5 char(4)
A6 char(4)

A7 char(8)

when I use command copy load

COPY public.ABC ( A1,A2,A3,A4,A5,A6,A7) from local '/data/sampledata.TXT' FIXEDWIDTH colsizes(5,100,1,4,4,4,8) skip 0 rejected data as table public.ABC_RJCT ENFORCELENGTH;

Rows Loaded = 0

and rejected_reason from table reject public.ABC_RJCT
"Row is malformed. Including record terminator, expected size is 127 bytes. Row size is 149 bytes."

I try expand data type A2 is varchar(400) but failed.
reason is the same,
"Row is malformed. Including record terminator, expected size is 127 bytes. Row size is 149 bytes."

Where do I need to make additional settings or edit? to be able to load the data.

Best Answer

  • moshegmosheg Vertica Employee Administrator
    edited December 2022 Answer ✓

    It looks like the issue is that the data in the file is not the same size as the columns you have defined in the COPY command.
    To fix this, you will need to adjust the colsizes parameter to match the actual size of the data in the file.

    Try this:

    SELECT 'A1 char(4)' as DataType, OCTET_LENGTH('9999'), '9999' as Data
    UNION ALL
    SELECT 'A2 varchar(36)', OCTET_LENGTH('Aเจ้าหน้าที่ A'), 'Aเจ้าหน้าที่ A'
    UNION ALL
    SELECT 'A3 char(1)', OCTET_LENGTH('X'), 'X'
    UNION ALL
    SELECT 'A4 char(4)', OCTET_LENGTH('999 '), '999 '
    UNION ALL
    SELECT 'A5 char(4)', OCTET_LENGTH('XXXX'), 'XXXX'
    UNION ALL
    SELECT 'A6 char(4)', OCTET_LENGTH('2022'), '2022'
    UNION ALL
    SELECT 'A7 char(4)', OCTET_LENGTH('1214'), '1214';
    --     DataType    | OCTET_LENGTH |    Data
    -- ----------------+--------------+------------
    --  A1 char(4)     |            4 | 9999
    --  A2 varchar(36) |           36 | Aเจ้าหน้าที่ A
    --  A3 char(1)     |            1 | X
    --  A4 char(4)     |            4 | 999
    --  A5 char(4)     |            4 | XXXX
    --  A6 char(4)     |            4 | 2022
    --  A7 char(4)     |            4 | 1214
    -- (7 rows)
    
    drop table if exists ABC cascade;
    create table ABC (
    A1 char(4),
    A2 varchar(36),
    A3 char(1),
    A4 char(4),
    A5 char(4),
    A6 char(4),
    A7 char(4));
    
    COPY ABC( A1,A2,A3,A4,A5,A6,A7)
    from STDIN
    FIXEDWIDTH colsizes(4,36,1,4,4,4,4)
    ENFORCELENGTH
    ABORT ON ERROR;
    9999Aเจ้าหน้าที่ AX999 XXXX20221214
    1234Aเจ้าหน้าที่ A11234123412341234
    \.
    
    SELECT * FROM ABC;
    --   A1  |     A2     | A3 |  A4  |  A5  |  A6  |  A7
    -- ------+------------+----+------+------+------+------
    --  1234 | Aเจ้าหน้าที่ A | 1  | 1234 | 1234 | 1234 | 1234
    --  9999 | Aเจ้าหน้าที่ A | X  | 999  | XXXX | 2022 | 1214
    -- (2 rows)
    

    Another option would be to try using a delimited format for the data instead of fixed width. This would allow you to specify the data type for each column without needing to worry about the exact size of the data in the file. For example, you could use the following COPY command to load the data using a delimited format:

    COPY public.ABC ( A1,A2,A3,A4,A5,A6,A7)
    FROM local '/data/sampledata.TXT'
    DELIMITER ','
    REJECTED DATA AS TABLE public.ABC_RJCT
    ENFORCELENGTH;

    In this case, you would need to make sure that the data in the file is properly delimited using commas (or whatever delimiter you choose). You would also need to adjust the data types for each column to match the data in the file.

    I hope this helps!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file