Issue with ExternalSource

Scenario 1  - remote file:

Remote server has a file with 15K rows

 

Shell script does ssh to this remote server and cat file to STDOUT

 

Vertica has an external table defined like (briefly, no details):

create external table ...
( columns according columns from remote file)
as copy with source public.ExternalSource(cmd='./shell_to_cat_file')

 

Now, if I run select * from this external table, I am getting random number of rows from the remote file. Like 267 rows, next run it can be like 464 rows, etc.. But never all set of rows (15K)

 

If I create external table to dump output from remote server to a file, like this:

create external table ...
( columns according columns from remote file)
as copy with source public.ExternalSource(cmd='./shell_to_cat_file > some_file.txt')

 and check how many rows were unloaded from remote file to the local, using above external table,  the file "some_file.txt" has all 15K rows from remote file.

 

Scenario 2 - local file:

If I have file local, not remote, 15K rows, and create external table like this:

 

create external table ...
( columns according columns from file)
as copy with source public.ExternalSource(cmd='cat some_file.txt')

 the resulty is also random, it outpusd not all 15K rows, it will never print all rows

 

If I create exterlanl table to read data without ExternalSource from the same local file:

create external table ...
( columns according columns from file)
as copy FROM 'some_file.txt';

 the result of select * is accurate, all 15K rows are printed

 

Question is  why ExternalSource works differently for above scenarious. This seems a bug to me.

Otherwise, what am I missing?

 

BTW: Vertica 7.1, 1 node, licensed

 

Thanks,

Oleg

 

Comments

  • If I create Python script to output file content, it works just fine:

     

    Python snippet:

    #!/usr/bin/python

     

    fname='/var/tmp/new.txt'

    with open(fname, 'r') as fin:

       print fin.read()

    table definition

     

    create external table ..

    ….

    AS copy

    with source ExternalSource(cmd='/var/tmp/print_file.py’)

    ;

     

    So I guess, it can be solved with such way around..

     

    ~ Oleg

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file