Options

remove pruned columns dynamically using sdk

Hello!

There is an ODBCLoader example (https://github.com/vertica/ODBC-Loader) that shows how to avoid fetching unused columns from the source system. But you still have to pass them filled with nulls through the Vertica SDK. Turns out it is very time-consuming too. If you query a single column out of external table with 20 columns it can be 10 times slower than fetching a single column out of external table with only one column.

Can you advise me if there is a way to avoid passing nulls for pruned columns out of UDParser?

Comments

  • Options

    I saw there is a RLETuple type oid. Is it possible to use it to reduce the size of data stream returning from UDParser? A little help will be appritiated. Or maybe you know someone I can reach who knows the answer.

  • Options
    Bryan_HBryan_H Vertica Employee Administrator

    I'm not sure it's avoidable - the number of columns parsed has to match the table DDL, so if 20 columns are defined, 20 are expected, even if 19 are set to NULL. What appears to be the time-consuming step: the query to the remote database, or processing the data after it's retrieved? This might help us narrow down and optimize.

  • Options
    Bryan_HBryan_H Vertica Employee Administrator

    Also, if possible, open a support case, or file an issue with GitHub.

  • Options
    Bryan_HBryan_H Vertica Employee Administrator

    You could also try dblink, which implements ODBC connectivity as a UDTF and allows you to configure SQL specific to a query: https://github.com/vertica/dblink

  • Options
    phil2phil2
    edited November 2023

    Passing data to vertica takes more than a half time. I dont even fill the nulls for unused columns, as they are not being used in a query. So I do exactly the same work whenever there is a one or twenty columns (read one column from the source and fill it in the writer buffer for that column). Moreover, I see that the number of rows available in writer decreases while adding new columns in external table interface. As if there is a limited RAM buffer with rows * columns size. Maybe that is what makes it slow down.

    For now I came up with implementing RLE by myself and deduplicating rows instead of removing columns. 
    I'm looking for a general solution, and dblink with specific query would be too complicated for users. 

    Another thought is to use arrays. But for an array I cannot get the used columns. It is a complete array or nothing choice.

    I also asked a question here https://github.com/vertica/ODBC-Loader/issues/16

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file