remove pruned columns dynamically using sdk

phil2 · November 2023

Hello!

There is an ODBCLoader example (https://github.com/vertica/ODBC-Loader) that shows how to avoid fetching unused columns from the source system. But you still have to pass them filled with nulls through the Vertica SDK. Turns out it is very time-consuming too. If you query a single column out of external table with 20 columns it can be 10 times slower than fetching a single column out of external table with only one column.

Can you advise me if there is a way to avoid passing nulls for pruned columns out of UDParser?

phil2 · November 2023

I saw there is a RLETuple type oid. Is it possible to use it to reduce the size of data stream returning from UDParser? A little help will be appritiated. Or maybe you know someone I can reach who knows the answer.

Bryan_H · November 2023

I'm not sure it's avoidable - the number of columns parsed has to match the table DDL, so if 20 columns are defined, 20 are expected, even if 19 are set to NULL. What appears to be the time-consuming step: the query to the remote database, or processing the data after it's retrieved? This might help us narrow down and optimize.

Bryan_H · November 2023

Also, if possible, open a support case, or file an issue with GitHub.

Bryan_H · November 2023

You could also try dblink, which implements ODBC connectivity as a UDTF and allows you to configure SQL specific to a query: https://github.com/vertica/dblink

phil2 · November 2023

Passing data to vertica takes more than a half time. I dont even fill the nulls for unused columns, as they are not being used in a query. So I do exactly the same work whenever there is a one or twenty columns (read one column from the source and fill it in the writer buffer for that column). Moreover, I see that the number of rows available in writer decreases while adding new columns in external table interface. As if there is a limited RAM buffer with rows * columns size. Maybe that is what makes it slow down.

For now I came up with implementing RLE by myself and deduplicating rows instead of removing columns.
I'm looking for a general solution, and dblink with specific query would be too complicated for users.

Another thought is to use arrays. But for an array I cannot get the used columns. It is a complete array or nothing choice.

I also asked a question here https://github.com/vertica/ODBC-Loader/issues/16

We're Moving!

Create My New Community Account Now

remove pruned columns dynamically using sdk

Comments

Leave a Comment