We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


How to add a new column to a table efficiently? — Vertica Forum

How to add a new column to a table efficiently?

yossibmyossibm Vertica Customer

I have a large table (~1B rows, ~50 columns) on a single node (no projections, no partition or segmentation), and data for a new column with the same exact keys and same order.
My use case is to repeatedly add such new columns.
After adding an empty column to the table, I need to write the column data.
I have tried to use Update directly and it took many hours.
I have tried to create a new temp table and then do a Merge. Although writing was efficient, the merge took hours and it also required to run a TupleMover mergeout before doing the next merge (without doing so, the next merge would be extremely slow).
I have tried to Join the temp table and the original table into a new table, but it was also very slow, especially after the original table grew wider.
Is there another more efficient solution - regarding that I have the data sorted and with the same primary keys? maybe a low level solution?
Thanks,
Yossi

Best Answer

Answers

  • marcothesanemarcothesane - Select Field - Administrator

    Hi @yossibm -
    Why don't you try something like:

    ALTER TABLE mytable ADD monthgap INTEGER DEFAULT MONTHS_BETWEEN(end_date,start_date);
    

    , instead of adding the new column first and updating it after?
    I have always regarded this possibility as one of the real perks of a columnar database ...
    Good luck
    Marco

  • yossibmyossibm Vertica Customer

    Hi,
    Thanks for the suggestion.
    Unfortunately I can't use something like that because the data is being generated by another process and cannot be calculated directly from existing columns.
    Also, I haven't found that syntax in the documentation.
    Kind regards,
    Yossi

  • yossibmyossibm Vertica Customer

    Hi,
    Thank you for that comprehensive solution!
    The example works for me, but I noticed that the original "helper" table is referenced by the "mickey" table.
    I have tried in the past to use Select with Join, to avoid the slow merge/update operation,
    but the Join caused the Select runtime to be very unpredictable: some queries even took hours, without any obvious reason.
    Can I trust that method to be more efficient / predictable than using Join ?
    Is it possible to copy the column content as to avoid the reference?
    Thanks and best regards,
    Yossi

  • marcothesanemarcothesane - Select Field - Administrator

    The Join is efficient when both tables are segmented and ordered by the join column.
    If the super-projection is not already designed in the right way (and you can check that by running: SELECT EXPORT_OBJECTS('','mickey',FALSE); ) , then create projections on both sides that support the join optimally:

    CREATE PROJECTION mickey_join_helper
    AS SELECT * FROM mickey
    ORDER BY id
    SEGMENTED BY HASH(id) ALL NODES;
    CREATE PROJECTION helper_join_mickey
    AS SELECT * FROM helper
    ORDER BY id
    SEGMENTED BY HASH(id) ALL NODES;
    

    You might also want to run SELECT ANALYZE_STATISTICS() on both tables before you add that column.

  • yossibmyossibm Vertica Customer

    After preliminary testing it looks like it is working without overhead.
    Thank you @marcothesane !

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file