PURGE_TABLE showed 1TB decrease in table size, but no impact on Disk Storage

Hi all,

Recently we had to perform UPDATE on one of the biggest columns of the biggest table in Vertica.

At the same day we saw a huge increase in disk space usage. Even though we saw that Tuple Mover performed Mergout on the above mentioned table, we manually ran 

SELECT MAKE_AHM_NOW();

And then

SELECT PURGE_TABLE('schema.table');

We saw that table size decreased by almost 1TB (using v_monitor.projection_storage table), though Disk Space remained absolutely the same and keeps showing the same numbers for several hours already (after PURGE_TABLE finished) (using disk_storage, v_monitor.host_resources and v_monitor.storage_usage).

 

Is there any time required to wait for the storage to really free the space?

Should we run/do something else to speed up the process?

 

Thanks

 

(We are currently using Vertica Analytic Database v7.2.3-0)

Comments

  • In Vertica an Update is a delete + Insert so if there was a whole table Update it is like writing the whole table. However, to investigate what is happening, 1- Check if the DV exist in the table's projections ( table delete_vectors ) 2- If there still delete vectors it may be because a. the deletes are after the AHM b. the number DV in the ros containers may not be more than 20% of the rows in the file
  • Sorry, I press enter by mistake and the post was saved before I completed. So if you have too many delete vectors, I recommend you to read the best practices doc in https://community.dev.hpe.com/t5/Vertica-Knowledge-Base/Best-Practices-for-Deleting-Data/ta-p/233531#_Toc434417129 Section Manage delete vectors. if you don't have deletes vectors but you still see the storage used when doing df, you need to wait. When vertica remove files, it marks them to be deleted and there is a process in the background that deletes the physical file, so it may take some time. Hope this helps, Eugenia
  • Thank you, Eugenia for your reply.

     

    I checked delete_vectors table - nothing is there for any projection of the table I was talking about.

    We moved AHM mark to be the latest possible (now) and then performed PURGE_TABLE manually on the table. So this is why I guess there is no delete_vectors. I can see that table decreased in size after PURGE_TABLE finished to run (by about 1TB), though Disk Storage kept being the same.

     

    Any estimations on how long it might take for Vertica to really delete files?

    Thanks

  • Please see my previous comment, seems that we reply together.
  • Yes, we replied together.

    Do you know how long it might take for Vertica to really delete files?

     

    Thanks a lot,

    Elena

  • It should depend on your IO, do you see changes when you do df?
  • Not really. And it's been several hours after PURGE_TABLE finished.

    Anything else we can check to be sure that Vertica really deletes these files?

     

    Elena

  • Quick update on our current state.

    We saw some delete_vectors in some other tables. Though it wasn't a huge amount of space they took, we tried to run SELECT PURGE(); and SELECT PURGE_TABLE (schema.table); for specific tables, but both failed several times with error of server closed session unexpectedly. 

    Disk Space continued to grow (like crazy). Left us with 50GB (0%) by now.

     

    Any thoughts on helpful steps will be highly appreciated.

    Thank you,

    Elena

  • We added another 1.5TB on each node (we have 4 nodes in total).

    Tried to run SELECT PURGE(); It failed again with the same error.

  • Do you create database backups?  If you do, check to make sure that there aren't any stranded snapshots in the Snapshots directory under both your _catalog and _data directories.  This would prevent the data files from actually being removed.

     

      --Sharon

     

     

  • Hi Sharon,

    My name is Yarden, a coleague of Elena.

     

    We don't see any snapshot in the snapshots table. and there's a single(and empty) Snapshots directory under the 'v_DB_NAME_node0001_catalog' directory.

     

    Just to make sure, our data location is '/data/vertica/DB_NAME' and we use hard-link backup which are stored in '/data/DB_NAME_backup'. can this be an issue?

     

    Thank you in advance,

    Yarden

  • Yes - the hard-link backups are your issue.  Hard-link backups create a link to all files, which prevents the files from being removed until the backup is also removed.  So also operations like mergeout will create new data files but the original data files won't be removed from the disk - like what you are seeing where purge doesn't have any effect on disk usage.

     

    Hard-link backups are intended for short-term backups, or as a first step before copying the backup to an external location.

     

      --Sharon

     

  • Sharon,

    This seems to be of a good direction.

     

    I've removed the hard-link backup dir from one node and the space has been released (probably by the OS).

     

    Thank you so much :)

     

    Elena and Yarden

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file