there is a folder called pgsql_tmp in the catalog/tmp folder. I would like purge old files from this folder and I would like to know if its safe to do so.
Interesting -- usually that directory contains small files and is automatically purged.
It is not safe to purge this directory while the database is running. I suspect it's ok to purge it once the node in question has been shut down, but I'm not certain because I would expect it to be self-cleaning in most scenarios so am not immediately sure why the files are still there.
If you're seeing lots of data accumulate here, that's certainly interesting. What version of Vertica are you running? What sorts of files (filenames, etc) are you seeing in that directory?
I am seeing files with names like pgsql_tmp5148.101191. Seems like this folder has never been purged since I am seeing data since November of last year. It has about 13GB worth of data in there. If shutting down the DB is all that is required to purge this folder then I can get that done. We are running version 6.0.1. We only have issues on the first node.
<code> EE_SPILLGROUP_BY_SPILLED▒▒▒DBD_ON_QUERYINSERT INTO STAGE_FACT_INVENTORY_ITEMS SELECT TO_TIMESTAMP( UPDATED_TIMESTAMP,'YYYY-MM-DD HH24:MI:SS') SYSTEM_TIMESTAMP, TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP,1,4)) * 10000 + TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP,6,2)) * 100 + TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP,9,2)) DATE_ID, TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP, 12,2) ) HOUR_ID, USER_ID, A.GAME_ID, B.INVENTORY_ITEM_ID, A.GAMEDB_ITEM_ID, A.ITEM_TYPE_ID, A.QTY FROM ( SELECT * FROM TMP_INVENTORY_ITEMS WHERE ITEM_TYPE_ID = 7 OR (USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID, QTY) NOT IN ( SELECT USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID, QTY FROM FACT_INVENTORY_ITEMS WHERE (UPDATED_TIMESTAMP, USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID) IN (SELECT MAX(UPDATED_TIMESTAMP), USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID FROM FACT_INVENTORY_ITEMS GROUP BY USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID ) ) ) A LEFT JOIN DIM_INVENTORY_ITEMS B ON A.GAMEDB_ITEM_ID = B.GAMEDB_ITEM_ID AND A.ITEM_TYPE_ID = B.ITEM_TYPE_ID AND A.GAME_ID = B.GAME_ID+consider query-specific design on querHIGH▒▒▒ </code>
Actually, I should correct myself. I am seeing data since November 2012 until December 2013. Then there is no data until April 9th and then I am seeing files accumulate again. Is it safe for me to delete files from last year without stopping the DB?
Hm... Looking around some more: Apparently there's a known issue about this in Vertica 6.0; certain types of statements will leave temp files behind. This was fixed in Vertica 6.1SP1.
If the statement corresponding to a particular temp file has completed, you can delete the corresponding file. I assume you have no long-running queries that started in December 2013 and are still going? In that case, it should be safe to delete those files.
It's not a common / well-tested operation. If you want to be conservative, take the DB down anyway; or do one node at a time, in which case the worst case (as long as your database has K-safety of at least 1) would be that the one node goes down and has to recover from its buddy.
If you're particularly concerned about this, and you have an Enterprise license, you're welcome to file a Support case at http://my.vertica.com/ . They can give you a more-specific statement about exactly what is safe/supported/etc.
Comments
Interesting -- usually that directory contains small files and is automatically purged.
It is not safe to purge this directory while the database is running. I suspect it's ok to purge it once the node in question has been shut down, but I'm not certain because I would expect it to be self-cleaning in most scenarios so am not immediately sure why the files are still there.
If you're seeing lots of data accumulate here, that's certainly interesting. What version of Vertica are you running? What sorts of files (filenames, etc) are you seeing in that directory?
Thanks,
Adam
<code> EE_SPILLGROUP_BY_SPILLED▒▒▒DBD_ON_QUERYINSERT INTO STAGE_FACT_INVENTORY_ITEMS SELECT TO_TIMESTAMP( UPDATED_TIMESTAMP,'YYYY-MM-DD HH24:MI:SS') SYSTEM_TIMESTAMP, TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP,1,4)) * 10000 + TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP,6,2)) * 100 + TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP,9,2)) DATE_ID, TO_NUMBER( SUBSTR( UPDATED_TIMESTAMP, 12,2) ) HOUR_ID, USER_ID, A.GAME_ID, B.INVENTORY_ITEM_ID, A.GAMEDB_ITEM_ID, A.ITEM_TYPE_ID, A.QTY FROM ( SELECT * FROM TMP_INVENTORY_ITEMS WHERE ITEM_TYPE_ID = 7 OR (USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID, QTY) NOT IN ( SELECT USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID, QTY FROM FACT_INVENTORY_ITEMS WHERE (UPDATED_TIMESTAMP, USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID) IN (SELECT MAX(UPDATED_TIMESTAMP), USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID FROM FACT_INVENTORY_ITEMS GROUP BY USER_ID, GAME_ID, GAMEDB_ITEM_ID, ITEM_TYPE_ID ) ) ) A LEFT JOIN DIM_INVENTORY_ITEMS B ON A.GAMEDB_ITEM_ID = B.GAMEDB_ITEM_ID AND A.ITEM_TYPE_ID = B.ITEM_TYPE_ID AND A.GAME_ID = B.GAME_ID+consider query-specific design on querHIGH▒▒▒
</code>
Hm... Looking around some more: Apparently there's a known issue about this in Vertica 6.0; certain types of statements will leave temp files behind. This was fixed in Vertica 6.1SP1.
If the statement corresponding to a particular temp file has completed, you can delete the corresponding file. I assume you have no long-running queries that started in December 2013 and are still going? In that case, it should be safe to delete those files.
Thanks,
Adam
It's not a common / well-tested operation. If you want to be conservative, take the DB down anyway; or do one node at a time, in which case the worst case (as long as your database has K-safety of at least 1) would be that the one node goes down and has to recover from its buddy.
If you're particularly concerned about this, and you have an Enterprise license, you're welcome to file a Support case at http://my.vertica.com/ . They can give you a more-specific statement about exactly what is safe/supported/etc.