Querying from a snapshot

Hi, I am trying to create a snapshot and query only on that snapshot. Even if there are further reloads of a table, truncates, updates or deletes to the table, I should get the same results as it was when I created the snapshot. I am not convinced that the below approach is what I should be taking as I was getting confusing results when I do operations between a) and b). a) SELECT DATABASE_SNAPSHOT('mysnapshot',true); b) at time 'timestamp1' select count(*) from schema.table1; Questions 1) Is there a better option than b) to query from 'mysnapshot'. 2) If b) is the right approach to take how will I find timestamp1 which is the timestamp at which the snapshot was created ? Thanks Zac

Comments

  • I don't think database_snapshot is what you want. Could you be more specific as to what you want to do? If you need to do several queries protected from changes made by other sessions, then a transaction sounds like a solution, possibly with serializable isolation (see: https://my.vertica.com/docs/CE/6.0.1/HTML/index.htm#2887.htm ). If this doesn't help, please explain your needs more.
  • George, Thanks for the response. Let me specify my need. I dont think that I am looking for a transaction. I need to query a table as of 6/27. I know that in the days following, there will be updates, deletes, inserts, reloads happening to this table. However I will need to get the query result as if no operations happened on this table after 6/27. Do you have a suggestion ? Zac.
  • Hi Zac, In Vertica, the sanpshot is referred differently. However, you can have the queries run at Historical epochs and expect the same result. Then only way to have the data as is to do CREATE TABLE NEW_TABLE AS (SELECT * from TARGET_TABLE) to have a snapshot of data. That's what I would do. Hope that helps. Thanks, Sashi
  • Hi Zac, If you want to use the "AT TIME" syntax, as it sounds like you had guessed, what you'll have to do is to tell Vertica to keep history for a longer time. DATABASE_SNAPSHOT() is actually a separate mechanism; it's used for backups, not for historical queries. The parameters that you'll want to set are documented here: https://my.vertica.com/docs/6.1.x/HTML/index.htm#13912.htm (You'll probably want to read some of the links off of that page to better understand what's going on. Each epoch, in essence, represents a snapshot of the state of the database at that point in time.) Please note, depending on your workload, it can get expensive in a hurry to keep history around for a long time, particularly if you have lots of UPDATEs and DELETEs as it means we can't get rid of the old data values. Adam
  • Adam, Thank you. What you provided is very close to what I need. The current parameters of AdvanceAHMInterval and HistoryRetentionEpochs are the default values. See below dbadmin=> SELECT GET_CONFIG_PARAMETER('AdvanceAHMInterval'); GET_CONFIG_PARAMETER ---------------------- 180 (1 row) dbadmin=> SELECT GET_CONFIG_PARAMETER('HistoryRetentionEpochs');; GET_CONFIG_PARAMETER ---------------------- -1 (1 row) So once I have my table(s) loaded and ready to take a snapshot (or epoch), would it be sufficient if I set the retaining epochs to 2 with a retention period of 10 days (864000 seconds) and then create a new epoch ? SELECT SET_CONFIG_PARAMETER('HistoryRetentionEpochs','2'); SELECT SET_CONFIG_PARAMETER('HistoryRetentionEpochs','864000'); SELECT MAKE_AHM_NOW(); /* Create a new epoch */ I can get the current epoch using SELECT GET_AHM_TIME(); How will I see the available epochs ? Zac.
  • Additionally if I truncate a table and reload it, I dont think at time 'timestamp1' select count(*) from schema.table1; will work. Can you confirm this ? Zac.
  • Hi Zac, Your first post is close but not quite right: MAKE_AHM_NOW() does not create a new epoch. Epochs aren't like snapshots; you don't explicitly create one. This might help: Pick a table that you load data into regularly, and do "SELECT epoch, * FROM [table];". Then look over the results. You should see that there are already epochs stored of this table. Lots of them, in fact. Vertica uses epochs sort of (but not exactly) like transaction ID's. Whenever you load or modify a bunch of data and commit the change, that's one epoch. (This is true whether or not you have any history-tracking enabled.) That's why there's no concept of explicitly "creating a new epoch" -- you're creating new epochs all the time, whether history tracking is enabled or not. Vertica needs this internally for various reasons. But usually it only keeps data values from old epochs (ie., the old values of data that was DELETE'd or UPDATE'd) for a short time; it tries to get rid of them relatively quickly to re-claim disk space. What you're doing with these settings is asking Vertica to keep the history around for longer, so that you can query it. So all you do is to set one of the HISTORY_RETENTION_* parameters, and start running historical queries. Vertica will keep track of all of your changes from that point forward; you can always query as of any moment within the history window. If all you want is one permanent snapshot, then Sashi is correct -- do a CREATE TABLE AS SELECT (...) to explicitly make/keep a copy of the data. For your second question, yes, you are correct, if you truncate a table, all history on the table is lost. (This is of course not true with CREATE TABLE AS SELECT -- the second table is unaffected by changes in the first.) Though we will start tracking history again as soon as you start loading data again. Adam
  • Adam, Thank you. Summarizing the discussion, 1) I can create a second table (or set of tables) and save the contents of the table(s) on which a snapshot should be taken. This is the only option that serves the purpose however it is not scalable after a certain point. 2) I can run historic queries by manipulating the HISTORY_RETENTION_* parameters. However TRUNCATE gets rid of all the historic data. 3) Database snapshots can be created but it is meant for backups and cannot be queried upon. I request a feature in future version, where a snapshot (or epoch) can be taken and historic queries be able to run off of it. Thank you. Zac.
  • Hi Zac, I'd say that that is mostly a correct summary. Though, what is your use case? I'm trying to understand when a snapshot would be more efficient than either of the two above mechanisms. Usually it is not. (Snapshots aren't magic; they're just a different user interface to the same set of problems.) Is your real concern that you want TRUNCATE (or some equivalent) to keep historic data? If you have few or no UPDATEs or DELETEs, then there is no additional cost to using epochs to tracking all past versions between your desired snapshot and the current database. (Vertica tracks all that information anyway.) If you have even a moderate number of UPDATEs or DELETEs, then any snapshot will rapidly become different from the original data, and will become just as expensive/unmaintainable to store as a full copy. (With more CPU overhead to maintain the differences, too.) It's true that you might only be modifying a specific subset of the data. But you can copy and work with tables at the partition level too, etc., so you can make that work as well without an explicit snapshotting mechanism. Regardless, I do see the advantage of your proposed feature from a usability perspective -- everything above is more complex to work with. I'd encourage you to post it to the "Ideas" section of the site. Adam
  • Adam, Thank you. I will post it to the 'Ideas' section. Zac

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file