Exclusive locks on deletes/updates

Hi , Can you elaborate why vertica had to take exclusive locks when delete /update take place . giving the fact that vertica create delete containers to for keeping the the pointers to the old data in the data file and data files are never being modify. Thanks

Comments

  • Hi Eli, Well, what if one query DELETEs a row; then another comes along and UPDATEs the same row before the first has committed? For that matter, what if query 1 deletes a record in table A and updates a record in table B; then query 2 updates that record in table A and deletes that record in table B? Each within a transaction. Basically, one way or another, the queries end up in a state that's impossible in a consistent system. PostgreSQL actually has a nice generic discussion of the issues that come into play here, in their documentation: http://www.postgresql.org/docs/9.1/static/transaction-iso.html (Best read alongside our own locking docs.) The locking system in many OLTP databases is somewhat different than ours; they get much better throughput on transactions that modify data, but in exchange they spend many more system resources just maintaining their locking data structures. As always, lots of design trade-offs and lots of room for cleverness and new ideas. Adam
  • Hi Adam , I understand that Vertica is not an OLTP database and I am very familiar how locks are take place on other database vendors ( as they publish their high level algorithm) , however giving the fact that data files are not being change , locking the entire table is weird and not reasonable (some databases take lock on block levels on some cases). For that is very important to know the concept behind that , the documentation just mention that fact that exclusive locks is take . if I can see it on one of the c-store stuff , please try to reference . Thanks
  • Hi Eli, Hm... So, unfortunately, a lot of the reasoning here has not been published (neither by us nor by other vendors), so there's a limit to how much internal implementation detail I can go into in a public forum post. I can talk about one big reason, though: Doing lots of DELETEs and UPDATES in Vertica is terrible for performance. And I don't mean because of locking issues. As you just correctly noted, each DELETE and UPDATE statement adds a new delete vector. Every time we query a table, we have to check every delete vector to figure out which records no longer exist from the perspective of that transaction. When a node goes down, we have to process DV's to figure out what records that node missed, etc; some of the computations there can become perhaps counterintuitively expensive. And we have to integrate DV's into ROS containers over time; this imposes a steady-state load on your system, since removing a bunch of records from a sorted, compressed file means rewriting the whole file. So, yes, you will see lock contention; but for most users that's more of an early-warning sign; in our experience the follow-on effects as DV's accumulate tend to be much more problematic for users. Rather than using DELETE and UPDATE, we strongly recommend using our partition-management tools, temporary tables and creative INSERT ... SELECT statements, etc., as discussed in other posts. Also, regarding the locking models of other databases, careful -- they tell you what you need to know to maintain the system, not necessarily to correctly understand the "why" behind it. For example, why does it matter that we use multiple files (taking advantage of the filesystem) and other databases use single big files? What is a filesystem, anyway, other than a big fixed-size file with a clever index structure? I assure you, at least some of the big database systems are able to do more than they discuss in their documentation :-) On that note: It's not about physical conflicts between blocks or files. We and they can both get around that easily, multiple files or not. (Some systems provide that level of robustness entirely without transactional locks, in fact.) It's about correctly implementing the desired semantics regarding concurrent interleaved transactions. Broadly speaking (and definitely overgeneralizing), there's a tradeoff: The more CPU cycles you're willing to spend untangling transactions and determining when they're logically independent, the more concurrency you can get. This is what block-level locking is really for -- a block is a more-granular logical unit than a table. Because (among other reasons) having enough of an UPDATE or DELETE Vertica workload that concurrency matters is sure to get you into trouble in other ways sooner rather than later, we've come down on the "save those CPU cycles, run queries faster" side of the equation. If you want some slightly-more-specific info, we do discuss locking briefly here, though I don't think we answer your particular questions: http://vldb.org/pvldb/vol5/p1790_andrewlamb_vldb2012.pdf If this is all old news to you, and you think you could do better, well then maybe you're right. Vertica is a solid production-ready platform, but it's not finished; there's lots more to do, lots more to research, many more future versions to come. (This is true for every DB platform out there; the databases world is an exciting place to be right now.) If you think you can do it, we are hiring :-) Adam
  • Hi Adam The fact that delete vector impact query’s performance is known , please advise the link to exclusive locks ? You mention that DV integration into ROS is expensive , megrout will do probably the same activity’s for regulate records (New insert records) that need to be merge to an a existing compress file , please advise the link to exclusive locks on deletes ? I read the attach pdf several time , its good article , but the locking covering is very minimal . Vertica is one of the grates DB's i work with , and i work with database many years , i am working for SW company and i know and understand why company’s do not publish their internals , however , I do think that publish the locking model (especially exclusive locks ) is something that the community as to know more deeply . Anyway thanks for doing your best on the circumstances , as usual I think your answers are very useful despite I don’t relay get answers . Thanks Eli
  • Hi Eli, I'm glad you appreciate the answers! Regarding exclusive lock on delete and DV's: Locking is really an implementation detail. (Though maybe an important implementation detail.) What really matters to most people is performance. In order to see a significant performance impact due to lock contention, even on an exclusive lock, your system must be spending a lot of time doing UPDATEs or DELETES. If your system is spending a lot of time doing UPDATEs or DELETEs, it must be creating a large amount of delete-vector data as well. In short, if you're doing enough UPDATEs/DELETES that the resulting lock contention matters to you, you must also have a lot of delete vectors. In our experience, the net slowdown due to all these delete vectors is typically greater than the slowdown due to lock contention. (Of course, this depends on the exact workload.) So, say we were to use a perfect locking solution; one with zero overhead that totally eliminated all exclusive locks on UPDATE/DELETE. Even if we had such a solution, it still wouldn't improve overall UPDATE or DELETE performance that much for a lot of our users, because the real slowdown is all the delete vectors being created by this workload. As evidence: Look at our operations to drop/move/etc partitions, to create temporary tables, etc. Many of these also take exclusive locks. Yet we still encourage customers to use them, and customers are typically much happier with them than with UPDATE/DELETE. Why? Various reasons, including no delete vectors. Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file