Lock Nest Monster

A year ago, Paul Randal of SQLSkills conducted a survey in which he asked individuals what would be, in their opinion, the most troubling wait type to see on a SQL Server instance.  In short, Paul concluded at the time that LCK_M_IX would concern him the most.  You can read his discussion on the wait types here.  In reference to LCK_M_IX, he stated that “…the overall user experience is [his] primary concern, hence [his] biggest worry is the wait that could have the biggest effect on someone’s use of the system.”

I’d add the following two cents – in my experience, performance issues related to locking can be very troubling in that it can be more time-consuming to determine the source of the locking and, therefore, resolve the issue.  Contrast this with, for example,  seeing PAGELATCH_XX waits on allocation bitmaps in tempdb (i.e. 2:1:1, 2:1:2, 2:1:3, etc), whereby one could create additional tempdb data files and/or enable trace flag 1118 to reduce or eliminate waits on those bitmaps.  In short, the solution is readily known.

With locking however, the players involved – queries and objects accessed – must first be discovered.  If the scenario involves readers blocking writers, and vice versa, the problem can most often be alleviated by the DBA pushing a magic button (i.e. enabling read committed snapshot isolation – RCSI), and all is well.  However, what if the scenario is writers blocking writers?  The solution will most likely involve changes to application-related code, whether stored procedures or the application itself.  What’s that, you say?  Change…the code?!?  It just got personal!

I’m kidding, of course.  Mostly 🙂  This is one of the age-old DBA/Developer battles I often encounter, and neither side is willing to move.  I say (cue the Beatles), let’s come together and take some time to learn about locking and transaction nesting, so that we can prevent this situation from happening again.  With that in mind, let’s explore a simple example to learn how locks are acquired and held in SQL Server within the context of an explicit transaction that modifies data.

First, let’s create some tables and load some data.  Also, to you the reader, I beg your forgiveness.  I’m new to this blogging thing, so I apologize for the TSQL code not being colored-coded.  And for the lack of screenshots.  Hopefully, I can remedy this in the future.  Here we go…

use [tempdb]
go
if exists (select name
from sys.objects
where [type] = ‘U’ and name = ‘test’)
begin
drop table dbo.test
end
create table dbo.test
(col1 int,
col2 char(1)
)
if exists (select name
from sys.objects
where [type] = ‘U’ and name = ‘test2’)
begin
drop table dbo.test
end
create table dbo.test2
(col1 int,
col2 char(1)
)
insert into dbo.test values (1,’a’)
insert into dbo.test values (2,’b’)
insert into dbo.test values (3,’c’)
insert into dbo.test values (4,’d’)
insert into dbo.test values (5,’e’)
insert into dbo.test values (6,’f’)
insert into dbo.test values (7,’g’)
insert into dbo.test2 values (8,’h’)
insert into dbo.test2 values (9,’i’)
insert into dbo.test2 values (10,’j’)
insert into dbo.test2 values (11,’k’)
insert into dbo.test2 values (12,’l’)
insert into dbo.test2 values (13,’m’)

Now, let’s begin some small transactions and examine their locking footprint.  I’m using the following query (in a separate query window) to examine the locks being held.  The credit for the query goes to Jonathan Kehayias:

SELECT CASE
WHEN resource_type = ‘object’
THEN OBJECT_NAME(tr.resource_associated_entity_id)
ELSE OBJECT_NAME(sp.OBJECT_ID)
END AS [object_name],
tr.resource_type,
tr.resource_description,
tr.request_mode
FROM sys.dm_tran_locks tr
left join sys.partitions sp ON (sp.hobt_id = tr.resource_associated_entity_id)
inner join sys.indexes si on (si.OBJECT_ID = sp.OBJECT_ID AND si.index_id = sp.index_id)
WHERE resource_associated_entity_id > 0
AND resource_database_id = DB_ID()
ORDER BY request_session_id, resource_associated_entity_id
begin transaction
insert into dbo.test2
select col1, col2
from dbo.test
–in separate query window, check locks being held
–rollback transaction for later test
rollback transaction
begin transaction
update dbo.test
set col2 = ‘z’
where col1 > 5
–in separate query window, check locks being held
–rollback transaction for later test
rollback transaction
begin transaction
delete
from dbo.test2
where col1 <= 9
–in separate query window, check locks being held
–rollback transaction for later test
rollback transaction

Here are the locks held by each transaction.  Nothing unusual – an intent exclusive lock on the data page containing the rows, and then the finer-grained exclusive locks on each row (RID is the resource type because the table is a heap):

Transaction 1:

object_name resource_type resource_description request_mode
———– ————- ——————– ————
test2      RID          1:332:8            X
test2      PAGE       1:332               IX
test2      RID          1:332:9            X
test2      RID          1:332:10          X
test2      RID          1:332:11           X
test2      RID          1:332:12          X
test2      RID          1:332:6            X
test2      RID          1:332:7            X

Transaction 2:

object_name resource_type resource_description request_mode
———– ————- ——————– ————
test        RID          1:786:5              X
test        RID          1:786:6              X
test        PAGE       1:786                 IX

Transaction 3:

object_name resource_type resource_description request_mode
———– ————- ——————– ————
test2      RID          1:332:0             X
test2      PAGE       1:332                IX
test2      RID          1:332:1              X

Now, let’s suppose we execute the code below.  We place each DML operation within the context of an explicit transaction, and all of them are then nested inside one large transaction.  Our objective is to only incur the locks acquired for each DML operation individually, while logically grouping them so that in the event one fails, all fail.

begin transaction
begin transaction
begin try
insert into dbo.test2
select col1, col2
from dbo.test

commit transaction

end try
begin catch
select ERROR_MESSAGE()
if @@TRANCOUNT > 0
begin
rollback transaction
end
end catch
begin transaction
begin try
update dbo.test
set col2 = ‘z’
where col1 > 5
commit transaction
end try
begin catch
select ERROR_MESSAGE()
if @@TRANCOUNT > 0
begin
rollback transaction
end
end catch
begin transaction
begin try
delete
from dbo.test2
where col1 <= 9
commit transaction
end try
begin catch
select ERROR_MESSAGE()
if @@TRANCOUNT > 0
begin
rollback transaction
end
end catch

When it’s time to commit the outer transaction, what locks are being held?  Let’s check…

object_name resource_type resource_description request_mode
———– ————- ——————– ————
test        RID          1:786:5             X
test        RID          1:786:6             X
test        PAGE       1:786                IX
test2      RID          1:332:9            X
test2      RID          1:332:1             X
test2      RID          1:332:10          X
test2      RID          1:332:11           X
test2      RID          1:332:8            X
test2      RID          1:332:0            X
test2      PAGE       1:332                IX
test2      RID          1:332:7            X
test2      RID          1:332:12           X
test2      RID          1:332:6             X

Wait a minute (insert various expletives)!  Why are all the locks required for all the DML operations still being held?  Furthermore, what happens if the outer transaction is not committed or rolled back (and it does happen)?  People get angry.  Phones start blowing up.  Torches and pitchforks come out.  And hiding under your desk won’t save you.  Trust me, I’ve tried.

Enough with the dramatics, let’s go to the explanation.  In short, until the outer transaction is committed or rolled back, SQL Server still sees the transaction as open (@@TRANCOUNT will still equal 1).  Additionally, locks acquired for DML operations are held for the entirety of the transaction, until it commits or rolls back.  Therefore, SQL Server still holds the locks in order to maintain the ACID (atomic, consistent, isolated, and durable) properties of a transaction.  This is important to keep in mind, because you may think that the work at the beginning of the transaction is done and, therefore, the locks for that portion should be gone.  As such, you may be operating under the assumption that the locking footprint for your workload is very small, when in reality it is much larger.  Here are the key takeaways I’d like to point out:

1) Locks acquired for transactions that modify data are held until the transaction terminates via a commit or rollback.  This is in contrast to reading data (via the default READ COMMITTED isolation, without RCSI), where the shared locks (S) for reading the data are released as soon as a row is read.

2) Nesting DML operations within individual transactions, when contained in a larger transaction, will not reduce the number of locks held and, perhaps most importantly, the duration of the locks.

What can be done to remedy or avoid this problem?  Here are a few recommendations:

1) If the workload against the database is OLTP-heavy, keep DML transactions short and sweet.  The fewer the locks and the shorter amount of time they are held, the lesser the contention will be.

2) Make every effort to access tables and modify the requisite data in the same order each time.  This can be accomplished through the use of stored procedures, to ensure that the same process is repeated.  If tables are accessed in an ad-hoc fashion, they’ll be accessed and modified in a more random order, leading to more locking and contention.

3) Be cautious with the practice of nesting explicit transactions.  Leverage TRY/CATCH and XACT_ABORT to handle run-time errors with transactions, and always keep track of @@TRANCOUNT, and commit or roll back a transaction accordingly.

If business rules or application requirements dictate that all operations are dependent on one another, you have no choice but to contain them in one larger transaction.  Therefore, as is often the case, (cliche alert!) it depends.  I’m sorry to leave you with that.  However, I merely wanted to make you aware of how SQL Server handles locking within the context of explicit transactions that modify data.  Regardless of the use case, however, these are important things to keep in mind so that you, too, don’t encounter the lock nest monster.

Advertisements

Let’s start the insanity…

I’ve long considered blogging, but have been hesitant up to this point.  There are many within the SQL Server community (Paul Randal, Kimberly Tripp, Joe Sack, and Jonathan Kehayias, to name a few) that are far more knowledgeable than I.  Given this, I wondered if there was anything I could contribute?  The answer: probably not much in the way of new technical revelations but, hopefully, in the way of experience.

I’ve been a SQL Server DBA for roughly 10 years and, along the way, have many times been tasked with exonerating SQL Server when something goes wrong with an application.  Is it running slowly?  Database problem.  Logical drive disappeared from the server?  Database problem.  Server not pinging?  Database problem.  Supplying incorrect login credentials (i.e. bad password)?  Database problem.  While this can be frustrating at times, it can also produce positive results.  It can breed humility that, ideally, encourages a DBA to validate his or her assertions and assumptions, and instill a desire to help a customer that simply wants his or her problem fixed.

What follows will be posts based on my experiences as the Default Blame Assignment (DBA) for application problems encountered within two very large companies.  I can’t take credit for the acronym, as I saw Jonathan Kehayias mention it in one of his many informative blog posts.  I hope that these posts will help both DBAs and developers alike, who may be seeking answers to a particular problem.  Along the way, I will also strive to give credit to the individuals that have provided, and continue to provide, great help that makes me a better DBA.  Here we go…