Transactions 101

Wednesday May 30th 2001 by Michael Trachtenberg
Share:

A transaction is the smallest unit of work SQL Server will do. Learn about these basic DB building blocks and how to manage them in this quick course in transaction fundamentals.

Introduction

SQL Server uses many mechanisms to ensure the integrity of its databases. One of these mechanisms is the concept of a transaction.

A transaction is the smallest unit of work SQL Server will do. One transaction can involve a single operation on one row of one table, or it can involve hundreds of operations on multiple tables in multiple databases on multiple servers. The developer usually controls when the transaction begins and ends. The server makes sure that either the entire transaction runs to completion or instead rolls it back to the point where it originally began. There's no middle ground.

All T-SQL statements must run within a transaction; there are no provisions for doing otherwise.

A unit of work is a transaction if it meets the ACID test. ACID is an acronym for atomic, consistent, isolated, and durable.

Atomic

The entire transaction has to run or the server will restore the data to the point before the transaction started. For instance, if an update statement modifies 100 rows and one fails because of a check constraint, every row will be restored to the point where it was before the update started.

It's important that the developer both understands how the server works and does his or her part. Potential problem areas are:

  1. The developer explicitly begins a transaction but, due to a typing or logic error, never ends it. The server will wait forever for the transaction to complete, up to the point the server is restarted. Most likely, the user will tire and break the connection or the DBA will break it for them. Either way, the server will roll back the transaction and no work will be done.
  2. The connection is set for implicit transactions, the developer issues a statement that begins a transaction, and neither commits nor rolls the transaction back. The outcome will be the same as for explicit transactions, though perhaps harder to debug.
  3. The developer mixes T-SQL and database API calls. If the developer uses ADO's BeginTrans to begin a transaction, they shouldn't use a T-SQL COMMIT TRANSACTION to commit it. BOL says this may produce undefined results.

Consistent

Consistency means the data never appears to other transactions to be in a transitory state. They either see the data as it was before the transaction began or after it was committed. Consistency involves a tradeoff of concurrency--the ability of multiple users to share data without impacting one another. In practice, maintaining perfect consistency imposes an unacceptable hardship on other users, and SQL Server won't enforce it without instructions from the developer. Read-only databases combine performance and consistency, and as such are a potential workaround.

Isolated

SQL Server's design permits multiple users to work with the same data simultaneously. A lack of controls to isolate transactions from one another introduces three potential problems: dirty reads, nonrepeatable reads, and phantoms.

Dirty reads are reads of another transaction's uncommitted data modifications. If the other transaction is rolled back, the first transaction has effectively read data that never existed. SQL Server won't permit this without instructions from the developer.

Nonrepeatable reads are instances where a transaction reads rows, another transaction modifies or deletes those rows and commits its changes, and the first transaction re-reads the rows. SQL Server will permit this if not told otherwise. Affected applications should be designed to handle nonrepeatable reads. Timestamp, datetime, and smalldatetime data types can be used, the latter two in conjunction with triggers.

Phantoms are instances where a transaction reads rows satisfying a search condition, another transaction inserts and commits rows that satisfy the search condition, and the first transaction rereads using the same search condition and gets a different set of rows. SQL Server permits this if not instructed otherwise.

SQL Server follows the ANSI specifications for isolation and allows four distinct levels. They are implemented using locks of increasing scope and duration.

Transaction Isolation Level Allow Dirty reads Allow Nonrepeatable reads Allow Phantoms
READ UNCOMMITTED Yes Yes Yes
READ COMMITTED No Yes Yes
REPEATABLE READ No No 7Yes/6No
SERIALIZABLE No No No

The levels of interest are READ COMMITTED, which is SQL Server's default level, and SERIALIZABLE, which can turn a multi-user system into a single-user system for the duration of the transaction.

There are three ways to control the isolation level: T-SQL, database API calls, and locking hints.

/* Either of the following is set prior to beginning
** the transaction and remains in effect until reset or the
** connection is closed.
*/

-- T-SQL
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- ADO
conC.IsolationLevel = adXactSerializable

/*
** Setting a locking hint gives you more
** control than a connection-level setting.
** It also overrides a connection-level
** setting. See Locking Hints in BOL for
** details.
*/

select *
from swynk_table with (HOLDLOCK)

Durable

Durability guarantees that a committed transaction will be permanently written to the database, even if the system fails during the process. It also guarantees that an uncommitted transaction will be completely rolled back. This is the theory. In practice, there are a few things to note, and some gotchas.

SQL Server allows you to restore to a specific point in time, assuming you have backups. You can restore the database to the state it was in before you inadvertently committed the deletion of a million rows. There are several products for SQL 6.5, and at least one for 7.0 and 2000, that go beyond SQL Server's recovery capability. Nothing is permanent.

Your hardware--particularly the disk subsystem--and file system settings can break durability. Concerning hardware, keep in mind the old motorcycle helmet advertisement--"If you have a $20 head, buy a $20 helmet." The system, especially the disk controller(s), should be outfitted for a database server. Write-caching controllers are problematic because they lie to SQL Server that data has been written to the disks before it physically has. If the system fails, you're relying on the controller to pick up exactly where it left off. See Q234656.

Use NTFS for the file system but don't enable NTFS compression. Compression isn't supported, kills performance, and may corrupt your data. See Q231347.

Transaction Modes

SQL Server supports three transaction modes: autocommit, explicit, and implicit. It also supports distributed transactions, which can be transactions that span multiple servers or multiple databases on one server.

The transaction mode, like isolation, is controlled on a per-connection basis.

Autocommit

Autocommit is the default mode for SQL Server and its APIs. Each transaction is automatically committed if successful, or rolled back if not. The server remains in autocommit until an explicit or implicit transaction is requested.

Click here for code example.

This creates a table and inserts the values 5 and 10. The third insert fails, but it doesn't affect the preceding two.

If NULL had been mistyped, perhaps as NUL, the batch wouldn't compile and no rows would have been inserted. In 7.0, omitting the GO would move the CREATE TABLE into the bad batch and the table would never have been created (6.x requires a GO or other batch terminator).

Explicit

With explicit transactions, the developer defines the beginning and end of the transaction, after which the server returns to whatever transaction mode it was in before the explicit transaction.

There are four statements available for explicit transactions, but ultimately, you only need to tell the server where the transaction starts and ends.

BEGIN TRAN[SACTION] [Name]

Starts a local transaction, which can be given a name if desired. Additional transactions can be nested within the initial transaction but the server will ignore any names you give them.

The server keeps track of transactions, nested and otherwise, through the @@TRANCOUNT variable. Each BEGIN TRAN increments @@TRANCOUNT by 1.

The server will convert a local transaction into a distributed transaction under some circumstances.

COMMIT TRAN[SACTION] [NAME]/COMMIT [WORK]

Commit marks the end of a transaction; however, it doesn't write the transaction to the database unless it's the end of the outermost transaction. Names can be used for readability but they are ignored by the server.

Each Commit decrements @@TRANCOUNT by 1. The server won't permanently commit changes or free locks until @@TRANCOUNT reaches 0. Once it does, changes can't be undone.

SAVE TRAN[SACTION] NAME

Save is a marker within a transaction. Its purpose is to allow the server to roll back part of a transaction if necessary. The same name can be used more than once in a transaction but the server will only roll back to the most recent use of the name. Save does not preclude the eventual need to commit or roll back the entire transaction.

ROLLBACK TRAN[SACTION] [NAME]/ROLLBACK [WORK]

Rollback can do one of two things: roll back to a savepoint, or roll back the entire transaction. If the latter, all changes are discarded, @@TRANCOUNT is set to 0, and locks are freed. If rollback is issued within a nested transaction, everything up to and including the outermost transaction is rolled back.

If a transaction name is specified, it must match the name of the outermost transaction or the rollback will be ignored. It's best not to use names if you're not rolling back to a savepoint.

Examples

I frequently wrap my ad hoc DML in explicit transactions to avoid "haste makes waste" and other issues. Specifically, I check the number of rows changed. It's not foolproof, but it's good insurance. You can test anything you can express in T-SQL, and roll it back if it didn't work.

set rowcount 0             
-- avoid arbitrary limit
begin tran
    update apinpchg
    set batch_code = 'BATCH2432'
    where batch_code = 'BATCH2422'
    if @@rowcount = 20      -- # of rows affected
        commit tran
    else
        begin
            print 'Failed.' -- can also use raiserror
            rollback tran
        end

Rollback figures prominently in triggers (perhaps another article). A trigger fires each time a user modifies data covered by the trigger. For instance, a glitch in our Accounts Payable application occasionally marks voucher batches as void.

After getting tired of clerks asking me where their work went, I put a trigger on the table involved. The trigger looks for a change in the void_flag column and rolls it back if the change was inappropriate.

create trigger CannotVoidBatchesWithVouchers
on batchctl
for update
as

declare @vcount int,
    @batch varchar(15)

if @@rowcount = 0
    return
if update(void_flag)
begin
    select @vcount = count(a.batch_code) 
    from apinpchg a, inserted i
    where a.batch_code = i.batch_ctrl_num and
    i.void_flag = 1

    if(@vcount) > 0
    begin
        select @batch = batch_ctrl_num from inserted
        raiserror('System attempted to void %s, which
        has at least one unposted voucher/DM.', 16,1,@batch)
        rollback transaction
    end
end

The ROLLBACK rolls back all modifications done by the transaction up to that point, including any done by the trigger. BOL provides additional information and explicit transaction examples.

Implicit

Implicit transactions start automatically like autocommit but need to be committed or rolled back like explicit. Specifically, if there isn't an existing transaction, and the server executes any of the usual DML/DDL statements, the server begins a new transaction and doesn't end it until it encounters either a COMMIT or ROLLBACK.

Implicit transaction mode can be initiated two ways: with SET IMPLICIT_TRANSACTIONS ON or SET ANSI_DEFAULTS ON. The latter statement includes implicit transactions among the options it sets.

Start two connections. Execute the code in Connection 1 first, then Connect 2. Connection 2 will time out after twenty seconds, assuming you don't touch Connection 1 in the meantime.

Connection 1 Connection 2
set implicit_transactions on
create table ##b(i int not null)
go

insert ##b values(100)
insert ##b values(200)

select @@trancount
set lock_timeout 20000
go

select * from ##b

Connection 2 times out because the transaction in Connection 1 is still running--it has never been committed or rolled back--and Table ##b is locked. Notice that @@TRANCOUNT is 1.

LOCK_TIMEOUT, which is measured in milliseconds, is a useful statement because it prevents having a connection wait forever for locks to be freed. The alternative is to have the operator break the connection. You can reset the timeout to infinite by using -1.

Keep the same two connections, but this time run the code in Connection 2 first. Then, run Connection 1. Switch back to Connection 2.

Connection 1 Connection 2
rollback tran
set lock_timeout -1
go

select * from ##b

You should get the message, "Invalid object name '##b'." The transaction in Connection 1 ends, locks are freed, and the transaction in Connection 2 is looking for a table that to it never existed.

Transaction Performance Considerations

  1. Use as low a level of isolation as possible.
  2. Never allow user input during a transaction.
  3. Commit changes as quickly as possible.
  4. See if large transactions can be broken up into smaller pieces that can be committed more often. Also, determine if it's possible to use SET ROWCOUNT, in conjunction with loops, to break a transaction that modifies many rows into several transactions that modify fewer rows.
  5. Don't use implicit transactions if you don't regularly work with them, perhaps in another DBMS.
  6. Put @@TRANCOUNT in your code to clarify nesting issues and ensure that transactions are being committed or rolled back.
  7. Use the server's tools to identify issues caused by open transactions. These tools include the Enterprise Manager, Profiler, sp_lock, DBCC OPENTRAN, and NT Performance Monitor. There are numerous scripts and stored procedures, some from Microsoft, that augment the data these tools provide.
  8. BOL has extensive information on optimizing transactions.

Summary

Transactions are what SQL Server is all about. The server does an immense amount of work behind the scenes to support multiple users, but ultimate responsibility for keeping everyone running smoothly lies with the developer.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved