SQL: Incorrect by Construction

原文

The design of SQL and relational database systems makes it easy to accidentally introduce serious concurrency bugs. Below is a textbook money-transfer procedure in TSQL; Alice wants to send ten dollars to Bob, and to keep Alice from overdrafting her account, we first check that she has enough money. The code looks completely reasonable, but it has several critical bugs. Can you spot them?

DECLARE @balance INT;

SET @balance = (  
    SELECT balance  
    FROM accounts
    WHERE owner = 'alice'
);

IF @balance >= 10  
BEGIN  
    UPDATE accounts  
    SET balance = balance - 10  
    WHERE owner = 'alice';  
  
    UPDATE accounts  
    SET balance = balance + 10  
    WHERE owner = 'bob';  
END

Atomicity

First, if this procedure aborts midway through, we might transfer money from Alice’s account without transferring any to Bob. Alice won’t be happy about that, and we’ve destroyed money in the process. We want all of the transfers to succeed, or none of them; the fix is to wrap the procedure in a transaction:

BEGIN TRANSACTION;

DECLARE @balance INT;

SET @balance = (  
    SELECT balance  
    FROM accounts
    WHERE owner = 'alice'
);

IF @balance >= 10  
BEGIN  
    UPDATE accounts  
    SET balance = balance - 10  
    WHERE owner = 'alice';  
  
    UPDATE accounts  
    SET balance = balance + 10  
    WHERE owner = 'bob';  
END

COMMIT TRANSACTION;

TOCTOU

Are we done yet? Not quite. Suppose Alice fires off two transfers to Bob in parallel, T1 and T2. Let’s map out what happens:

T1: Check Alice’s account balance
T2: Check Alice’s account balance
T1: Withdraw 10 from Alice’s account
T2: Withdraw 10 from Alice’s account
T1: Deposit 10 in Bob’s account
T2: Deposit 10 in Bob’s account

Note how T2 checks the balance before T1 has withdrawn any money from Alice’s account—so when T2 finally withdraws, the account might become overdrafted. This is a Time-of-check to time-of-use (TOCTOU) bug: The precondition changes between when we check it and when we act on it.

The fix is to lock Alice’s account until the transaction completes. We can change the isolation level so locks are acquired automatically, or lock the account row by hand:

BEGIN TRANSACTION;

DECLARE @balance INT;

SET @balance = (  
    SELECT balance
    -- This is roughly equivalent
    -- to SELECT FOR UPDATE
    FROM accounts WITH (UPDLOCK)
    WHERE owner = 'alice'
);

IF @balance >= 10  
BEGIN  
    UPDATE accounts  
    SET balance = balance - 10  
    WHERE owner = 'alice';  
  
    UPDATE accounts  
    SET balance = balance + 10  
    WHERE owner = 'bob';  
END

COMMIT TRANSACTION;

The UPDLOCK hint takes a row-level lock on Alice’s account when the SELECT runs; other transactions that want to modify Alice’s account will block until the lock is released.

Deadlocks

What if Alice and Bob both try to transfer money to each other at the same time? Let’s map out the transactions again:

T1: Acquire a lock on Alice’s account
T2: Acquire a lock on Bob’s account
T1: Check Alice’s account balance
T2: Check Bob’s account balance
T1: Withdraw 10 from Alice’s account
T2: Withdraw 10 from Bob’s account
T1: Can’t update Bob’s account because it’s locked by T2
T2: Can’t update Alice’s account because it’s locked by T1

T1 waits for T2’s lock on Bob; T2 waits for T1’s lock on Alice—we’re stuck in a deadlock. The fix is to acquire all locks upfront^:

BEGIN TRANSACTION;

DECLARE @balance INT;

SELECT owner
FROM accounts WITH (UPDLOCK)  
WHERE owner IN ('alice', 'bob');

SET @balance = (  
    SELECT balance
    FROM accounts
    WHERE owner = 'alice'
);

IF @balance >= 10  
BEGIN  
    UPDATE accounts  
    SET balance = balance - 10  
    WHERE owner = 'alice';  
  
    UPDATE accounts  
    SET balance = balance + 10  
    WHERE owner = 'bob';  
END

COMMIT TRANSACTION;

Conclusion

We’ve fixed the concurrency bugs in the original code, but in the process it grew about 50%, and became harder to read. Sure, you could argue that there are other, more idiomatic ways to fix this code^{, but the point still
stands: A SQL program that looks completely reasonable can be riddled with
serious bugs.}

If you’re building a social media site, it might not be the end of the world if a user likes a post twice, but if a system fails to record that a patient received a dose of medicine, it might have fatal consequences. For systems where correctness matters, we need better tools.

Proposed solution

I want an alternative to SQL that adopts Rust’s approach of fearless concurrency—that is, make the correct behavior the default, and provide “unsafe” escape hatches if necessary. Some concrete suggestions:

Make transactions atomic by default; if the user wants to save an intermediate “checkpoint” state they would have to say so explicitly.
Let the user manage locks themselves, and make sure the correct locks are acquired before mutating a database object.
Use static analysis to detect potential deadlocks; this is a tricky problem and a subject of ongoing research. Deterministic database systems could be one possible solution.

This system will come with other trade-offs; for example, it might end up with lower throughput than modern SQL systems. But that’s fine—we still have SQL for use cases where correctness is less important.