Atomic ops are issued directly from the core, rather than
potentially sitting in the write buffer, so can improve the
performance of other waiters. In addition, if we didn't end
up pulling a copy of the cache line where the lock is into cache,
by using an atomic op we don't have to acquire the cache line
before we can unlock.