With the recent tuning the C version of rwlocks is basically the same
performance as the x86 assembler version for uncontended locks (with a
a few cycles near the run-to-run variability). For others it should not
matter anyways.
So remove the assembler code and use the C version like other
architectures.