Chris Metcalf d9cd52e64d tile: optimize memcmp
Customize memcmp.c for tile, using similar tricks from memcpy:

- replace MERGE macro with dblalign.
- replace memcmp_bytes function with revbytes.
- use __glibc_likely.
- use post-increment addressing.

The schedule is still not perfect: the compiler is not hoisting
code above the comparison branch, which could save a bundle or two.
memcmp speeds up by 30-40% on shorter aligned tests in benchtest,
with some tests with unaligned lengths taking a small performance hit.
2014-10-06 11:20:59 -04:00
..
2014-06-13 13:15:28 -07:00
2014-10-06 11:20:59 -04:00
2014-06-27 16:51:22 +00:00