It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch removes ENABLE_SSSE3_ON_ATOM.
This patch adds multiarch support when configured for i686. I modified some x86-64 functions to support 32bit. I will contribute 32bit SSE string and memory functions later.