64d76ca064
We were multiplying a byte by 0x0101010101010101ULL to create a constant for SIMD ops, but the compiler isn't good at optimizing this case (the fact that one operand is a byte is lost by the time it would be possible to do the optimization). So instead we add a helper routine that explicitly uses SIMD ops to create the constant.