It turns out that even if you stub out the rounding and exception
support and use the ieee754 version, it's still much better than
the generic version that just uses normal multiply and add.
The resulting functions have only 1 ULP of error according to the tests.
Previously we weren't re-aligning the stack pointer during the
call to _dl_init(), so for tilegx32 and an odd value in _dl_skip_args
and kernel unaligned access fixups disabled, we would die with SIGBUS.
We now handle this case properly by aligning before calling _dl_init().
We were multiplying a byte by 0x0101010101010101ULL to create a
constant for SIMD ops, but the compiler isn't good at optimizing
this case (the fact that one operand is a byte is lost by the time
it would be possible to do the optimization). So instead we add
a helper routine that explicitly uses SIMD ops to create the constant.
Although this is not required by the definition of memcpy(),
in practice this sort of thing does happen, and it's easy to make
the code robust by doing nothing in this case. (Since structure
copy causes the compiler to emit a memcpy, in the case where the
target structure is the same as the destination, we were seeing
corruption.)
This patches fixes up the tile startup files, moving elf/start.S up a
directory level and implementing the required crti.S and crtn.S files
based on the old initfini.c compiler output (hand-optimized to bum a
couple of cycles).
Common code moved _itoa.h necessitating a change in the #include path.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>