Previously we weren't re-aligning the stack pointer during the
call to _dl_init(), so for tilegx32 and an odd value in _dl_skip_args
and kernel unaligned access fixups disabled, we would die with SIGBUS.
We now handle this case properly by aligning before calling _dl_init().
We were multiplying a byte by 0x0101010101010101ULL to create a
constant for SIMD ops, but the compiler isn't good at optimizing
this case (the fact that one operand is a byte is lost by the time
it would be possible to do the optimization). So instead we add
a helper routine that explicitly uses SIMD ops to create the constant.
Although this is not required by the definition of memcpy(),
in practice this sort of thing does happen, and it's easy to make
the code robust by doing nothing in this case. (Since structure
copy causes the compiler to emit a memcpy, in the case where the
target structure is the same as the destination, we were seeing
corruption.)
This patches fixes up the tile startup files, moving elf/start.S up a
directory level and implementing the required crti.S and crtn.S files
based on the old initfini.c compiler output (hand-optimized to bum a
couple of cycles).