c044cf14b0
Wrong copy algorithm for last bytes, not thread safety. In some particular cases it uses the destination memory beyond the string end for 16-byte load, puts changes into that part that is relevant to destination string and writes whole 16-byte chunk into memory. I have a test case where the memory beyond the string end contains malloc/free data, that appear corrupted in case free() updates it in between the 16-byte read and 16-byte write.