f3dcae82d5
This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve and _dl_runtime_profile, which save and restore the first 8 vector registers used for parameter passing. elf_machine_runtime_setup selects the proper _dl_runtime_resolve or _dl_runtime_profile based on _dl_x86_cpu_features. It avoids race condition caused by FOREIGN_CALL macros, which are only used for x86-64. Performance impact of saving and restoring 8 vector registers are negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when ld.so is optimized with SSE2. [BZ #15128] * sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add ifuncmain8. (modules-names): Add ifuncmod8. ($(objpfx)ifuncmain8): New rule. * sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and <cpuid.h>. (elf_machine_runtime_setup): Use _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512, _dl_runtime_profile_sse, _dl_runtime_profile_avx, or _dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE. * sysdeps/x86_64/dl-trampoline.S: Rewrite. * sysdeps/x86_64/dl-trampoline.h: Likewise. * sysdeps/x86_64/ifuncmain8.c: New file. * sysdeps/x86_64/ifuncmod8.c: Likewise. * sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE): Removed. * sysdeps/x86_64/nptl/tls.h (__128bits): Removed. (tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1. Change rtld_savespace_sse to __glibc_unused2. (RTLD_CHECK_FOREIGN_CALL): Removed. (RTLD_ENABLE_FOREIGN_CALL): Likewise. (RTLD_PREPARE_FOREIGN_CALL): Likewise. (RTLD_FINALIZE_FOREIGN_CALL): Likewise.
29 lines
1.1 KiB
Plaintext
29 lines
1.1 KiB
Plaintext
#include <sysdep.h>
|
|
#include <tls.h>
|
|
#include <kernel-features.h>
|
|
|
|
RESULT offsetof (struct pthread, result)
|
|
TID offsetof (struct pthread, tid)
|
|
PID offsetof (struct pthread, pid)
|
|
CANCELHANDLING offsetof (struct pthread, cancelhandling)
|
|
CLEANUP_JMP_BUF offsetof (struct pthread, cleanup_jmp_buf)
|
|
CLEANUP offsetof (struct pthread, cleanup)
|
|
CLEANUP_PREV offsetof (struct _pthread_cleanup_buffer, __prev)
|
|
MUTEX_FUTEX offsetof (pthread_mutex_t, __data.__lock)
|
|
MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads)
|
|
POINTER_GUARD offsetof (tcbhead_t, pointer_guard)
|
|
VGETCPU_CACHE_OFFSET offsetof (tcbhead_t, vgetcpu_cache)
|
|
#ifndef __ASSUME_PRIVATE_FUTEX
|
|
PRIVATE_FUTEX offsetof (tcbhead_t, private_futex)
|
|
#endif
|
|
|
|
-- Not strictly offsets, but these values are also used in the TCB.
|
|
TCB_CANCELSTATE_BITMASK CANCELSTATE_BITMASK
|
|
TCB_CANCELTYPE_BITMASK CANCELTYPE_BITMASK
|
|
TCB_CANCELING_BITMASK CANCELING_BITMASK
|
|
TCB_CANCELED_BITMASK CANCELED_BITMASK
|
|
TCB_EXITING_BITMASK EXITING_BITMASK
|
|
TCB_CANCEL_RESTMASK CANCEL_RESTMASK
|
|
TCB_TERMINATED_BITMASK TERMINATED_BITMASK
|
|
TCB_PTHREAD_CANCELED PTHREAD_CANCELED
|