crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
Ard Biesheuvel authored
commit 86ad60a6 upstream.

The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.

Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.

As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)

Fixes: 9697fa39 ("x86/retpoli...
8e970771
Name Last commit Last update
..
alpha local64.h: make <asm/local64.h> mandatory
arc ARC: [hsdk]: Enable FPU_SAVE_RESTORE
arm Xen/gnttab: handle p2m update errors on a per-slot basis
arm64 arm64: Unconditionally set virtual cpu id registers
c6x Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block
csky csky: Fix a size determination in gpr_get()
h8300 h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
hexagon local64.h: make <asm/local64.h> mandatory
ia64 ia64: don't call handle_signal() unless there's actually a signal queued
m68k m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMU
microblaze local64.h: make <asm/local64.h> mandatory
mips MIPS: kernel: Reserve exception base early to prevent corruption
nds32 local64.h: make <asm/local64.h> mandatory
nios2 nios2: fixed broken sys_clone syscall
openrisc openrisc: io: Add missing __iomem annotation to iounmap()
parisc parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST
powerpc powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
riscv riscv: Get rid of MAX_EARLY_MAPPING_SIZE
s390 s390/smp: __smp_rescan_cpus() - move cpumask away from stack
sh Merge tag 'sh-for-5.11' of git://git.libc.org/linux-sh
sparc sparc64: Use arch_validate_flags() to validate ADI flag
um um: defer killing userspace on page table update failures
x86 crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
xtensa local64.h: make <asm/local64.h> mandatory
.gitignore .gitignore: add SPDX License Identifier
Kconfig fanotify: Fix sys_fanotify_mark() on native x86-32