An error occurred while fetching folder content.
mm, oom: do not trigger out_of_memory from the #PF
Michal Hocko authored
commit 60e2793d upstream.

Any allocation failure during the #PF path will return with VM_FAULT_OOM
which in turn results in pagefault_out_of_memory.  This can happen for 2
different reasons.  a) Memcg is out of memory and we rely on
mem_cgroup_oom_synchronize to perform the memcg OOM handling or b)
normal allocation fails.

The latter is quite problematic because allocation paths already trigger
out_of_memory and the page allocator tries really hard to not fail
allocations.  Anyway, if the OOM killer has been already invoked there
is no reason to invoke it again from the #PF path.  Especially when the
OOM condition might be gone by that time and we have no way to find out
other than allocate.

Moreover if the allocation failed and the OOM killer hasn't been invoked
then we are unlikely to do the right thing from the #PF context because
we have already lost the allocation context and restictions and
therefore might oom kill a task from a different NUMA domain.

This all suggests that there is no legitimate reason to trigger
out_of_memory from pagefault_out_of_memory so drop it.  Just to be sure
that no #PF path returns with VM_FAULT_OOM without allocation print a
warning that this is happening before we restart the #PF.

[VvS: #PF allocation can hit into limit of cgroup v1 kmem controller.
This is a local problem related to memcg, however, it causes unnecessary
global OOM kills that are repeated over and over again and escalate into a
real disaster.  This has been broken since kmem accounting has been
introduced for cgroup v1 (3.8).  There was no kmem specific reclaim for
the separate limit so the only way to handle kmem hard limit was to return
with ENOMEM.  In upstream the problem will be fixed by removing the
outdated kmem limit, however stable and LTS kernels cannot do it and are
still affected.  This patch fixes the problem and should be backported
into stable/LTS.]

Link: https://lkml.kernel.org/r/f5fd8dd8-0ad4-c524-5f65-920b01972a42@virtuozzo.com

Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
Acked-by: default avatarMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
c15aeead
Name Last commit Last update
..
damon mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'
kasan Merge branch 'akpm' (patches from Andrew)
kfence kfence: always use static branches to guard kfence_alloc()
Kconfig mm/idle_page_tracking: make PG_idle reusable
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO
Makefile mm: introduce Data Access MONitor (DAMON)
backing-dev.c Merge branch 'akpm' (patches from Andrew)
balloon_compaction.c mm: fix typos in comments
bootmem_info.c mm/bootmem_info.c: mark __init on register_page_bootmem_info_section
cleancache.c Merge tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
cma.c mm: use proper type for cma_[alloc|release]
cma.h mm: cma: support sysfs
cma_debug.c mm/cma: change cma mutex to irq safe spinlock
cma_sysfs.c mm: cma: support sysfs
compaction.c Merge branch 'akpm' (patches from Andrew)
debug.c mm/debug: sync up latest migrate_reason to migrate_reason_names
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license
debug_vm_pgtable.c mm/debug_vm_pgtable: fix corrupted page flag
dmapool.c mm/dmapool: use DEVICE_ATTR_RO macro
early_ioremap.c mm/early_ioremap.c: remove redundant early_ioremap_shutdown()
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only
filemap.c mm/filemap.c: remove bogus VM_BUG_ON
frontswap.c mm/mempool: minor coding style tweaks
gup.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"
gup_test.c
gup_test.h
highmem.c
hmm.c
huge_memory.c
hugetlb.c
hugetlb_cgroup.c
hugetlb_vmemmap.c
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
io-mapping.c
ioremap.c
khugepaged.c
kmemleak.c
ksm.c
list_lru.c
maccess.c
madvise.c
mapping_dirty_helpers.c
memblock.c
memcontrol.c
memfd.c
memory-failure.c
memory.c
memory_hotplug.c
mempolicy.c
mempool.c
memremap.c
memtest.c
migrate.c
mincore.c
mlock.c
mm_init.c
mmap.c
mmap_lock.c
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c
oom_kill.c
page-writeback.c
page_alloc.c
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c
process_vm_access.c
ptdump.c
readahead.c
rmap.c
rodata_test.c
secretmem.c
shmem.c
shuffle.c
shuffle.h