mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
Michal Hocko authored
[ Upstream commit efad4e47 ]

Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.

Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1].  I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.

We have ended up with commit 2830bf6f ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3].  The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.

In order to plug this hole we really have to be zone range aware in
those handlers.  I have split up the original patch into two.  One is
unchanged (patch 2) and I took a different approach for `removable'
crash.

[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz

This patch (of 2):

Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:

 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   is_mem_section_removable+0xb4/0x190
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page.  Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.

Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.org

Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
Reported-by: default avatarMikhail Zaslonko <zaslonko@linux.ibm.com>
Debugged-by: default avatarMikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
a3f34919
Name Last commit Last update
..
Kconfig mm/balloon_compaction: add vmstat counters and kpageflags bit
Kconfig.debug mm: more intensive memory corruption debugging
Makefile Merge tag 'tiny/no-advice-fixup-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux
backing-dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
balloon_compaction.c virtio_balloon: fix race between migration and ballooning
bootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE
cma.c cma: fix calculation of aligned offset
compaction.c mm, compaction: abort free scanner if split fails
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled
debug.c mm: get rid of vmacache_flush_all() entirely
dmapool.c mm/dmapool.c: fixed a brace coding style issue
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep
fadvise.c mm/fadvise.c: fix signed overflow UBSAN complaint
failslab.c switch debugfs to umode_t
filemap.c mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures
fremap.c mm: mark remap_file_pages() syscall as deprecated
frontswap.c mm: frontswap: invalidate expired data on a dup-store failure
gup.c mm: larger stack guard gap, between vmas
highmem.c mm/highmem: make kmap cache coloring aware
huge_memory.c mremap: properly flush TLB before releasing the page
hugetlb.c hugetlbfs: fix races and page leaks during migration
hugetlb_cgroup.c hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked
hwpoison-inject.c mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive
init-mm.c atomic: use <linux/atomic.h>
internal.h mm: use 'unsigned int' for page order
interval_tree.c mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA
iov_iter.c Fix thinko in iov_iter_single_seg_count
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging
kmemleak.c mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
ksm.c mm/ksm: fix interaction with THP
list_lru.c mm: keep page cache radix tree nodes in check
maccess.c mm: Map most files to use export.h instead of module.h
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages
memblock.c mm: page_alloc: pass PFN to __free_pages_bootmem
memcontrol.c hwpoison, memcg: forcibly uncharge LRU pages
memory-failure.c hwpoison, memcg: forcibly uncharge LRU pages
memory.c mm/tlb: Remove tlb_remove_table() non-concurrent condition
memory_hotplug.c mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
mempolicy.c numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
mempool.c mm/mempool.c: update the kmemleak stack trace for mempool allocations
migrate.c hugetlbfs: fix races and page leaks during migration
mincore.c mm + fs: prepare for non-page entries in page cache radix trees
mlock.c mlock: fix mlock count can not decrease in race condition
mm_init.c mm: bring back /sys/kernel/mm
mmap.c mm: enforce min addr even if capable() in expand_downwards()
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm
mmu_notifier.c kvm: Fix page ageing bugs
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c
page-writeback.c
page_alloc.c
page_cgroup.c
page_io.c
page_isolation.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
shmem.c
slab.c
slab.h
slab_common.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap.c
swap_state.c
swapfile.c
truncate.c
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c
vmstat.c
workingset.c
zbud.c
zpool.c
zsmalloc.c
zswap.c