mm: fix wrong kunmap_atomic() pointer
Steven Rostedt authored
Running a ktest.pl test, I hit the following bug on x86_32:

  ------------[ cut here ]------------
  WARNING: at arch/x86/mm/highmem_32.c:81 __kunmap_atomic+0x64/0xc1()
   Hardware name:
  Modules linked in:
  Pid: 93, comm: sh Not tainted 2.6.39-test+ #1
  Call Trace:
   [<c04450da>] warn_slowpath_common+0x7c/0x91
   [<c042f5df>] ? __kunmap_atomic+0x64/0xc1
   [<c042f5df>] ? __kunmap_atomic+0x64/0xc1^M
   [<c0445111>] warn_slowpath_null+0x22/0x24
   [<c042f5df>] __kunmap_atomic+0x64/0xc1
   [<c04d4a22>] unmap_vmas+0x43a/0x4e0
   [<c04d9065>] exit_mmap+0x91/0xd2
   [<c0443057>] mmput+0x43/0xad
   [<c0448358>] exit_mm+0x111/0x119
   [<c044855f>] do_exit+0x1ff/0x5fa
   [<c0454ea2>] ? set_current_blocked+0x3c/0x40
   [<c0454f24>] ? sigprocmask+0x7e/0x8e
   [<c0448b55>] do_group_exit+0x65/0x88
   [<c0448b90>] sys_exit_group+0x18/0x1c
   [<c0c3915f>] sysenter_do_call+0x12/0x38
  ---[ end trace 8055f74ea3c0eb62 ]---

Running a ktest.pl git bisect, found the culprit: commit e303297e
("mm: extended batches for generic mmu_gather")

But although this was the commit triggering the bug, it was not the one
originally responsible for the bug.  That was commit d16dfc55 ("mm:
mmu_gather rework").

The code in zap_pte_range() has something that looks like the following:

	pte =  pte_offset_map_lock(mm, pmd, addr, &ptl);
	do {
		[...]
	} while (pte++, addr += PAGE_SIZE, addr != end);
	pte_unmap_unlock(pte - 1, ptl);

The pte starts off pointing at the first element in the page table
directory that was returned by the pte_offset_map_lock().  When it's done
with the page, pte will be pointing to anything between the next entry and
the first entry of the next page inclusive.  By doing a pte - 1, this puts
the pte back onto the original page, which is all that pte_unmap_unlock()
needs.

In most archs (64 bit), this is not an issue as the pte is ignored in the
pte_unmap_unlock().  But on 32 bit archs, where things may be kmapped, it
is essential that the pte passed to pte_unmap_unlock() resides on the same
page that was given by pte_offest_map_lock().

The problem came in d16dfc55 ("mm: mmu_gather rework") where it introduced
a "break;" from the while loop.  This alone did not seem to easily trigger
the bug.  But the modifications made by e303297e

 caused that "break;" to
be hit on the first iteration, before the pte++.

The pte not being incremented will now cause pte_unmap_unlock(pte - 1) to
be pointing to the previous page.  This will cause the wrong page to be
unmapped, and also trigger the warning above.

The simple solution is to just save the pointer given by
pte_offset_map_lock() and use it in the unlock.
Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: default avatarHugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
5f1a1907
Name Last commit Last update
..
Kconfig mm: cleancache core ops functions and config
Kconfig.debug mm: debug-pagealloc: fix kconfig dependency warning
Makefile mm: cleancache core ops functions and config
backing-dev.c backing-dev: Kill set but not used var in bdi_debug_stats_show()
bootmem.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec()
cleancache.c mm: cleancache core ops functions and config
compaction.c mm: compaction: minimise the time IRQs are disabled while isolating pages for migration
debug-pagealloc.c generic debug pagealloc
dmapool.c mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc()
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM
failslab.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
filemap.c more conservative S_NOSEC handling
filemap_xip.c mm: Convert i_mmap_lock to a mutex
fremap.c mm: don't access vm_flags as 'int'
highmem.c mm,x86: fix kmap_atomic_push vs ioremap_32.c
huge_memory.c mm: remove khugepaged double thp vmstat update with CONFIG_NUMA=n
hugetlb.c mm: fix ENOSPC returned by handle_mm_fault()
hwpoison-inject.c Fix common misspellings
init-mm.c mm: convert mm->cpu_vm_cpumask into cpumask_var_t
internal.h mm: nommu: sort mm->mmap list properly
kmemcheck.c kmemcheck: add hooks for the page allocator
kmemleak-test.c kmemleak: remove memset by using kzalloc
kmemleak.c kmemleak: Do not return a pointer to an object that kmemleak did not get
ksm.c oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
maccess.c maccess,probe_kernel: Make write/read src const void *
madvise.c thp: khugepaged: make khugepaged aware about madvise
memblock.c mm/memblock: properly handle overlaps and fix error path
memcontrol.c vmscan,memcg: memcg aware swap token
memory-failure.c vmscan: change shrinker API by passing shrink_control struct
memory.c mm: fix wrong kunmap_atomic() pointer
memory_hotplug.c mm: remove dependency on CONFIG_FLATMEM from online_page()
mempolicy.c mm: proc: move show_numa_map() to fs/proc/task_mmu.c
mempool.c mm: remove broken 'kzalloc' mempool
migrate.c mm: use refcounts for page_lock_anon_vma()
mincore.c thp: mincore transparent hugepage support
mlock.c mm: don't access vm_flags as 'int'
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore
mmap.c mm: don't access vm_flags as 'int'
mmu_context.c exit: fix oops in sync_mm_rss
mmu_notifier.c thp: mmu_notifier_test_young
mmzone.c mm: page allocator: adjust the per-cpu counter threshold when memory is low
mprotect.c thp: mprotect: transparent huge page support
mremap.c mm: Convert i_mmap_lock to a mutex
msync.c sanitize vfs_fsync calling conventions
nobootmem.c memblock/nobootmem: remove unneeded code from alloc_bootmem_node_high()
nommu.c nommu: add page alignment to mmap
oom_kill.c oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
page-writeback.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
page_alloc.c Revert "mm: fail GFP_DMA allocations when ZONE_DMA is not configured"
page_cgroup.c
page_io.c
page_isolation.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
prio_tree.c
quicklist.c
readahead.c
rmap.c
shmem.c
slab.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap.c
swap_state.c
swapfile.c
thrash.c
truncate.c
util.c
vmalloc.c
vmscan.c
vmstat.c