mlock: fix mlock count can not decrease in race condition
Yisheng Xie authored
commit 70feee0e upstream.

Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:

 [1] testcase
 linux:~ # cat test_mlockal
 grep Mlocked /proc/meminfo
  for j in `seq 0 10`
  do
 	for i in `seq 4 15`
 	do
 		./p_mlockall >> log &
 	done
 	sleep 0.2
 done
 # wait some time to let mlock counter decrease and 5s may not enough
 sleep 5
 grep Mlocked /proc/meminfo

 linux:~ # cat p_mlockall.c
 #include <sys/mman.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define SPACE_LEN	4096

 int main(int argc, char ** argv)
 {
	 	int ret;
	 	void *adr = malloc(SPACE_LEN);
	 	if (!adr)
	 		return -1;

	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
	 	printf("mlcokall ret = %d\n", ret);

	 	ret = munlockall();
	 	printf("munlcokall ret = %d\n", ret);

	 	free(adr);
	 	return 0;
	 }

In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag.  Commit 1ebb7cc6 ("mm: munlock: batch
NR_MLOCK zone state updates") has introduced a bug where we don't
decrement NR_MLOCK for pages where we clear the flag, but fail to
isolate them from the lru list (e.g.  when the pages are on some other
cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
accounting gets permanently disrupted by this.

Fix it by counting the number of page whose PageMlock flag is cleared.

Fixes: 1ebb7cc6 (" mm: munlock: batch NR_MLOCK zone state updates")
Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com

Signed-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhongjiang <zhongjiang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
aef16f4c
Name Last commit Last update
..
Kconfig mm/balloon_compaction: add vmstat counters and kpageflags bit
Kconfig.debug mm: more intensive memory corruption debugging
Makefile Merge tag 'tiny/no-advice-fixup-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux
backing-dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
balloon_compaction.c virtio_balloon: fix race between migration and ballooning
bootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE
cma.c mm/cma: silence warnings due to max() usage
compaction.c mm, compaction: abort free scanner if split fails
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled
debug.c mm/debug.c: use pr_emerg()
dmapool.c mm/dmapool.c: fixed a brace coding style issue
early_ioremap.c mm: create generic early_ioremap() support
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long
failslab.c switch debugfs to umode_t
filemap.c mm: make sendfile(2) killable
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures
fremap.c mm: mark remap_file_pages() syscall as deprecated
frontswap.c mm: frontswap: invalidate expired data on a dup-store failure
gup.c mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
highmem.c mm/highmem: make kmap cache coloring aware
huge_memory.c mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
hugetlb.c mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
hugetlb_cgroup.c hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked
hwpoison-inject.c mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive
init-mm.c
internal.h
interval_tree.c
iov_iter.c
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c
list_lru.c
maccess.c
madvise.c
memblock.c
memcontrol.c
memory-failure.c
memory.c
memory_hotplug.c
mempolicy.c
mempool.c
migrate.c
mincore.c
mlock.c
mm_init.c
mmap.c
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c
page-writeback.c
page_alloc.c
page_cgroup.c
page_io.c
page_isolation.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
shmem.c
slab.c
slab.h
slab_common.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap.c
swap_state.c
swapfile.c
truncate.c
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c
vmstat.c
workingset.c
zbud.c
zpool.c
zsmalloc.c
zswap.c