1. 30 Sep, 2013 1 commit
  2. 11 Sep, 2013 5 commits
  3. 29 Apr, 2013 1 commit
  4. 24 Feb, 2013 1 commit
  5. 09 Oct, 2012 1 commit
    • Konstantin Khlebnikov's avatar
      mm: prepare VM_DONTDUMP for using in drivers · 0103bd16
      Konstantin Khlebnikov authored
      
      Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags:
      VM_DONTEXPAND, VM_DONTCOPY.  Currently this flag used only for
      sys_madvise.  The next patch will use it for replacing the outdated flag
      VM_RESERVED.
      
      Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL
      (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0103bd16
  6. 06 Jul, 2012 1 commit
  7. 29 May, 2012 1 commit
    • Hugh Dickins's avatar
      mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE · 3f31d075
      Hugh Dickins authored
      
      Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
      to use do_fallocate() instead of vmtruncate_range(): which extends
      madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.
      
      There is one more user of vmtruncate_range() in our tree,
      staging/android's ashmem_shrink(): convert it to use do_fallocate() too
      (but if its unpinned areas are already unmapped - I don't know - then it
      would do better to use shmem_truncate_range() directly).
      Based-on-patch-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Colin Cross <ccross@android.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f31d075
  8. 23 Mar, 2012 1 commit
    • Jason Baron's avatar
      coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP · accb61fe
      Jason Baron authored
      
      Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
      for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
      MADV_DONTDUMP, which can be set by applications to specifically request
      memory regions which should not dump core.
      
      The specific application I have in mind is qemu: we can add a flag there
      that wouldn't dump all of guest memory when qemu dumps core.  This flag
      might also be useful for security sensitive apps that want to absolutely
      make sure that parts of memory are not dumped.  To clear the flag use:
      MADV_DODUMP.
      
      [akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
      [akpm@linux-foundation.org: fix up the architectures which broke]
      Signed-off-by: default avatarJason Baron <jbaron@redhat.com>
      Acked-by: default avatarRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      accb61fe
  9. 03 Jan, 2012 1 commit
    • Tony Luck's avatar
      HWPOISON: Clean up memory_failure() vs. __memory_failure() · cd42f4a3
      Tony Luck authored
      
      There is only one caller of memory_failure(), all other users call
      __memory_failure() and pass in the flags argument explicitly. The
      lone user of memory_failure() will soon need to pass flags too.
      
      Add flags argument to the callsite in mce.c. Delete the old memory_failure()
      function, and then rename __memory_failure() without the leading "__".
      
      Provide clearer message when action optional memory errors are ignored.
      Acked-by: default avatarBorislav Petkov <bp@amd64.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      cd42f4a3
  10. 21 Jul, 2011 1 commit
    • Christoph Hellwig's avatar
      fs: kill i_alloc_sem · bd5fe6c5
      Christoph Hellwig authored
      
      i_alloc_sem is a rather special rw_semaphore.  It's the last one that may
      be released by a non-owner, and it's write side is always mirrored by
      real exclusion.  It's intended use it to wait for all pending direct I/O
      requests to finish before starting a truncate.
      
      Replace it with a hand-grown construct:
      
       - exclusion for truncates is already guaranteed by i_mutex, so it can
         simply fall way
       - the reader side is replaced by an i_dio_count member in struct inode
         that counts the number of pending direct I/O requests.  Truncate can't
         proceed as long as it's non-zero
       - when i_dio_count reaches non-zero we wake up a pending truncate using
         wake_up_bit on a new bit in i_flags
       - new references to i_dio_count can't appear while we are waiting for
         it to read zero because the direct I/O count always needs i_mutex
         (or an equivalent like XFS's i_iolock) for starting a new operation.
      
      This scheme is much simpler, and saves the space of a spinlock_t and a
      struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
      system).
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      bd5fe6c5
  11. 14 Jan, 2011 3 commits
  12. 16 Dec, 2009 4 commits
  13. 22 Sep, 2009 2 commits
    • Hugh Dickins's avatar
      ksm: the mm interface to ksm · f8af4da3
      Hugh Dickins authored
      
      This patch presents the mm interface to a dummy version of ksm.c, for
      better scrutiny of that interface: the real ksm.c follows later.
      
      When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
      MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
      pretending that they can be serviced.  But when CONFIG_KSM=y, accept them
      even if KSM is not currently running, and even on areas which KSM will not
      touch (e.g.  hugetlb or shared file or special driver mappings).
      
      Like other madvices, report ENOMEM despite success if any area in the
      range is unmapped, and use EAGAIN to report out of memory.
      
      Define vma flag VM_MERGEABLE to identify an area on which KSM may try
      merging pages: leave it to ksm_madvise() to decide whether to set it.
      Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
      VM_MERGEABLE areas, to minimize callouts when forking or exiting.
      
      Based upon earlier patches by Chris Wright and Izik Eidus.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8af4da3
    • Hugh Dickins's avatar
      ksm: first tidy up madvise_vma() · 3866ea90
      Hugh Dickins authored
      
      madvise.c has several levels of switch statements, what to do in which?
      Move MADV_DOFORK code down from madvise_vma() to madvise_behavior(), so
      madvise_vma() can be a simple router, to madvise_behavior() by default.
      
      vma->vm_flags is an unsigned long so use the same type for new_flags.  Add
      missing comment lines to describe MADV_DONTFORK and MADV_DOFORK.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarChris Wright <chrisw@redhat.com>
      Signed-off-by: default avatarIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3866ea90
  14. 16 Sep, 2009 1 commit
    • Andi Kleen's avatar
      HWPOISON: Add madvise() based injector for hardware poisoned pages v4 · 9893e49d
      Andi Kleen authored
      
      Impact: optional, useful for debugging
      
      Add a new madvice sub command to inject poison for some
      pages in a process' address space.  This is useful for
      testing the poison page handling.
      
      This patch can allow root to tie up large amounts of memory.
      I got feedback from container developers and they didn't see any
      problem.
      
      v2: Use write flag for get_user_pages to make sure to always get
      a fresh page
      v3: Don't request write mapping (Fengguang Wu)
      v4: Move MADV_* number to avoid conflict with KSM (Hugh Dickins)
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      9893e49d
  15. 17 Jun, 2009 2 commits
  16. 13 May, 2009 1 commit
  17. 05 May, 2009 1 commit
    • Mel Gorman's avatar
      Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions · a425a638
      Mel Gorman authored
      
      madvise(MADV_WILLNEED) forces page cache readahead on a range of memory
      backed by a file.  The assumption is made that the page required is
      order-0 and "normal" page cache.
      
      On hugetlbfs, this assumption is not true and order-0 pages are
      allocated and inserted into the hugetlbfs page cache.  This leaks
      hugetlbfs page reservations and can cause BUGs to trigger related to
      corrupted page tables.
      
      This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed
      regions.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a425a638
  18. 14 Jan, 2009 1 commit
  19. 30 Jul, 2008 1 commit
  20. 28 Apr, 2008 1 commit
  21. 16 Jul, 2007 1 commit
  22. 21 May, 2007 1 commit
    • Alexey Dobriyan's avatar
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan authored
      
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  23. 07 May, 2007 1 commit
  24. 29 Mar, 2007 1 commit
    • Hugh Dickins's avatar
      [PATCH] holepunch: fix mmap_sem i_mutex deadlock · 90ed52eb
      Hugh Dickins authored
      
      sys_madvise has down_write of mmap_sem, then madvise_remove calls
      vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can easily devise
      deadlocks from that ordering.
      
      madvise_remove drop mmap_sem while calling vmtruncate_range: luckily, since
      madvise_remove doesn't split or merge vmas, it's easy to handle this case with
      a NULL prev, without restructuring sys_madvise.  (Though sad to retake
      mmap_sem when it's unlikely to be needed, and certainly down_read is
      sufficient for MADV_REMOVE, unlike the other madvices.)
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90ed52eb
  25. 17 Mar, 2007 1 commit
  26. 18 Apr, 2006 1 commit
  27. 15 Feb, 2006 1 commit
    • Michael S. Tsirkin's avatar
      [PATCH] madvise MADV_DONTFORK/MADV_DOFORK · f8225661
      Michael S. Tsirkin authored
      
      Currently, copy-on-write may change the physical address of a page even if the
      user requested that the page is pinned in memory (either by mlock or by
      get_user_pages).  This happens if the process forks meanwhile, and the parent
      writes to that page.  As a result, the page is orphaned: in case of
      get_user_pages, the application will never see any data hardware DMA's into
      this page after the COW.  In case of mlock'd memory, the parent is not getting
      the realtime/security benefits of mlock.
      
      In particular, this affects the Infiniband modules which do DMA from and into
      user pages all the time.
      
      This patch adds madvise options to control whether memory range is inherited
      across fork.  Useful e.g.  for when hardware is doing DMA from/into these
      pages.  Could also be useful to an application wanting to speed up its forks
      by cutting large areas out of consideration.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@mellanox.co.il>
      Acked-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f8225661
  28. 06 Jan, 2006 1 commit
    • Badari Pulavarty's avatar
      [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store · f6b3ec23
      Badari Pulavarty authored
      
      Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
      given range of pages & its associated backing store.  Current
      implementation supports only shmfs/tmpfs and other filesystems return
      -ENOSYS.
      
      "Some app allocates large tmpfs files, then when some task quits and some
      client disconnect, some memory can be released.  However the only way to
      release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli
      
      Databases want to use this feature to drop a section of their bufferpool
      (shared memory segments) - without writing back to disk/swap space.
      
      This feature is also useful for supporting hot-plug memory on UML.
      
      Concerns raised by Andrew Morton:
      
      - "We have no plan for holepunching!  If we _do_ have such a plan (or
        might in the future) then what would the API look like?  I think
        sys_holepunch(fd, start, len), so we should start out with that."
      
      - Using madvise is very weird, because people will ask "why do I need to
        mmap my file before I can stick a hole in it?"
      
      - None of the other madvise operations call into the filesystem in this
        manner.  A broad question is: is this capability an MM operation or a
        filesytem operation?  truncate, for example, is a filesystem operation
        which sometimes has MM side-effects.  madvise is an mm operation and with
        this patch, it gains FS side-effects, only they're really, really
        significant ones."
      
      Comments:
      
      - Andrea suggested the fs operation too but then it's more efficient to
        have it as a mm operation with fs side effects, because they don't
        immediatly know fd and physical offset of the range.  It's possible to
        fixup in userland and to use the fs operation but it's more expensive,
        the vmas are already in the kernel and we can use them.
      
      Short term plan &  Future Direction:
      
      - We seem to need this interface only for shmfs/tmpfs files in the short
        term.  We have to add hooks into the filesystem for correctness and
        completeness.  This is what this patch does.
      
      - In the future, plan is to support both fs and mmap apis also.  This
        also involves (other) filesystem specific functions to be implemented.
      
      - Current patch doesn't support VM_NONLINEAR - which can be addressed in
        the future.
      Signed-off-by: default avatarBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f6b3ec23
  29. 28 Nov, 2005 1 commit
    • Linus Torvalds's avatar
      mm: re-architect the VM_UNPAGED logic · 6aab341e
      Linus Torvalds authored
      
      This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
      explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
      VM area to contain an arbitrary range of page table entries that the VM
      never touches, and never considers to be normal pages.
      
      Any user of "remap_pfn_range()" automatically gets this new
      functionality, and doesn't even have to mark the pages reserved or
      indeed mark them any other way.  It just works.  As a side effect, doing
      mmap() on /dev/mem works for arbitrary ranges.
      
      Sparc update from David in the next commit.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6aab341e