1. 23 Mar, 2011 3 commits
  2. 24 Feb, 2011 1 commit
  3. 14 Jan, 2011 1 commit
    • Andrea Arcangeli's avatar
      thp: split_huge_page paging · 3f04f62f
      Andrea Arcangeli authored
      
      Paging logic that splits the page before it is unmapped and added to swap
      to ensure backwards compatibility with the legacy swap code.  Eventually
      swap should natively pageout the hugepages to increase performance and
      decrease seeking and fragmentation of swap space.  swapoff can just skip
      over huge pmd as they cannot be part of swap yet.  In add_to_swap be
      careful to split the page only if we got a valid swap entry so we don't
      split hugepages with a full swap.
      
      In theory we could split pages before isolating them during the lru scan,
      but for khugepaged to be safe, I'm relying on either mmap_sem write mode,
      or PG_lock taken, so split_huge_page has to run either with mmap_sem
      read/write mode or PG_lock taken.  Calling it from isolate_lru_page would
      make locking more complicated, in addition to that split_huge_page would
      deadlock if called by __isolate_lru_page because it has to take the lru
      lock to add the tail pages.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f04f62f
  4. 13 Nov, 2010 1 commit
    • Tejun Heo's avatar
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo authored
      
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarNeil Brown <neilb@suse.de>
      Acked-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
  5. 26 Oct, 2010 1 commit
  6. 16 Sep, 2010 1 commit
    • Christoph Hellwig's avatar
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig authored
      
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  7. 10 Sep, 2010 5 commits
    • Christoph Hellwig's avatar
      swap: do not send discards as barriers · 349f429e
      Christoph Hellwig authored
      
      The swap code already uses synchronous discards, no need to add I/O barriers.
      
      tj: superflous newlines removed.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarNigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      349f429e
    • Hugh Dickins's avatar
      swap: discard while swapping only if SWAP_FLAG_DISCARD · 33994466
      Hugh Dickins authored
      
      Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs
      show a shift since I last tested in December: in part because of firmware
      updates, in part because of the necessary move from barriers to awaiting
      completion at the block layer.  While discard at swapon still shows as
      slightly beneficial on both, discarding 1MB swap cluster when allocating
      is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV).
      
      Surrender: discard as presently implemented is more hindrance than help
      for swap; but might prove useful on other devices, or with improvements.
      So continue to do the discard at swapon, but make discard while swapping
      conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using
      only the lower 16 bits of int flags).
      
      We can add a --discard or -d to swapon(8), and a "discard" to swap in
      /etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Nigel Cunningham <nigel@tuxonice.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33994466
    • Christoph Hellwig's avatar
      swap: do not send discards as barriers · 8f2ae0fa
      Christoph Hellwig authored
      
      The swap code already uses synchronous discards, no need to add I/O
      barriers.
      
      This fixes the worst of the terrible slowdown in swap allocation for
      hibernation, reported on 2.6.35 by Nigel Cunningham; but does not entirely
      eliminate that regression.
      
      [tj@kernel.org: superflous newlines removed]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarNigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f2ae0fa
    • Hugh Dickins's avatar
      swap: prevent reuse during hibernation · b73d7fce
      Hugh Dickins authored
      
      Move the hibernation check from scan_swap_map() into try_to_free_swap():
      to catch not only the common case when hibernation's allocation itself
      triggers swap reuse, but also the less likely case when concurrent page
      reclaim (shrink_page_list) might happen to try_to_free_swap from a page.
      
      Hibernation already clears __GFP_IO from the gfp_allowed_mask, to stop
      reclaim from going to swap: check that to prevent swap reuse too.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ondrej Zary <linux@rainbow-software.org>
      Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nigel Cunningham <nigel@tuxonice.net>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b73d7fce
    • Hugh Dickins's avatar
      swap: revert special hibernation allocation · 910321ea
      Hugh Dickins authored
      Please revert 2.6.36-rc commit d2997b10
      
      
      "hibernation: freeze swap at hibernation".  It complicated matters by
      adding a second swap allocation path, just for hibernation; without in any
      way fixing the issue that it was intended to address - page reclaim after
      fixing the hibernation image might free swap from a page already imaged as
      swapcache, letting its swap be reallocated to store a different page of
      the image: resulting in data corruption if the imaged page were freed as
      clean then swapped back in.  Pages freed to si->swap_map were still in
      danger of being reallocated by the alternative allocation path.
      
      I guess it inadvertently fixed slow SSD swap allocation for hibernation,
      as reported by Nigel Cunningham: by missing out the discards that occur on
      the usual swap allocation path; but that was unintentional, and needs a
      separate fix.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ondrej Zary <linux@rainbow-software.org>
      Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      910321ea
  8. 10 Aug, 2010 2 commits
  9. 18 May, 2010 2 commits
  10. 28 Apr, 2010 1 commit
  11. 12 Mar, 2010 1 commit
    • Daisuke Nishimura's avatar
      memcg: move charges of anonymous swap · 02491447
      Daisuke Nishimura authored
      
      This patch is another core part of this move-charge-at-task-migration
      feature.  It enables moving charges of anonymous swaps.
      
      To move the charge of swap, we need to exchange swap_cgroup's record.
      
      In current implementation, swap_cgroup's record is protected by:
      
        - page lock: if the entry is on swap cache.
        - swap_lock: if the entry is not on swap cache.
      
      This works well in usual swap-in/out activity.
      
      But this behavior make the feature of moving swap charge check many
      conditions to exchange swap_cgroup's record safely.
      
      So I changed modification of swap_cgroup's recored(swap_cgroup_record())
      to use xchg, and define a new function to cmpxchg swap_cgroup's record.
      
      This patch also enables moving charge of non pte_present but not uncharged
      swap caches, which can be exist on swap-out path, by getting the target
      pages via find_get_page() as do_mincore() does.
      
      [kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
      [akpm@linux-foundation.org: fix typos]
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02491447
  12. 06 Mar, 2010 4 commits
    • Hugh Dickins's avatar
      mm: add comment on swap_duplicate's error code · 08259d58
      Hugh Dickins authored
      
      swap_duplicate()'s loop appears to miss out on returning the error code
      from __swap_duplicate(), except when that's -ENOMEM.  In fact this is
      intentional: prior to -ENOMEM for swap_count_continuation,
      swap_duplicate() was void (and the case only occurs when copy_one_pte()
      hits a corrupt pte).  But that's surprising behaviour, which certainly
      deserves a comment.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarHuang Shijie <shijie8@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08259d58
    • Hugh Dickins's avatar
      mm/swapfile.c: fix swapon size off-by-one · ad2bd7e0
      Hugh Dickins authored
      
      There's an off-by-one disagreement between mkswap and swapon about the
      meaning of swap_header last_page: mkswap (in all versions I've looked at:
      util-linux-ng and BusyBox and old util-linux; probably as far back as
      1999) consistently means the offset (in page units) of the last page of
      the swap area, whereas kernel sys_swapon (as far back as 2.2 and 2.3)
      strangely takes it to mean the size (in page units) of the swap area.
      
      This disagreement is the safe way round; but it's worrying people, and
      loses us one page of swap.
      
      The fix is not just to add one to nr_good_pages: we need to get maxpages
      (the size of the swap_map array) right before that; and though that is an
      unsigned long, be careful not to overflow the unsigned int p->max which
      later holds it (probably why header uses __u32 last_page instead of size).
      
      Why did we subtract one from the maximum swp_offset to calculate maxpages?
       Though it was probably me who made that change in 2.4.10, I don't get it:
      and now we should be adding one (without risk of overflow in this case).
      
      Fix the handling of swap_header badpages: it could have overrun the
      swap_map when very large swap area used on a more limited architecture.
      
      Remove pre-initializations of swap_header, nr_good_pages and maxpages:
      those date from when sys_swapon was supporting other versions of header.
      Reported-by: default avatarNitin Gupta <ngupta@vflare.org>
      Reported-by: default avatarJarkko Lavinen <jarkko.lavinen@nokia.com>
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad2bd7e0
    • KAMEZAWA Hiroyuki's avatar
      mm: count swap usage · b084d435
      KAMEZAWA Hiroyuki authored
      
      A frequent questions from users about memory management is what numbers of
      swap ents are user for processes.  And this information will give some
      hints to oom-killer.
      
      Besides we can count the number of swapents per a process by scanning
      /proc/<pid>/smaps, this is very slow and not good for usual process
      information handler which works like 'ps' or 'top'.  (ps or top is now
      enough slow..)
      
      This patch adds a counter of swapents to mm_counter and update is at each
      swap events.  Information is exported via /proc/<pid>/status file as
      
      [kamezawa@bluextal memory]$ cat /proc/self/status
      Name:   cat
      State:  R (running)
      Tgid:   2910
      Pid:    2910
      PPid:   2823
      TracerPid:      0
      Uid:    500     500     500     500
      Gid:    500     500     500     500
      FDSize: 256
      Groups: 500
      VmPeak:    82696 kB
      VmSize:    82696 kB
      VmLck:         0 kB
      VmHWM:       432 kB
      VmRSS:       432 kB
      VmData:      172 kB
      VmStk:        84 kB
      VmExe:        48 kB
      VmLib:      1568 kB
      VmPTE:        40 kB
      VmSwap:        0 kB <=============== this.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b084d435
    • KAMEZAWA Hiroyuki's avatar
      mm: clean up mm_counter · d559db08
      KAMEZAWA Hiroyuki authored
      
      Presently, per-mm statistics counter is defined by macro in sched.h
      
      This patch modifies it to
        - defined in mm.h as inlinf functions
        - use array instead of macro's name creation.
      
      This patch is for reducing patch size in future patch to modify
      implementation of per-mm counter.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d559db08
  13. 15 Dec, 2009 11 commits
  14. 02 Nov, 2009 1 commit
  15. 01 Oct, 2009 1 commit
  16. 22 Sep, 2009 1 commit
  17. 16 Sep, 2009 1 commit
    • Andi Kleen's avatar
      HWPOISON: Add support for poison swap entries v2 · a7420aa5
      Andi Kleen authored
      
      Memory migration uses special swap entry types to trigger special actions on
      page faults. Extend this mechanism to also support poisoned swap entries, to
      trigger poison handling on page faults. This allows follow-on patches to
      prevent processes from faulting in poisoned pages again.
      
      v2: Fix overflow in MAX_SWAPFILES (Fengguang Wu)
      v3: Better overflow fix (Hidehiro Kawai)
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      a7420aa5
  18. 14 Sep, 2009 1 commit
    • Christoph Hellwig's avatar
      block: use blkdev_issue_discard in blk_ioctl_discard · 746cd1e7
      Christoph Hellwig authored
      
      blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
      the only difference between the two is that blkdev_issue_discard needs to
      send a barrier discard request and blk_ioctl_discard a non-barrier one,
      and blk_ioctl_discard needs to wait on the request.  To facilitates this
      add a flags argument to blkdev_issue_discard to control both aspects of the
      behaviour.  This will be very useful later on for using the waiting
      funcitonality for other callers.
      
      Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      746cd1e7
  19. 29 Jul, 2009 1 commit