1. 03 Feb, 2020 1 commit
    • Carlos Maiolino's avatar
      fs: Enable bmap() function to properly return errors · 30460e1e
      Carlos Maiolino authored
      
      By now, bmap() will either return the physical block number related to
      the requested file offset or 0 in case of error or the requested offset
      maps into a hole.
      This patch makes the needed changes to enable bmap() to proper return
      errors, using the return value as an error return, and now, a pointer
      must be passed to bmap() to be filled with the mapped physical block.
      
      It will change the behavior of bmap() on return:
      
      - negative value in case of error
      - zero on success or map fell into a hole
      
      In case of a hole, the *block will be zero too
      
      Since this is a prep patch, by now, the only error return is -EINVAL if
      ->bmap doesn't exist.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      30460e1e
  2. 01 Dec, 2019 1 commit
  3. 16 Nov, 2019 1 commit
  4. 12 Jul, 2019 1 commit
    • Aaron Lu's avatar
      mm, swap: use rbtree for swap_extent · 4efaceb1
      Aaron Lu authored
      swap_extent is used to map swap page offset to backing device's block
      offset.  For a continuous block range, one swap_extent is used and all
      these swap_extents are managed in a linked list.
      
      These swap_extents are used by map_swap_entry() during swap's read and
      write path.  To find out the backing device's block offset for a page
      offset, the swap_extent list will be traversed linearly, with
      curr_swap_extent being used as a cache to speed up the search.
      
      This works well as long as swap_extents are not huge or when the number
      of processes that access swap device are few, but when the swap device
      has many extents and there are a number of processes accessing the swap
      device concurrently, it can be a problem.  On one of our servers, the
      disk's remaining size is tight:
      
        $df -h
        Filesystem      Size  Used Avail Use% Mounted on
        ... ...
        /dev/nvme0n1p1  1.8T  1.3T  504G  72% /home/t4
      
      When creating a 80G swapfile there, there are as many as 84656 swap
      extents.  The end result is, kernel spends abou 30% time in
      map_swap_entry() and swap throughput is only 70MB/s.
      
      As a comparison, when I used smaller sized swapfile, like 4G whose
      swap_extent dropped to 2000, swap throughput is back to 400-500MB/s and
      map_swap_entry() is about 3%.
      
      One downside of using rbtree for swap_extent is, 'struct rbtree' takes
      24 bytes while 'struct list_head' takes 16 bytes, that's 8 bytes more
      for each swap_extent.  For a swapfile that has 80k swap_extents, that
      means 625KiB more memory consumed.
      
      Test:
      
      Since it's not possible to reboot that server, I can not test this patch
      diretly there.  Instead, I tested it on another server with NVMe disk.
      
      I created a 20G swapfile on an NVMe backed XFS fs.  By default, the
      filesystem is quite clean and the created swapfile has only 2 extents.
      Testing vanilla and this patch shows no obvious performance difference
      when swapfile is not fragmented.
      
      To see the patch's effects, I used some tweaks to manually fragment the
      swapfile by breaking the extent at 1M boundary.  This made the swapfile
      have 20K extents.
      
        nr_task=4
        kernel   swapout(KB/s) map_swap_entry(perf)  swapin(KB/s) map_swap_entry(perf)
        vanilla  165191           90.77%             171798          90.21%
        patched  858993 +420%      2.16%             715827 +317%     0.77%
      
        nr_task=8
        kernel   swapout(KB/s) map_swap_entry(perf)  swapin(KB/s) map_swap_entry(perf)
        vanilla  306783           92.19%             318145          87.76%
        patched  954437 +211%      2.35%            1073741 +237%     1.57%
      
      swapout: the throughput of swap out, in KB/s, higher is better 1st
      map_swap_entry: cpu cycles percent sampled by perf swapin: the
      throughput of swap in, in KB/s, higher is better.  2nd map_swap_entry:
      cpu cycles percent sampled by perf
      
      nr_task=1 doesn't show any difference, this is due to the curr_swap_extent
      can be effectively used to cache the correct swap extent for single task
      workload.
      
      [akpm@linux-foundation.org: s/BUG_ON(1)/BUG()/]
      Link: http://lkml.kernel.org/r/20190523142404.GA181@aaronlu
      
      Signed-off-by: default avatarAaron Lu <ziqian.lzq@antfin.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4efaceb1
  5. 05 Jul, 2019 1 commit
  6. 29 Jun, 2019 1 commit
    • Huang Ying's avatar
      mm, swap: fix THP swap out · 1a5f439c
      Huang Ying authored
      0-Day test system reported some OOM regressions for several THP
      (Transparent Huge Page) swap test cases.  These regressions are bisected
      to 68614289 ("block: always define BIO_MAX_PAGES as 256").  In the
      commit, BIO_MAX_PAGES is set to 256 even when THP swap is enabled.  So the
      bio_alloc(gfp_flags, 512) in get_swap_bio() may fail when swapping out
      THP.  That causes the OOM.
      
      As in the patch description of 68614289 ("block: always define
      BIO_MAX_PAGES as 256"), THP swap should use multi-page bvec to write THP
      to swap space.  So the issue is fixed via doing that in get_swap_bio().
      
      BTW: I remember I have checked the THP swap code when 68614289
      ("block: always define BIO_MAX_PAGES as 256") was merged, and thought the
      THP swap code needn't to be changed.  But apparently, I was wrong.  I
      should have done this at that time.
      
      Link: http://lkml.kernel.org/r/20190624075515.31040-1-ying.huang@intel.com
      Fixes: 68614289
      
       ("block: always define BIO_MAX_PAGES as 256")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a5f439c
  7. 04 Jan, 2019 1 commit
  8. 02 Jan, 2019 1 commit
    • Linus Torvalds's avatar
      block: don't use un-ordered __set_current_state(TASK_UNINTERRUPTIBLE) · 1ac5cd49
      Linus Torvalds authored
      This mostly reverts commit 849a3700
      
       ("block: avoid ordered task
      state change for polled IO").  It was wrongly claiming that the ordering
      wasn't necessary.  The memory barrier _is_ necessary.
      
      If something is truly polling and not going to sleep, it's the whole
      state setting that is unnecessary, not the memory barrier.  Whenever you
      set your state to a sleeping state, you absolutely need the memory
      barrier.
      
      Note that sometimes the memory barrier can be elsewhere.  For example,
      the ordering might be provided by an external lock, or by setting the
      process state to sleeping before adding yourself to the wait queue list
      that is used for waking up (where the wait queue lock itself will
      guarantee that any wakeup will correctly see the sleeping state).
      
      But none of those cases were true here.
      
      NOTE! Some of the polling paths may indeed be able to drop the state
      setting entirely, at which point the memory barrier also goes away.
      
      (Also note that this doesn't revert the TASK_RUNNING cases: there is no
      race between a wakeup and setting the process state to TASK_RUNNING,
      since the end result doesn't depend on ordering).
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ac5cd49
  9. 08 Dec, 2018 1 commit
  10. 26 Nov, 2018 1 commit
    • Jens Axboe's avatar
      block: make blk_poll() take a parameter on whether to spin or not · 0a1b8b87
      Jens Axboe authored
      
      blk_poll() has always kept spinning until it found an IO. This is
      fine for SYNC polling, since we need to find one request we have
      pending, but in preparation for ASYNC polling it can be beneficial
      to just check if we have any entries available or not.
      
      Existing callers are converted to pass in 'spin == true', to retain
      the old behavior.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0a1b8b87
  11. 19 Nov, 2018 1 commit
  12. 16 Nov, 2018 1 commit
  13. 02 Nov, 2018 1 commit
  14. 26 Oct, 2018 1 commit
  15. 23 Oct, 2018 1 commit
    • David Howells's avatar
      iov_iter: Separate type from direction and use accessor functions · aa563d7b
      David Howells authored
      
      In the iov_iter struct, separate the iterator type from the iterator
      direction and use accessor functions to access them in most places.
      
      Convert a bunch of places to use switch-statements to access them rather
      then chains of bitwise-AND statements.  This makes it easier to add further
      iterator types.  Also, this can be more efficient as to implement a switch
      of small contiguous integers, the compiler can use ~50% fewer compare
      instructions than it has to use bitwise-and instructions.
      
      Further, cease passing the iterator type into the iterator setup function.
      The iterator function can set that itself.  Only the direction is required.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      aa563d7b
  16. 22 Sep, 2018 1 commit
  17. 09 Jul, 2018 2 commits
  18. 06 Jan, 2018 1 commit
  19. 16 Nov, 2017 1 commit
  20. 03 Nov, 2017 1 commit
  21. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard...
      b2441318
  22. 07 Sep, 2017 1 commit
    • Huang Ying's avatar
      mm: test code to write THP to swap device as a whole · 225311a4
      Huang Ying authored
      To support delay splitting THP (Transparent Huge Page) after swapped
      out, we need to enhance swap writing code to support to write a THP as a
      whole.  This will improve swap write IO performance.
      
      As Ming Lei <ming.lei@redhat.com> pointed out, this should be based on
      multipage bvec support, which hasn't been merged yet.  So this patch is
      only for testing the functionality of the other patches in the series.
      And will be reimplemented after multipage bvec support is merged.
      
      Link: http://lkml.kernel.org/r/20170724051840.2309-7-ying.huang@intel.com
      
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Vishal L Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      225311a4
  23. 23 Aug, 2017 1 commit
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74d46992
  24. 03 Aug, 2017 1 commit
    • Tetsuo Handa's avatar
      mm/page_io.c: fix oops during block io poll in swapin path · b0ba2d0f
      Tetsuo Handa authored
      When a thread is OOM-killed during swap_readpage() operation, an oops
      occurs because end_swap_bio_read() is calling wake_up_process() based on
      an assumption that the thread which called swap_readpage() is still
      alive.
      
        Out of memory: Kill process 525 (polkitd) score 0 or sacrifice child
        Killed process 525 (polkitd) total-vm:528128kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
        oom_reaper: reaped process 525 (polkitd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
        Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp ppdev pcspkr vmw_balloon sg shpchp vmw_vmci parport_pc parport i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi vmwgfx ahci libahci drm_kms_helper ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi scsi_transport_spi ttm e1000 mptscsih drm mptbase i2c_core libata serio_raw
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc2-next-20170725 #129
        Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
        task: ffffffffb7c16500 task.stack: ffffffffb7c00000
        RIP: 0010:__lock_acquire+0x151/0x12f0
        Call Trace:
         <IRQ>
         lock_acquire+0x59/0x80
         _raw_spin_lock_irqsave+0x3b/0x4f
         try_to_wake_up+0x3b/0x410
         wake_up_process+0x10/0x20
         end_swap_bio_read+0x6f/0xf0
         bio_endio+0x92/0xb0
         blk_update_request+0x88/0x270
         scsi_end_request+0x32/0x1c0
         scsi_io_completion+0x209/0x680
         scsi_finish_command+0xd4/0x120
         scsi_softirq_done+0x120/0x140
         __blk_mq_complete_request_remote+0xe/0x10
         flush_smp_call_function_queue+0x51/0x120
         generic_smp_call_function_single_interrupt+0xe/0x20
         smp_trace_call_function_single_interrupt+0x22/0x30
         smp_call_function_single_interrupt+0x9/0x10
         call_function_single_interrupt+0xa7/0xb0
         </IRQ>
        RIP: 0010:native_safe_halt+0x6/0x10
         default_idle+0xe/0x20
         arch_cpu_idle+0xa/0x10
         default_idle_call+0x1e/0x30
         do_idle+0x187/0x200
         cpu_startup_entry+0x6e/0x70
         rest_init+0xd0/0xe0
         start_kernel+0x456/0x477
         x86_64_start_reservations+0x24/0x26
         x86_64_start_kernel+0xf7/0x11a
         secondary_startup_64+0xa5/0xa5
        Code: c3 49 81 3f 20 9e 0b b8 41 bc 00 00 00 00 44 0f 45 e2 83 fe 01 0f 87 62 ff ff ff 89 f0 49 8b 44 c7 08 48 85 c0 0f 84 52 ff ff ff <f0> ff 80 98 01 00 00 8b 3d 5a 49 c4 01 45 8b b3 18 0c 00 00 85
        RIP: __lock_acquire+0x151/0x12f0 RSP: ffffa01f39e03c50
        ---[ end trace 6c441db499169b1e ]---
        Kernel panic - not syncing: Fatal exception in interrupt
        Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
        ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      Fix it by holding a reference to the thread.
      
      [akpm@linux-foundation.org: add comment]
      Fixes: 23955622
      
       ("swap: add block io poll in swapin path")
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarShaohua Li <shli@fb.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0ba2d0f
  25. 10 Jul, 2017 1 commit
    • Shaohua Li's avatar
      swap: add block io poll in swapin path · 23955622
      Shaohua Li authored
      For fast flash disk, async IO could introduce overhead because of
      context switch.  block-mq now supports IO poll, which improves
      performance and latency a lot.  swapin is a good place to use this
      technique, because the task is waiting for the swapin page to continue
      execution.
      
      In my virtual machine, directly read 4k data from a NVMe with iopoll is
      about 60% better than that without poll.  With iopoll support in swapin
      patch, my microbenchmark (a task does random memory write) is about
      10%~25% faster.  CPU utilization increases a lot though, 2x and even 3x
      CPU utilization.  This will depend on disk speed.
      
      While iopoll in swapin isn't intended for all usage cases, it's a win
      for latency sensistive workloads with high speed swap disk.  block layer
      has knob to control poll in runtime.  If poll isn't enabled in block
      layer, there should be no noticeable change in swapin.
      
      I got a chance to run the same test in a NVMe with DRAM as the media.
      In simple fio IO test, blkpoll boosts 50% performance in single thread
      test and ~20% in 8 threads test.  So this is the base line.  In above
      swap test, blkpoll boosts ~27% performance in single thread test.
      blkpoll uses 2x CPU time though.
      
      If we enable hybid polling, the performance gain has very slight drop
      but CPU time is only 50% worse than that without blkpoll.  Also we can
      adjust parameter of hybid poll, with it, the CPU time penality is
      reduced further.  In 8 threads test, blkpoll doesn't help though.  The
      performance is similar to that without blkpoll, but cpu utilization is
      similar too.  There is lock contention in swap path.  The cpu time
      spending on blkpoll isn't high.  So overall, blkpoll swapin isn't worse
      than that without it.
      
      The swapin readahead might read several pages in in the same time and
      form a big IO request.  Since the IO will take longer time, it doesn't
      make sense to do poll, so the patch only does iopoll for single page
      swapin.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/070c3c3e40b711e7b1390002c991e86a-b5408f0@7511894063d3764ff01ea8111f5a004d7dd700ed078797c204a24e620ddb965c
      
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23955622
  26. 09 Jun, 2017 1 commit
  27. 02 Nov, 2016 1 commit
  28. 08 Oct, 2016 1 commit
  29. 19 Sep, 2016 1 commit
    • Santosh Shilimkar's avatar
      mm: fix the page_swap_info() BUG_ON check · c8de641b
      Santosh Shilimkar authored
      Commit 62c230bc ("mm: add support for a filesystem to activate
      swap files and use direct_IO for writing swap pages") replaced the
      swap_aops dirty hook from __set_page_dirty_no_writeback() with
      swap_set_page_dirty().
      
      For normal cases without these special SWP flags code path falls back to
      __set_page_dirty_no_writeback() so the behaviour is expected to be the
      same as before.
      
      But swap_set_page_dirty() makes use of the page_swap_info() helper to
      get the swap_info_struct to check for the flags like SWP_FILE,
      SWP_BLKDEV etc as desired for those features.  This helper has
      BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
      set_page_dirty_lock() path.
      
      For the set_page_dirty() path which is often needed for cases to be
      called from irq context, kswapd() can toggle the flag behind the back
      while the call is getting executed when system is low on memory and
      heavy swapping is ongoing.
      
      This ends up with undesired kernel panic.
      
      This patch just moves the check outside the helper to its users
      appropriately to fix kernel panic for the described path.  Couple of
      users of helpers already take care of SwapCache condition so I skipped
      them.
      
      Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.com
      
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[4.7.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8de641b
  30. 07 Aug, 2016 1 commit
  31. 28 Jul, 2016 1 commit
  32. 07 Jun, 2016 2 commits
  33. 01 May, 2016 1 commit
  34. 29 Apr, 2016 1 commit
    • Minchan Kim's avatar
      mm: call swap_slot_free_notify() with page lock held · b06bad17
      Minchan Kim authored
      Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
      page_swap_info.  The reason is that page_endio in rw_page unlocks the
      page if read I/O is completed so we need to hold a PG_lock again to
      check PageSwapCache.  Otherwise, the page can be removed from swapcache.
      
        Kernel BUG at c00f9040 [verbose debug info unavailable]
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
        Modules linked in:
        CPU: 4 PID: 13446 Comm: RenderThread Tainted: G        W 3.10.84-g9f14aec-dirty #73
        task: c3b73200 ti: dd192000 task.ti: dd192000
        PC is at page_swap_info+0x10/0x2c
        LR is at swap_slot_free_notify+0x18/0x6c
        pc : [<c00f9040>]    lr : [<c00f5560>]    psr: 400f0113
        sp : dd193d78  ip : c2deb1e4  fp : da015180
        r10: 00000000  r9 : 000200da  r8 : c120fe08
        r7 : 00000000  r6 : 00000000  r5 : c249a6c0  r4 : = c249a6c0
        r3 : 00000000  r2 : 40080009  r1 : 200f0113  r0 : = c249a6c0
        ..<snip> ..
        Call Trace:
          page_swap_info+0x10/0x2c
          swap_slot_free_notify+0x18/0x6c
          swap_readpage+0x90/0x11c
          read_swap_cache_async+0x134/0x1ac
          swapin_readahead+0x70/0xb0
          handle_pte_fault+0x320/0x6fc
          handle_mm_fault+0xc0/0xf0
          do_page_fault+0x11c/0x36c
          do_DataAbort+0x34/0x118
      
      Fixes: 3f2b1a04
      
       ("zram: revive swap_slot_free_notify")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Tested-by: default avatarKyeongdon Kim <kyeongdon.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b06bad17
  35. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  36. 22 Mar, 2016 1 commit
    • Minchan Kim's avatar
      zram: revive swap_slot_free_notify · 3f2b1a04
      Minchan Kim authored
      Commit b430e9d1 ("remove compressed copy from zram in-memory")
      applied swap_slot_free_notify call in *end_swap_bio_read* to remove
      duplicated memory between zram and memory.
      
      However, with the introduction of rw_page in zram: 8c7f0102
      
       ("zram:
      implement rw_page operation of zram"), it became void because rw_page
      doesn't need bio.
      
      Memory footprint is really important in embedded platforms which have
      small memory, for example, 512M) recently because it could start to kill
      processes if memory footprint exceeds some threshold by LMK or some
      similar memory management modules.
      
      This patch restores the function for rw_page, thereby eliminating this
      duplication.
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: karam.lee <karam.lee@lge.com>
      Cc: <sangseok.lee@lge.com>
      Cc: Chan Jeong <chan.jeong@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f2b1a04
  37. 17 Mar, 2016 1 commit
  38. 13 Aug, 2015 1 commit