1. 16 Mar, 2022 1 commit
  2. 15 Mar, 2022 1 commit
  3. 09 Mar, 2022 1 commit
  4. 22 Jan, 2022 1 commit
    • Xiaoming Ni's avatar
      aio: move aio sysctl to aio.c · 86b12b6c
      Xiaoming Ni authored
      The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
      dishes, this makes it very difficult to maintain.
      
      To help with this maintenance let's start by moving sysctls to places
      where they actually belong.  The proc sysctl maintainers do not want to
      know what sysctl knobs you wish to add for your own piece of code, we
      just care about the core logic.
      
      Move aio sysctl to aio.c and use the new register_sysctl_init() to
      register the sysctl interface for aio.
      
      [mcgrof@kernel.org: adjust commit log to justify the move]
      
      Link: https://lkml.kernel.org/r/20211123202347.818157-9-mcgrof@kernel.org
      
      Signed-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Qing Wang <wangqing@vivo.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Stephen Kitt <steve@sk2.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Antti Palosaari <crope@iki.fi>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Clemens Ladisch <clemens@ladisch.de>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Julia Lawall <julia.lawall@inria.fr>
      Cc: Lukas Middendorf <kernel@tuxforce.de>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Phillip Potter <phil@philpotter.co.uk>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: James E.J. Bottomley <jejb@linux.ibm.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86b12b6c
  5. 09 Dec, 2021 3 commits
    • Xie Yongji's avatar
      aio: Fix incorrect usage of eventfd_signal_allowed() · 4b374986
      Xie Yongji authored
      We should defer eventfd_signal() to the workqueue when
      eventfd_signal_allowed() return false rather than return
      true.
      
      Fixes: b542e383
      
       ("eventfd: Make signal recursion protection a task bit")
      Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com
      
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      4b374986
    • Eric Biggers's avatar
      aio: fix use-after-free due to missing POLLFREE handling · 50252e4b
      Eric Biggers authored
      signalfd_poll() and binder_poll() are special in that they use a
      waitqueue whose lifetime is the current task, rather than the struct
      file as is normally the case.  This is okay for blocking polls, since a
      blocking poll occurs within one task; however, non-blocking polls
      require another solution.  This solution is for the queue to be cleared
      before it is freed, by sending a POLLFREE notification to all waiters.
      
      Unfortunately, only eventpoll handles POLLFREE.  A second type of
      non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
      handle POLLFREE.  This allows a use-after-free to occur if a signalfd or
      binder fd is polled with aio poll, and the waitqueue gets freed.
      
      Fix this by making aio poll handle POLLFREE.
      
      A patch by Ramji Jiyani <ramjiyani@google.com>
      (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
      tried to do this by making aio_poll_wake() always complete the request
      inline if POLLFREE is seen.  However, that solution had two bugs.
      First, it introduced a deadlock, as it unconditionally locked the aio
      context while holding the waitqueue lock, which inverts the normal
      locking order.  Second, it didn't consider that POLLFREE notifications
      are missed while the request has been temporarily de-queued.
      
      The second problem was solved by my previous patch.  This patch then
      properly fixes the use-after-free by handling POLLFREE in a
      deadlock-free way.  It does this by taking advantage of the fact that
      freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      50252e4b
    • Eric Biggers's avatar
      aio: keep poll requests on waitqueue until completed · 363bee27
      Eric Biggers authored
      Currently, aio_poll_wake() will always remove the poll request from the
      waitqueue.  Then, if aio_poll_complete_work() sees that none of the
      polled events are ready and the request isn't cancelled, it re-adds the
      request to the waitqueue.  (This can easily happen when polling a file
      that doesn't pass an event mask when waking up its waitqueue.)
      
      This is fundamentally broken for two reasons:
      
        1. If a wakeup occurs between vfs_poll() and the request being
           re-added to the waitqueue, it will be missed because the request
           wasn't on the waitqueue at the time.  Therefore, IOCB_CMD_POLL
           might never complete even if the polled file is ready.
      
        2. When the request isn't on the waitqueue, there is no way to be
           notified that the waitqueue is being freed (which happens when its
           lifetime is shorter than the struct file's).  This is supposed to
           happen via the waitqueue entries being woken up with POLLFREE.
      
      Therefore, leave the requests on the waitqueue until they are actually
      completed (or cancelled).  To keep track of when aio_poll_complete_work
      needs to be scheduled, use new fields in struct poll_iocb.  Remove the
      'done' field which is now redundant.
      
      Note that this is consistent with how sys_poll() and eventpoll work;
      their wakeup functions do *not* remove the waitqueue entries.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      363bee27
  6. 25 Oct, 2021 1 commit
  7. 20 Oct, 2021 1 commit
  8. 27 Aug, 2021 1 commit
    • Thomas Gleixner's avatar
      eventfd: Make signal recursion protection a task bit · b542e383
      Thomas Gleixner authored
      
      The recursion protection for eventfd_signal() is based on a per CPU
      variable and relies on the !RT semantics of spin_lock_irqsave() for
      protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither
      disables preemption nor interrupts which allows the spin lock held section
      to be preempted. If the preempting task invokes eventfd_signal() as well,
      then the recursion warning triggers.
      
      Paolo suggested to protect the per CPU variable with a local lock, but
      that's heavyweight and actually not necessary. The goal of this protection
      is to prevent the task stack from overflowing, which can be achieved with a
      per task recursion protection as well.
      
      Replace the per CPU variable with a per task bit similar to other recursion
      protection bits like task_struct::in_page_owner. This works on both !RT and
      RT kernels and removes as a side effect the extra per CPU storage.
      
      No functional change for !RT kernels.
      Reported-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx
      b542e383
  9. 30 Apr, 2021 1 commit
  10. 15 Dec, 2020 1 commit
    • Dmitry Safonov's avatar
      mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio · cd544fd1
      Dmitry Safonov authored
      As kernel expect to see only one of such mappings, any further operations
      on the VMA-copy may be unexpected by the kernel.  Maybe it's being on the
      safe side, but there doesn't seem to be any expected use-case for this, so
      restrict it now.
      
      Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com
      Fixes: commit e346b381
      
       ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd544fd1
  11. 11 Nov, 2020 1 commit
  12. 06 Nov, 2020 1 commit
  13. 03 Oct, 2020 1 commit
  14. 23 Aug, 2020 1 commit
  15. 07 Aug, 2020 1 commit
  16. 16 Jun, 2020 1 commit
  17. 11 Jun, 2020 1 commit
  18. 09 Jun, 2020 1 commit
  19. 14 May, 2020 1 commit
    • Miklos Szeredi's avatar
      aio: fix async fsync creds · 530f32fc
      Miklos Szeredi authored
      
      Avi Kivity reports that on fuse filesystems running in a user namespace
      asyncronous fsync fails with EOVERFLOW.
      
      The reason is that f_ops->fsync() is called with the creds of the kthread
      performing aio work instead of the creds of the process originally
      submitting IOCB_CMD_FSYNC.
      
      Fuse sends the creds of the caller in the request header and it needs to
      translate the uid and gid into the server's user namespace.  Since the
      kthread is running in init_user_ns, the translation will fail and the
      operation returns an error.
      
      It can be argued that fsync doesn't actually need any creds, but just
      zeroing out those fields in the header (as with requests that currently
      don't take creds) is a backward compatibility risk.
      
      Instead of working around this issue in fuse, solve the core of the problem
      by calling the filesystem with the proper creds.
      Reported-by: default avatarAvi Kivity <avi@scylladb.com>
      Tested-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
      Fixes: c9582eb0
      
       ("fuse: Fail all requests with invalid uids or gids")
      Cc: stable@vger.kernel.org  # 4.18+
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      530f32fc
  20. 04 Feb, 2020 1 commit
    • Jens Axboe's avatar
      aio: prevent potential eventfd recursion on poll · 01d7a356
      Jens Axboe authored
      
      If we have nested or circular eventfd wakeups, then we can deadlock if
      we run them inline from our poll waitqueue wakeup handler. It's also
      possible to have very long chains of notifications, to the extent where
      we could risk blowing the stack.
      
      Check the eventfd recursion count before calling eventfd_signal(). If
      it's non-zero, then punt the signaling to async context. This is always
      safe, as it takes us out-of-line in terms of stack and locking context.
      
      Cc: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      01d7a356
  21. 15 Nov, 2019 1 commit
  22. 21 Oct, 2019 1 commit
    • Guillem Jover's avatar
      aio: Fix io_pgetevents() struct __compat_aio_sigset layout · 97eba80f
      Guillem Jover authored
      This type is used to pass the sigset_t from userland to the kernel,
      but it was using the kernel native pointer type for the member
      representing the compat userland pointer to the userland sigset_t.
      
      This messes up the layout, and makes the kernel eat up both the
      userland pointer and the size members into the kernel pointer, and
      then reads garbage into the kernel sigsetsize. Which makes the sigset_t
      size consistency check fail, and consequently the syscall always
      returns -EINVAL.
      
      This breaks both libaio and strace on 32-bit userland running on 64-bit
      kernels. And there are apparently no users in the wild of the current
      broken layout (at least according to codesearch.debian.org and a brief
      check over github.com search). So it looks safe to fix this directly
      in the kernel, instead of either letting userland deal with this
      permanently with the additional overhead or trying to make the syscall
      infer what layout userland used, even though this is also being worked
      around in libaio to temporarily cope with kernels that have not yet
      been fixed.
      
      We use a proper compat_uptr_t instead of a compat_sigset_t pointer.
      
      Fixes: 7a074e96
      
       ("aio: implement io_pgetevents")
      Signed-off-by: default avatarGuillem Jover <guillem@hadrons.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      97eba80f
  23. 19 Jul, 2019 1 commit
  24. 17 Jul, 2019 1 commit
    • Oleg Nesterov's avatar
      signal: simplify set_user_sigmask/restore_user_sigmask · b772434b
      Oleg Nesterov authored
      task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
      syscall paths.  This means that set_user_sigmask() can save ->blocked in
      ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
      was modified.
      
      This way the callers do not need 2 sigset_t's passed to set/restore and
      restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
      into the trivial helper which just calls restore_saved_sigmask().
      
      Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Deepa Dinamani <deepa.kernel@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Eric Wong <e@80x24.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b772434b
  25. 29 Jun, 2019 1 commit
    • Oleg Nesterov's avatar
      signal: remove the wrong signal_pending() check in restore_user_sigmask() · 97abc889
      Oleg Nesterov authored
      This is the minimal fix for stable, I'll send cleanups later.
      
      Commit 854a6ed5 ("signal: Add restore_user_sigmask()") introduced
      the visible change which breaks user-space: a signal temporary unblocked
      by set_user_sigmask() can be delivered even if the caller returns
      success or timeout.
      
      Change restore_user_sigmask() to accept the additional "interrupted"
      argument which should be used instead of signal_pending() check, and
      update the callers.
      
      Eric said:
      
      : For clarity.  I don't think this is required by posix, or fundamentally to
      : remove the races in select.  It is what linux has always done and we have
      : applications who care so I agree this fix is needed.
      :
      : Further in any case where the semantic change that this patch rolls back
      : (aka where allowing a signal to be delivered and the select like call to
      : complete) would be advantage we can do as well if not better by using
      : signalfd.
      :
      : Michael is there any chance we can get this guarantee of the linux
      : implementation of pselect and friends clearly documented.  The guarantee
      : that if the system call completes successfully we are guaranteed that no
      : signal that is unblocked by using sigmask will be delivered?
      
      Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
      Fixes: 854a6ed5
      
       ("signal: Add restore_user_sigmask()")
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarEric Wong <e@80x24.org>
      Tested-by: default avatarEric Wong <e@80x24.org>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: <stable@vger.kernel.org>	[5.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97abc889
  26. 31 May, 2019 1 commit
  27. 25 May, 2019 2 commits
    • David Howells's avatar
      vfs: Convert aio to use the new mount API · 52db59df
      David Howells authored
      
      Convert the aio filesystem to the new internal mount API as the old
      one will be obsoleted and removed.  This allows greater flexibility in
      communication of mount parameters between userspace, the VFS and the
      filesystem.
      
      See Documentation/filesystems/mount_api.txt for more information.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Benjamin LaHaise <bcrl@kvack.org>
      cc: linux-aio@kvack.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      52db59df
    • Al Viro's avatar
      mount_pseudo(): drop 'name' argument, switch to d_make_root() · 1f58bb18
      Al Viro authored
      
      Once upon a time we used to set ->d_name of e.g. pipefs root
      so that d_path() on pipes would work.  These days it's
      completely pointless - dentries of pipes are not even connected
      to pipefs root.  However, mount_pseudo() had set the root
      dentry name (passed as the second argument) and callers
      kept inventing names to pass to it.  Including those that
      didn't *have* any non-root dentries to start with...
      
      All of that had been pointless for about 8 years now; it's
      time to get rid of that cargo-culting...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1f58bb18
  28. 05 Apr, 2019 1 commit
  29. 03 Apr, 2019 1 commit
  30. 18 Mar, 2019 8 commits
    • Al Viro's avatar
      aio: move sanity checks and request allocation to io_submit_one() · 7316b49c
      Al Viro authored
      
      makes for somewhat cleaner control flow in __io_submit_one()
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7316b49c
    • Al Viro's avatar
      deal with get_reqs_available() in aio_get_req() itself · fa0ca2ae
      Al Viro authored
      
      simplifies the caller
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      fa0ca2ae
    • Al Viro's avatar
      aio: move dropping ->ki_eventfd into iocb_destroy() · 74259703
      Al Viro authored
      
      no reason to duplicate that...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      74259703
    • Al Viro's avatar
      make aio_read()/aio_write() return int · 958c13ce
      Al Viro authored
      
      that ssize_t is a rudiment of earlier calling conventions; it's been
      used only to pass 0 and -E... since last autumn.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      958c13ce
    • Al Viro's avatar
      Fix aio_poll() races · af5c72b1
      Al Viro authored
      
      aio_poll() has to cope with several unpleasant problems:
      	* requests that might stay around indefinitely need to
      be made visible for io_cancel(2); that must not be done to
      a request already completed, though.
      	* in cases when ->poll() has placed us on a waitqueue,
      wakeup might have happened (and request completed) before ->poll()
      returns.
      	* worse, in some early wakeup cases request might end
      up re-added into the queue later - we can't treat "woken up and
      currently not in the queue" as "it's not going to stick around
      indefinitely"
      	* ... moreover, ->poll() might have decided not to
      put it on any queues to start with, and that needs to be distinguished
      from the previous case
      	* ->poll() might have tried to put us on more than one queue.
      Only the first will succeed for aio poll, so we might end up missing
      wakeups.  OTOH, we might very well notice that only after the
      wakeup hits and request gets completed (all before ->poll() gets
      around to the second poll_wait()).  In that case it's too late to
      decide that we have an error.
      
      req->woken was an attempt to deal with that.  Unfortunately, it was
      broken.  What we need to keep track of is not that wakeup has happened -
      the thing might come back after that.  It's that async reference is
      already gone and won't come back, so we can't (and needn't) put the
      request on the list of cancellables.
      
      The easiest case is "request hadn't been put on any waitqueues"; we
      can tell by seeing NULL apt.head, and in that case there won't be
      anything async.  We should either complete the request ourselves
      (if vfs_poll() reports anything of interest) or return an error.
      
      In all other cases we get exclusion with wakeups by grabbing the
      queue lock.
      
      If request is currently on queue and we have something interesting
      from vfs_poll(), we can steal it and complete the request ourselves.
      
      If it's on queue and vfs_poll() has not reported anything interesting,
      we either put it on the cancellable list, or, if we know that it
      hadn't been put on all queues ->poll() wanted it on, we steal it and
      return an error.
      
      If it's _not_ on queue, it's either been already dealt with (in which
      case we do nothing), or there's aio_poll_complete_work() about to be
      executed.  In that case we either put it on the cancellable list,
      or, if we know it hadn't been put on all queues ->poll() wanted it on,
      simulate what cancel would've done.
      
      It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
      suck, and unfortunately aio is not an exception...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      af5c72b1
    • Al Viro's avatar
      aio: store event at final iocb_put() · 2bb874c0
      Al Viro authored
      
      Instead of having aio_complete() set ->ki_res.{res,res2}, do that
      explicitly in its callers, drop the reference (as aio_complete()
      used to do) and delay the rest until the final iocb_put().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2bb874c0
    • Al Viro's avatar
      aio: keep io_event in aio_kiocb · a9339b78
      Al Viro authored
      
      We want to separate forming the resulting io_event from putting it
      into the ring buffer.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a9339b78
    • Al Viro's avatar
      aio: fold lookup_kiocb() into its sole caller · 833f4154
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      833f4154