1. 14 Dec, 2021 2 commits
    • Eric Biggers's avatar
      aio: fix use-after-free due to missing POLLFREE handling · 47ffefd8
      Eric Biggers authored
      commit 50252e4b upstream.
      
      signalfd_poll() and binder_poll() are special in that they use a
      waitqueue whose lifetime is the current task, rather than the struct
      file as is normally the case.  This is okay for blocking polls, since a
      blocking poll occurs within one task; however, non-blocking polls
      require another solution.  This solution is for the queue to be cleared
      before it is freed, by sending a POLLFREE notification to all waiters.
      
      Unfortunately, only eventpoll handles POLLFREE.  A second type of
      non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
      handle POLLFREE.  This allows a use-after-free to occur if a signalfd or
      binder fd is polled with aio poll, and the waitqueue gets freed.
      
      Fix this by making aio poll handle POLLFREE.
      
      A patch by Ramji Jiyani <ramjiyani@google.com>
      (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
      tried to do this by making aio_poll_wake() always complete the request
      inline if POLLFREE is seen.  However, that solution had two bugs.
      First, it introduced a deadlock, as it unconditionally locked the aio
      context while holding the waitqueue lock, which inverts the normal
      locking order.  Second, it didn't consider that POLLFREE notifications
      are missed while the request has been temporarily de-queued.
      
      The second problem was solved by my previous patch.  This patch then
      properly fixes the use-after-free by handling POLLFREE in a
      deadlock-free way.  It does this by taking advantage of the fact that
      freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47ffefd8
    • Eric Biggers's avatar
      aio: keep poll requests on waitqueue until completed · e4d19740
      Eric Biggers authored
      commit 363bee27 upstream.
      
      Currently, aio_poll_wake() will always remove the poll request from the
      waitqueue.  Then, if aio_poll_complete_work() sees that none of the
      polled events are ready and the request isn't cancelled, it re-adds the
      request to the waitqueue.  (This can easily happen when polling a file
      that doesn't pass an event mask when waking up its waitqueue.)
      
      This is fundamentally broken for two reasons:
      
        1. If a wakeup occurs between vfs_poll() and the request being
           re-added to the waitqueue, it will be missed because the request
           wasn't on the waitqueue at the time.  Therefore, IOCB_CMD_POLL
           might never complete even if the polled file is ready.
      
        2. When the request isn't on the waitqueue, there is no way to be
           notified that the waitqueue is being freed (which happens when its
           lifetime is shorter than the struct file's).  This is supposed to
           happen via the waitqueue entries being woken up with POLLFREE.
      
      Therefore, leave the requests on the waitqueue until they are actually
      completed (or cancelled).  To keep track of when aio_poll_complete_work
      needs to be scheduled, use new fields in struct poll_iocb.  Remove the
      'done' field which is now redundant.
      
      Note that this is consistent with how sys_poll() and eventpoll work;
      their wakeup functions do *not* remove the waitqueue entries.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4d19740
  2. 11 Nov, 2020 1 commit
  3. 03 Oct, 2020 1 commit
  4. 23 Aug, 2020 1 commit
  5. 07 Aug, 2020 1 commit
  6. 16 Jun, 2020 1 commit
  7. 11 Jun, 2020 1 commit
  8. 09 Jun, 2020 1 commit
  9. 14 May, 2020 1 commit
    • Miklos Szeredi's avatar
      aio: fix async fsync creds · 530f32fc
      Miklos Szeredi authored
      
      Avi Kivity reports that on fuse filesystems running in a user namespace
      asyncronous fsync fails with EOVERFLOW.
      
      The reason is that f_ops->fsync() is called with the creds of the kthread
      performing aio work instead of the creds of the process originally
      submitting IOCB_CMD_FSYNC.
      
      Fuse sends the creds of the caller in the request header and it needs to
      translate the uid and gid into the server's user namespace.  Since the
      kthread is running in init_user_ns, the translation will fail and the
      operation returns an error.
      
      It can be argued that fsync doesn't actually need any creds, but just
      zeroing out those fields in the header (as with requests that currently
      don't take creds) is a backward compatibility risk.
      
      Instead of working around this issue in fuse, solve the core of the problem
      by calling the filesystem with the proper creds.
      Reported-by: default avatarAvi Kivity <avi@scylladb.com>
      Tested-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
      Fixes: c9582eb0
      
       ("fuse: Fail all requests with invalid uids or gids")
      Cc: stable@vger.kernel.org  # 4.18+
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      530f32fc
  10. 04 Feb, 2020 1 commit
    • Jens Axboe's avatar
      aio: prevent potential eventfd recursion on poll · 01d7a356
      Jens Axboe authored
      
      If we have nested or circular eventfd wakeups, then we can deadlock if
      we run them inline from our poll waitqueue wakeup handler. It's also
      possible to have very long chains of notifications, to the extent where
      we could risk blowing the stack.
      
      Check the eventfd recursion count before calling eventfd_signal(). If
      it's non-zero, then punt the signaling to async context. This is always
      safe, as it takes us out-of-line in terms of stack and locking context.
      
      Cc: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      01d7a356
  11. 15 Nov, 2019 1 commit
  12. 21 Oct, 2019 1 commit
    • Guillem Jover's avatar
      aio: Fix io_pgetevents() struct __compat_aio_sigset layout · 97eba80f
      Guillem Jover authored
      This type is used to pass the sigset_t from userland to the kernel,
      but it was using the kernel native pointer type for the member
      representing the compat userland pointer to the userland sigset_t.
      
      This messes up the layout, and makes the kernel eat up both the
      userland pointer and the size members into the kernel pointer, and
      then reads garbage into the kernel sigsetsize. Which makes the sigset_t
      size consistency check fail, and consequently the syscall always
      returns -EINVAL.
      
      This breaks both libaio and strace on 32-bit userland running on 64-bit
      kernels. And there are apparently no users in the wild of the current
      broken layout (at least according to codesearch.debian.org and a brief
      check over github.com search). So it looks safe to fix this directly
      in the kernel, instead of either letting userland deal with this
      permanently with the additional overhead or trying to make the syscall
      infer what layout userland used, even though this is also being worked
      around in libaio to temporarily cope with kernels that have not yet
      been fixed.
      
      We use a proper compat_uptr_t instead of a compat_sigset_t pointer.
      
      Fixes: 7a074e96
      
       ("aio: implement io_pgetevents")
      Signed-off-by: default avatarGuillem Jover <guillem@hadrons.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      97eba80f
  13. 19 Jul, 2019 1 commit
  14. 17 Jul, 2019 1 commit
    • Oleg Nesterov's avatar
      signal: simplify set_user_sigmask/restore_user_sigmask · b772434b
      Oleg Nesterov authored
      task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
      syscall paths.  This means that set_user_sigmask() can save ->blocked in
      ->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
      was modified.
      
      This way the callers do not need 2 sigset_t's passed to set/restore and
      restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
      into the trivial helper which just calls restore_saved_sigmask().
      
      Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Deepa Dinamani <deepa.kernel@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Eric Wong <e@80x24.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b772434b
  15. 29 Jun, 2019 1 commit
    • Oleg Nesterov's avatar
      signal: remove the wrong signal_pending() check in restore_user_sigmask() · 97abc889
      Oleg Nesterov authored
      This is the minimal fix for stable, I'll send cleanups later.
      
      Commit 854a6ed5 ("signal: Add restore_user_sigmask()") introduced
      the visible change which breaks user-space: a signal temporary unblocked
      by set_user_sigmask() can be delivered even if the caller returns
      success or timeout.
      
      Change restore_user_sigmask() to accept the additional "interrupted"
      argument which should be used instead of signal_pending() check, and
      update the callers.
      
      Eric said:
      
      : For clarity.  I don't think this is required by posix, or fundamentally to
      : remove the races in select.  It is what linux has always done and we have
      : applications who care so I agree this fix is needed.
      :
      : Further in any case where the semantic change that this patch rolls back
      : (aka where allowing a signal to be delivered and the select like call to
      : complete) would be advantage we can do as well if not better by using
      : signalfd.
      :
      : Michael is there any chance we can get this guarantee of the linux
      : implementation of pselect and friends clearly documented.  The guarantee
      : that if the system call completes successfully we are guaranteed that no
      : signal that is unblocked by using sigmask will be delivered?
      
      Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
      Fixes: 854a6ed5
      
       ("signal: Add restore_user_sigmask()")
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarEric Wong <e@80x24.org>
      Tested-by: default avatarEric Wong <e@80x24.org>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: <stable@vger.kernel.org>	[5.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97abc889
  16. 31 May, 2019 1 commit
  17. 25 May, 2019 2 commits
    • David Howells's avatar
      vfs: Convert aio to use the new mount API · 52db59df
      David Howells authored
      
      Convert the aio filesystem to the new internal mount API as the old
      one will be obsoleted and removed.  This allows greater flexibility in
      communication of mount parameters between userspace, the VFS and the
      filesystem.
      
      See Documentation/filesystems/mount_api.txt for more information.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Benjamin LaHaise <bcrl@kvack.org>
      cc: linux-aio@kvack.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      52db59df
    • Al Viro's avatar
      mount_pseudo(): drop 'name' argument, switch to d_make_root() · 1f58bb18
      Al Viro authored
      
      Once upon a time we used to set ->d_name of e.g. pipefs root
      so that d_path() on pipes would work.  These days it's
      completely pointless - dentries of pipes are not even connected
      to pipefs root.  However, mount_pseudo() had set the root
      dentry name (passed as the second argument) and callers
      kept inventing names to pass to it.  Including those that
      didn't *have* any non-root dentries to start with...
      
      All of that had been pointless for about 8 years now; it's
      time to get rid of that cargo-culting...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1f58bb18
  18. 05 Apr, 2019 1 commit
  19. 03 Apr, 2019 1 commit
  20. 18 Mar, 2019 9 commits
    • Al Viro's avatar
      aio: move sanity checks and request allocation to io_submit_one() · 7316b49c
      Al Viro authored
      
      makes for somewhat cleaner control flow in __io_submit_one()
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7316b49c
    • Al Viro's avatar
      deal with get_reqs_available() in aio_get_req() itself · fa0ca2ae
      Al Viro authored
      
      simplifies the caller
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      fa0ca2ae
    • Al Viro's avatar
      aio: move dropping ->ki_eventfd into iocb_destroy() · 74259703
      Al Viro authored
      
      no reason to duplicate that...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      74259703
    • Al Viro's avatar
      make aio_read()/aio_write() return int · 958c13ce
      Al Viro authored
      
      that ssize_t is a rudiment of earlier calling conventions; it's been
      used only to pass 0 and -E... since last autumn.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      958c13ce
    • Al Viro's avatar
      Fix aio_poll() races · af5c72b1
      Al Viro authored
      
      aio_poll() has to cope with several unpleasant problems:
      	* requests that might stay around indefinitely need to
      be made visible for io_cancel(2); that must not be done to
      a request already completed, though.
      	* in cases when ->poll() has placed us on a waitqueue,
      wakeup might have happened (and request completed) before ->poll()
      returns.
      	* worse, in some early wakeup cases request might end
      up re-added into the queue later - we can't treat "woken up and
      currently not in the queue" as "it's not going to stick around
      indefinitely"
      	* ... moreover, ->poll() might have decided not to
      put it on any queues to start with, and that needs to be distinguished
      from the previous case
      	* ->poll() might have tried to put us on more than one queue.
      Only the first will succeed for aio poll, so we might end up missing
      wakeups.  OTOH, we might very well notice that only after the
      wakeup hits and request gets completed (all before ->poll() gets
      around to the second poll_wait()).  In that case it's too late to
      decide that we have an error.
      
      req->woken was an attempt to deal with that.  Unfortunately, it was
      broken.  What we need to keep track of is not that wakeup has happened -
      the thing might come back after that.  It's that async reference is
      already gone and won't come back, so we can't (and needn't) put the
      request on the list of cancellables.
      
      The easiest case is "request hadn't been put on any waitqueues"; we
      can tell by seeing NULL apt.head, and in that case there won't be
      anything async.  We should either complete the request ourselves
      (if vfs_poll() reports anything of interest) or return an error.
      
      In all other cases we get exclusion with wakeups by grabbing the
      queue lock.
      
      If request is currently on queue and we have something interesting
      from vfs_poll(), we can steal it and complete the request ourselves.
      
      If it's on queue and vfs_poll() has not reported anything interesting,
      we either put it on the cancellable list, or, if we know that it
      hadn't been put on all queues ->poll() wanted it on, we steal it and
      return an error.
      
      If it's _not_ on queue, it's either been already dealt with (in which
      case we do nothing), or there's aio_poll_complete_work() about to be
      executed.  In that case we either put it on the cancellable list,
      or, if we know it hadn't been put on all queues ->poll() wanted it on,
      simulate what cancel would've done.
      
      It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
      suck, and unfortunately aio is not an exception...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      af5c72b1
    • Al Viro's avatar
      aio: store event at final iocb_put() · 2bb874c0
      Al Viro authored
      
      Instead of having aio_complete() set ->ki_res.{res,res2}, do that
      explicitly in its callers, drop the reference (as aio_complete()
      used to do) and delay the rest until the final iocb_put().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2bb874c0
    • Al Viro's avatar
      aio: keep io_event in aio_kiocb · a9339b78
      Al Viro authored
      
      We want to separate forming the resulting io_event from putting it
      into the ring buffer.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a9339b78
    • Al Viro's avatar
      aio: fold lookup_kiocb() into its sole caller · 833f4154
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      833f4154
    • Linus Torvalds's avatar
      pin iocb through aio. · b53119f1
      Linus Torvalds authored
      
      aio_poll() is not the only case that needs file pinned; worse, while
      aio_read()/aio_write() can live without pinning iocb itself, the
      proof is rather brittle and can easily break on later changes.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b53119f1
  21. 04 Mar, 2019 1 commit
    • Linus Torvalds's avatar
      aio: simplify - and fix - fget/fput for io_submit() · 84c4e1f8
      Linus Torvalds authored
      Al Viro root-caused a race where the IOCB_CMD_POLL handling of
      fget/fput() could cause us to access the file pointer after it had
      already been freed:
      
       "In more details - normally IOCB_CMD_POLL handling looks so:
      
         1) io_submit(2) allocates aio_kiocb instance and passes it to
            aio_poll()
      
         2) aio_poll() resolves the descriptor to struct file by req->file =
            fget(iocb->aio_fildes)
      
         3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
            aio_kiocb to 2 (bumps by 1, that is).
      
         4) aio_poll() calls vfs_poll(). After sanity checks (basically,
            "poll_wait() had been called and only once") it locks the queue.
            That's what the extra reference to iocb had been for - we know we
            can safely access it.
      
         5) With queue locked, we check if ->woken has already been set to
            true (by aio_poll_wake()) and, if it had been, we unlock the
            queue, drop a reference to aio_kiocb and bugger off - at that
            point it's a responsibility to aio_poll_wake() and the stuff
            called/scheduled by it. That code will drop the reference to file
            in req->file, along with the other reference to our aio_kiocb.
      
         6) otherwise, we see whether we need to wait. If we do, we unlock the
            queue, drop one reference to aio_kiocb and go away - eventual
            wakeup (or cancel) will deal with the reference to file and with
            the other reference to aio_kiocb
      
         7) otherwise we remove ourselves from waitqueue (still under the
            queue lock), so that wakeup won't get us. No async activity will
            be happening, so we can safely drop req->file and iocb ourselves.
      
        If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
        won't get freed under us, so we can do all the checks and locking
        safely. And we don't touch ->file if we detect that case.
      
        However, vfs_poll() most certainly *does* touch the file it had been
        given. So wakeup coming while we are still in ->poll() might end up
        doing fput() on that file. That case is not too rare, and usually we
        are saved by the still present reference from descriptor table - that
        fput() is not the final one.
      
        But if another thread closes that descriptor right after our fget()
        and wakeup does happen before ->poll() returns, we are in trouble -
        final fput() done while we are in the middle of a method:
      
      Al also wrote a patch to take an extra reference to the file descriptor
      to fix this, but I instead suggested we just streamline the whole file
      pointer handling by submit_io() so that the generic aio submission code
      simply keeps the file pointer around until the aio has completed.
      
      Fixes: bfe4037e
      
       ("aio: implement IOCB_CMD_POLL")
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      84c4e1f8
  22. 22 Feb, 2019 1 commit
    • Bart Van Assche's avatar
      aio: Fix locking in aio_poll() · d3d6a18d
      Bart Van Assche authored
      
      wake_up_locked() may but does not have to be called with interrupts
      disabled. Since the fuse filesystem calls wake_up_locked() without
      disabling interrupts aio_poll_wake() may be called with interrupts
      enabled. Since the kioctx.ctx_lock may be acquired from IRQ context,
      all code that acquires that lock from thread context must disable
      interrupts. Hence change the spin_trylock() call in aio_poll_wake()
      into a spin_trylock_irqsave() call. This patch fixes the following
      lockdep complaint:
      
      =====================================================
      WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      5.0.0-rc4-next-20190131 #23 Not tainted
      -----------------------------------------------------
      syz-executor2/13779 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      0000000098ac1230 (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:329 [inline]
      0000000098ac1230 (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1772 [inline]
      0000000098ac1230 (&fiq->waitq){+.+.}, at: __io_submit_one fs/aio.c:1875 [inline]
      0000000098ac1230 (&fiq->waitq){+.+.}, at: io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
      
      and this task is already holding:
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:354 [inline]
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1771 [inline]
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one fs/aio.c:1875 [inline]
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: io_submit_one+0xeb6/0x1cf0 fs/aio.c:1908
      which would create a new lock dependency:
       (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
      
      but this new dependency connects a SOFTIRQ-irq-safe lock:
       (&(&ctx->ctx_lock)->rlock){..-.}
      
      ... which became SOFTIRQ-irq-safe at:
        lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
        __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
        _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
        spin_lock_irq include/linux/spinlock.h:354 [inline]
        free_ioctx_users+0x2d/0x4a0 fs/aio.c:610
        percpu_ref_put_many include/linux/percpu-refcount.h:285 [inline]
        percpu_ref_put include/linux/percpu-refcount.h:301 [inline]
        percpu_ref_call_confirm_rcu lib/percpu-refcount.c:123 [inline]
        percpu_ref_switch_to_atomic_rcu+0x3e7/0x520 lib/percpu-refcount.c:158
        __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
        rcu_do_batch kernel/rcu/tree.c:2486 [inline]
        invoke_rcu_callbacks kernel/rcu/tree.c:2799 [inline]
        rcu_core+0x928/0x1390 kernel/rcu/tree.c:2780
        __do_softirq+0x266/0x95a kernel/softirq.c:292
        run_ksoftirqd kernel/softirq.c:654 [inline]
        run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
        smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
        kthread+0x357/0x430 kernel/kthread.c:247
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      to a SOFTIRQ-irq-unsafe lock:
       (&fiq->waitq){+.+.}
      
      ... which became SOFTIRQ-irq-unsafe at:
      ...
        lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
        spin_lock include/linux/spinlock.h:329 [inline]
        flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
        fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
        fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
        fuse_send_init fs/fuse/inode.c:989 [inline]
        fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
        mount_nodev+0x68/0x110 fs/super.c:1392
        fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
        legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
        vfs_get_tree+0x123/0x450 fs/super.c:1481
        do_new_mount fs/namespace.c:2610 [inline]
        do_mount+0x1436/0x2c40 fs/namespace.c:2932
        ksys_mount+0xdb/0x150 fs/namespace.c:3148
        __do_sys_mount fs/namespace.c:3162 [inline]
        __se_sys_mount fs/namespace.c:3159 [inline]
        __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
        do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&fiq->waitq);
                                     local_irq_disable();
                                     lock(&(&ctx->ctx_lock)->rlock);
                                     lock(&fiq->waitq);
        <Interrupt>
          lock(&(&ctx->ctx_lock)->rlock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor2/13779:
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:354 [inline]
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1771 [inline]
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one fs/aio.c:1875 [inline]
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: io_submit_one+0xeb6/0x1cf0 fs/aio.c:1908
      
      the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
      -> (&(&ctx->ctx_lock)->rlock){..-.} {
         IN-SOFTIRQ-W at:
                          lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                          __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
                          _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
                          spin_lock_irq include/linux/spinlock.h:354 [inline]
                          free_ioctx_users+0x2d/0x4a0 fs/aio.c:610
                          percpu_ref_put_many include/linux/percpu-refcount.h:285 [inline]
                          percpu_ref_put include/linux/percpu-refcount.h:301 [inline]
                          percpu_ref_call_confirm_rcu lib/percpu-refcount.c:123 [inline]
                          percpu_ref_switch_to_atomic_rcu+0x3e7/0x520 lib/percpu-refcount.c:158
                          __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
                          rcu_do_batch kernel/rcu/tree.c:2486 [inline]
                          invoke_rcu_callbacks kernel/rcu/tree.c:2799 [inline]
                          rcu_core+0x928/0x1390 kernel/rcu/tree.c:2780
                          __do_softirq+0x266/0x95a kernel/softirq.c:292
                          run_ksoftirqd kernel/softirq.c:654 [inline]
                          run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
                          smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
                          kthread+0x357/0x430 kernel/kthread.c:247
                          ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
         INITIAL USE at:
                         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                         __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
                         _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
                         spin_lock_irq include/linux/spinlock.h:354 [inline]
                         __do_sys_io_cancel fs/aio.c:2052 [inline]
                         __se_sys_io_cancel fs/aio.c:2035 [inline]
                         __x64_sys_io_cancel+0xd5/0x5a0 fs/aio.c:2035
                         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                         entry_SYSCALL_64_after_hwframe+0x49/0xbe
       }
       ... key      at: [<ffffffff8a574140>] __key.52370+0x0/0x40
       ... acquired at:
         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
         __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
         _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
         spin_lock include/linux/spinlock.h:329 [inline]
         aio_poll fs/aio.c:1772 [inline]
         __io_submit_one fs/aio.c:1875 [inline]
         io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
         __do_sys_io_submit fs/aio.c:1953 [inline]
         __se_sys_io_submit fs/aio.c:1923 [inline]
         __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      the dependencies between the lock to be acquired
       and SOFTIRQ-irq-unsafe lock:
      -> (&fiq->waitq){+.+.} {
         HARDIRQ-ON-W at:
                          lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                          __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                          _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                          spin_lock include/linux/spinlock.h:329 [inline]
                          flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                          fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                          fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                          fuse_send_init fs/fuse/inode.c:989 [inline]
                          fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                          mount_nodev+0x68/0x110 fs/super.c:1392
                          fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                          legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                          vfs_get_tree+0x123/0x450 fs/super.c:1481
                          do_new_mount fs/namespace.c:2610 [inline]
                          do_mount+0x1436/0x2c40 fs/namespace.c:2932
                          ksys_mount+0xdb/0x150 fs/namespace.c:3148
                          __do_sys_mount fs/namespace.c:3162 [inline]
                          __se_sys_mount fs/namespace.c:3159 [inline]
                          __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                          do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                          entry_SYSCALL_64_after_hwframe+0x49/0xbe
         SOFTIRQ-ON-W at:
                          lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                          __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                          _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                          spin_lock include/linux/spinlock.h:329 [inline]
                          flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                          fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                          fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                          fuse_send_init fs/fuse/inode.c:989 [inline]
                          fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                          mount_nodev+0x68/0x110 fs/super.c:1392
                          fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                          legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                          vfs_get_tree+0x123/0x450 fs/super.c:1481
                          do_new_mount fs/namespace.c:2610 [inline]
                          do_mount+0x1436/0x2c40 fs/namespace.c:2932
                          ksys_mount+0xdb/0x150 fs/namespace.c:3148
                          __do_sys_mount fs/namespace.c:3162 [inline]
                          __se_sys_mount fs/namespace.c:3159 [inline]
                          __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                          do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                          entry_SYSCALL_64_after_hwframe+0x49/0xbe
         INITIAL USE at:
                         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                         __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                         _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                         spin_lock include/linux/spinlock.h:329 [inline]
                         flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                         fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                         fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                         fuse_send_init fs/fuse/inode.c:989 [inline]
                         fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                         mount_nodev+0x68/0x110 fs/super.c:1392
                         fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                         legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                         vfs_get_tree+0x123/0x450 fs/super.c:1481
                         do_new_mount fs/namespace.c:2610 [inline]
                         do_mount+0x1436/0x2c40 fs/namespace.c:2932
                         ksys_mount+0xdb/0x150 fs/namespace.c:3148
                         __do_sys_mount fs/namespace.c:3162 [inline]
                         __se_sys_mount fs/namespace.c:3159 [inline]
                         __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                         entry_SYSCALL_64_after_hwframe+0x49/0xbe
       }
       ... key      at: [<ffffffff8a60dec0>] __key.43450+0x0/0x40
       ... acquired at:
         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
         __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
         _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
         spin_lock include/linux/spinlock.h:329 [inline]
         aio_poll fs/aio.c:1772 [inline]
         __io_submit_one fs/aio.c:1875 [inline]
         io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
         __do_sys_io_submit fs/aio.c:1953 [inline]
         __se_sys_io_submit fs/aio.c:1923 [inline]
         __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      stack backtrace:
      CPU: 0 PID: 13779 Comm: syz-executor2 Not tainted 5.0.0-rc4-next-20190131 #23
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_bad_irq_dependency kernel/locking/lockdep.c:1573 [inline]
       check_usage.cold+0x60f/0x940 kernel/locking/lockdep.c:1605
       check_irq_usage kernel/locking/lockdep.c:1650 [inline]
       check_prev_add_irq kernel/locking/lockdep_states.h:8 [inline]
       check_prev_add kernel/locking/lockdep.c:1860 [inline]
       check_prevs_add kernel/locking/lockdep.c:1968 [inline]
       validate_chain kernel/locking/lockdep.c:2339 [inline]
       __lock_acquire+0x1f12/0x4790 kernel/locking/lockdep.c:3320
       lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
       spin_lock include/linux/spinlock.h:329 [inline]
       aio_poll fs/aio.c:1772 [inline]
       __io_submit_one fs/aio.c:1875 [inline]
       io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
       __do_sys_io_submit fs/aio.c:1953 [inline]
       __se_sys_io_submit fs/aio.c:1923 [inline]
       __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: <stable@vger.kernel.org>
      Fixes: e8693bcf
      
       ("aio: allow direct aio poll comletions for keyed wakeups") # v4.19
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      [ bvanassche: added a comment ]
      Reluctantly-Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d3d6a18d
  23. 06 Feb, 2019 2 commits
  24. 28 Dec, 2018 1 commit
  25. 18 Dec, 2018 5 commits