1. 22 Feb, 2021 1 commit
    • Jeffle Xu's avatar
      block: fix potential IO hang when turning off io_poll · 6b09b4d3
      Jeffle Xu authored
      
      QUEUE_FLAG_POLL flag will be cleared when turning off 'io_poll', while
      at that moment there may be IOs stuck in hw queue uncompleted. The
      following polling routine won't help reap these IOs, since blk_poll()
      will return immediately because of cleared QUEUE_FLAG_POLL flag. Thus
      these IOs will hang until they finnaly time out. The hang out can be
      observed by 'fio --engine=io_uring iodepth=1', while turning off
      'io_poll' at the same time.
      
      To fix this, freeze and flush the request queue first when turning off
      'io_poll'.
      Signed-off-by: default avatarJeffle Xu <jefflexu@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6b09b4d3
  2. 10 Feb, 2021 1 commit
    • Damien Le Moal's avatar
      block: introduce zone_write_granularity limit · a805a4fa
      Damien Le Moal authored
      
      Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
      all writes into sequential write required zones be aligned to the device
      physical block size. However, NVMe ZNS does not have this constraint and
      allows write operations into sequential zones to be aligned to the
      device logical block size. This inconsistency does not help with
      software portability across device types.
      
      To solve this, introduce the zone_write_granularity queue limit to
      indicate the alignment constraint, in bytes, of write operations into
      zones of a zoned block device. This new limit is exported as a
      read-only sysfs queue attribute and the helper
      blk_queue_zone_write_granularity() introduced for drivers to set this
      limit.
      
      The function blk_queue_set_zoned() is modified to set this new limit to
      the device logical block size by default. NVMe ZNS devices as well as
      zoned nullb devices use this default value as is. The scsi disk driver
      is modified to execute the blk_queue_zone_write_granularity() helper to
      set the zone write granularity of host-managed SMR disks to the disk
      physical block size.
      
      The accessor functions queue_zone_write_granularity() and
      bdev_zone_write_granularity() are also introduced.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@edc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a805a4fa
  3. 09 Oct, 2020 3 commits
  4. 24 Sep, 2020 2 commits
  5. 08 Sep, 2020 2 commits
  6. 15 Jul, 2020 2 commits
  7. 24 Jun, 2020 2 commits
    • Luis Chamberlain's avatar
      block: create the request_queue debugfs_dir on registration · 85e0cbbb
      Luis Chamberlain authored
      
      We were only creating the request_queue debugfs_dir only
      for make_request block drivers (multiqueue), but never for
      request-based block drivers. We did this as we were only
      creating non-blktrace additional debugfs files on that directory
      for make_request drivers. However, since blktrace *always* creates
      that directory anyway, we special-case the use of that directory
      on blktrace. Other than this being an eye-sore, this exposes
      request-based block drivers to the same debugfs fragile
      race that used to exist with make_request block drivers
      where if we start adding files onto that directory we can later
      run a race with a double removal of dentries on the directory
      if we don't deal with this carefully on blktrace.
      
      Instead, just simplify things by always creating the request_queue
      debugfs_dir on request_queue registration. Rename the mutex also to
      reflect the fact that this is used outside of the blktrace context.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      85e0cbbb
    • Luis Chamberlain's avatar
      block: revert back to synchronous request_queue removal · e8c7d14a
      Luis Chamberlain authored
      Commit dc9edc44 ("block: Fix a blk_exit_rl() regression") merged on
      v4.12 moved the work behind blk_release_queue() into a workqueue after a
      splat floated around which indicated some work on blk_release_queue()
      could sleep in blk_exit_rl(). This splat would be possible when a driver
      called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue()
      as its final call) from an atomic context.
      
      blk_put_queue() decrements the refcount for the request_queue kobject, and
      upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is
      now removed through commit db6d9952 ("block: remove request_list code")
      on v5.0, we reserve the right to be able to sleep within
      blk_release_queue() context.
      
      The last reference for the request_queue must not be called from atomic
      context. *When* the last reference to the request_queue reaches 0 varies,
      and so let's take the opportunity to document when that is expected to
      happen and also document the context of the related calls as best as
      possible so we can avoid future issues, and with the hopes that the
      synchronous request_queue removal sticks.
      
      We revert back to synchronous request_queue removal because asynchronous
      removal creates a regression with expected userspace interaction with
      several drivers. An example is when removing the loopback driver, one
      uses ioctls from userspace to do so, but upon return and if successful,
      one expects the device to be removed. Likewise if one races to add another
      device the new one may not be added as it is still being removed. This was
      expected behavior before and it now fails as the device is still present
      and busy still. Moving to asynchronous request_queue removal could have
      broken many scripts which relied on the removal to have been completed if
      there was no error. Document this expectation as well so that this
      doesn't regress userspace again.
      
      Using asynchronous request_queue removal however has helped us find
      other bugs. In the future we can test what could break with this
      arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE.
      
      While at it, update the docs with the context expectations for the
      request_queue / gendisk refcount decrement, and make these
      expectations explicit by using might_sleep().
      
      Fixes: dc9edc44
      
       ("block: Fix a blk_exit_rl() regression")
      Suggested-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: yu kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e8c7d14a
  8. 13 May, 2020 1 commit
    • Keith Busch's avatar
      block: Introduce REQ_OP_ZONE_APPEND · 0512a75b
      Keith Busch authored
      
      Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
      block device. This is a no-merge write operation.
      
      A zone append write BIO must:
      * Target a zoned block device
      * Have a sector position indicating the start sector of the target zone
      * The target zone must be a sequential write zone
      * The BIO must not cross a zone boundary
      * The BIO size must not be split to ensure that a single range of LBAs
        is written with a single command.
      
      Implement these checks in generic_make_request_checks() using the
      helper function blk_check_zone_append(). To avoid write append BIO
      splitting, introduce the new max_zone_append_sectors queue limit
      attribute and ensure that a BIO size is always lower than this limit.
      Export this new limit through sysfs and check these limits in bio_full().
      
      Also when a LLDD can't dispatch a request to a specific zone, it
      will return BLK_STS_ZONE_RESOURCE indicating this request needs to
      be delayed, e.g.  because the zone it will be dispatched to is still
      write-locked. If this happens set the request aside in a local list
      to continue trying dispatching requests such as READ requests or a
      WRITE/ZONE_APPEND requests targetting other zones. This way we can
      still keep a high queue depth without starving other requests even if
      one request can't be served due to zone write-locking.
      
      Finally, make sure that the bio sector position indicates the actual
      write position as indicated by the device on completion.
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      [ jth: added zone-append specific add_page and merge_page helpers ]
      Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0512a75b
  9. 07 Oct, 2019 1 commit
    • Bart Van Assche's avatar
      block: Remove "dying" checks from sysfs callbacks · bae85c15
      Bart Van Assche authored
      
      Block drivers must call del_gendisk() before blk_cleanup_queue().
      del_gendisk() calls kobject_del() and kobject_del() waits until any
      ongoing sysfs callback functions have finished. In other words, the
      sysfs callback functions won't be called for a queue in the dying
      state. Hence remove the "dying" checks from the sysfs callback
      functions.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bae85c15
  10. 27 Sep, 2019 1 commit
  11. 26 Sep, 2019 1 commit
    • Ming Lei's avatar
      block: don't release queue's sysfs lock during switching elevator · b89f625e
      Ming Lei authored
      cecf5d87 ("block: split .sysfs_lock into two locks") starts to
      release & acquire sysfs_lock before registering/un-registering elevator
      queue during switching elevator for avoiding potential deadlock from
      showing & storing 'queue/iosched' attributes and removing elevator's
      kobject.
      
      Turns out there isn't such deadlock because 'q->sysfs_lock' isn't
      required in .show & .store of queue/iosched's attributes, and just
      elevator's sysfs lock is acquired in elv_iosched_store() and
      elv_iosched_show(). So it is safe to hold queue's sysfs lock when
      registering/un-registering elevator queue.
      
      The biggest issue is that commit cecf5d87 assumes that concurrent
      write on 'queue/scheduler' can't happen. However, this assumption isn't
      true, because kernfs_fop_write() only guarantees that concurrent write
      aren't called on the same open file, but the write could be from
      different open on the file. So we can't release & re-acquire queue's
      sysfs lock during switching elevator, otherwise use-after-free on
      elevator could be triggered.
      
      Fixes the issue by not releasing queue's sysfs lock during switching
      elevator.
      
      Fixes: cecf5d87
      
       ("block: split .sysfs_lock into two locks")
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b89f625e
  12. 12 Sep, 2019 1 commit
    • Ming Lei's avatar
      block: fix race between switching elevator and removing queues · 0a67b5a9
      Ming Lei authored
      cecf5d87 ("block: split .sysfs_lock into two locks") starts to
      release & actuire sysfs_lock again during switching elevator. So it
      isn't enough to prevent switching elevator from happening by simply
      clearing QUEUE_FLAG_REGISTERED with holding sysfs_lock, because
      in-progress switch still can move on after re-acquiring the lock,
      meantime the flag of QUEUE_FLAG_REGISTERED won't get checked.
      
      Fixes this issue by checking 'q->elevator' directly & locklessly after
      q->kobj is removed in blk_unregister_queue(), this way is safe because
      q->elevator can't be changed at that time.
      
      Fixes: cecf5d87
      
       ("block: split .sysfs_lock into two locks")
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0a67b5a9
  13. 27 Aug, 2019 2 commits
    • Ming Lei's avatar
      block: split .sysfs_lock into two locks · cecf5d87
      Ming Lei authored
      
      The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
      path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
      required.
      
      However, when mq & iosched kobjects are removed via
      blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
      too. This way causes AB-BA lock because the kernfs built-in lock of
      'kn-count' is required inside kobject_del() too, see the lockdep warning[1].
      
      On the other hand, it isn't necessary to acquire q->sysfs_lock for
      both blk_mq_unregister_dev() & elv_unregister_queue() because
      clearing REGISTERED flag prevents storing to 'queue/scheduler'
      from being happened. Also sysfs write(store) is exclusive, so no
      necessary to hold the lock for elv_unregister_queue() when it is
      called in switching elevator path.
      
      So split .sysfs_lock into two: one is still named as .sysfs_lock for
      covering sync .store, the other one is named as .sysfs_dir_lock
      for covering kobjects and related status change.
      
      sysfs itself can handle the race between add/remove kobjects and
      showing/storing attributes under kobjects. For switching scheduler
      via storing to 'queue/scheduler', we use the queue flag of
      QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
      we can avoid to hold .sysfs_lock during removing/adding kobjects.
      
      [1]  lockdep warning
          ======================================================
          WARNING: possible circular locking dependency detected
          5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
          ------------------------------------------------------
          rmmod/777 is trying to acquire lock:
          00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72
      
          but task is already holding lock:
          00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&q->sysfs_lock){+.+.}:
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __mutex_lock+0x14a/0xa9b
                 blk_mq_hw_sysfs_show+0x63/0xb6
                 sysfs_kf_seq_show+0x11f/0x196
                 seq_read+0x2cd/0x5f2
                 vfs_read+0xc7/0x18c
                 ksys_read+0xc4/0x13e
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          -> #0 (kn->count#202){++++}:
                 check_prev_add+0x5d2/0xc45
                 validate_chain+0xed3/0xf94
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __kernfs_remove+0x237/0x40b
                 kernfs_remove_by_name_ns+0x59/0x72
                 remove_files+0x61/0x96
                 sysfs_remove_group+0x81/0xa4
                 sysfs_remove_groups+0x3b/0x44
                 kobject_del+0x44/0x94
                 blk_mq_unregister_dev+0x83/0xdd
                 blk_unregister_queue+0xa0/0x10b
                 del_gendisk+0x259/0x3fa
                 null_del_dev+0x8b/0x1c3 [null_blk]
                 null_exit+0x5c/0x95 [null_blk]
                 __se_sys_delete_module+0x204/0x337
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          other info that might help us debug this:
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(&q->sysfs_lock);
                                         lock(kn->count#202);
                                         lock(&q->sysfs_lock);
            lock(kn->count#202);
      
           *** DEADLOCK ***
      
          2 locks held by rmmod/777:
           #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
           #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          stack backtrace:
          CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
          Call Trace:
           dump_stack+0x9a/0xe6
           check_noncircular+0x207/0x251
           ? print_circular_bug+0x32a/0x32a
           ? find_usage_backwards+0x84/0xb0
           check_prev_add+0x5d2/0xc45
           validate_chain+0xed3/0xf94
           ? check_prev_add+0xc45/0xc45
           ? mark_lock+0x11b/0x804
           ? check_usage_forwards+0x1ca/0x1ca
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           ? kernfs_remove_by_name_ns+0x59/0x72
           __kernfs_remove+0x237/0x40b
           ? kernfs_remove_by_name_ns+0x59/0x72
           ? kernfs_next_descendant_post+0x7d/0x7d
           ? strlen+0x10/0x23
           ? strcmp+0x22/0x44
           kernfs_remove_by_name_ns+0x59/0x72
           remove_files+0x61/0x96
           sysfs_remove_group+0x81/0xa4
           sysfs_remove_groups+0x3b/0x44
           kobject_del+0x44/0x94
           blk_mq_unregister_dev+0x83/0xdd
           blk_unregister_queue+0xa0/0x10b
           del_gendisk+0x259/0x3fa
           ? disk_events_poll_msecs_store+0x12b/0x12b
           ? check_flags+0x1ea/0x204
           ? mark_held_locks+0x1f/0x7a
           null_del_dev+0x8b/0x1c3 [null_blk]
           null_exit+0x5c/0x95 [null_blk]
           __se_sys_delete_module+0x204/0x337
           ? free_module+0x39f/0x39f
           ? blkcg_maybe_throttle_current+0x8a/0x718
           ? rwlock_bug+0x62/0x62
           ? __blkcg_punt_bio_submit+0xd0/0xd0
           ? trace_hardirqs_on_thunk+0x1a/0x20
           ? mark_held_locks+0x1f/0x7a
           ? do_syscall_64+0x4c/0x295
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
          RIP: 0033:0x7fb696cdbe6b
          Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
          RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
          RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
          RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
          RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
          R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
          R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cecf5d87
    • Ming Lei's avatar
      block: add helper for checking if queue is registered · 58c898ba
      Ming Lei authored
      
      There are 4 users which check if queue is registered, so add one helper
      to check it.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      58c898ba
  14. 12 Aug, 2019 1 commit
  15. 07 Jun, 2019 1 commit
    • Ming Lei's avatar
      block: free sched's request pool in blk_cleanup_queue · c3e22192
      Ming Lei authored
      In theory, IO scheduler belongs to request queue, and the request pool
      of sched tags belongs to the request queue too.
      
      However, the current tags allocation interfaces are re-used for both
      driver tags and sched tags, and driver tags is definitely host wide,
      and doesn't belong to any request queue, same with its request pool.
      So we need tagset instance for freeing request of sched tags.
      
      Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
      of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
      tags to be freed before calling blk_mq_free_tag_set().
      
      Commit 47cdee29 ("block: move blk_exit_queue into __blk_release_queue")
      moves blk_exit_queue into __blk_release_queue for simplying the fast
      path in generic_make_request(), then causes oops during freeing requests
      of sched tags in __blk_release_queue().
      
      Fix the above issue by move freeing request pool of sched tags into
      blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
      in-queue requests at that time. Freeing sched tags has to be kept in queue's
      release handler becasue there might be un-completed dispatch activity
      which might refer to sched tags.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Fixes: 47cdee29
      
       ("block: move blk_exit_queue into __blk_release_queue")
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c3e22192
  16. 29 May, 2019 1 commit
    • Ming Lei's avatar
      block: move blk_exit_queue into __blk_release_queue · 47cdee29
      Ming Lei authored
      Commit 498f6650 ("block: Fix a race between the cgroup code and
      request queue initialization") moves what blk_exit_queue does into
      blk_cleanup_queue() for fixing issue caused by changing back
      queue lock.
      
      However, after legacy request IO path is killed, driver queue lock
      won't be used at all, and there isn't story for changing back
      queue lock. Then the issue addressed by Commit 498f6650 doesn't
      exist any more.
      
      So move move blk_exit_queue into __blk_release_queue.
      
      This patch basically reverts the following two commits:
      
      	498f6650 block: Fix a race between the cgroup code and request queue initialization
      	24ecc358
      
       block: Ensure that a request queue is dissociated from the cgroup controller
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      47cdee29
  17. 25 Apr, 2019 1 commit
  18. 22 Apr, 2019 1 commit
  19. 20 Mar, 2019 1 commit
  20. 11 Feb, 2019 1 commit
  21. 10 Feb, 2019 1 commit
  22. 19 Dec, 2018 1 commit
  23. 18 Dec, 2018 1 commit
  24. 04 Dec, 2018 1 commit
  25. 28 Nov, 2018 1 commit
  26. 16 Nov, 2018 1 commit
  27. 15 Nov, 2018 2 commits
  28. 07 Nov, 2018 2 commits
  29. 31 Oct, 2018 1 commit
  30. 25 Oct, 2018 2 commits
    • Damien Le Moal's avatar
      block: Introduce blk_revalidate_disk_zones() · bf505456
      Damien Le Moal authored
      
      Drivers exposing zoned block devices have to initialize and maintain
      correctness (i.e. revalidate) of the device zone bitmaps attached to
      the device request queue (seq_zones_bitmap and seq_zones_wlock).
      
      To simplify coding this, introduce a generic helper function
      blk_revalidate_disk_zones() suitable for most (and likely all) cases.
      This new function always update the seq_zones_bitmap and seq_zones_wlock
      bitmaps as well as the queue nr_zones field when called for a disk
      using a request based queue. For a disk using a BIO based queue, only
      the number of zones is updated since these queues do not have
      schedulers and so do not need the zone bitmaps.
      
      With this change, the zone bitmap initialization code in sd_zbc.c can be
      replaced with a call to this function in sd_zbc_read_zones(), which is
      called from the disk revalidate block operation method.
      
      A call to blk_revalidate_disk_zones() is also added to the null_blk
      driver for devices created with the zoned mode enabled.
      
      Finally, to ensure that zoned devices created with dm-linear or
      dm-flakey expose the correct number of zones through sysfs, a call to
      blk_revalidate_disk_zones() is added to dm_table_set_restrictions().
      
      The zone bitmaps allocated and initialized with
      blk_revalidate_disk_zones() are freed automatically from
      __blk_release_queue() using the block internal function
      blk_queue_free_zone_bitmaps().
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bf505456
    • Damien Le Moal's avatar
      block: Expose queue nr_zones in sysfs · 965b652e
      Damien Le Moal authored
      
      Expose through sysfs the nr_zones field of struct request_queue.
      Exposing this value helps in debugging disk issues as well as
      facilitating scripts based use of the disk (e.g. blktests).
      
      For zoned block devices, the nr_zones field indicates the total number
      of zones of the device calculated using the known disk capacity and
      zone size. This number of zones is always 0 for regular block devices.
      
      Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
      introduce the blk_queue_nr_zones() function to return the correct value
      for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      965b652e