1. 24 Feb, 2011 1 commit
  2. 17 Nov, 2010 1 commit
  3. 13 Nov, 2010 1 commit
    • Tejun Heo's avatar
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo authored
      
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarNeil Brown <neilb@suse.de>
      Acked-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
  4. 10 Nov, 2010 2 commits
  5. 16 Sep, 2010 1 commit
    • Christoph Hellwig's avatar
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig authored
      
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  6. 15 Sep, 2010 1 commit
    • Will Drewry's avatar
      block, partition: add partition_meta_info to hd_struct · 6d1d8050
      Will Drewry authored
      I'm reposting this patch series as v4 since there have been no additional
      comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
      are against Linus's tree: 2bfc96a1
      
      
      (2.6.36-rc3).
      
      Would this patchset be suitable for inclusion in an mm branch?
      
      This changes adds a partition_meta_info struct which itself contains a
      union of structures that provide partition table specific metadata.
      
      This change leaves the union empty. The subsequent patch includes an
      implementation for CONFIG_EFI_PARTITION-based metadata.
      Signed-off-by: default avatarWill Drewry <wad@chromium.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      6d1d8050
  7. 12 Aug, 2010 1 commit
  8. 07 Aug, 2010 4 commits
  9. 28 Apr, 2010 1 commit
  10. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        bloc...
      5a0e3ad6
  11. 03 Dec, 2009 1 commit
    • Martin K. Petersen's avatar
      block: Allow devices to indicate whether discarded blocks are zeroed · 98262f27
      Martin K. Petersen authored
      
      The discard ioctl is used by mkfs utilities to clear a block device
      prior to putting metadata down.  However, not all devices return zeroed
      blocks after a discard.  Some drives return stale data, potentially
      containing old superblocks.  It is therefore important to know whether
      discarded blocks are properly zeroed.
      
      Both ATA and SCSI drives have configuration bits that indicate whether
      zeroes are returned after a discard operation.  Implement a block level
      interface that allows this information to be bubbled up the stack and
      queried via a new block device ioctl.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      98262f27
  12. 03 Oct, 2009 1 commit
  13. 14 Sep, 2009 1 commit
    • Christoph Hellwig's avatar
      block: use blkdev_issue_discard in blk_ioctl_discard · 746cd1e7
      Christoph Hellwig authored
      
      blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
      the only difference between the two is that blkdev_issue_discard needs to
      send a barrier discard request and blk_ioctl_discard a non-barrier one,
      and blk_ioctl_discard needs to wait on the request.  To facilitates this
      add a flags argument to blkdev_issue_discard to control both aspects of the
      behaviour.  This will be very useful later on for using the waiting
      funcitonality for other callers.
      
      Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      746cd1e7
  14. 22 May, 2009 2 commits
  15. 15 Apr, 2009 1 commit
  16. 29 Dec, 2008 1 commit
  17. 18 Nov, 2008 1 commit
  18. 21 Oct, 2008 7 commits
  19. 09 Oct, 2008 9 commits
    • Tejun Heo's avatar
      block: make partition array dynamic · 540eed56
      Tejun Heo authored
      
      disk->__part used to be statically allocated to the maximum possible
      number of partitions.  This patch makes partition array allocation
      dynamic.  The added overhead is minimal as only real change is one
      memory dereference changed to RCU one.  This saves both a bit of
      memory and cpu cycles iterating through unoccupied slots and makes
      increasing partition limit easier.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      540eed56
    • Tejun Heo's avatar
      block: introduce partition 0 · b5d0b9df
      Tejun Heo authored
      
      genhd and partition code handled disk and partitions separately.  All
      information about the whole disk was in struct genhd and partitions in
      struct hd_struct.  However, the whole disk (part0) and other
      partitions have a lot in common and the data structures end up having
      good number of common fields and thus separate code paths doing the
      same thing.  Also, the partition array was indexed by partno - 1 which
      gets pretty confusing at times.
      
      This patch introduces partition 0 and makes the partition array
      indexed by partno.  Following patches will unify the handling of disk
      and parts piece-by-piece.
      
      This patch also implements disk_partitionable() which tests whether a
      disk is partitionable.  With coming dynamic partition array change,
      the most common usage of disk_max_parts() will be testing whether a
      disk is partitionable and the number of max partitions will become
      much less important.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      b5d0b9df
    • Tejun Heo's avatar
      block: fix disk->part[] dereferencing race · e71bf0d0
      Tejun Heo authored
      
      disk->part[] is protected by its matching bdev's lock.  However,
      non-critical accesses like collecting stats and printing out sysfs and
      proc information used to be performed without any locking.  As
      partitions can come and go dynamically, partitions can go away
      underneath those non-critical accesses.  As some of those accesses are
      writes, this theoretically can lead to silent corruption.
      
      This patch fixes the race by using RCU for the partition array and dev
      reference counter to hold partitions.
      
      * Rename disk->part[] to disk->__part[] to make sure no one outside
        genhd layer proper accesses it directly.
      
      * Use RCU for disk->__part[] dereferencing.
      
      * Implement disk_{get|put}_part() which can be used to get and put
        partitions from gendisk respectively.
      
      * Iterators are implemented to help iterate through all partitions
        safely.
      
      * Functions which require RCU readlock are marked with _rcu suffix.
      
      * Use disk_put_part() in __blkdev_put() instead of directly putting
        the contained kobject.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      e71bf0d0
    • Tejun Heo's avatar
      block: don't depend on consecutive minor space · f331c029
      Tejun Heo authored
      
      * Implement disk_devt() and part_devt() and use them to directly
        access devt instead of computing it from ->major and ->first_minor.
      
        Note that all references to ->major and ->first_minor outside of
        block layer is used to determine devt of the disk (the part0) and as
        ->major and ->first_minor will continue to represent devt for the
        disk, converting these users aren't strictly necessary.  However,
        convert them for consistency.
      
      * Implement disk_max_parts() to avoid directly deferencing
        genhd->minors.
      
      * Update bdget_disk() such that it doesn't assume consecutive minor
        space.
      
      * Move devt computation from register_disk() to add_disk() and make it
        the only one (all other usages use the initially determined value).
      
      These changes clean up the code and will help disk->part dereference
      fix and extended block device numbers.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      f331c029
    • Tejun Heo's avatar
      block: make variable and argument names more consistent · cf771cb5
      Tejun Heo authored
      
      In hd_struct, @partno is used to denote partition number and a number
      of other places use @part to denote hd_struct.  Functions use @part
      and @index instead.  This causes confusion and makes it difficult to
      use consistent variable names for hd_struct.  Always use @partno if a
      variable represents partition number.
      
      Also, print out functions use @f or @part for seq_file argument.  Use
      @seqf uniformly instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      cf771cb5
    • Tejun Heo's avatar
      block: update add_partition() error handling · 88e34126
      Tejun Heo authored
      d805dda4
      
       tried to fix error case handling in add_partition() but had a
      few problems.
      
      * disk->part[] entry is set early and left dangling if operation
        fails.
      
      * Once device initialized, the last put_device() is responsible for
        freeing all the resources.  The failure path freed part_stats and p
        regardless of put_device() causing double free.
      
      * holders subdir holds reference to the disk device, so failure path
        should remove it to release resources properly which was missing.
      
      This patch fixes the above problems and while at it move partition
      slot busy check into add_partition() for completeness and inlines
      holders subdirectory creation.  Using separate function for it just
      obfuscates the code.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Abdel Benamrouche <draconux@gmail.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      88e34126
    • Tejun Heo's avatar
      block: allow deleting zero length partition · ec2cdedf
      Tejun Heo authored
      
      delete_partition() was noop for zero length partition.  As the
      addition code allows creating zero lenght partition and deletion is
      assumed to always succeed, this causes memory leak for zero length
      partitions.  Allow zero length partitions to end their meaningless
      lives.
      
      While at it, allow deleting zero lenght partition via
      BLKPG_DEL_PARTITION ioctl too.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      ec2cdedf
    • David Woodhouse's avatar
      Allow elevators to sort/merge discard requests · e17fc0a1
      David Woodhouse authored
      
      But blkdev_issue_discard() still emits requests which are interpreted as
      soft barriers, because naïve callers might otherwise issue subsequent
      writes to those same sectors, which might cross on the queue (if they're
      reallocated quickly enough).
      
      Callers still _can_ issue non-barrier discard requests, but they have to
      take care of queue ordering for themselves.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      e17fc0a1
    • David Woodhouse's avatar
      Add BLKDISCARD ioctl to allow userspace to discard sectors · d30a2605
      David Woodhouse authored
      
      We may well want mkfs tools to use this to mark the whole device as
      unwanted before they format it, for example.
      
      The ioctl takes a pair of uint64_ts, which are start offset and length
      in _bytes_. Although at the moment it might make sense for them both to
      be in 512-byte sectors, I don't want to limit the ABI to that.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      d30a2605
  20. 25 Jul, 2008 1 commit
  21. 10 Oct, 2007 1 commit