- 06 May, 2011 1 commit
-
-
shaohua.li@intel.com authored
In some drives, flush requests are non-queueable. When flush request is running, normal read/write requests can't run. If block layer dispatches such request, driver can't handle it and requeue it. Tejun suggested we can hold the queue when flush is running. This can avoid unnecessary requeue. Also this can improve performance. For example, we have request flush1, write1, flush 2. flush1 is dispatched, then queue is hold, write1 isn't inserted to queue. After flush1 is finished, flush2 will be dispatched. Since disk cache is already clean, flush2 will be finished very soon, so looks like flush2 is folded to flush1. In my test, the queue holding completely solves a regression introduced by commit 53d63e6b : block: make the flush insertion use the tail of the dispatch list It's not a preempt type request, in fact we have to insert it behind requests that do specify INSERT_FRONT. which causes about 20% regression running a sysbench fileio workload. Stable: 2.6.39 only Cc: stable@kernel.org Signed-off-by:
Shaohua Li <shaohua.li@intel.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 18 Apr, 2011 1 commit
-
-
Christoph Hellwig authored
Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 05 Apr, 2011 2 commits
-
-
Jens Axboe authored
It's not a preempt type request, in fact we have to insert it behind requests that do specify INSERT_FRONT. Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Jens Axboe authored
Merge it with __elv_add_request(), it's pretty pointless to have a function with only two callers. The main interface is elv_add_request()/__elv_add_request(). Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 10 Mar, 2011 2 commits
-
-
Jens Axboe authored
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Jens Axboe authored
This patch adds support for creating a queuing context outside of the queue itself. This enables us to batch up pieces of IO before grabbing the block device queue lock and submitting them to the IO scheduler. The context is created on the stack of the process and assigned in the task structure, so that we can auto-unplug it if we hit a schedule event. The current queue plugging happens implicitly if IO is submitted to an empty device, yet callers have to remember to unplug that IO when they are going to wait for it. This is an ugly API and has caused bugs in the past. Additionally, it requires hacks in the vm (->sync_page() callback) to handle that logic. By switching to an explicit plugging scheme we make the API a lot nicer and can get rid of the ->sync_page() hack in the vm. Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 02 Mar, 2011 2 commits
-
-
Tejun Heo authored
blk-flush decomposes a flush into sequence of multiple requests. On completion of a request, the next one is queued; however, block layer must not implicitly call into q->request_fn() directly from completion path. This makes the queue behave unexpectedly when seen from the drivers and violates the assumption that q->request_fn() is called with process context + queue_lock. This patch makes blk-flush the following two changes to make sure q->request_fn() is not called directly from request completion path. - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always use kblockd instead of calling directly into q->request_fn(). - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the request queue directly. Reported by Jan in the following threads. http://thread.gmane.org/gmane.linux.ide/48778 http://thread.gmane.org/gmane.linux.ide/48786 stable: applicable to v2.6.37. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Jan Beulich <JBeulich@novell.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: stable@kernel.org Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 25 Jan, 2011 2 commits
-
-
Tejun Heo authored
The current FLUSH/FUA support has evolved from the implementation which had to perform queue draining. As such, sequencing is done queue-wide one flush request after another. However, with the draining requirement gone, there's no reason to keep the queue-wide sequential approach. This patch reimplements FLUSH/FUA support such that each FLUSH/FUA request is sequenced individually. The actual FLUSH execution is double buffered and whenever a request wants to execute one for either PRE or POSTFLUSH, it queues on the pending queue. Once certain conditions are met, a flush request is issued and on its completion all pending requests proceed to the next sequence. This allows arbitrary merging of different type of flushes. How they are merged can be primarily controlled and tuned by adjusting the above said 'conditions' used to determine when to issue the next flush. This is inspired by Darrick's patches to merge multiple zero-data flushes which helps workloads with highly concurrent fsync requests. * As flush requests are never put on the IO scheduler, request fields used for flush share space with rq->rb_node. rq->completion_data is moved out of the union. This increases the request size by one pointer. As rq->elevator_private* are used only by the iosched too, it is possible to reduce the request size further. However, to do that, we need to modify request allocation path such that iosched data is not allocated for flush requests. * FLUSH/FUA processing happens on insertion now instead of dispatch. - Comments updated as per Vivek and Mike. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: "Darrick J. Wong" <djwong@us.ibm.com> Cc: Shaohua Li <shli@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
rq == &q->flush_rq was used to determine whether a rq is part of a flush sequence, which worked because all requests in a flush sequence were sequenced using the single dedicated request. This is about to change, so introduce REQ_FLUSH_SEQ flag to distinguish flush sequence requests. This patch doesn't cause any behavior change. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 16 Sep, 2010 1 commit
-
-
Christoph Hellwig authored
All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 10 Sep, 2010 12 commits
-
-
Tejun Heo authored
Update blkdev_issue_flush() to use new REQ_FLUSH interface. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
rq->rq_disk and bio->bi_bdev->bd_disk may differ if a request has passed through remapping drivers. FSEQ_DATA request incorrectly followed bio->bi_bdev->bd_disk ending up being issued w/ mismatching rq_disk. Make it follow orig_rq->rq_disk. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Tested-by:
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
While completing a request from a REQ_FLUSH/FUA sequence, another request can be pushed to the request queue. If a driver tests elv_queue_empty() before completing a request and runs the queue again only if the queue wasn't empty, this may lead to hang. Please note that most drivers either kick the queue unconditionally or test queue emptiness after completing the current request and don't have this problem. This patch removes this possibility by making REQ_FLUSH/FUA sequence code kick the queue if the queue was empty before completing a request from REQ_FLUSH/FUA sequence. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
init_flush_request() only set REQ_FLUSH when initializing flush requests making them READ requests. Use WRITE_FLUSH instead. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Christoph Hellwig authored
We need to call blk_rq_init and elv_insert for all cases in queue_next_fseq, so take these calls into common code. Also move the end_io initialization from queue_flush into queue_next_fseq and rename queue_flush to init_flush_request now that it's old name doesn't apply anymore. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
Now that the backend conversion is complete, export sequenced FLUSH/FUA capability through REQ_FLUSH/FUA flags. REQ_FLUSH means the device cache should be flushed before executing the request. REQ_FUA means that the data in the request should be on non-volatile media on completion. Block layer will choose the correct way of implementing the semantics and execute it. The request may be passed to the device directly if the device can handle it; otherwise, it will be sequenced using one or more proxy requests. Devices will never see REQ_FLUSH and/or FUA which it doesn't support. Also, unlike the original REQ_HARDBARRIER, REQ_FLUSH/FUA requests are never failed with -EOPNOTSUPP. If the underlying device doesn't support FLUSH/FUA, the block layer simply make those noop. IOW, it no longer distinguishes between writeback cache which doesn't support cache flush and writethrough/no cache. Devices which have WB cache w/o flush are very difficult to come by these days and there's nothing much we can do anyway, so it doesn't make sense to require everyone to implement -EOPNOTSUPP handling. This will simplify filesystems and block drivers as they can drop -EOPNOTSUPP retry logic for barriers. * QUEUE_ORDERED_* are removed and QUEUE_FSEQ_* are moved into blk-flush.c. * REQ_FLUSH w/o data can also be directly passed to drivers without sequencing but some drivers assume that zero length requests don't have rq->bio which isn't true for these requests requiring the use of proxy requests. * REQ_COMMON_MASK now includes REQ_FLUSH | REQ_FUA so that they are copied from bio to request. * WRITE_BARRIER is marked deprecated and WRITE_FLUSH, WRITE_FUA and WRITE_FLUSH_FUA are added. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
With ordering requirements dropped, barrier and ordered are misnomers. Now all block layer does is sequencing FLUSH and FUA. Rename them to flush. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
Without ordering requirements, barrier and ordering are minomers. Rename block/blk-barrier.c to block/blk-flush.c. Rename of symbols will follow. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
Filesystems will take all the responsibilities for ordering requests around commit writes and will only indicate how the commit writes themselves should be handled by block layers. This patch drops barrier ordering by queue draining from block layer. Ordering by draining implementation was somewhat invasive to request handling. List of notable changes follow. * Each queue has 1 bit color which is flipped on each barrier issue. This is used to track whether a given request is issued before the current barrier or not. REQ_ORDERED_COLOR flag and coloring implementation in __elv_add_request() are removed. * Requests which shouldn't be processed yet for draining were stalled by returning -EAGAIN from blk_do_ordered() according to the test result between blk_ordered_req_seq() and blk_blk_ordered_cur_seq(). This logic is removed. * Draining completion logic in elv_completed_request() removed. * All barrier sequence requests were queued to request queue and then trckled to lower layer according to progress and thus maintaining request orders during requeue was necessary. This is replaced by queueing the next request in the barrier sequence only after the current one is complete from blk_ordered_complete_seq(), which removes the need for multiple proxy requests in struct request_queue and the request sorting logic in the ELEVATOR_INSERT_REQUEUE path of elv_insert(). * As barriers no longer have ordering constraints, there's no need to dump the whole elevator onto the dispatch queue on each barrier. Insert barriers at the front instead. * If other barrier requests come to the front of the dispatch queue while one is already in progress, they are stored in q->pending_barriers and restored to dispatch queue one-by-one after each barrier completion from blk_ordered_complete_seq(). Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
Make the following cleanups in preparation of barrier/flush update. * blk_do_ordered() declaration is moved from include/linux/blkdev.h to block/blk.h. * blk_do_ordered() now returns pointer to struct request, with %NULL meaning "try the next request" and ERR_PTR(-EAGAIN) "try again later". The third case will be dropped with further changes. * In the initialization of proxy barrier request, data direction is already set by init_request_from_bio(). Drop unnecessary explicit REQ_WRITE setting and move init_request_from_bio() above REQ_FUA flag setting. * add_request() is collapsed into __make_request(). These changes don't make any functional difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler blk_queue_flush(). blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a device has write cache and can flush it, it should set REQ_FLUSH. If the device can handle FUA writes, it should also set REQ_FUA. All blk_queue_ordered() users are converted. * ORDERED_DRAIN is mapped to 0 which is the default value. * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH. * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Boaz Harrosh <bharrosh@panasas.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Tejun Heo authored
Nobody is making meaningful use of ORDERED_BY_TAG now and queue draining for barrier requests will be removed soon which will render the advantage of tag ordering moot. Kill ORDERED_BY_TAG. The following users are affected. * brd: converted to ORDERED_DRAIN. * virtio_blk: ORDERED_TAG path was already marked deprecated. Removed. * xen-blkfront: ORDERED_TAG case dropped. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 07 Aug, 2010 8 commits
-
-
FUJITA Tomonori authored
q->bar_rq.rq_disk is NULL. Use the rq_disk of the original request instead. Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
FUJITA Tomonori authored
the block layer doesn't set rq->cmd_type on flush requests. By definition, it should be REQ_TYPE_FS (the lower layers build a command and interpret the result of it, that is, the block layer doesn't know the details). Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Dave Chinner authored
Issuing a blkdev_issue_flush() on an unconfigured loop device causes a panic as q->make_request_fn is not configured. This can occur when trying to mount the unconfigured loop device as an XFS filesystem. There are no guards that catch the bio before the request function is called because we don't add a payload to the bio. Instead, manually check this case as soon as we have a pointer to the queue to flush. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
FUJITA Tomonori authored
This removes q->prepare_flush_fn completely (changes the blk_queue_ordered API). Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
FUJITA Tomonori authored
This is preparation for removing q->prepare_flush_fn. Temporarily, blk_queue_ordered() permits QUEUE_ORDERED_DO_PREFLUSH and QUEUE_ORDERED_DO_POSTFLUSH without prepare_flush_fn. Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
FUJITA Tomonori authored
SCSI-ml needs a way to mark a request as flush request in q->prepare_flush_fn because it needs to identify them later (e.g. in q->request_fn or prep_rq_fn). queue_flush sets REQ_HARDBARRIER in rq->cmd_flags however the block layer also sends normal REQ_TYPE_FS requests with REQ_HARDBARRIER. So SCSI-ml can't use REQ_HARDBARRIER to identify flush requests. We could change the block layer to clear REQ_HARDBARRIER bit before sending non flush requests to the lower layers. However, intorudcing the new flag looks cleaner (surely easier). Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alasdair G Kergon <agk@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Christoph Hellwig authored
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
Christoph Hellwig authored
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types instead of unwinding through macros. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jaxboe@fusionio.com>
-
- 28 Apr, 2010 3 commits
-
-
Dmitry Monakhov authored
Move blkdev_issue_discard from blk-barrier.c because it is not barrier related. Later the file will be populated by other helpers. Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Dmitry Monakhov authored
In some places caller don't want to wait a request to complete. Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Dmitry Monakhov authored
The patch just convert all blkdev_issue_xxx function to common set of flags. Wait/allocation semantics preserved. Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 30 Mar, 2010 1 commit
-
-
Tejun Heo authored
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include bloc...
-
- 29 Dec, 2009 1 commit
-
-
OGAWA Hirofumi authored
Signed-off-by:
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 01 Oct, 2009 4 commits
-
-
Christoph Hellwig authored
Currently we set the bio size to the byte equivalent of the blocks to be trimmed when submitting the initial DISCARD ioctl. That means it is subject to the max_hw_sectors limitation of the HBA which is much lower than the size of a DISCARD request we can support. Add a separate max_discard_sectors tunable to limit the size for discard requests. We limit the max discard request size in bytes to 32bit as that is the limit for bio->bi_size. This could be much larger if we had a way to pass that information through the block layer. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Christoph Hellwig authored
prepare_discard_fn() was being called in a place where memory allocation was effectively impossible. This makes it inappropriate for all but the most trivial translations of Linux's DISCARD operation to the block command set. Additionally adding a payload there makes the ownership of the bio backing unclear as it's now allocated by the device driver and not the submitter as usual. It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether the queue supports discard operations or not. blkdev_issue_discard now allocates a one-page, sector-length payload which is the right thing for the common ATA and SCSI implementations. The mtd implementation of prepare_discard_fn() is replaced with simply checking for the request being a discard. Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx> which did the prepare_discard_fn but not the different payload allocation yet. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Christoph Hellwig authored
Currently we set the bio size to the byte equivalent of the blocks to be trimmed when submitting the initial DISCARD ioctl. That means it is subject to the max_hw_sectors limitation of the HBA which is much lower than the size of a DISCARD request we can support. Add a separate max_discard_sectors tunable to limit the size for discard requests. We limit the max discard request size in bytes to 32bit as that is the limit for bio->bi_size. This could be much larger if we had a way to pass that information through the block layer. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Christoph Hellwig authored
prepare_discard_fn() was being called in a place where memory allocation was effectively impossible. This makes it inappropriate for all but the most trivial translations of Linux's DISCARD operation to the block command set. Additionally adding a payload there makes the ownership of the bio backing unclear as it's now allocated by the device driver and not the submitter as usual. It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether the queue supports discard operations or not. blkdev_issue_discard now allocates a one-page, sector-length payload which is the right thing for the common ATA and SCSI implementations. The mtd implementation of prepare_discard_fn() is replaced with simply checking for the request being a discard. Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx> which did the prepare_discard_fn but not the different payload allocation yet. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-