- 17 Sep, 2015 1 commit
-
-
Ming Lei authored
When bio bounce is involved, one new bio and its biovecs are cloned from the comming bio, which can be one fast-cloned bio from upper layer(such as dm). So it is obviously wrong to assume the start index of the coming( original) bio's io vector is zero, which can be any value between 0 and (bi_max_vecs - 1), especially in case of bio split. This patch fixes Fedora's booting oops on i386, often with the following kernel log together: > [ 9.026738] systemd[1]: Switching root. > [ 9.036467] systemd-journald[149]: Received SIGTERM from PID 1 > (systemd). > [ 9.082262] BUG: Bad page state in process kworker/u5:1 pfn:372ac > [ 9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 > index:0x16a > [ 9.085755] flags: 0x40020021(locked|lru|mappedtodisk) > [ 9.087284] page dumped because: page still charged to cgroup > [ 9.088772] bad because of flags: > [ 9.089731] flags: 0x21(locked|lru) > [ 9.090818] page->mem_cgroup:f2c3e400 Reported-by:
Josh Boyer <jwboyer@fedoraproject.org> Tested-by:
Adam Williamson <awilliam@redhat.com> Cc: Ming Lin <mlin@kernel.org> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 29 Jul, 2015 2 commits
-
-
Jens Axboe authored
Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too. It was a bit of a mess of atomic vs non-atomic access. With BIO_UPTODATE gone, we don't have any risk of concurrent access to the flags. So relax the restriction and don't make any of them atomic. The flags that do have serialization issues (reffed and chained), we already handle those separately. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Christoph Hellwig authored
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
NeilBrown <neilb@suse.com> Signed-off-b...
-
- 23 Jul, 2015 1 commit
-
-
Jan Kara authored
JBD layer wrote back data buffers without setting PageWriteback bit. Thus standard mechanism for guaranteeing stable pages under IO did not work. Since JBD is gone now and there is no other user of the functionality, just remove it. Acked-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Jan Kara <jack@suse.cz>
-
- 02 Jun, 2015 1 commit
-
-
Tejun Heo authored
With the planned cgroup writeback support, backing-dev related declarations will be more widely used across block and cgroup; unfortunately, including backing-dev.h from include/linux/blkdev.h makes cyclic include dependency quite likely. This patch separates out backing-dev-defs.h which only has the essential definitions and updates blkdev.h to include it. c files which need access to more backing-dev details now include backing-dev.h directly. This takes backing-dev.h off the common include dependency chain making it a lot easier to use it across block and cgroup. v2: fs/fat build failure fixed. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 19 May, 2015 1 commit
-
-
Christoph Hellwig authored
Since the big barrier rewrite/removal in 2007 we never fail FLUSH or FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag to help propagating those to the buffer_head layer. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jeff Moyer <jmoyer@redhat.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 27 Apr, 2015 1 commit
-
-
Wang YanQing authored
Commit d2c5e30c ("[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter") convert statistic of nr_bounce to per zone and one global value in vm_stat, but it call inc_|dec_zone_page_state on different pages, then different zones, and cause us to get unexpected value of NR_BOUNCE. Below is the result on my machine: Mar 2 09:26:08 udknight kernel: [144766.778265] Mem-Info: Mar 2 09:26:08 udknight kernel: [144766.778266] DMA per-cpu: Mar 2 09:26:08 udknight kernel: [144766.778268] CPU 0: hi: 0, btch: 1 usd: 0 Mar 2 09:26:08 udknight kernel: [144766.778269] CPU 1: hi: 0, btch: 1 usd: 0 Mar 2 09:26:08 udknight kernel: [144766.778270] Normal per-cpu: Mar 2 09:26:08 udknight kernel: [144766.778271] CPU 0: hi: 186, btch: 31 usd: 0 Mar 2 09:26:08 udknight kernel: [144766.778273] CPU 1: hi: 186, btch: 31 usd: 0 Mar 2 09:26:08 udknight kernel: [144766.778274] HighMem per-cpu: Mar 2 09:26:08 udknight kernel: [144766.778275] CPU 0: hi: 186, btch: 31 usd: 0 Mar 2 09:26:08 udknight kernel: [144766.778276] CPU 1: hi: 186, btch: 31 usd: 0 Mar 2 09:26:08 udknight kernel: [144766.778279] active_anon:46926 inactive_anon:287406 isolated_anon:0 Mar 2 09:26:08 udknight kernel: [144766.778279] active_file:105085 inactive_file:139432 isolated_file:0 Mar 2 09:26:08 udknight kernel: [144766.778279] unevictable:653 dirty:0 writeback:0 unstable:0 Mar 2 09:26:08 udknight kernel: [144766.778279] free:178957 slab_reclaimable:6419 slab_unreclaimable:9966 Mar 2 09:26:08 udknight kernel: [144766.778279] mapped:4426 shmem:305277 pagetables:784 bounce:0 Mar 2 09:26:08 udknight kernel: [144766.778279] free_cma:0 Mar 2 09:26:08 udknight kernel: [144766.778286] DMA free:3324kB min:68kB low:84kB high:100kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 2 09:26:08 udknight kernel: [144766.778287] lowmem_reserve[]: 0 822 3754 3754 Mar 2 09:26:08 udknight kernel: [144766.778293] Normal free:26828kB min:3632kB low:4540kB high:5448kB active_anon:4872kB inactive_anon:68kB active_file:1796kB inactive_file:1796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:842560kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:4144kB slab_reclaimable:25676kB slab_unreclaimable:39864kB kernel_stack:1944kB pagetables:3136kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2412612 all_unreclaimable? yes Mar 2 09:26:08 udknight kernel: [144766.778294] lowmem_reserve[]: 0 0 23451 23451 Mar 2 09:26:08 udknight kernel: [144766.778299] HighMem free:685676kB min:512kB low:3748kB high:6984kB active_anon:182832kB inactive_anon:1149556kB active_file:418544kB inactive_file:555932kB unevictable:2612kB isolated(anon):0kB isolated(file):0kB present:3001732kB managed:3001732kB mlocked:0kB dirty:0kB writeback:0kB mapped:17704kB shmem:1216964kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:75771152kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Mar 2 09:26:08 udknight kernel: [144766.778300] lowmem_reserve[]: 0 0 0 0 You can see bounce:75771152kB for HighMem, but bounce:0 for lowmem and global. This patch fix it. Signed-off-by:
Wang YanQing <udknight@gmail.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 06 Jun, 2014 1 commit
-
-
Mitchel Humpherys authored
printk is meant to be used with an associated log level. There are some instances of printk scattered around the mm code where the log level is missing. Add a log level and adhere to suggestions by scripts/checkpatch.pl by moving to the pr_* macros. Also add the typical pr_fmt definition so that print statements can be easily traced back to the modules where they occur, correlated one with another, etc. This will require the removal of some (now redundant) prefixes on a few print statements. Signed-off-by:
Mitchel Humpherys <mitchelh@codeaurora.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 20 May, 2014 1 commit
-
-
Jens Axboe authored
Continue moving some of the block files that are scattered around. bounce.c contains only code for bouncing the contents of a bio. It's block proper code, not mm code. Suggested-by:
Ming Lei <tom.leiming@gmail.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 24 Nov, 2013 1 commit
-
-
Kent Overstreet authored
More prep work for immutable biovecs - with immutable bvecs drivers won't be able to use the biovec directly, they'll need to use helpers that take into account bio->bi_iter.bi_bvec_done. This updates callers for the new usage without changing the implementation yet. Signed-off-by:
Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Jim Paris <jim@jtan.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Cc: support@lsi.com Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Cc: linux-m68k@lists.linux-m68k.org Cc: linuxppc-dev@lists.ozlabs.org Cc: drbd-user@lists.linbit.com Cc: nbd-general@lists.sourceforge.net Cc: cbe-oss-dev@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-raid@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: DL-MPTFusionLinux@lsi.com Cc: linux-scsi@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: linux-fsdevel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: linux-mm@kvack.org Acked-by:
Geoff Levand <geoff@infradead.org>
-
- 30 Sep, 2013 1 commit
-
-
Darrick J. Wong authored
The "force" parameter in __blk_queue_bounce was being ignored, which means that stable page snapshots are not always happening (on ext3). This of course leads to DIF disks reporting checksum errors, so fix this regression. The regression was introduced in commit 6bc454d1 ("bounce: Refactor __blk_queue_bounce to not use bi_io_vec") Reported-by:
Mel Gorman <mgorman@suse.de> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 29 Apr, 2013 1 commit
-
-
Darrick J. Wong authored
Walking a bio's page mappings has proved problematic, so create a new bio flag to indicate that a bio's data needs to be snapshotted in order to guarantee stable pages during writeback. Next, for the one user (ext3/jbd) of snapshotting, hook all the places where writes can be initiated without PG_writeback set, and set BIO_SNAP_STABLE there. We must also flag journal "metadata" bios for stable writeout, since file data can be written through the journal. Finally, the MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get rid of it. [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration] [akpm@linux-foundation.org: teeny cleanup] Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by:
Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 23 Mar, 2013 3 commits
-
-
Kent Overstreet authored
More prep work for immutable bvecs: A few places in the code were either open coding or using the wrong version - fix. After we introduce the bvec iter, it'll no longer be possible to modify the biovec through bio_for_each_segment_all() - it doesn't increment a pointer to the current bvec, you pass in a struct bio_vec (not a pointer) which is updated with what the current biovec would be (taking into account bi_bvec_done and bi_size). So because of that it's more worthwhile to be consistent about bio_for_each_segment()/bio_for_each_segment_all() usage. Signed-off-by:
Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Alexander Viro <viro@zeniv.linux.org.uk>
-
Kent Overstreet authored
__bio_for_each_segment() iterates bvecs from the specified index instead of bio->bv_idx. Currently, the only usage is to walk all the bvecs after the bio has been advanced by specifying 0 index. For immutable bvecs, we need to split these apart; bio_for_each_segment() is going to have a different implementation. This will also help document the intent of code that's using it - bio_for_each_segment_all() is only legal to use for code that owns the bio. Signed-off-by:
Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Neil Brown <neilb@suse.de> CC: Boaz Harrosh <bharrosh@panasas.com>
-
Kent Overstreet authored
A bunch of what __blk_queue_bounce() was doing was problematic for the immutable bvec work; this cleans that up and the code is quite a bit smaller, too. The __bio_for_each_segment() in copy_to_high_bio_irq() was changed because that one's looping over the original bio, not the bounce bio - a later patch renames __bio_for_each_segment() -> bio_for_each_segment_all(), and documents that bio_for_each_segment_all() is only for code that owns the bio. Signed-off-by:
Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk>
-
- 22 Feb, 2013 1 commit
-
-
Darrick J. Wong authored
This provides a band-aid to provide stable page writes on jbd without needing to backport the fixed locking and page writeback bit handling schemes of jbd2. The band-aid works by using bounce buffers to snapshot page contents instead of waiting. For those wondering about the ext3 bandage -- fixing the jbd locking (which was done as part of ext4dev years ago) is a lot of surgery, and setting PG_writeback on data pages when we actually hold the page lock dropped ext3 performance by nearly an order of magnitude. If we're going to migrate iscsi and raid to use stable page writes, the complaints about high latency will likely return. We might as well centralize their page snapshotting thing to one place. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Tested-by:
Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by:
Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 18 Jul, 2012 1 commit
-
-
Chris Metcalf authored
The tilegx USB OHCI support needs the bounce pool since we're not using the IOMMU to handle 32-bit addresses. Signed-off-by:
Chris Metcalf <cmetcalf@tilera.com>
-
- 20 Mar, 2012 1 commit
-
-
Cong Wang authored
Signed-off-by:
Cong Wang <amwang@redhat.com>
-
- 31 Oct, 2011 1 commit
-
-
Paul Gortmaker authored
The files changed within are only using the EXPORT_SYMBOL macro variants. They are not using core modular infrastructure and hence don't need module.h but only the export.h header. Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com>
-
- 20 Oct, 2011 1 commit
-
-
David Vrabel authored
init_emergency_pool() does not create the page pool for bouncing block requests if the current count of high pages is zero. If high memory may be added later (either via memory hotplug or a balloon driver in a virtualized system) then a oops occurs if a request with a high page need bouncing because the pool does not exist. So, always create the pool if memory hotplug is enabled and change the test so it's valid even if all high pages are currently in the balloon (the balloon drivers adjust totalhigh_pages but not max_pfn). Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 10 Sep, 2010 1 commit
-
-
Gary King authored
I have been seeing problems on Tegra 2 (ARMv7 SMP) systems with HIGHMEM enabled on 2.6.35 (plus some patches targetted at 2.6.36 to perform cache maintenance lazily), and the root cause appears to be that the mm bouncing code is calling flush_dcache_page before it copies the bounce buffer into the bio. The bounced page needs to be flushed after data is copied into it, to ensure that architecture implementations can synchronize instruction and data caches if necessary. Signed-off-by:
Gary King <gking@nvidia.com> Cc: Tejun Heo <tj@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by:
Jens Axboe <axboe@kernel.dk> Cc: <stable@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 30 Mar, 2010 1 commit
-
-
Tejun Heo authored
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include bloc...
-
- 16 Jun, 2009 1 commit
-
-
Li Zefan authored
When porting blktrace to tracepoints, we changed to trace/block.h for trace prober declarations. Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 09 Jun, 2009 1 commit
-
-
Li Zefan authored
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
- 22 May, 2009 1 commit
-
-
Martin K. Petersen authored
Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 29 Dec, 2008 1 commit
-
-
Jens Axboe authored
__blk_queue_bounce() relies on a zeroed bio_vec list, since it looks up arbitrary indexes in the allocated bio. The block layer only guarentees that added entries are valid, so clear memory after alloc. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 26 Nov, 2008 2 commits
-
-
Ingo Molnar authored
Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE() sites. Spread them out to the usage sites, as suggested by Mathieu Desnoyers. Signed-off-by:
Ingo Molnar <mingo@elte.hu> Acked-by:
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
-
Arnaldo Carvalho de Melo authored
This was a forward port of work done by Mathieu Desnoyers, I changed it to encode the 'what' parameter on the tracepoint name, so that one can register interest in specific events and not on classes of events to then check the 'what' parameter. Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
- 09 Oct, 2008 1 commit
-
-
Jens Axboe authored
Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 16 Oct, 2007 1 commit
-
-
Jens Axboe authored
This implements functionality to pass down or insert a barrier in a queue, without having data attached to it. The ->prepare_flush_fn() infrastructure from data barriers are reused to provide this functionality. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 10 Oct, 2007 1 commit
-
-
NeilBrown authored
As bi_end_io is only called once when the reqeust is complete, the 'size' argument is now redundant. Remove it. Now there is no need for bio_endio to subtract the size completed from bi_size. So don't do that either. While we are at it, change bi_end_io to return void. Signed-off-by:
Neil Brown <neilb@suse.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 24 Jul, 2007 1 commit
-
-
Jens Axboe authored
Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 27 Mar, 2007 1 commit
-
-
Vasily Tarasov authored
There is a small problem in handling page bounce. At the moment blk_max_pfn equals max_pfn, which is in fact not maximum possible _number_ of a page frame, but the _amount_ of page frames. For example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not 0xFFFF. request_queue structure has a member q->bounce_pfn and queue needs bounce pages for the pages _above_ this limit. This routine is handled by blk_queue_bounce(), where the following check is produced: if (q->bounce_pfn >= blk_max_pfn) return; Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn equals 0x10000. In such situation the check above fails and for each bio we always fall down for iterating over pages tied to the bio. I want to notice, that for quite a big range of device drivers (ide, md, ...) such problem doesn't happen because they use BLK_BOUNCE_ANY for bounce_pfn. BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and then the check above doesn't fail. But for other drivers, which obtain reuired value from drivers, it fails. For example sata_nv uses ATA_DMA_MASK or dev->dma_mask. I propose to use (max_pfn - 1) for blk_max_pfn. And the same for blk_max_low_pfn. The patch also cleanses some checks related with bounce_pfn. Signed-off-by:
Vasily Tarasov <vtaras@openvz.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 12 Jan, 2007 1 commit
-
-
Jens Axboe authored
Currently we issue a bounce trace when __blk_queue_bounce() is called, but that merely means that the device has a lower dma mask than the higher pages in the system. The bio itself may still be lower pages. So move the bounce trace into __blk_queue_bounce(), when we know there will actually be page bouncing. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 30 Sep, 2006 1 commit
-
-
David Howells authored
Move the bounce buffer code from mm/highmem.c to mm/bounce.c so that it can be more easily disabled when the block layer is disabled. !!!NOTE!!! There may be a bug in this code: Should init_emergency_pool() be contingent on CONFIG_HIGHMEM? Signed-Off-By:
David Howells <dhowells@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-