- 29 Jan, 2022 4 commits
-
-
Paolo Bonzini authored
commit 0780516a upstream. This fixes the new ept_access_test_read_only and ept_access_test_read_write testcases from vmx.flat. The problem is that gpte_access moves bits around to switch from EPT bit order (XWR) to ACC_*_MASK bit order (RWX). This results in an incorrect exit qualification. To fix this, make pt_access and pte_access operate on raw PTE values (only with NX flipped to mean "can execute") and call gpte_access at the end of the walk. This lets us use pte_access to compute the exit qualification with XWR bit order. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Xiao Guangrong <xiaoguangrong@tencent.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> [bwh: Backported to 4.9: - There's no support for EPT accessed/dirty bits, so do not use have_ad flag - Adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
commit dd99e9f9 upstream. Set up the connection to the NFSv4 server in nfs4_alloc_client(), before we've added the struct nfs_client to the net-namespace's nfs_client_list so that a downed server won't cause other mounts to hang in the trunking detection code. Reported-by:
Michael Wakabayashi <mwakabayashi@vmware.com> Fixes: 5c6e5b60 ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS") Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> [bwh: Backported to 4.9: adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit 35d2969e upstream. The bounds checking in avc_ca_pmt() is not strict enough. It should be checking "read_pos + 4" because it's reading 5 bytes. If the "es_info_length" is non-zero then it reads a 6th byte so there needs to be an additional check for that. I also added checks for the "write_pos". I don't think these are required because "read_pos" and "write_pos" are tied together so checking one ought to be enough. But they make the code easier to understand for me. The check on write_pos is: if (write_pos + 4 >= sizeof(c->operand) - 4) { The first "+ 4" is because we're writing 5 bytes and the last " - 4" is to leave space for the CRC. The other problem is that "length" can be invalid. It comes from "data_length" in fdtv_ca_pmt(). Cc: stable@vger.kernel.org Reported-by:
Luo Likang <luolikang@nsfocus.com> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> [bwh: Backported to 4.9: adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tvrtko Ursulin authored
commit 7938d615 upstream. We need to flush TLBs before releasing backing store otherwise userspace is able to encounter stale entries if a) it is not declaring access to certain buffers and b) it races with the backing store release from a such undeclared execution already executing on the GPU in parallel. The approach taken is to mark any buffer objects which were ever bound to the GPU and to trigger a serialized TLB flush when their backing store is released. Alternatively the flushing could be done on VMA unbind, at which point we would be able to ascertain whether there is potential a parallel GPU execution (which could race), but essentially it boils down to paying the cost of TLB flushes potentially needlessly at VMA unbind time (when the backing store is not known to be going away so not needed for safety), versus potentially needlessly at backing store relase time (since we at that point cannot tell whether there is anything executing on the GPU which uses that object). Thereforce simplicity of implementation has been chosen for now with scope to benchmark and refine later as required. Signed-off-by:
Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reported-by:
Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by:
Dave Airlie <airlied@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 27 Jan, 2022 36 commits
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20220124183932.787526760@linuxfoundation.org Tested-by:
Shuah Khan <skhan@linuxfoundation.org> Tested-by:
Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220125155253.051565866@linuxfoundation.org Tested-by:
Florian Fainelli <f.fainelli@gmail.com> Tested-by:
Jon Hunter <jonathanh@nvidia.com> Tested-by:
Shuah Khan <skhan@linuxfoundation.org> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Piggin authored
commit f8be156b upstream. It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly. Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page). This addresses CVE-2021-22543. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Tested-by:
Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit a9545779 upstream. Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving a so called "remapped" hva/pfn pair. In theory, the hva could resolve to a pfn in high memory on a 32-bit kernel. This bug was inadvertantly exposed by commit bd2fae8d ("KVM: do not assume PTE is writable after follow_pfn"), which added an error PFN value to the mix, causing gcc to comlain about overflowing the unsigned long. arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’: include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’ to ‘long unsigned int’ changes value from ‘9218868437227405314’ to ‘2’ [-Werror=overflow] 89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) | ^ virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’ Cc: stable@vger.kernel.org Fixes: add6a0cd ("KVM: MMU: try to fix up page faults before giving up") Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20210208201940.1258328-1-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
commit bd2fae8d upstream. In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn. In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable. Usage of follow_pfn was introduced in commit add6a0cd ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable. Fixes: 2e2e3738 ("KVM: Handle vma regions with no backing page") Reported-by:
David Stevens <stevensd@google.com> Cc: 3pvd@google.com Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> [OP: backport to 4.19, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by:
Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backport to 4.9: follow_pte_pmd() does not take start or end parameters] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ross Zwisler authored
commit 09796395 upstream. Patch series "Write protect DAX PMDs in *sync path". Currently dax_mapping_entry_mkclean() fails to clean and write protect the pmd_t of a DAX PMD entry during an *sync operation. This can result in data loss, as detailed in patch 2. This series is based on Dan's "libnvdimm-pending" branch, which is the current home for Jan's "dax: Page invalidation fixes" series. You can find a working tree here: https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean This patch (of 2): Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a huge page PMD leaf to be found and returned. Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.com Signed-off-by:
Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by:
Dave Hansen <dave.hansen@intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 4.9: adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Davidlohr Bueso authored
commit 511885d7 upstream. Simplify the timerqueue code by using cached rbtrees and rely on the tree leftmost node semantics to get the timer with earliest expiration time. This is a drop in conversion, and therefore semantics remain untouched. The runtime overhead of cached rbtrees is be pretty much the same as the current head->next method, noting that when removing the leftmost node, a common operation for the timerqueue, the rb_next(leftmost) is O(1) as well, so the next timer will either be the right node or its parent. Therefore no extra pointer chasing. Finally, the size of the struct timerqueue_head remains the same. Passes several hours of rcutorture. Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190724152323.bojciei3muvfxalm@linux-r8p5 [bwh: While this was supposed to be just refactoring, it also fixed a security flaw (CVE-2021-20317). Backported to 4.9: - Deleted code in timerqueue_del() is different before commit d852d394 "timerqueue: Use rb_entry_safe() instead of open-coding it" - Adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Davidlohr Bueso authored
commit cd9e61ed upstream. Patch series "rbtree: Cache leftmost node internally", v4. A series to extending rbtrees to internally cache the leftmost node such that we can have fast overlap check optimization for all interval tree users[1]. The benefits of this series are that: (i) Unify users that do internal leftmost node caching. (ii) Optimize all interval tree users. (iii) Convert at least two new users (epoll and procfs) to the new interface. This patch (of 16): Red-black tree semantics imply that nodes with smaller or greater (or equal for duplicates) keys always be to the left and right, respectively. For the kernel this is extremely evident when considering our rb_first() semantics. Enabling lookups for the smallest node in the tree in O(1) can save a good chunk of cycles in not having to walk down the tree each time. To this end there are a few core users that explicitly do this, such as the scheduler and rtmutexes. There is also the desire for interval trees to have this optimization allowing faster overlap checking. This patch introduces a new 'struct rb_root_cached' which is just the root with a cached pointer to the leftmost node. The reason why the regular rb_root was not extended instead of adding a new structure was that this allows the user to have the choice between memory footprint and actual tree performance. The new wrappers on top of the regular rb_root calls are: - rb_first_cached(cached_root) -- which is a fast replacement for rb_first. - rb_insert_color_cached(node, cached_root, new) - rb_erase_cached(node, cached_root) In addition, augmented cached interfaces are also added for basic insertion and deletion operations; which becomes important for the interval tree changes. With the exception of the inserts, which adds a bool for updating the new leftmost, the interfaces are kept the same. To this end, porting rb users to the cached version becomes really trivial, and keeping current rbtree semantics for users that don't care about the optimization requires zero overhead. Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reviewed-by:
Jan Kara <jack@suse.cz> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Moore authored
commit ad5d07f4 upstream. The current CIPSO and CALIPSO refcounting scheme for the DOI definitions is a bit flawed in that we: 1. Don't correctly match gets/puts in netlbl_cipsov4_list(). 2. Decrement the refcount on each attempt to remove the DOI from the DOI list, only removing it from the list once the refcount drops to zero. This patch fixes these problems by adding the missing "puts" to netlbl_cipsov4_list() and introduces a more conventional, i.e. not-buggy, refcounting mechanism to the DOI definitions. Upon the addition of a DOI to the DOI list, it is initialized with a refcount of one, removing a DOI from the list removes it from the list and drops the refcount by one; "gets" and "puts" behave as expected with respect to refcounts, increasing and decreasing the DOI's refcount by one. Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts") Fixes: d7cce015 ("netlabel: Add support for removing a CALIPSO DOI.") Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com Signed-off-by:
Paul Moore <paul@paul-moore.com> Signed-off-by:
David S. Miller <davem@davemloft.net> [bwh: Backported to 4.9: adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Braun authored
commit d8861bab upstream. When using jumbo packets and overrunning rx queue with napi enabled, the following sequence is observed in gfar_add_rx_frag: | lstatus | | skb | t | lstatus, size, flags | first | len, data_len, *ptr | ---+--------------------------------------+-------+-----------------------+ 13 | 18002348, 9032, INTERRUPT LAST | 0 | 9600, 8000, f554c12e | 12 | 10000640, 1600, INTERRUPT | 0 | 8000, 6400, f554c12e | 11 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, f554c12e | 10 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, f554c12e | 09 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, f554c12e | 08 | 14000640, 1600, INTERRUPT FIRST | 0 | 1600, 0, f554c12e | 07 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, f554c12e | 06 | 1c000080, 128, INTERRUPT LAST FIRST | 1 | 0, 0, abf3bd6e | 05 | 18002348, 9032, INTERRUPT LAST | 0 | 8000, 6400, c5a57780 | 04 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, c5a57780 | 03 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, c5a57780 | 02 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, c5a57780 | 01 | 10000640, 1600, INTERRUPT | 0 | 1600, 0, c5a57780 | 00 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, c5a57780 | So at t=7 a new packets is started but not finished, probably due to rx overrun - but rx overrun is not indicated in the flags. Instead a new packets starts at t=8. This results in skb->len to exceed size for the LAST fragment at t=13 and thus a negative fragment size added to the skb. This then crashes: kernel BUG at include/linux/skbuff.h:2277! Oops: Exception in kernel mode, sig: 5 [#1] ... NIP [c04689f4] skb_pull+0x2c/0x48 LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844 Call Trace: [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable) [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4 [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0 [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70 [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc [ec4bff08] [c0062718] kthread+0x140/0x144 [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c This patch fixes this by checking for computed LAST fragment size, so a negative sized fragment is never added. In order to prevent the newer rx frame from getting corrupted, the FIRST flag is checked to discard the incomplete older frame. Signed-off-by:
Michael Braun <michael-dev@fami-braun.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Spencer authored
commit d903ec77 upstream. Previously, buffer descriptors containing only the frame check sequence (FCS) were skipped and not added to the skb. However, the page reference count was still incremented, leading to a memory leak. Fixing this inside gfar_add_rx_frag() is difficult due to reserved memory handling and page reuse. Instead, move the FCS handling to gfar_process_frame() and trim off the FCS before passing the skb up the networking stack. Signed-off-by:
Andy Spencer <aspencer@spacex.com> Signed-off-by:
Jim Gruen <jgruen@spacex.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Airlie authored
commit 5de5b6ec upstream. This is confusing, and from my reading of all the drivers only nouveau got this right. Just make the API act under driver control of it's own allocation failing, and don't call destroy, if the page table fails to create there is nothing to cleanup here. (I'm willing to believe I've missed something here, so please review deeply). Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728041736.20689-1-airlied@gmail.com [bwh: Backported to 4.14: - Drop change in ttm_sg_tt_init() - Adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 9bbd42e7 upstream. Doing a "get_user_pages()" on a copy-on-write page for reading can be ambiguous: the page can be COW'ed at any time afterwards, and the direction of a COW event isn't defined. Yes, whoever writes to it will generally do the COW, but if the thread that did the get_user_pages() unmapped the page before the write (and that could happen due to memory pressure in addition to any outright action), the writer could also just take over the old page instead. End result: the get_user_pages() call might result in a page pointer that is no longer associated with the original VM, and is associated with - and controlled by - another VM having taken it over instead. So when doing a get_user_pages() on a COW mapping, the only really safe thing to do would be to break the COW when getting the page, even when only getting it for reading. At the same time, some users simply don't even care. For example, the perf code wants to look up the page not because it cares about the page, but because the code simply wants to look up the physical address of the access for informational purposes, and doesn't really care about races when a page might be unmapped and remapped elsewhere. This adds logic to force a COW event by setting FOLL_WRITE on any copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page pointer as a result. The current semantics end up being: - __get_user_pages_fast(): no change. If you don't ask for a write, you won't break COW. You'd better know what you're doing. - get_user_pages_fast(): the fast-case "look it up in the page tables without anything getting mmap_sem" now refuses to follow a read-only page, since it might need COW breaking. Which happens in the slow path - the fast path doesn't know if the memory might be COW or not. - get_user_pages() (including the slow-path fallback for gup_fast()): for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with very similar semantics to FOLL_FORCE. If it turns out that we want finer granularity (ie "only break COW when it might actually matter" - things like the zero page are special and don't need to be broken) we might need to push these semantics deeper into the lookup fault path. So if people care enough, it's possible that we might end up adding a new internal FOLL_BREAK_COW flag to go with the internal FOLL_COW flag we already have for tracking "I had a COW". Alternatively, if it turns out that different callers might want to explicitly control the forced COW break behavior, we might even want to make such a flag visible to the users of get_user_pages() instead of using the above default semantics. But for now, this is mostly commentary on the issue (this commit message being a lot bigger than the patch, and that patch in turn is almost all comments), with that minimal "enable COW breaking early" logic using the existing FOLL_WRITE behavior. [ It might be worth noting that we've always had this ambiguity, and it could arguably be seen as a user-space issue. You only get private COW mappings that could break either way in situations where user space is doing cooperative things (ie fork() before an execve() etc), but it _is_ surprising and very subtle, and fork() is supposed to give you independent address spaces. So let's treat this as a kernel issue and make the semantics of get_user_pages() easier to understand. Note that obviously a true shared mapping will still get a page that can change under us, so this does _not_ mean that get_user_pages() somehow returns any "stable" page ] [surenb: backport notes Replaced (gup_flags | FOLL_WRITE) with write=1 in gup_pgd_range. Removed FOLL_PIN usage in should_force_cow_break since it's missing in the earlier kernels.] Reported-by:
Jann Horn <jannh@google.com> Tested-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Oleg Nesterov <oleg@redhat.com> Acked-by:
Kirill Shutemov <kirill@shutemov.name> Acked-by:
Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [surenb: backport to 4.19 kernel] Cc: stable@vger.kernel.org # 4.19.x Signed-off-by:
Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 4.9: - Generic get_user_pages_fast() calls __get_user_pages_fast() here, so make it pass write=1 - Various architectures have their own implementations of get_user_pages_fast(), so apply the corresponding change there - Adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ben Hutchings authored
This reverts commit 9bbd42e7, which was commit 17839856 upstream. The backport was incorrect and incomplete: * It forced the write flag on in the generic __get_user_pages_fast(), whereas only get_user_pages_fast() was supposed to do that. * It only fixed the generic RCU-based implementation used by arm, arm64, and powerpc. Before Linux 4.13, several other architectures had their own implementations: mips, s390, sparc, sh, and x86. This will be followed by a (hopefully) correct backport. Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miaoqian Lin authored
commit 99218cbf upstream. platform_get_irq() returns negative error number instead 0 on failure. And the doc of platform_get_irq() provides a usage example: int irq = platform_get_irq(pdev, 0); if (irq < 0) return irq; Fix the check of return value to catch errors correctly. Fixes: 11597885 ("i825xx: Move the Intel 82586/82593/82596 based drivers") Signed-off-by:
Miaoqian Lin <linmq006@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Matthias Schiffer authored
commit d8adf5b9 upstream. dtx_diff suggests to use <(...) syntax to pipe two inputs into it, but this has never worked: The /proc/self/fds/... paths passed by the shell will fail the `[ -f "${dtx}" ] && [ -r "${dtx}" ]` check in compile_to_dts, but even with this check removed, the function cannot work: hexdump will eat up the DTB magic, making the subsequent dtc call fail, as a pipe cannot be rewound. Simply remove this broken example, as there is already an alternative one that works fine. Fixes: 10eadc25 ("dtc: create tool to diff device trees") Signed-off-by:
Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by:
Frank Rowand <frank.rowand@sony.com> Signed-off-by:
Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220113081918.10387-1-matthias.schiffer@ew.tq-group.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sergey Shtylyov authored
commit 9deb48b5 upstream. The driver neglects to check the result of platform_get_irq_optional()'s call and blithely passes the negative error codes to devm_request_irq() (which takes *unsigned* IRQ #), causing it to fail with -EINVAL. Stop calling devm_request_irq() with the invalid IRQ #s. Fixes: 8562056f ("net: bcmgenet: request Wake-on-LAN interrupt") Signed-off-by:
Sergey Shtylyov <s.shtylyov@omp.ru> Acked-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kevin Bracey authored
commit fb80445c upstream. commit 56b765b7 ("htb: improved accuracy at high rates") broke "overhead X", "linklayer atm" and "mpu X" attributes. "overhead X" and "linklayer atm" have already been fixed. This restores the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping: tc class add ... htb rate X overhead 4 mpu 64 The code being fixed is used by htb, tbf and act_police. Cake has its own mpu handling. qdisc_calculate_pkt_len still uses the size table containing values adjusted for mpu by user space. iproute2 tc has always passed mpu into the kernel via a tc_ratespec structure, but the kernel never directly acted on it, merely stored it so that it could be read back by `tc class show`. Rather, tc would generate length-to-time tables that included the mpu (and linklayer) in their construction, and the kernel used those tables. Since v3.7, the tables were no longer used. Along with "mpu", this also broke "overhead" and "linklayer" which were fixed in 01cb71d2 ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84 ("net_sched: restore "linklayer atm" handling", v3.11). "overhead" was fixed by simply restoring use of tc_ratespec::overhead - this had originally been used by the kernel but was initially omitted from the new non-table-based calculations. "linklayer" had been handled in the table like "mpu", but the mode was not originally passed in tc_ratespec. The new implementation was made to handle it by getting new versions of tc to pass the mode in an extended tc_ratespec, and for older versions of tc the table contents were analysed at load time to deduce linklayer. As "mpu" has always been given to the kernel in tc_ratespec, accompanying the mpu-based table, we can restore system functionality with no userspace change by making the kernel act on the tc_ratespec value. Fixes: 56b765b7 ("htb: improved accuracy at high rates") Signed-off-by:
Kevin Bracey <kevin@bracey.fi> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Vimalkumar <j.vimal@gmail.com> Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fi Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tudor Ambarus authored
commit 912f7c6f upstream. The hardware channel next descriptor view structure contains just fields of 32 bits, while dma_addr_t can be of type u64 or u32 depending on CONFIG_ARCH_DMA_ADDR_T_64BIT. Force u32 to comply with what the hardware expects. Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by:
Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-11-tudor.ambarus@microchip.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tudor Ambarus authored
commit 1385eb4d upstream. AT_XDMAC_CNDC_NDVIEW_NDV3 was set even for AT_XDMAC_MBR_UBC_NDV2, because of the wrong bit handling. Fix it. Fixes: ee0fe35c ("dmaengine: xdmac: Handle descriptor's view 3 registers") Signed-off-by:
Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-10-tudor.ambarus@microchip.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tudor Ambarus authored
commit 5edc24ac upstream. It is desirable to do the prints without the lock held if possible, so move the print after the lock is released. Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by:
Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-4-tudor.ambarus@microchip.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tudor Ambarus authored
commit bccfb96b upstream. tx_submit is supposed to push the current transaction descriptor to a pending queue, waiting for issue_pending() to be called. issue_pending() must start the transfer, not tx_submit(), thus remove at_xdmac_start_xfer() from at_xdmac_tx_submit(). Clients of at_xdmac that assume that tx_submit() starts the transfer must be updated and call dma_async_issue_pending() if they miss to call it (one example is atmel_serial). As the at_xdmac_start_xfer() is now called only from at_xdmac_advance_work() when !at_xdmac_chan_is_enabled(), the at_xdmac_chan_is_enabled() check is no longer needed in at_xdmac_start_xfer(), thus remove it. Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by:
Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-2-tudor.ambarus@microchip.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guillaume Nault authored
commit a915deaa upstream. Mask the ECN bits before calling ip_route_output_ports(). The tos variable might be passed directly from an IPv4 header, so it may have the last ECN bit set. This interferes with the route lookup process as ip_route_output_key_hash() interpretes this bit specially (to restrict the route scope). Found by code inspection, compile tested only. Fixes: 804c2f3e ("libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route()") Signed-off-by:
Guillaume Nault <gnault@redhat.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
commit 2836615a upstream. When under stress, cleanup_net() can have to dismantle netns in big numbers. ops_exit_list() currently calls many helpers [1] that have no schedule point, and we can end up with soft lockups, particularly on hosts with many cpus. Even for moderate amount of netns processed by cleanup_net() this patch avoids latency spikes. [1] Some of these helpers like fib_sync_up() and fib_sync_down_dev() are very slow because net/ipv4/fib_semantics.c uses host-wide hash tables, and ifindex is used as the only input of two hash functions. ifindexes tend to be the same for all netns (lo.ifindex==1 per instance) This will be fixed in a separate patch. Fixes: 72ad937a ("net: Add support for batching network namespace cleanups") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Robert Hancock authored
commit aba57a82 upstream. The check for the number of available TX ring slots was off by 1 since a slot is required for the skb header as well as each fragment. This could result in overwriting a TX ring slot that was still in use. Fixes: 8a3b7a25 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by:
Robert Hancock <robert.hancock@calian.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Robert Hancock authored
commit b400c2f4 upstream. When resetting the device, wait for the PhyRstCmplt bit to be set in the interrupt status register before continuing initialization, to ensure that the core is actually ready. When using an external PHY, this also ensures we do not start trying to access the PHY while it is still in reset. The PHY reset is initiated by the core reset which is triggered just above, but remains asserted for 5ms after the core is reset according to the documentation. The MgtRdy bit could also be waited for, but unfortunately when using 7-series devices, the bit does not appear to work as documented (it seems to behave as some sort of link state indication and not just an indication the transceiver is ready) so it can't really be relied on for this purpose. Fixes: 8a3b7a25 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by:
Robert Hancock <robert.hancock@calian.com> Reviewed-by:
Andrew Lunn <andrew@lunn.ch> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
commit 9d6d7f1c upstream. wait_for_unix_gc() reads unix_tot_inflight & gc_in_progress without synchronization. Adds READ_ONCE()/WRITE_ONCE() and their associated comments to better document the intent. BUG: KCSAN: data-race in unix_inflight / wait_for_unix_gc write to 0xffffffff86e2b7c0 of 4 bytes by task 9380 on cpu 0: unix_inflight+0x1e8/0x260 net/unix/scm.c:63 unix_attach_fds+0x10c/0x1e0 net/unix/scm.c:121 unix_scm_to_skb net/unix/af_unix.c:1674 [inline] unix_dgram_sendmsg+0x679/0x16b0 net/unix/af_unix.c:1817 unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549 __do_sys_sendmmsg net/socket.c:2578 [inline] __se_sys_sendmmsg net/socket.c:2575 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffffffff86e2b7c0 of 4 bytes by task 9375 on cpu 1: wait_for_unix_gc+0x24/0x160 net/unix/garbage.c:196 unix_dgram_sendmsg+0x8e/0x16b0 net/unix/af_unix.c:1772 unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549 __do_sys_sendmmsg net/socket.c:2578 [inline] __se_sys_sendmmsg net/socket.c:2575 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000002 -> 0x00000004 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 9375 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 9915672d ("af_unix: limit unix_tot_inflight") Signed-off-by:
Eric Dumazet <edumazet@google.com> Reported-by:
syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220114164328.2038499-1-eric.dumazet@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miaoqian Lin authored
commit d24846a4 upstream. kobject_init_and_add() takes reference even when it fails. According to the doc of kobject_init_and_add(): If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Fix memory leak by calling kobject_put(). Fixes: 73f368cf ("Kobject: change drivers/parisc/pdc_stable.c to use kobject_init_and_add") Signed-off-by:
Miaoqian Lin <linmq006@gmail.com> Signed-off-by:
Helge Deller <deller@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tobias Waldekranz authored
commit 3f7c239c upstream. As reported by sparse: In the remove path, the driver would attempt to unmap its own priv pointer - instead of the io memory that it mapped in probe. Fixes: 9f35a734 ("net/fsl: introduce Freescale 10G MDIO driver") Signed-off-by:
Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by:
Andrew Lunn <andrew@lunn.ch> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tobias Waldekranz authored
commit 0d375d61 upstream. This block is used in (at least) T1024 and T1040, including their variants like T1023 etc. Fixes: d55ad296 ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan") Signed-off-by:
Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chengguang Xu authored
commit 8d1cfb88 upstream. There is a redundant ']' in the name of opcode IB_OPCODE_RC_SEND_MIDDLE, so just fix it. Fixes: 8700e3e7 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20211218112320.3558770-1-cgxu519@mykernel.net Signed-off-by:
Chengguang Xu <cgxu519@mykernel.net> Acked-by:
Zhu Yanjun <zyjzyj2000@gmail.com> Reviewed-by:
Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yixing Liu authored
commit 39d5534b upstream. It is more general for ARM device drivers to use the device attribute to map PCI BAR spaces. Fixes: 9a443537 ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/20211206133652.27476-1-liangwenpeng@huawei.com Signed-off-by:
Yixing Liu <liuyixing1@huawei.com> Signed-off-by:
Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian König authored
commit 4722f463 upstream. The return value was never initialized so the cleanup code executed when it isn't even necessary. Just add proper error handling. Fixes: ab50cb9d ("drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()") Signed-off-by:
Christian König <christian.koenig@amd.com> Tested-by:
Jan Stancek <jstancek@redhat.com> Tested-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Amir Goldstein authored
commit 775c5033 upstream. Commit 5d069dbe ("fuse: fix bad inode") replaced make_bad_inode() in fuse_iget() with a private implementation fuse_make_bad(). The private implementation fails to remove the bad inode from inode cache, so the retry loop with iget5_locked() finds the same bad inode and marks it bad forever. kmsg snip: [ ] rcu: INFO: rcu_sched self-detected stall on CPU ... [ ] ? bit_wait_io+0x50/0x50 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ? find_inode.isra.32+0x60/0xb0 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ilookup5_nowait+0x65/0x90 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ilookup5.part.36+0x2e/0x80 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ? fuse_inode_eq+0x20/0x20 [ ] iget5_locked+0x21/0x80 [ ] ? fuse_inode_eq+0x20/0x20 [ ] fuse_iget+0x96/0x1b0 Fixes: 5d069dbe ("fuse: fix bad inode") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miklos Szeredi authored
commit 5d069dbe upstream. Jan Kara's analysis of the syzbot report (edited): The reproducer opens a directory on FUSE filesystem, it then attaches dnotify mark to the open directory. After that a fuse_do_getattr() call finds that attributes returned by the server are inconsistent, and calls make_bad_inode() which, among other things does: inode->i_mode = S_IFREG; This then confuses dnotify which doesn't tear down its structures properly and eventually crashes. Avoid calling make_bad_inode() on a live inode: switch to a private flag on the fuse inode. Also add the test to ops which the bad_inode_ops would have caught. This bug goes back to the initial merge of fuse in 2.6.14... Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com> Tested-by:
Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> [bwh: Backported to 4.9: - Drop changes in fuse_dir_fsync(), fuse_readahead(), fuse_evict_inode() - In fuse_get_link(), return ERR_PTR(-EIO) for bad inodes - Convert some additional calls to is_bad_inode() - Adjust filename, context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Theodore Ts'o authored
commit 6eeaf88f upstream. We probably want to remove the indirect block to extents migration feature after a deprecation window, but until then, let's fix a potential data loss problem caused by the fact that we put the tmp_inode on the orphan list. In the unlikely case where we crash and do a journal recovery, the data blocks belonging to the inode being migrated are also represented in the tmp_inode on the orphan list --- and so its data blocks will get marked unallocated, and available for reuse. Instead, stop putting the tmp_inode on the oprhan list. So in the case where we crash while migrating the inode, we'll leak an inode, which is not a disaster. It will be easily fixed the next time we run fsck, and it's better than potentially having blocks getting claimed by two different files, and losing data as a result. Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reviewed-by:
Lukas Czerner <lczerner@redhat.com> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ye Bin authored
commit 380a0091 upstream. We got issue as follows when run syzkaller: [ 167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user [ 167.938306] EXT4-fs (loop0): Remounting filesystem read-only [ 167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0' [ 167.983601] ------------[ cut here ]------------ [ 167.984245] kernel BUG at fs/ext4/inode.c:847! [ 167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G B 5.16.0-rc5-next-20211217+ #123 [ 167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 167.988590] RIP: 0010:ext4_getblk+0x17e/0x504 [ 167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244 [ 167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282 [ 167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000 [ 167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66 [ 167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09 [ 167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8 [ 167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001 [ 167.997158] FS: 00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000 [ 167.998238] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0 [ 167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.001899] Call Trace: [ 168.002235] <TASK> [ 168.007167] ext4_bread+0xd/0x53 [ 168.007612] ext4_quota_write+0x20c/0x5c0 [ 168.010457] write_blk+0x100/0x220 [ 168.010944] remove_free_dqentry+0x1c6/0x440 [ 168.011525] free_dqentry.isra.0+0x565/0x830 [ 168.012133] remove_tree+0x318/0x6d0 [ 168.014744] remove_tree+0x1eb/0x6d0 [ 168.017346] remove_tree+0x1eb/0x6d0 [ 168.019969] remove_tree+0x1eb/0x6d0 [ 168.022128] qtree_release_dquot+0x291/0x340 [ 168.023297] v2_release_dquot+0xce/0x120 [ 168.023847] dquot_release+0x197/0x3e0 [ 168.024358] ext4_release_dquot+0x22a/0x2d0 [ 168.024932] dqput.part.0+0x1c9/0x900 [ 168.025430] __dquot_drop+0x120/0x190 [ 168.025942] ext4_clear_inode+0x86/0x220 [ 168.026472] ext4_evict_inode+0x9e8/0xa22 [ 168.028200] evict+0x29e/0x4f0 [ 168.028625] dispose_list+0x102/0x1f0 [ 168.029148] evict_inodes+0x2c1/0x3e0 [ 168.030188] generic_shutdown_super+0xa4/0x3b0 [ 168.030817] kill_block_super+0x95/0xd0 [ 168.031360] deactivate_locked_super+0x85/0xd0 [ 168.031977] cleanup_mnt+0x2bc/0x480 [ 168.033062] task_work_run+0xd1/0x170 [ 168.033565] do_exit+0xa4f/0x2b50 [ 168.037155] do_group_exit+0xef/0x2d0 [ 168.037666] __x64_sys_exit_group+0x3a/0x50 [ 168.038237] do_syscall_64+0x3b/0x90 [ 168.038751] entry_SYSCALL_64_after_hwframe+0x44/0xae In order to reproduce this problem, the following conditions need to be met: 1. Ext4 filesystem with no journal; 2. Filesystem image with incorrect quota data; 3. Abort filesystem forced by user; 4. umount filesystem; As in ext4_quota_write: ... if (EXT4_SB(sb)->s_journal && !handle) { ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)" " cancelled because transaction is not started", (unsigned long long)off, (unsigned long long)len); return -EIO; } ... We only check handle if NULL when filesystem has journal. There is need check handle if NULL even when filesystem has no journal. Signed-off-by:
Ye Bin <yebin10@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-