1. 18 Aug, 2021 40 commits
    • Ben Dai's avatar
      genirq/timings: Prevent potential array overflow in __irq_timings_store() · 2d2c6684
      Ben Dai authored
      commit b9cc7d8a upstream.
      
      When the interrupt interval is greater than 2 ^ PREDICTION_BUFFER_SIZE *
      PREDICTION_FACTOR us and less than 1s, the calculated index will be greater
      than the length of irqs->ema_time[]. Check the calculated index before
      using it to prevent array overflow.
      
      Fixes: 23aa3b9a
      
       ("genirq/timings: Encapsulate storing function")
      Signed-off-by: default avatarBen Dai <ben.dai@unisoc.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210425150903.25456-1-ben.dai9703@gmail.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d2c6684
    • Bixuan Cui's avatar
      genirq/msi: Ensure deactivation on teardown · 35575419
      Bixuan Cui authored
      commit dbbc9357 upstream.
      
      msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but
      msi_domain_free_irqs() does not enforce deactivation before tearing down
      the interrupts.
      
      This happens when PCI/MSI interrupts are set up and never used before being
      torn down again, e.g. in error handling pathes. The only place which cleans
      that up is the error handling path in msi_domain_alloc_irqs().
      
      Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs()
      to cure that.
      
      Fixes: f3b0946d
      
       ("genirq/msi: Make sure PCI MSIs are activated early")
      Signed-off-by: default avatarBixuan Cui <cuibixuan@huawei.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210518033117.78104-1-cuibixuan@huawei.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35575419
    • Babu Moger's avatar
      x86/resctrl: Fix default monitoring groups reporting · f0736bed
      Babu Moger authored
      commit 064855a6 upstream.
      
      Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
      getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes
      on the entire filesystem.
      
      Steps to reproduce:
      
        1. mount -t resctrl resctrl /sys/fs/resctrl/
      
        2. cd /sys/fs/resctrl/
      
        3. cat mon_data/mon_L3_00/mbm_total_bytes
           23189832
      
        4. Create sub monitor group:
        mkdir mon_groups/test1
      
        5. cat mon_data/mon_L3_00/mbm_total_bytes
           Unavailable
      
      When a new monitoring group is created, a new RMID is assigned to the
      new group. But the RMID is not active yet. When the events are read on
      the new RMID, it is expected to report the status as "Unavailable".
      
      When the user reads the events on the default monitoring group with
      multiple subgroups, the events on all subgroups are consolidated
      together. Currently, if any of the RMID reads report as "Unavailable",
      then everything will be reported as "Unavailable".
      
      Fix the issue by discarding the "Unavailable" reads and reporting all
      the successful RMID reads. This is not a problem on Intel systems as
      Intel reports 0 on Inactive RMIDs.
      
      Fixes: d89b7379
      
       ("x86/intel_rdt/cqm: Add mon_data")
      Reported-by: default avatarPaweł Szulik <pawel.szulik@intel.com>
      Signed-off-by: default avatarBabu Moger <Babu.Moger@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311
      Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0736bed
    • Thomas Gleixner's avatar
      x86/ioapic: Force affinity setup before startup · 25216ed9
      Thomas Gleixner authored
      commit 0c0e37dc upstream.
      
      The IO/APIC cannot handle interrupt affinity changes safely after startup
      other than from an interrupt handler. The startup sequence in the generic
      interrupt code violates that assumption.
      
      Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
      the default interrupt setting happens before the interrupt is started up
      for the first time.
      
      Fixes: 18404756
      
       ("genirq: Expose default irq affinity mask (take 3)")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25216ed9
    • Thomas Gleixner's avatar
      x86/msi: Force affinity setup before startup · 19fb5dab
      Thomas Gleixner authored
      commit ff363f48 upstream.
      
      The X86 MSI mechanism cannot handle interrupt affinity changes safely after
      startup other than from an interrupt handler, unless interrupt remapping is
      enabled. The startup sequence in the generic interrupt code violates that
      assumption.
      
      Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
      the default interrupt setting happens before the interrupt is started up
      for the first time.
      
      While the interrupt remapping MSI chip does not require this, there is no
      point in treating it differently as this might spare an interrupt to a CPU
      which is not in the default affinity mask.
      
      For the non-remapping case go to the direct write path when the interrupt
      is not yet started similar to the not yet activated case.
      
      Fixes: 18404756
      
       ("genirq: Expose default irq affinity mask (take 3)")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19fb5dab
    • Thomas Gleixner's avatar
      genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP · 4e52a4fe
      Thomas Gleixner authored
      commit 826da771 upstream.
      
      X86 IO/APIC and MSI interrupts (when used without interrupts remapping)
      require that the affinity setup on startup is done before the interrupt is
      enabled for the first time as the non-remapped operation mode cannot safely
      migrate enabled interrupts from arbitrary contexts. Provide a new irq chip
      flag which allows affected hardware to request this.
      
      This has to be opt-in because there have been reports in the past that some
      interrupt chips cannot handle affinity setting before startup.
      
      Fixes: 18404756
      
       ("genirq: Expose default irq affinity mask (take 3)")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e52a4fe
    • Randy Dunlap's avatar
      x86/tools: Fix objdump version check again · 2a28b523
      Randy Dunlap authored
      [ Upstream commit 839ad22f ]
      
      Skip (omit) any version string info that is parenthesized.
      
      Warning: objdump version 15) is older than 2.19
      Warning: Skipping posttest.
      
      where 'objdump -v' says:
      GNU objdump (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18
      
      Fixes: 8bee738b
      
       ("x86: Fix objdump version check in chkobjdump.awk for different formats.")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20210731000146.2720-1-rdunlap@infradead.org
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a28b523
    • Pu Lehui's avatar
      powerpc/kprobes: Fix kprobe Oops happens in booke · 4acc0d98
      Pu Lehui authored
      [ Upstream commit 43e8f760 ]
      
      When using kprobe on powerpc booke series processor, Oops happens
      as show bellow:
      
      / # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events
      / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
      / # sleep 1
      [   50.076730] Oops: Exception in kernel mode, sig: 5 [#1]
      [   50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
      [   50.077221] Modules linked in:
      [   50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524 #21
      [   50.077887] NIP:  c0b9c4e0 LR: c00ebecc CTR: 00000000
      [   50.078067] REGS: c3883de0 TRAP: 0700   Not tainted (5.14.0-rc4-00022-g251a1524)
      [   50.078349] MSR:  00029000 <CE,EE,ME>  CR: 24000228  XER: 20000000
      [   50.078675]
      [   50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001
      [   50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4
      [   50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000
      [   50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000
      [   50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190
      [   50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0
      [   50.080638] Call Trace:
      [   50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable)
      [   50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110
      [   50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28
      [   50.081541] --- interrupt: c00 at 0x100a4d08
      [   50.081749] NIP:  100a4d08 LR: 101b5234 CTR: 00000003
      [   50.081931] REGS: c3883f50 TRAP: 0c00   Not tainted (5.14.0-rc4-00022-g251a1524)
      [   50.082183] MSR:  0002f902 <CE,EE,PR,FP,ME>  CR: 24000222  XER: 00000000
      [   50.082457]
      [   50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff
      [   50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4
      [   50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000
      [   50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8
      [   50.083789] NIP [100a4d08] 0x100a4d08
      [   50.083917] LR [101b5234] 0x101b5234
      [   50.084042] --- interrupt: c00
      [   50.084238] Instruction dump:
      [   50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010
      [   50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048
      [   50.085487] ---[ end trace f6fffe98e2fa8f3e ]---
      [   50.085678]
      Trace/breakpoint trap
      
      There is no real mode for booke arch and the MMU translation is
      always on. The corresponding MSR_IS/MSR_DS bit in booke is used
      to switch the address space, but not for real mode judgment.
      
      Fixes: 21f8b2fa
      
       ("powerpc/kprobes: Ignore traps that happened in real mode")
      Signed-off-by: default avatarPu Lehui <pulehui@huawei.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4acc0d98
    • Ard Biesheuvel's avatar
      efi/libstub: arm64: Relax 2M alignment again for relocatable kernels · 015e2c90
      Ard Biesheuvel authored
      [ Upstream commit 3a262423 ]
      
      Commit 82046702 ("efi/libstub/arm64: Replace 'preferred' offset with
      alignment check") simplified the way the stub moves the kernel image
      around in memory before booting it, given that a relocatable image does
      not need to be copied to a 2M aligned offset if it was loaded on a 64k
      boundary by EFI.
      
      Commit d32de913 ("efi/arm64: libstub: Deal gracefully with
      EFI_RNG_PROTOCOL failure") inadvertently defeated this logic by
      overriding the value of efi_nokaslr if EFI_RNG_PROTOCOL is not
      available, which was mistaken by the loader logic as an explicit request
      on the part of the user to disable KASLR and any associated relocation
      of an Image not loaded on a 2M boundary.
      
      So let's reinstate this functionality, by capturing the value of
      efi_nokaslr at function entry to choose the minimum alignment.
      
      Fixes: d32de913
      
       ("efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure")
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      015e2c90
    • Ard Biesheuvel's avatar
      efi/libstub: arm64: Force Image reallocation if BSS was not reserved · feb4a01d
      Ard Biesheuvel authored
      [ Upstream commit 5b94046e ]
      
      Distro versions of GRUB replace the usual LoadImage/StartImage calls
      used to load the kernel image with some local code that fails to honor
      the allocation requirements described in the PE/COFF header, as it
      does not account for the image's BSS section at all: it fails to
      allocate space for it, and fails to zero initialize it.
      
      Since the EFI stub itself is allocated in the .init segment, which is
      in the middle of the image, its BSS section is not impacted by this,
      and the main consequence of this omission is that the BSS section may
      overlap with memory regions that are already used by the firmware.
      
      So let's warn about this condition, and force image reallocation to
      occur in this case, which works around the problem.
      
      Fixes: 82046702
      
       ("efi/libstub/arm64: Replace 'preferred' offset with alignment check")
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      feb4a01d
    • Benjamin Herrenschmidt's avatar
      arm64: efi: kaslr: Fix occasional random alloc (and boot) failure · afcb84e6
      Benjamin Herrenschmidt authored
      [ Upstream commit 4152433c
      
       ]
      
      The EFI stub random allocator used for kaslr on arm64 has a subtle
      bug. In function get_entry_num_slots() which counts the number of
      possible allocation "slots" for the image in a given chunk of free
      EFI memory, "last_slot" can become negative if the chunk is smaller
      than the requested allocation size.
      
      The test "if (first_slot > last_slot)" doesn't catch it because
      both first_slot and last_slot are unsigned.
      
      I chose not to make them signed to avoid problems if this is ever
      used on architectures where there are meaningful addresses with the
      top bit set. Instead, fix it with an additional test against the
      allocation size.
      
      This can cause a boot failure in addition to a loss of randomisation
      due to another bug in the arm64 stub fixed separately.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Fixes: 2ddbfc81
      
       ("efi: stub: add implementation of efi_random_alloc()")
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      afcb84e6
    • Xie Yongji's avatar
      nbd: Aovid double completion of a request · e0ee8d9c
      Xie Yongji authored
      [ Upstream commit cddce011 ]
      
      There is a race between iterating over requests in
      nbd_clear_que() and completing requests in recv_work(),
      which can lead to double completion of a request.
      
      To fix it, flush the recv worker before iterating over
      the requests and don't abort the completed request
      while iterating.
      
      Fixes: 96d97e17
      
       ("nbd: clear_sock on netlink disconnect")
      Reported-by: default avatarJiang Yadong <jiangyadong@bytedance.com>
      Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e0ee8d9c
    • Longpeng(Mike)'s avatar
      vsock/virtio: avoid potential deadlock when vsock device remove · f5cefe9a
      Longpeng(Mike) authored
      [ Upstream commit 49b0b6ff ]
      
      There's a potential deadlock case when remove the vsock device or
      process the RESET event:
      
        vsock_for_each_connected_socket:
            spin_lock_bh(&vsock_table_lock) ----------- (1)
            ...
                virtio_vsock_reset_sock:
                    lock_sock(sk) --------------------- (2)
            ...
            spin_unlock_bh(&vsock_table_lock)
      
      lock_sock() may do initiative schedule when the 'sk' is owned by
      other thread at the same time, we would receivce a warning message
      that "scheduling while atomic".
      
      Even worse, if the next task (selected by the scheduler) try to
      release a 'sk', it need to request vsock_table_lock and the deadlock
      occur, cause the system into softlockup state.
        Call trace:
         queued_spin_lock_slowpath
         vsock_remove_bound
         vsock_remove_sock
         virtio_transport_release
         __vsock_release
         vsock_release
         __sock_release
         sock_close
         __fput
         ____fput
      
      So we should not require sk_lock in this case, just like the behavior
      in vhost_vsock or vmci.
      
      Fixes: 0ea9e1d3
      
       ("VSOCK: Introduce virtio_transport.ko")
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarLongpeng(Mike) <longpeng2@huawei.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f5cefe9a
    • Maximilian Heyne's avatar
      xen/events: Fix race in set_evtchn_to_irq · dff830e5
      Maximilian Heyne authored
      [ Upstream commit 88ca2521
      
       ]
      
      There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
      mapping are lazily allocated in this function. The check whether the row
      is already present and the row initialization is not synchronized. Two
      threads can at the same time allocate a new row for evtchn_to_irq and
      add the irq mapping to the their newly allocated row. One thread will
      overwrite what the other has set for evtchn_to_irq[row] and therefore
      the irq mapping is lost. This will trigger a BUG_ON later in
      bind_evtchn_to_cpu:
      
        INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802
        INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002)
        INFO: nvme nvme77: 1/0/0 default/read/poll queues
        CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
        WARN: invalid opcode: 0000 [#1] SMP NOPTI
        WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
        WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
        WARN: Call Trace:
        WARN:  set_affinity_irq+0x121/0x150
        WARN:  irq_do_set_affinity+0x37/0xe0
        WARN:  irq_setup_affinity+0xf6/0x170
        WARN:  irq_startup+0x64/0xe0
        WARN:  __setup_irq+0x69e/0x740
        WARN:  ? request_threaded_irq+0xad/0x160
        WARN:  request_threaded_irq+0xf5/0x160
        WARN:  ? nvme_timeout+0x2f0/0x2f0 [nvme]
        WARN:  pci_request_irq+0xa9/0xf0
        WARN:  ? pci_alloc_irq_vectors_affinity+0xbb/0x130
        WARN:  queue_request_irq+0x4c/0x70 [nvme]
        WARN:  nvme_reset_work+0x82d/0x1550 [nvme]
        WARN:  ? check_preempt_wakeup+0x14f/0x230
        WARN:  ? check_preempt_curr+0x29/0x80
        WARN:  ? nvme_irq_check+0x30/0x30 [nvme]
        WARN:  process_one_work+0x18e/0x3c0
        WARN:  worker_thread+0x30/0x3a0
        WARN:  ? process_one_work+0x3c0/0x3c0
        WARN:  kthread+0x113/0x130
        WARN:  ? kthread_park+0x90/0x90
        WARN:  ret_from_fork+0x3a/0x50
      
      This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
      will be set only once. The row is now cleared before writing it to
      evtchn_to_irq in order to not create a race once the row is visible for
      other threads.
      
      While at it, do not require the page to be zeroed, because it will be
      overwritten with -1's in clear_evtchn_to_irq_row anyway.
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Fixes: d0b075ff ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated")
      Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de
      
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dff830e5
    • Matt Roper's avatar
      drm/i915: Only access SFC_DONE when media domain is not fused off · 65395b05
      Matt Roper authored
      [ Upstream commit 24d032e2 ]
      
      The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6
      forcewake domain and is not accessible if the vdbox in that domain is
      fused off and the forcewake is not initialized.
      
      This mistake went unnoticed because until recently we were using the
      wrong register offset for the SFC_DONE register; once the register
      offset was corrected, we started hitting errors like
      
        <4> [544.989065] i915 0000:cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000
      
      on parts with fused-off vdbox engines.
      
      Fixes: e50dbdbf ("drm/i915/tgl: Add SFC instdone to error state")
      Fixes: 9c9c6d0a
      
       ("drm/i915: Correct SFC_DONE register offset")
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210806174130.1058960-1-matthew.d.roper@intel.com
      
      Reviewed-by: default avatarJosé Roberto de Souza <jose.souza@intel.com>
      (cherry picked from commit c5589bb5
      
      )
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      [Changed Fixes tag to match the cherry-picked 82929a21
      
      ]
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      65395b05
    • Eric Dumazet's avatar
      net: igmp: increase size of mr_ifc_count · 4344440d
      Eric Dumazet authored
      [ Upstream commit b69dd5b3 ]
      
      Some arches support cmpxchg() on 4-byte and 8-byte only.
      Increase mr_ifc_count width to 32bit to fix this problem.
      
      Fixes: 4a2b285e
      
       ("net: igmp: fix data-race in igmp_ifc_timer_expire()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4344440d
    • Neal Cardwell's avatar
      tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets · 696afe28
      Neal Cardwell authored
      [ Upstream commit 6de035fe ]
      
      Currently if BBR congestion control is initialized after more than 2B
      packets have been delivered, depending on the phase of the
      tp->delivered counter the tracking of BBR round trips can get stuck.
      
      The bug arises because if tp->delivered is between 2^31 and 2^32 at
      the time the BBR congestion control module is initialized, then the
      initialization of bbr->next_rtt_delivered to 0 will cause the logic to
      believe that the end of the round trip is still billions of packets in
      the future. More specifically, the following check will fail
      repeatedly:
      
        !before(rs->prior_delivered, bbr->next_rtt_delivered)
      
      and thus the connection will take up to 2B packets delivered before
      that check will pass and the connection will set:
      
        bbr->round_start = 1;
      
      This could cause many mechanisms in BBR to fail to trigger, for
      example bbr_check_full_bw_reached() would likely never exit STARTUP.
      
      This bug is 5 years old and has not been observed, and as a practical
      matter this would likely rarely trigger, since it would require
      transferring at least 2B packets, or likely more than 3 terabytes of
      data, before switching congestion control algorithms to BBR.
      
      This patch is a stable candidate for kernels as far back as v4.9,
      when tcp_bbr.c was added.
      
      Fixes: 0f8782ea
      
       ("tcp_bbr: add BBR congestion control")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarKevin Yang <yyd@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      696afe28
    • Willy Tarreau's avatar
      net: linkwatch: fix failure to restore device state across suspend/resume · 8976606c
      Willy Tarreau authored
      [ Upstream commit 6922110d ]
      
      After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed
      that my Ethernet port to which a bond and a VLAN interface are attached
      appeared to remain up after resuming from suspend with the cable unplugged
      (and that problem still persists with 5.10-LTS).
      
      It happens that the following happens:
      
        - the network driver (e1000e here) prepares to suspend, calls e1000e_down()
          which calls netif_carrier_off() to signal that the link is going down.
        - netif_carrier_off() adds a link_watch event to the list of events for
          this device
        - the device is completely stopped.
        - the machine suspends
        - the cable is unplugged and the machine brought to another location
        - the machine is resumed
        - the queued linkwatch events are processed for the device
        - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events
          are silently dropped
        - the device is resumed with its link down
        - the upper VLAN and bond interfaces are never notified that the link had
          been turned down and remain up
        - the only way to provoke a change is to physically connect the machine
          to a port and possibly unplug it.
      
      The state after resume looks like this:
        $ ip -br li | egrep 'bond|eth'
        bond0            UP             e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>
        eth0             DOWN           e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP>
        eth0.2@eth0      UP             e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>
      
      Placing an explicit call to netdev_state_change() either in the suspend
      or the resume code in the NIC driver worked around this but the solution
      is not satisfying.
      
      The issue in fact really is in link_watch that loses events while it
      ought not to. It happens that the test for the device being present was
      added by commit 124eee3f ("net: linkwatch: add check for netdevice
      being present to linkwatch_do_dev") in 4.20 to avoid an access to
      devices that are not present.
      
      Instead of dropping events, this patch proceeds slightly differently by
      postponing their handling so that they happen after the device is fully
      resumed.
      
      Fixes: 124eee3f ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev")
      Link: https://lists.openwall.net/netdev/2018/03/15/62
      
      
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8976606c
    • Yang Yingliang's avatar
      net: bridge: fix memleak in br_add_if() · 4c2af901
      Yang Yingliang authored
      [ Upstream commit 519133de ]
      
      I got a memleak report:
      
      BUG: memory leak
      unreferenced object 0x607ee521a658 (size 240):
      comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s)
      hex dump (first 32 bytes, cpu 1):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      backtrace:
      [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693
      [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline]
      [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611
      [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline]
      [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487
      [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457
      [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488
      [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550
      [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504
      [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
      [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340
      [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929
      [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline]
      [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674
      [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350
      [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404
      [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433
      [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47
      [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      On error path of br_add_if(), p->mcast_stats allocated in
      new_nbp() need be freed, or it will be leaked.
      
      Fixes: 1080ab95
      
       ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4c2af901
    • Nikolay Aleksandrov's avatar
      net: bridge: fix flags interpretation for extern learn fdb entries · f333a5ca
      Nikolay Aleksandrov authored
      [ Upstream commit 45a68787 ]
      
      Ignore fdb flags when adding port extern learn entries and always set
      BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
      closest to the behaviour we had before and avoids breaking any use cases
      which were allowed.
      
      This patch fixes iproute2 calls which assume NUD_PERMANENT and were
      allowed before, example:
      $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
      
      Extern learn entries are allowed to roam, but do not expire, so static
      or dynamic flags make no sense for them.
      
      Also add a comment for future reference.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Fixes: 0541a629
      
       ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry")
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f333a5ca
    • Vladimir Oltean's avatar
      net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry · e3b949b8
      Vladimir Oltean authored
      [ Upstream commit 0541a629 ]
      
      Currently it is possible to add broken extern_learn FDB entries to the
      bridge in two ways:
      
      1. Entries pointing towards the bridge device that are not local/permanent:
      
      ip link add br0 type bridge
      bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static
      
      2. Entries pointing towards the bridge device or towards a port that
      are marked as local/permanent, however the bridge does not process the
      'permanent' bit in any way, therefore they are recorded as though they
      aren't permanent:
      
      ip link add br0 type bridge
      bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent
      
      Since commit 52e4bec1 ("net: bridge: switchdev: treat local FDBs the
      same as entries towards the bridge"), these incorrect FDB entries can
      even trigger NULL pointer dereferences inside the kernel.
      
      This is because that commit made the assumption that all FDB entries
      that are not local/permanent have a valid destination port. For context,
      local / permanent FDB entries either have fdb->dst == NULL, and these
      point towards the bridge device and are therefore local and not to be
      used for forwarding, or have fdb->dst == a net_bridge_port structure
      (but are to be treated in the same way, i.e. not for forwarding).
      
      That assumption _is_ correct as long as things are working correctly in
      the bridge driver, i.e. we cannot logically have fdb->dst == NULL under
      any circumstance for FDB entries that are not local. However, the
      extern_learn code path where FDB entries are managed by a user space
      controller show that it is possible for the bridge kernel driver to
      misinterpret the NUD flags of an entry transmitted by user space, and
      end up having fdb->dst == NULL while not being a local entry. This is
      invalid and should be rejected.
      
      Before, the two commands listed above both crashed the kernel in this
      check from br_switchdev_fdb_notify:
      
      	struct net_device *dev = info.is_local ? br->dev : dst->dev;
      
      info.is_local == false, dst == NULL.
      
      After this patch, the invalid entry added by the first command is
      rejected:
      
      ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static; ip link del br0
      Error: bridge: FDB entry towards bridge must be permanent.
      
      and the valid entry added by the second command is properly treated as a
      local address and does not crash br_switchdev_fdb_notify anymore:
      
      ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent; ip link del br0
      
      Fixes: eb100e0e
      
       ("net: bridge: allow to add externally learned entries from user-space")
      Reported-by: syzbot+9ba1174359adba5a5b7c@syzkaller.appspotmail.com
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210801231730.7493-1-vladimir.oltean@nxp.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e3b949b8
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 1cad01ac
      Vladimir Oltean authored
      [ Upstream commit 21b52fed ]
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 291d1e72
      
       ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1cad01ac
    • Vladimir Oltean's avatar
      net: dsa: lantiq: fix broken backpressure in .port_fdb_dump · 56cc3408
      Vladimir Oltean authored
      [ Upstream commit 871a73a1 ]
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 58c59ef9
      
       ("net: dsa: lantiq: Add Forwarding Database access")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      56cc3408
    • Vladimir Oltean's avatar
      net: dsa: lan9303: fix broken backpressure in .port_fdb_dump · f7720b35
      Vladimir Oltean authored
      [ Upstream commit ada2fee1 ]
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: ab335349
      
       ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f7720b35
    • Eric Dumazet's avatar
      net: igmp: fix data-race in igmp_ifc_timer_expire() · 24e1b7db
      Eric Dumazet authored
      [ Upstream commit 4a2b285e ]
      
      Fix the data-race reported by syzbot [1]
      Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count
      while another change just occured from another context.
      
      in_dev->mr_ifc_count is only 8bit wide, so the race had little
      consequences.
      
      [1]
      BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire
      
      write to 0xffff8881051e3062 of 1 bytes by task 12547 on cpu 0:
       igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821
       igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356
       ____ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461
       __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199
       ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline]
       ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423
       tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657
       sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362
       __sys_setsockopt+0x18f/0x200 net/socket.c:2159
       __do_sys_setsockopt net/socket.c:2170 [inline]
       __se_sys_setsockopt net/socket.c:2167 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff8881051e3062 of 1 bytes by interrupt on cpu 1:
       igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808
       call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419
       expire_timers+0x135/0x250 kernel/time/timer.c:1464
       __run_timers+0x358/0x420 kernel/time/timer.c:1732
       run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745
       __do_softirq+0x12c/0x26e kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636
       sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
       console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646
       vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174
       vprintk_default+0x22/0x30 kernel/printk/printk.c:2185
       vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392
       printk+0x62/0x87 kernel/printk/printk.c:2216
       selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041
       security_netlink_send+0x42/0x90 security/security.c:2070
       netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:703 [inline]
       sock_sendmsg net/socket.c:723 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
       ___sys_sendmsg net/socket.c:2446 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
       __do_sys_sendmsg net/socket.c:2484 [inline]
       __se_sys_sendmsg net/socket.c:2482 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x01 -> 0x02
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      24e1b7db
    • Takeshi Misawa's avatar
      net: Fix memory leak in ieee802154_raw_deliver · 69b13167
      Takeshi Misawa authored
      [ Upstream commit 1090340f ]
      
      If IEEE-802.15.4-RAW is closed before receive skb, skb is leaked.
      Fix this, by freeing sk_receive_queue in sk->sk_destruct().
      
      syzbot report:
      BUG: memory leak
      unreferenced object 0xffff88810f644600 (size 232):
        comm "softirq", pid 0, jiffies 4294967032 (age 81.270s)
        hex dump (first 32 bytes):
          10 7d 4b 12 81 88 ff ff 10 7d 4b 12 81 88 ff ff  .}K......}K.....
          00 00 00 00 00 00 00 00 40 7c 4b 12 81 88 ff ff  ........@|K.....
        backtrace:
          [<ffffffff83651d4a>] skb_clone+0xaa/0x2b0 net/core/skbuff.c:1496
          [<ffffffff83fe1b80>] ieee802154_raw_deliver net/ieee802154/socket.c:369 [inline]
          [<ffffffff83fe1b80>] ieee802154_rcv+0x100/0x340 net/ieee802154/socket.c:1070
          [<ffffffff8367cc7a>] __netif_receive_skb_one_core+0x6a/0xa0 net/core/dev.c:5384
          [<ffffffff8367cd07>] __netif_receive_skb+0x27/0xa0 net/core/dev.c:5498
          [<ffffffff8367cdd9>] netif_receive_skb_internal net/core/dev.c:5603 [inline]
          [<ffffffff8367cdd9>] netif_receive_skb+0x59/0x260 net/core/dev.c:5662
          [<ffffffff83fe6302>] ieee802154_deliver_skb net/mac802154/rx.c:29 [inline]
          [<ffffffff83fe6302>] ieee802154_subif_frame net/mac802154/rx.c:102 [inline]
          [<ffffffff83fe6302>] __ieee802154_rx_handle_packet net/mac802154/rx.c:212 [inline]
          [<ffffffff83fe6302>] ieee802154_rx+0x612/0x620 net/mac802154/rx.c:284
          [<ffffffff83fe59a6>] ieee802154_tasklet_handler+0x86/0xa0 net/mac802154/main.c:35
          [<ffffffff81232aab>] tasklet_action_common.constprop.0+0x5b/0x100 kernel/softirq.c:557
          [<ffffffff846000bf>] __do_softirq+0xbf/0x2ab kernel/softirq.c:345
          [<ffffffff81232f4c>] do_softirq kernel/softirq.c:248 [inline]
          [<ffffffff81232f4c>] do_softirq+0x5c/0x80 kernel/softirq.c:235
          [<ffffffff81232fc1>] __local_bh_enable_ip+0x51/0x60 kernel/softirq.c:198
          [<ffffffff8367a9a4>] local_bh_enable include/linux/bottom_half.h:32 [inline]
          [<ffffffff8367a9a4>] rcu_read_unlock_bh include/linux/rcupdate.h:745 [inline]
          [<ffffffff8367a9a4>] __dev_queue_xmit+0x7f4/0xf60 net/core/dev.c:4221
          [<ffffffff83fe2db4>] raw_sendmsg+0x1f4/0x2b0 net/ieee802154/socket.c:295
          [<ffffffff8363af16>] sock_sendmsg_nosec net/socket.c:654 [inline]
          [<ffffffff8363af16>] sock_sendmsg+0x56/0x80 net/socket.c:674
          [<ffffffff8363deec>] __sys_sendto+0x15c/0x200 net/socket.c:1977
          [<ffffffff8363dfb6>] __do_sys_sendto net/socket.c:1989 [inline]
          [<ffffffff8363dfb6>] __se_sys_sendto net/socket.c:1985 [inline]
          [<ffffffff8363dfb6>] __x64_sys_sendto+0x26/0x30 net/socket.c:1985
      
      Fixes: 9ec76716
      
       ("net: add IEEE 802.15.4 socket family implementation")
      Reported-and-tested-by: syzbot+1f68113fa907bf0695a8@syzkaller.appspotmail.com
      Signed-off-by: default avatarTakeshi Misawa <jeliantsurux@gmail.com>
      Acked-by: default avatarAlexander Aring <aahringo@redhat.com>
      Link: https://lore.kernel.org/r/20210805075414.GA15796@DESKTOP
      
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69b13167
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Fix VLAN filtering · dbfaf7a6
      Ben Hutchings authored
      [ Upstream commit 16484413 ]
      
      Currently ksz8_port_vlan_filtering() sets or clears the VLAN Enable
      hardware flag.  That controls discarding of packets with a VID that
      has not been enabled for any port on the switch.
      
      Since it is a global flag, set the dsa_switch::vlan_filtering_is_global
      flag so that the DSA core understands this can't be controlled per
      port.
      
      When VLAN filtering is enabled, the switch should also discard packets
      with a VID that's not enabled on the ingress port.  Set or clear each
      external port's VLAN Ingress Filter flag in ksz8_port_vlan_filtering()
      to make that happen.
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dbfaf7a6
    • Ben Hutchings's avatar
      net: dsa: microchip: Fix ksz_read64() · ccc1fe82
      Ben Hutchings authored
      [ Upstream commit c34f674c ]
      
      ksz_read64() currently does some dubious byte-swapping on the two
      halves of a 64-bit register, and then only returns the high bits.
      Replace this with a straightforward expression.
      
      Fixes: e66f840c
      
       ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ccc1fe82
    • Christian Hewitt's avatar
      drm/meson: fix colour distortion from HDR set during vendor u-boot · 558092b8
      Christian Hewitt authored
      [ Upstream commit bf33677a ]
      
      Add support for the OSD1 HDR registers so meson DRM can handle the HDR
      properties set by Amlogic u-boot on G12A and newer devices which result
      in blue/green/pink colour distortion to display output.
      
      This takes the original patch submissions from Mathias [0] and [1] with
      corrections for formatting and the missing description and attribution
      needed for merge.
      
      [0] https://lore.kernel.org/linux-amlogic/59dfd7e6-fc91-3d61-04c4-94e078a3188c@baylibre.com/T/
      [1] https://lore.kernel.org/linux-amlogic/CAOKfEHBx_fboUqkENEMd-OC-NSrf46nto+vDLgvgttzPe99kXg@mail.gmail.com/T/#u
      
      Fixes: 72888394
      
       ("drm/meson: Add G12A Support for VIU setup")
      Suggested-by: default avatarMathias Steiger <mathias.steiger@googlemail.com>
      Signed-off-by: default avatarChristian Hewitt <christianshewitt@gmail.com>
      Tested-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Tested-by: default avatarPhilip Milev <milev.philip@gmail.com>
      [narmsrong: adding missing space on second tested-by tag]
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210806094005.7136-1-christianshewitt@gmail.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      558092b8
    • Aya Levin's avatar
      net/mlx5: Fix return value from tracer initialization · 6e188646
      Aya Levin authored
      [ Upstream commit bd37c288 ]
      
      Check return value of mlx5_fw_tracer_start(), set error path and fix
      return value of mlx5_fw_tracer_init() accordingly.
      
      Fixes: c71ad41c
      
       ("net/mlx5: FW tracer, events handling")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6e188646
    • Shay Drory's avatar
      net/mlx5: Synchronize correct IRQ when destroying CQ · 303ba011
      Shay Drory authored
      [ Upstream commit 563476ae ]
      
      The CQ destroy is performed based on the IRQ number that is stored in
      cq->irqn. That number wasn't set explicitly during CQ creation and as
      expected some of the API users of mlx5_core_create_cq() forgot to update
      it.
      
      This caused to wrong synchronization call of the wrong IRQ with a number
      0 instead of the real one.
      
      As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
      update all users accordingly.
      
      Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
      Fixes: ef1659ad
      
       ("IB/mlx5: Add DEVX support for CQ events")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      303ba011
    • Guillaume Nault's avatar
      bareudp: Fix invalid read beyond skb's linear data · 00a0c11d
      Guillaume Nault authored
      [ Upstream commit 143a8526 ]
      
      Data beyond the UDP header might not be part of the skb's linear data.
      Use skb_copy_bits() instead of direct access to skb->data+X, so that
      we read the correct bytes even on a fragmented skb.
      
      Fixes: 4b5f6723
      
       ("net: Special handling for IP & MPLS.")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/7741c46545c6ef02e70c80a9b32814b22d9616b3.1628264975.git.gnault@redhat.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      00a0c11d
    • Roi Dayan's avatar
      psample: Add a fwd declaration for skbuff · 30b1fc47
      Roi Dayan authored
      [ Upstream commit beb7f2de ]
      
      Without this there is a warning if source files include psample.h
      before skbuff.h or doesn't include it at all.
      
      Fixes: 6ae0a628
      
       ("net: Introduce psample, a new genetlink channel for packet sampling")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Link: https://lore.kernel.org/r/20210808065242.1522535-1-roid@nvidia.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      30b1fc47
    • Md Fahad Iqbal Polash's avatar
      iavf: Set RSS LUT and key in reset handle path · b3f0b170
      Md Fahad Iqbal Polash authored
      [ Upstream commit a7550f8b ]
      
      iavf driver should set RSS LUT and key unconditionally in reset
      path. Currently, the driver does not do that. This patch fixes
      this issue.
      
      Fixes: 2c86ac3c
      
       ("i40evf: create a generic config RSS function")
      Signed-off-by: default avatarMd Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b3f0b170
    • Brett Creeley's avatar
      ice: don't remove netdev->dev_addr from uc sync list · a6192bae
      Brett Creeley authored
      [ Upstream commit 3ba7f53f ]
      
      In some circumstances, such as with bridging, it's possible that the
      stack will add the device's own MAC address to its unicast address list.
      
      If, later, the stack deletes this address, the driver will receive a
      request to remove this address.
      
      The driver stores its current MAC address as part of the VSI MAC filter
      list instead of separately. So, this causes a problem when the device's
      MAC address is deleted unexpectedly, which results in traffic failure in
      some cases.
      
      The following configuration steps will reproduce the previously
      mentioned problem:
      
      > ip link set eth0 up
      > ip link add dev br0 type bridge
      > ip link set br0 up
      > ip addr flush dev eth0
      > ip link set eth0 master br0
      > echo 1 > /sys/class/net/br0/bridge/vlan_filtering
      > modprobe -r veth
      > modprobe -r bridge
      > ip addr add 192.168.1.100/24 dev eth0
      
      The following ping command fails due to the netdev->dev_addr being
      deleted when removing the bridge module.
      > ping <link partner>
      
      Fix this by making sure to not delete the netdev->dev_addr during MAC
      address sync. After fixing this issue it was noticed that the
      netdev_warn() in .set_mac was overly verbose, so make it at
      netdev_dbg().
      
      Also, there is a possibility of a race condition between .set_mac and
      .set_rx_mode. Fix this by calling netif_addr_lock_bh() and
      netif_addr_unlock_bh() on the device's netdev when the netdev->dev_addr
      is going to be updated in .set_mac.
      
      Fixes: e94d4478
      
       ("ice: Implement filter sync, NDO operations and bump version")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarLiang Li <liali@redhat.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a6192bae
    • Anirudh Venkataramanan's avatar
      ice: Prevent probing virtual functions · bae5b521
      Anirudh Venkataramanan authored
      [ Upstream commit 50ac7479 ]
      
      The userspace utility "driverctl" can be used to change/override the
      system's default driver choices. This is useful in some situations
      (buggy driver, old driver missing a device ID, trying a workaround,
      etc.) where the user needs to load a different driver.
      
      However, this is also prone to user error, where a driver is mapped
      to a device it's not designed to drive. For example, if the ice driver
      is mapped to driver iavf devices, the ice driver crashes.
      
      Add a check to return an error if the ice driver is being used to
      probe a virtual function.
      
      Fixes: 837f08fd
      
       ("ice: Add basic driver framework for Intel(R) E800 Series")
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bae5b521
    • Hangbin Liu's avatar
      net: sched: act_mirred: Reset ct info when mirror/redirect skb · 059238c5
      Hangbin Liu authored
      [ Upstream commit d09c548d ]
      
      When mirror/redirect a skb to a different port, the ct info should be reset
      for reclassification. Or the pkts will match unexpected rules. For example,
      with following topology and commands:
      
          -----------
                    |
             veth0 -+-------
                    |
             veth1 -+-------
                    |
         ------------
      
       tc qdisc add dev veth0 clsact
       # The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1"
       tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1
       tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1
       tc qdisc add dev veth1 clsact
       tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop
      
       ping <remove ip via veth0> &
       tc -s filter show dev veth1 ingress
      
      With command 'tc -s filter show', we can find the pkts were dropped on
      veth1.
      
      Fixes: b57dc7c1
      
       ("net/sched: Introduce action ct")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      059238c5
    • Karsten Graul's avatar
      net/smc: fix wait on already cleared link · f15f7716
      Karsten Graul authored
      [ Upstream commit 8f3d65c1 ]
      
      There can be a race between the waiters for a tx work request buffer
      and the link down processing that finally clears the link. Although
      all waiters are woken up before the link is cleared there might be
      waiters which did not yet get back control and are still waiting.
      This results in an access to a cleared wait queue head.
      
      Fix this by introducing atomic reference counting around the wait calls,
      and wait with the link clear processing until all waiters have finished.
      Move the work request layer related calls into smc_wr.c and set the
      link state to INACTIVE before calling smcr_link_clear() in
      smc_llc_srv_add_link().
      
      Fixes: 15e1b99a
      
       ("net/smc: no WR buffer wait for terminating link group")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarGuvenc Gulce <guvenc@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f15f7716
    • Pali Rohár's avatar
      ppp: Fix generating ifname when empty IFLA_IFNAME is specified · 51f4965d
      Pali Rohár authored
      [ Upstream commit 2459dcb9
      
       ]
      
      IFLA_IFNAME is nul-term string which means that IFLA_IFNAME buffer can be
      larger than length of string which contains.
      
      Function __rtnl_newlink() generates new own ifname if either IFLA_IFNAME
      was not specified at all or userspace passed empty nul-term string.
      
      It is expected that if userspace does not specify ifname for new ppp netdev
      then kernel generates one in format "ppp<id>" where id matches to the ppp
      unit id which can be later obtained by PPPIOCGUNIT ioctl.
      
      And it works in this way if IFLA_IFNAME is not specified at all. But it
      does not work when IFLA_IFNAME is specified with empty string.
      
      So fix this logic also for empty IFLA_IFNAME in ppp_nl_newlink() function
      and correctly generates ifname based on ppp unit identifier if userspace
      did not provided preferred ifname.
      
      Without this patch when IFLA_IFNAME was specified with empty string then
      kernel created a new ppp interface in format "ppp<id>" but id did not
      match ppp unit id returned by PPPIOCGUNIT ioctl. In this case id was some
      number generated by __rtnl_newlink() function.
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Fixes: bb8082f6
      
       ("ppp: build ifname using unit identifier for rtnl based devices")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      51f4965d
    • Ben Hutchings's avatar
      net: phy: micrel: Fix link detection on ksz87xx switch" · 046579c9
      Ben Hutchings authored
      [ Upstream commit 2383cb94 ]
      
      Commit a5e63c7d "net: phy: micrel: Fix detection of ksz87xx
      switch" broke link detection on the external ports of the KSZ8795.
      
      The previously unused phy_driver structure for these devices specifies
      config_aneg and read_status functions that appear to be designed for a
      fixed link and do not work with the embedded PHYs in the KSZ8795.
      
      Delete the use of these functions in favour of the generic PHY
      implementations which were used previously.
      
      Fixes: a5e63c7d
      
       ("net: phy: micrel: Fix detection of ksz87xx switch")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      046579c9