1. 22 Jun, 2022 17 commits
  2. 16 Jun, 2022 12 commits
  3. 14 Jun, 2022 11 commits
    • Greg Kroah-Hartman's avatar
    • Maciej Fijalkowski's avatar
      xsk: Fix possible crash when multiple sockets are created · f7019562
      Maciej Fijalkowski authored
      commit ba3beec2 upstream.
      
      Fix a crash that happens if an Rx only socket is created first, then a
      second socket is created that is Tx only and bound to the same umem as
      the first socket and also the same netdev and queue_id together with the
      XDP_SHARED_UMEM flag. In this specific case, the tx_descs array page
      pool was not created by the first socket as it was an Rx only socket.
      When the second socket is bound it needs this tx_descs array of this
      shared page pool as it has a Tx component, but unfortunately it was
      never allocated, leading to a crash. Note that this array is only used
      for zero-copy drivers using the batched Tx APIs, currently only ice and
      i40e.
      
      [ 5511.150360] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [ 5511.158419] #PF: supervisor write access in kernel mode
      [ 5511.164472] #PF: error_code(0x0002) - not-present page
      [ 5511.170416] PGD 0 P4D 0
      [ 5511.173347] Oops: 0002 [#1] PREEMPT SMP PTI
      [ 5511.178186] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E     5.18.0-rc1+ #97
      [ 5511.187245] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
      [ 5511.198418] RIP: 0010:xsk_tx_peek_release_desc_batch+0x198/0x310
      [ 5511.205375] Code: c0 83 c6 01 84 c2 74 6d 8d 46 ff 23 07 44 89 e1 48 83 c0 14 48 c1 e1 04 48 c1 e0 04 48 03 47 10 4c 01 c1 48 8b 50 08 48 8b 00 <48> 89 51 08 48 89 01 41 80 bd d7 00 00 00 00 75 82 48 8b 19 49 8b
      [ 5511.227091] RSP: 0018:ffffc90000003dd0 EFLAGS: 00010246
      [ 5511.233135] RAX: 0000000000000000 RBX: ffff88810c8da600 RCX: 0000000000000000
      [ 5511.241384] RDX: 000000000000003c RSI: 0000000000000001 RDI: ffff888115f555c0
      [ 5511.249634] RBP: ffffc90000003e08 R08: 0000000000000000 R09: ffff889092296b48
      [ 5511.257886] R10: 0000ffffffffffff R11: ffff889092296800 R12: 0000000000000000
      [ 5511.266138] R13: ffff88810c8db500 R14: 0000000000000040 R15: 0000000000000100
      [ 5511.274387] FS:  0000000000000000(0000) GS:ffff88903f800000(0000) knlGS:0000000000000000
      [ 5511.283746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5511.290389] CR2: 0000000000000008 CR3: 00000001046e2001 CR4: 00000000003706f0
      [ 5511.298640] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 5511.306892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 5511.315142] Call Trace:
      [ 5511.317972]  <IRQ>
      [ 5511.320301]  ice_xmit_zc+0x68/0x2f0 [ice]
      [ 5511.324977]  ? ktime_get+0x38/0xa0
      [ 5511.328913]  ice_napi_poll+0x7a/0x6a0 [ice]
      [ 5511.333784]  __napi_poll+0x2c/0x160
      [ 5511.337821]  net_rx_action+0xdd/0x200
      [ 5511.342058]  __do_softirq+0xe6/0x2dd
      [ 5511.346198]  irq_exit_rcu+0xb5/0x100
      [ 5511.350339]  common_interrupt+0xa4/0xc0
      [ 5511.354777]  </IRQ>
      [ 5511.357201]  <TASK>
      [ 5511.359625]  asm_common_interrupt+0x1e/0x40
      [ 5511.364466] RIP: 0010:cpuidle_enter_state+0xd2/0x360
      [ 5511.370211] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 e9 00 7b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 72 02 00 00 31 ff e8 02 0c 80 ff fb 45 85 f6 <0f> 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
      [ 5511.391921] RSP: 0018:ffffffff82a03e60 EFLAGS: 00000202
      [ 5511.397962] RAX: ffff88903f800000 RBX: 0000000000000001 RCX: 000000000000001f
      [ 5511.406214] RDX: 0000000000000000 RSI: ffffffff823400b9 RDI: ffffffff8234c046
      [ 5511.424646] RBP: ffff88810a384800 R08: 000005032a28c046 R09: 0000000000000008
      [ 5511.443233] R10: 000000000000000b R11: 0000000000000006 R12: ffffffff82bcf700
      [ 5511.461922] R13: 000005032a28c046 R14: 0000000000000001 R15: 0000000000000000
      [ 5511.480300]  cpuidle_enter+0x29/0x40
      [ 5511.494329]  do_idle+0x1c7/0x250
      [ 5511.507610]  cpu_startup_entry+0x19/0x20
      [ 5511.521394]  start_kernel+0x649/0x66e
      [ 5511.534626]  secondary_startup_64_no_verify+0xc3/0xcb
      [ 5511.549230]  </TASK>
      
      Detect such case during bind() and allocate this memory region via newly
      introduced xp_alloc_tx_descs(). Also, use kvcalloc instead of kcalloc as
      for other buffer pool allocations, so that it matches the kvfree() from
      xp_destroy().
      
      Fixes: d1bc532e
      
       ("i40e: xsk: Move tmp desc array from driver to pool")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220425153745.481322-1-maciej.fijalkowski@intel.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7019562
    • Eric Dumazet's avatar
      tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd · 90385f2b
      Eric Dumazet authored
      commit 11825765 upstream.
      
      syzbot got a new report [1] finally pointing to a very old bug,
      added in initial support for MTU probing.
      
      tcp_mtu_probe() has checks about starting an MTU probe if
      tcp_snd_cwnd(tp) >= 11.
      
      But nothing prevents tcp_snd_cwnd(tp) to be reduced later
      and before the MTU probe succeeds.
      
      This bug would lead to potential zero-divides.
      
      Debugging added in commit 40570375 ("tcp: add accessors
      to read/set tp->snd_cwnd") has paid off :)
      
      While we are at it, address potential overflows in this code.
      
      [1]
      WARNING: CPU: 1 PID: 14132 at include/net/tcp.h:1219 tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712
      Modules linked in:
      CPU: 1 PID: 14132 Comm: syz-executor.2 Not tainted 5.18.0-syzkaller-07857-gbabf0bb9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_snd_cwnd_set include/net/tcp.h:1219 [inline]
      RIP: 0010:tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712
      Code: 74 08 48 89 ef e8 da 80 17 f9 48 8b 45 00 65 48 ff 80 80 03 00 00 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 aa b0 c5 f8 <0f> 0b e9 16 fe ff ff 48 8b 4c 24 08 80 e1 07 38 c1 0f 8c c7 fc ff
      RSP: 0018:ffffc900079e70f8 EFLAGS: 00010287
      RAX: ffffffff88c0f7f6 RBX: ffff8880756e7a80 RCX: 0000000000040000
      RDX: ffffc9000c6c4000 RSI: 0000000000031f9e RDI: 0000000000031f9f
      RBP: 0000000000000000 R08: ffffffff88c0f606 R09: ffffc900079e7520
      R10: ffffed101011226d R11: 1ffff1101011226c R12: 1ffff1100eadcf50
      R13: ffff8880756e72c0 R14: 1ffff1100eadcf89 R15: dffffc0000000000
      FS:  00007f643236e700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f1ab3f1e2a0 CR3: 0000000064fe7000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_clean_rtx_queue+0x223a/0x2da0 net/ipv4/tcp_input.c:3356
       tcp_ack+0x1962/0x3c90 net/ipv4/tcp_input.c:3861
       tcp_rcv_established+0x7c8/0x1ac0 net/ipv4/tcp_input.c:5973
       tcp_v6_do_rcv+0x57b/0x1210 net/ipv6/tcp_ipv6.c:1476
       sk_backlog_rcv include/net/sock.h:1061 [inline]
       __release_sock+0x1d8/0x4c0 net/core/sock.c:2849
       release_sock+0x5d/0x1c0 net/core/sock.c:3404
       sk_stream_wait_memory+0x700/0xdc0 net/core/stream.c:145
       tcp_sendmsg_locked+0x111d/0x3fc0 net/ipv4/tcp.c:1410
       tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1448
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       __sys_sendto+0x439/0x5c0 net/socket.c:2119
       __do_sys_sendto net/socket.c:2131 [inline]
       __se_sys_sendto net/socket.c:2127 [inline]
       __x64_sys_sendto+0xda/0xf0 net/socket.c:2127
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f6431289109
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f643236e168 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007f643139c100 RCX: 00007f6431289109
      RDX: 00000000d0d0c2ac RSI: 0000000020000080 RDI: 000000000000000a
      RBP: 00007f64312e308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fff372533af R14: 00007f643236e300 R15: 0000000000022000
      
      Fixes: 5d424d5a
      
       ("[TCP]: MTU probing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90385f2b
    • Dave Jiang's avatar
      dmaengine: idxd: add missing callback function to support DMA_INTERRUPT · cfe3dd8b
      Dave Jiang authored
      commit 2112b8f4 upstream.
      
      When setting DMA_INTERRUPT capability, a callback function
      dma->device_prep_dma_interrupt() is needed to support this capability.
      Without setting the callback, dma_async_device_register() will fail dma
      capability check.
      
      Fixes: 4e5a4eb2
      
       ("dmaengine: idxd: set DMA_INTERRUPT cap bit")
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Link: https://lore.kernel.org/r/165101232637.3951447.15765792791591763119.stgit@djiang5-desk3.ch.intel.com
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfe3dd8b
    • Linus Torvalds's avatar
      iov_iter: fix build issue due to possible type mis-match · fb5e51c0
      Linus Torvalds authored
      commit 1c27f1fc upstream.
      
      Commit 6c776766
      
       ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()")
      introduced a problem on some 32-bit architectures (at least arm, xtensa,
      csky,sparc and mips), that have a 'size_t' that is 'unsigned int'.
      
      The reason is that we now do
      
          min(nr * PAGE_SIZE - offset, maxsize);
      
      where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is
      'unsigned long'.  As a result, the normal C type rules means that the
      first argument to 'min()' ends up being 'unsigned long'.
      
      In contrast, 'maxsize' is of type 'size_t'.
      
      Now, 'size_t' and 'unsigned long' are always the same physical type in
      the kernel, so you'd think this doesn't matter, and from an actual
      arithmetic standpoint it doesn't.
      
      But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if
      it could also be 'unsigned long'.  In that situation, both are unsigned
      32-bit types, but they are not the *same* type.
      
      And as a result 'min()' will complain about the distinct types (ignore
      the "pointer types" part of the error message: that's an artifact of the
      way we have made 'min()' check types for being the same):
      
        lib/iov_iter.c: In function 'iter_xarray_get_pages':
        include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
           20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
              |                                   ^~
        lib/iov_iter.c:1464:16: note: in expansion of macro 'min'
         1464 |         return min(nr * PAGE_SIZE - offset, maxsize);
              |                ^~~
      
      This was not visible on 64-bit architectures (where we always define
      'size_t' to be 'unsigned long').
      
      Force these cases to use 'min_t(size_t, x, y)' to make the type explicit
      and avoid the issue.
      
      [ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned
        long' arithmetically. We've certainly historically seen environments
        with 16-bit address spaces and 32-bit 'unsigned long'.
      
        Similarly, even in 64-bit modern environments, 'size_t' could be its
        own type distinct from 'unsigned long', even if it were arithmetically
        identical.
      
        So the above type commentary is only really descriptive of the kernel
        environment, not some kind of universal truth for the kinds of wild
        and crazy situations that are allowed by the C standard ]
      Reported-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/
      
      
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb5e51c0
    • Damien Le Moal's avatar
      zonefs: fix handling of explicit_open option on mount · 7f36e2e1
      Damien Le Moal authored
      commit a2a513be upstream.
      
      Ignoring the explicit_open mount option on mount for devices that do not
      have a limit on the number of open zones must be done after the mount
      options are parsed and set in s_mount_opts. Move the check to ignore
      the explicit_open option after the call to zonefs_parse_options() in
      zonefs_fill_super().
      
      Fixes: b5c00e97
      
       ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f36e2e1
    • Johan Hovold's avatar
      PCI: qcom: Fix pipe clock imbalance · 9e4810b4
      Johan Hovold authored
      commit fdf6a2f5 upstream.
      
      Fix a clock imbalance introduced by ed8cc3b1 ("PCI: qcom: Add support
      for SDM845 PCIe controller"), which enables the pipe clock both in init()
      and in post_init() but only disables in post_deinit().
      
      Note that the pipe clock was also never disabled in the init() error
      paths and that enabling the clock before powering up the PHY looks
      questionable.
      
      Link: https://lore.kernel.org/r/20220401133351.10113-1-johan+linaro@kernel.org
      Fixes: ed8cc3b1
      
       ("PCI: qcom: Add support for SDM845 PCIe controller")
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Cc: stable@vger.kernel.org      # 5.6
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e4810b4
    • Davide Caratti's avatar
      net/sched: act_police: more accurate MTU policing · 42c0160d
      Davide Caratti authored
      commit 4ddc844e
      
       upstream.
      
      in current Linux, MTU policing does not take into account that packets at
      the TC ingress have the L2 header pulled. Thus, the same TC police action
      (with the same value of tcfp_mtu) behaves differently for ingress/egress.
      In addition, the full GSO size is compared to tcfp_mtu: as a consequence,
      the policer drops GSO packets even when individual segments have the L2 +
      L3 + L4 + payload length below the configured valued of tcfp_mtu.
      
      Improve the accuracy of MTU policing as follows:
       - account for mac_len for non-GSO packets at TC ingress.
       - compare MTU threshold with the segmented size for GSO packets.
      Also, add a kselftest that verifies the correct behavior.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42c0160d
    • Pascal Hambourg's avatar
      md/raid0: Ignore RAID0 layout if the second zone has only one device · 4c106eb8
      Pascal Hambourg authored
      commit ea23994e
      
       upstream.
      
      The RAID0 layout is irrelevant if all members have the same size so the
      array has only one zone. It is *also* irrelevant if the array has two
      zones and the second zone has only one device, for example if the array
      has two members of different sizes.
      
      So in that case it makes sense to allow assembly even when the layout is
      undefined, like what is done when the array has only one zone.
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarPascal Hambourg <pascal@plouf.fr.eu.org>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c106eb8
    • Jason A. Donenfeld's avatar
      random: account for arch randomness in bits · 51e55727
      Jason A. Donenfeld authored
      commit 77fc95f8 upstream.
      
      Rather than accounting in bytes and multiplying (shifting), we can just
      account in bits and avoid the shift. The main motivation for this is
      there are other patches in flux that expand this code a bit, and
      avoiding the duplication of "* 8" everywhere makes things a bit clearer.
      
      Cc: stable@vger.kernel.org
      Fixes: 12e45a2a
      
       ("random: credit architectural init the exact amount")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51e55727
    • Jason A. Donenfeld's avatar
      random: mark bootloader randomness code as __init · e59a120f
      Jason A. Donenfeld authored
      commit 39e0f991 upstream.
      
      add_bootloader_randomness() and the variables it touches are only used
      during __init and not after, so mark these as __init. At the same time,
      unexport this, since it's only called by other __init code that's
      built-in.
      
      Cc: stable@vger.kernel.org
      Fixes: 428826f5
      
       ("fdt: add support for rng-seed")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e59a120f