1. 14 Oct, 2020 40 commits
    • Antony Antony's avatar
      xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate · c1becfeb
      Antony Antony authored
      [ Upstream commit 91a46c6d ]
      
      XFRMA_REPLAY_ESN_VAL was not cloned completely from the old to the new.
      Migrate this attribute during XFRMA_MSG_MIGRATE
      
      v1->v2:
       - move curleft cloning to a separate patch
      
      Fixes: af2f464e
      
       ("xfrm: Assign esn pointers when cloning a state")
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c1becfeb
    • Antony Antony's avatar
      xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate · 0bea401a
      Antony Antony authored
      [ Upstream commit 545e5c57 ]
      
      XFRMA_SET_MARK and XFRMA_SET_MARK_MASK was not cloned from the old
      to the new. Migrate these two attributes during XFRMA_MSG_MIGRATE
      
      Fixes: 9b42c1f1
      
       ("xfrm: Extend the output_mark to support input direction and masking.")
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0bea401a
    • Lu Baolu's avatar
      iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() · f825fd53
      Lu Baolu authored
      [ Upstream commit 1a3f2fd7
      
       ]
      
      Lock(&iommu->lock) without disabling irq causes lockdep warnings.
      
      [   12.703950] ========================================================
      [   12.703962] WARNING: possible irq lock inversion dependency detected
      [   12.703975] 5.9.0-rc6+ #659 Not tainted
      [   12.703983] --------------------------------------------------------
      [   12.703995] systemd-udevd/284 just changed the state of lock:
      [   12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at:
                     iommu_flush_dev_iotlb.part.57+0x2e/0x90
      [   12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past:
      [   12.704043]  (&iommu->lock){+.+.}-{2:2}
      [   12.704045]
      
                     and interrupts could create inverse lock ordering between
                     them.
      
      [   12.704073]
                     other info that might help us debug this:
      [   12.704085]  Possible interrupt unsafe locking scenario:
      
      [   12.704097]        CPU0                    CPU1
      [   12.704106]        ----                    ----
      [   12.704115]   lock(&iommu->lock);
      [   12.704123]                                local_irq_disable();
      [   12.704134]                                lock(device_domain_lock);
      [   12.704146]                                lock(&iommu->lock);
      [   12.704158]   <Interrupt>
      [   12.704164]     lock(device_domain_lock);
      [   12.704174]
                      *** DEADLOCK ***
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com
      
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f825fd53
    • Philip Yang's avatar
      drm/amdgpu: prevent double kfree ttm->sg · bdffb36b
      Philip Yang authored
      [ Upstream commit 1d0e16ac
      
       ]
      
      Set ttm->sg to NULL after kfree, to avoid memory corruption backtrace:
      
      [  420.932812] kernel BUG at
      /build/linux-do9eLF/linux-4.15.0/mm/slub.c:295!
      [  420.934182] invalid opcode: 0000 [#1] SMP NOPTI
      [  420.935445] Modules linked in: xt_conntrack ipt_MASQUERADE
      [  420.951332] Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS
      1.5.4 07/09/2020
      [  420.952887] RIP: 0010:__slab_free+0x180/0x2d0
      [  420.954419] RSP: 0018:ffffbe426291fa60 EFLAGS: 00010246
      [  420.955963] RAX: ffff9e29263e9c30 RBX: ffff9e29263e9c30 RCX:
      000000018100004b
      [  420.957512] RDX: ffff9e29263e9c30 RSI: fffff3d33e98fa40 RDI:
      ffff9e297e407a80
      [  420.959055] RBP: ffffbe426291fb00 R08: 0000000000000001 R09:
      ffffffffc0d39ade
      [  420.960587] R10: ffffbe426291fb20 R11: ffff9e49ffdd4000 R12:
      ffff9e297e407a80
      [  420.962105] R13: fffff3d33e98fa40 R14: ffff9e29263e9c30 R15:
      ffff9e2954464fd8
      [  420.963611] FS:  00007fa2ea097780(0000) GS:ffff9e297e840000(0000)
      knlGS:0000000000000000
      [  420.965144] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  420.966663] CR2: 00007f16bfffefb8 CR3: 0000001ff0c62000 CR4:
      0000000000340ee0
      [  420.968193] Call Trace:
      [  420.969703]  ? __page_cache_release+0x3c/0x220
      [  420.971294]  ? amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu]
      [  420.972789]  kfree+0x168/0x180
      [  420.974353]  ? amdgpu_ttm_tt_set_user_pages+0x64/0xc0 [amdgpu]
      [  420.975850]  ? kfree+0x168/0x180
      [  420.977403]  amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu]
      [  420.978888]  ttm_tt_unpopulate.part.10+0x53/0x60 [amdttm]
      [  420.980357]  ttm_tt_destroy.part.11+0x4f/0x60 [amdttm]
      [  420.981814]  ttm_tt_destroy+0x13/0x20 [amdttm]
      [  420.983273]  ttm_bo_cleanup_memtype_use+0x36/0x80 [amdttm]
      [  420.984725]  ttm_bo_release+0x1c9/0x360 [amdttm]
      [  420.986167]  amdttm_bo_put+0x24/0x30 [amdttm]
      [  420.987663]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [  420.989165]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x9ca/0xb10
      [amdgpu]
      [  420.990666]  kfd_ioctl_alloc_memory_of_gpu+0xef/0x2c0 [amdgpu]
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bdffb36b
    • Dumitru Ceara's avatar
      openvswitch: handle DNAT tuple collision · 4034664a
      Dumitru Ceara authored
      commit 8aa7b526 upstream.
      
      With multiple DNAT rules it's possible that after destination
      translation the resulting tuples collide.
      
      For example, two openvswitch flows:
      nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
      nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
      
      Assuming two TCP clients initiating the following connections:
      10.0.0.10:5000->10.0.0.10:10
      10.0.0.10:5000->10.0.0.20:10
      
      Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
      nf_conntrack_confirm() to fail because of tuple collision.
      
      Netfilter handles this case by allocating a null binding for SNAT at
      egress by default.  Perform the same operation in openvswitch for DNAT
      if no explicit SNAT is requested by the user and allocate a null binding
      for SNAT for packets in the "original" direction.
      
      Reported-at: https://bugzilla.redhat.com/1877128
      
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Fixes: 05752523
      
       ("openvswitch: Interface with NAT.")
      Signed-off-by: default avatarDumitru Ceara <dceara@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4034664a
    • Anant Thazhemadam's avatar
      net: team: fix memory leak in __team_options_register · f89128ad
      Anant Thazhemadam authored
      commit 9a9e7749 upstream.
      
      The variable "i" isn't initialized back correctly after the first loop
      under the label inst_rollback gets executed.
      
      The value of "i" is assigned to be option_count - 1, and the ensuing
      loop (under alloc_rollback) begins by initializing i--.
      Thus, the value of i when the loop begins execution will now become
      i = option_count - 2.
      
      Thus, when kfree(dst_opts[i]) is called in the second loop in this
      order, (i.e., inst_rollback followed by alloc_rollback),
      dst_optsp[option_count - 2] is the first element freed, and
      dst_opts[option_count - 1] does not get freed, and thus, a memory
      leak is caused.
      
      This memory leak can be fixed, by assigning i = option_count (instead of
      option_count - 1).
      
      Fixes: 80f7c668
      
       ("team: add support for per-port options")
      Reported-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
      Tested-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
      Signed-off-by: default avatarAnant Thazhemadam <anant.thazhemadam@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f89128ad
    • Eric Dumazet's avatar
      team: set dev->needed_headroom in team_setup_by_port() · 003269d8
      Eric Dumazet authored
      commit 89d01748 upstream.
      
      Some devices set needed_headroom. If we ignore it, we might
      end up crashing in various skb_push() for example in ipgre_header()
      since some layers assume enough headroom has been reserved.
      
      Fixes: 1d76efe1
      
       ("team: add support for non-ethernet devices")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      003269d8
    • Eric Dumazet's avatar
      sctp: fix sctp_auth_init_hmacs() error path · fb3681c2
      Eric Dumazet authored
      commit d42ee76e upstream.
      
      After freeing ep->auth_hmacs we have to clear the pointer
      or risk use-after-free as reported by syzbot:
      
      BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
      BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
      BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
      Read of size 8 at addr ffff8880a8ff52c0 by task syz-executor941/6874
      
      CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x198/0x1fd lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
       sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
       sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
       sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203
       sctp_endpoint_put net/sctp/endpointola.c:236 [inline]
       sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183
       sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981
       sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415
       sk_common_release+0x64/0x390 net/core/sock.c:3254
       sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475
       __sock_release+0xcd/0x280 net/socket.c:596
       sock_close+0x18/0x20 net/socket.c:1277
       __fput+0x285/0x920 fs/file_table.c:281
       task_work_run+0xdd/0x190 kernel/task_work.c:141
       exit_task_work include/linux/task_work.h:25 [inline]
       do_exit+0xb7d/0x29f0 kernel/exit.c:806
       do_group_exit+0x125/0x310 kernel/exit.c:903
       __do_sys_exit_group kernel/exit.c:914 [inline]
       __se_sys_exit_group kernel/exit.c:912 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x43f278
      Code: Bad RIP value.
      RSP: 002b:00007fffe0995c38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f278
      RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
      RBP: 00000000004bf068 R08: 00000000000000e7 R09: ffffffffffffffd0
      R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000001
      R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 6874:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
       kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554
       kmalloc include/linux/slab.h:554 [inline]
       kmalloc_array include/linux/slab.h:593 [inline]
       kcalloc include/linux/slab.h:605 [inline]
       sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464
       sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
       sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
       sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
       __sys_setsockopt+0x2db/0x610 net/socket.c:2132
       __do_sys_setsockopt net/socket.c:2143 [inline]
       __se_sys_setsockopt net/socket.c:2140 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 6874:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
       kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
       __cache_free mm/slab.c:3422 [inline]
       kfree+0x10e/0x2b0 mm/slab.c:3760
       sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline]
       sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
       sctp_auth_init_hmacs net/sctp/auth.c:496 [inline]
       sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454
       sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
       sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
       sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
       __sys_setsockopt+0x2db/0x610 net/socket.c:2132
       __do_sys_setsockopt net/socket.c:2143 [inline]
       __se_sys_setsockopt net/socket.c:2140 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1f485649
      
       ("[SCTP]: Implement SCTP-AUTH internals")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb3681c2
    • Cristian Ciocaltea's avatar
      i2c: owl: Clear NACK and BUS error bits · 040e3110
      Cristian Ciocaltea authored
      commit f5b3f433 upstream.
      
      When the NACK and BUS error bits are set by the hardware, the driver is
      responsible for clearing them by writing "1" into the corresponding
      status registers.
      
      Hence perform the necessary operations in owl_i2c_interrupt().
      
      Fixes: d211e62a
      
       ("i2c: Add Actions Semiconductor Owl family S900 I2C driver")
      Reported-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarCristian Ciocaltea <cristian.ciocaltea@gmail.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      040e3110
    • Nicolas Belin's avatar
      i2c: meson: fixup rate calculation with filter delay · abe997f6
      Nicolas Belin authored
      commit 1334d3b4 upstream.
      
      Apparently, 15 cycles of the peripheral clock are used by the controller
      for sampling and filtering. Because this was not known before, the rate
      calculation is slightly off.
      
      Clean up and fix the calculation taking this filtering delay into account.
      
      Fixes: 30021e37
      
       ("i2c: add support for Amlogic Meson I2C controller")
      Signed-off-by: default avatarNicolas Belin <nbelin@baylibre.com>
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abe997f6
    • Jerome Brunet's avatar
      i2c: meson: fix clock setting overwrite · 6db69c39
      Jerome Brunet authored
      commit 28683e84 upstream.
      
      When the slave address is written in do_start(), SLAVE_ADDR is written
      completely. This may overwrite some setting related to the clock rate
      or signal filtering.
      
      Fix this by writing only the bits related to slave address. To avoid
      causing unexpected changed, explicitly disable filtering or high/low
      clock mode which may have been left over by the bootloader.
      
      Fixes: 30021e37
      
       ("i2c: add support for Amlogic Meson I2C controller")
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6db69c39
    • Vladimir Zapolskiy's avatar
      cifs: Fix incomplete memory allocation on setxattr path · 209549c1
      Vladimir Zapolskiy authored
      commit 64b7f674 upstream.
      
      On setxattr() syscall path due to an apprent typo the size of a dynamically
      allocated memory chunk for storing struct smb2_file_full_ea_info object is
      computed incorrectly, to be more precise the first addend is the size of
      a pointer instead of the wanted object size. Coincidentally it makes no
      difference on 64-bit platforms, however on 32-bit targets the following
      memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
      
        =============================================================================
        BUG kmalloc-16 (Not tainted): Redzone overwritten
        -----------------------------------------------------------------------------
      
        Disabling lock debugging due to kernel taint
        INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
        INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
        INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
      
        Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
        Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69  ........snrub.fi
        Redzone 79e69a6f: 73 68 32 0a                                      sh2.
        Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
        CPU: 0 PID: 8196 Comm: attr Tainted: G    B             5.9.0-rc8+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
        Call Trace:
         dump_stack+0x54/0x6e
         print_trailer+0x12c/0x134
         check_bytes_and_report.cold+0x3e/0x69
         check_object+0x18c/0x250
         free_debug_processing+0xfe/0x230
         __slab_free+0x1c0/0x300
         kfree+0x1d3/0x220
         smb2_set_ea+0x27d/0x540
         cifs_xattr_set+0x57f/0x620
         __vfs_setxattr+0x4e/0x60
         __vfs_setxattr_noperm+0x4e/0x100
         __vfs_setxattr_locked+0xae/0xd0
         vfs_setxattr+0x4e/0xe0
         setxattr+0x12c/0x1a0
         path_setxattr+0xa4/0xc0
         __ia32_sys_lsetxattr+0x1d/0x20
         __do_fast_syscall_32+0x40/0x70
         do_fast_syscall_32+0x29/0x60
         do_SYSENTER_32+0x15/0x20
         entry_SYSENTER_32+0x9f/0xf2
      
      Fixes: 5517554e
      
       ("cifs: Add support for writing attributes on SMB2+")
      Signed-off-by: default avatarVladimir Zapolskiy <vladimir@tuxera.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      209549c1
    • Sabrina Dubroca's avatar
      xfrmi: drop ignore_df check before updating pmtu · 0afdda28
      Sabrina Dubroca authored
      commit 45a36a18 upstream.
      
      xfrm interfaces currently test for !skb->ignore_df when deciding
      whether to update the pmtu on the skb's dst. Because of this, no pmtu
      exception is created when we do something like:
      
          ping -s 1438 <dest>
      
      By dropping this check, the pmtu exception will be created and the
      next ping attempt will work.
      
      Fixes: f203b76d
      
       ("xfrm: Add virtual xfrm interfaces")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0afdda28
    • Coly Li's avatar
      nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() · 49af88ac
      Coly Li authored
      commit 7d4194ab
      
       upstream.
      
      Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
      send slab pages. But for pages allocated by __get_free_pages() without
      __GFP_COMP, which also have refcount as 0, they are still sent by
      kernel_sendpage() to remote end, this is problematic.
      
      The new introduced helper sendpage_ok() checks both PageSlab tag and
      page_count counter, and returns true if the checking page is OK to be
      sent by kernel_sendpage().
      
      This patch fixes the page checking issue of nvme_tcp_try_send_data()
      with sendpage_ok(). If sendpage_ok() returns true, send this page by
      kernel_sendpage(), otherwise use sock_no_sendpage to handle this page.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Vlastimil Babka <vbabka@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49af88ac
    • Coly Li's avatar
      tcp: use sendpage_ok() to detect misused .sendpage · 15cac17d
      Coly Li authored
      commit cf83a17e upstream.
      
      commit a10674bf ("tcp: detecting the misuse of .sendpage for Slab
      objects") adds the checks for Slab pages, but the pages don't have
      page_count are still missing from the check.
      
      Network layer's sendpage method is not designed to send page_count 0
      pages neither, therefore both PageSlab() and page_count() should be
      both checked for the sending page. This is exactly what sendpage_ok()
      does.
      
      This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused
      .sendpage, to make the code more robust.
      
      Fixes: a10674bf
      
       ("tcp: detecting the misuse of .sendpage for Slab objects")
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15cac17d
    • Coly Li's avatar
      net: introduce helper sendpage_ok() in include/linux/net.h · d23dd386
      Coly Li authored
      commit c381b079
      
       upstream.
      
      The original problem was from nvme-over-tcp code, who mistakenly uses
      kernel_sendpage() to send pages allocated by __get_free_pages() without
      __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on
      tail pages, sending them by kernel_sendpage() may trigger a kernel panic
      from a corrupted kernel heap, because these pages are incorrectly freed
      in network stack as page_count 0 pages.
      
      This patch introduces a helper sendpage_ok(), it returns true if the
      checking page,
      - is not slab page: PageSlab(page) is false.
      - has page refcount: page_count(page) is not zero
      
      All drivers who want to send page to remote end by kernel_sendpage()
      may use this helper to check whether the page is OK. If the helper does
      not return true, the driver should try other non sendpage method (e.g.
      sock_no_sendpage()) to handle the page.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Vlastimil Babka <vbabka@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d23dd386
    • Hugh Dickins's avatar
      mm/khugepaged: fix filemap page_to_pgoff(page) != offset · 5c62d335
      Hugh Dickins authored
      commit 033b5d77
      
       upstream.
      
      There have been elusive reports of filemap_fault() hitting its
      VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
      with CONFIG_READ_ONLY_THP_FOR_FS=y.
      
      Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
      CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
      without NUMA reuses the same huge page after collapse_file() failed
      (whereas NUMA targets its allocation to the respective node each time).
      And most of us were usually testing with CONFIG_NUMA=y kernels.
      
      collapse_file(old start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=old offset
        new_page->mapping = mapping
        xas_store(&xas, new_page)
      
                                filemap_fault
                                  page = find_get_page(mapping, offset)
                                  // if offset falls inside hpage then
                                  // compound_head(page) == hpage
                                  lock_page_maybe_drop_mmap()
                                    __lock_page(page)
      
        // collapse fails
        xas_store(&xas, old page)
        new_page->mapping = NULL
        unlock_page(new_page)
      
      collapse_file(new start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=new offset
        new_page->mapping = mapping // mapping becomes valid again
      
                                  // since compound_head(page) == hpage
                                  // page_to_pgoff(page) got changed
                                  VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
      
      An initial patch replaced __SetPageLocked() by lock_page(), which did
      fix the race which Suren illustrates above.  But testing showed that it's
      not good enough: if the racing task's __lock_page() gets delayed long
      after its find_get_page(), then it may follow collapse_file(new start)'s
      successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
      
      It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
      check and retry (as is done for mapping), with similar relaxations in
      find_lock_entry() and pagecache_get_page(): but it's not obvious what
      else might get caught out; and khugepaged non-NUMA appears to be unique
      in exposing a page to page cache, then revoking, without going through
      a full cycle of freeing before reuse.
      
      Instead, non-NUMA khugepaged_prealloc_page() release the old page
      if anyone else has a reference to it (1% of cases when I tested).
      
      Although never reported on huge tmpfs, I believe its find_lock_entry()
      has been at similar risk; but huge tmpfs does not rely on khugepaged
      for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
      Reported-by: default avatarDenis Lisov <dennis.lissov@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
      Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
      
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
      
      Reported-and-analyzed-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Fixes: 87c460a0
      
       ("mm/khugepaged: collapse_shmem() without freezing new_page")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v4.9+
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c62d335
    • Eric Dumazet's avatar
      macsec: avoid use-after-free in macsec_handle_frame() · 1317469f
      Eric Dumazet authored
      commit c7cc9200 upstream.
      
      De-referencing skb after call to gro_cells_receive() is not allowed.
      We need to fetch skb->len earlier.
      
      Fixes: 5491e7c6
      
       ("macsec: enable GRO and RPS on macsec devices")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1317469f
    • Chaitanya Kulkarni's avatar
      nvme-core: put ctrl ref when module ref get fail · 20f96fee
      Chaitanya Kulkarni authored
      commit 4bab6909 upstream.
      
      When try_module_get() fails in the nvme_dev_open() it returns without
      releasing the ctrl reference which was taken earlier.
      
      Put the ctrl reference which is taken before calling the
      try_module_get() in the error return code path.
      
      Fixes: 52a3974f
      
       "nvme-core: get/put ctrl and transport module in nvme_dev_open/release()"
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20f96fee
    • Qu Wenruo's avatar
      btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation · c0f3c538
      Qu Wenruo authored
      commit 6d4572a9
      
       upstream.
      
      [BUG]
      When the data space is exhausted, even if the inode has NOCOW attribute,
      we will still refuse to truncate unaligned range due to ENOSPC.
      
      The following script can reproduce it pretty easily:
        #!/bin/bash
      
        dev=/dev/test/test
        mnt=/mnt/btrfs
      
        umount $dev &> /dev/null
        umount $mnt &> /dev/null
      
        mkfs.btrfs -f $dev -b 1G
        mount -o nospace_cache $dev $mnt
        touch $mnt/foobar
        chattr +C $mnt/foobar
      
        xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
        xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
        sync
      
        xfs_io -c "fpunch 0 2k" $mnt/foobar
        umount $mnt
      
      Currently this will fail at the fpunch part.
      
      [CAUSE]
      Because btrfs_truncate_block() always reserves space without checking
      the NOCOW attribute.
      
      Since the writeback path follows NOCOW bit, we only need to bother the
      space reservation code in btrfs_truncate_block().
      
      [FIX]
      Make btrfs_truncate_block() follow btrfs_buffered_write() to try to
      reserve data space first, and fall back to NOCOW check only when we
      don't have enough space.
      
      Such always-try-reserve is an optimization introduced in
      btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
      
      This patch will export check_can_nocow() as btrfs_check_can_nocow(), and
      use it in btrfs_truncate_block() to fix the problem.
      Reported-by: default avatarMartin Doucha <martin.doucha@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0f3c538
    • Filipe Manana's avatar
      btrfs: fix RWF_NOWAIT write not failling when we need to cow · e531fd7f
      Filipe Manana authored
      commit 260a6339 upstream.
      
      If we attempt to do a RWF_NOWAIT write against a file range for which we
      can only do NOCOW for a part of it, due to the existence of holes or
      shared extents for example, we proceed with the write as if it were
      possible to NOCOW the whole range.
      
      Example:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ touch /mnt/sdj/bar
        $ chattr +C /mnt/sdj/bar
      
        $ xfs_io -d -c "pwrite -S 0xab -b 256K 0 256K" /mnt/bar
        wrote 262144/262144 bytes at offset 0
        256 KiB, 1 ops; 0.0003 sec (694.444 MiB/sec and 2777.7778 ops/sec)
      
        $ xfs_io -c "fpunch 64K 64K" /mnt/bar
        $ sync
      
        $ xfs_io -d -c "pwrite -N -V 1 -b 128K -S 0xfe 0 128K" /mnt/bar
        wrote 131072/131072 bytes at offset 0
        128 KiB, 1 ops; 0.0007 sec (160.051 MiB/sec and 1280.4097 ops/sec)
      
      This last write should fail with -EAGAIN since the file range from 64K to
      128K is a hole. On xfs it fails, as expected, but on ext4 it currently
      succeeds because apparently it is expensive to check if there are extents
      allocated for the whole range, but I'll check with the ext4 people.
      
      Fix the issue by checking if check_can_nocow() returns a number of
      NOCOW'able bytes smaller then the requested number of bytes, and if it
      does return -EAGAIN.
      
      Fixes: edf064e7
      
       ("btrfs: nowait aio support")
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e531fd7f
    • Qu Wenruo's avatar
      btrfs: Ensure we trim ranges across block group boundary · 1f90600e
      Qu Wenruo authored
      commit 6b7faadd
      
       upstream.
      
      [BUG]
      When deleting large files (which cross block group boundary) with
      discard mount option, we find some btrfs_discard_extent() calls only
      trimmed part of its space, not the whole range:
      
        btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 trimmed=1073741824 ratio=50%
      
      type:		bbio->map_type, in above case, it's SINGLE DATA.
      start:		Logical address of this trim
      len:		Logical length of this trim
      trimmed:	Physically trimmed bytes
      ratio:		trimmed / len
      
      Thus leaving some unused space not discarded.
      
      [CAUSE]
      When discard mount option is specified, after a transaction is fully
      committed (super block written to disk), we begin to cleanup pinned
      extents in the following call chain:
      
      btrfs_commit_transaction()
      |- btrfs_finish_extent_commit()
         |- find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY);
         |- btrfs_discard_extent()
      
      However, pinned extents are recorded in an extent_io_tree, which can
      merge adjacent extent states.
      
      When a large file gets deleted and it has adjacent file extents across
      block group boundary, we will get a large merged range like this:
      
            |<---    BG1    --->|<---      BG2     --->|
            |//////|<--   Range to discard   --->|/////|
      
      To discard that range, we have the following calls:
      
        btrfs_discard_extent()
        |- btrfs_map_block()
        |  Returned bbio will end at BG1's end. As btrfs_map_block()
        |  never returns result across block group boundary.
        |- btrfs_issuse_discard()
           Issue discard for each stripe.
      
      So we will only discard the range in BG1, not the remaining part in BG2.
      
      Furthermore, this bug is not that reliably observed, for above case, if
      there is no other extent in BG2, BG2 will be empty and btrfs will trim
      all space of BG2, covering up the bug.
      
      [FIX]
      - Allow __btrfs_map_block_for_discard() to modify @length parameter
        btrfs_map_block() uses its @length paramter to notify the caller how
        many bytes are mapped in current call.
        With __btrfs_map_block_for_discard() also modifing the @length,
        btrfs_discard_extent() now understands when to do extra trim.
      
      - Call btrfs_map_block() in a loop until we hit the range end Since we
        now know how many bytes are mapped each time, we can iterate through
        each block group boundary and issue correct trim for each range.
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Tested-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f90600e
    • Qu Wenruo's avatar
      btrfs: volumes: Use more straightforward way to calculate map length · 6a0f5da2
      Qu Wenruo authored
      commit 2d974619
      
       upstream.
      
      The old code goes:
      
       	offset = logical - em->start;
      	length = min_t(u64, em->len - offset, length);
      
      Where @length calculation is dependent on offset, it can take reader
      several more seconds to find it's just the same code as:
      
       	offset = logical - em->start;
      	length = min_t(u64, em->start + em->len - logical, length);
      
      Use above code to make the length calculate independent from other
      variable, thus slightly increase the readability.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a0f5da2
    • Filipe Manana's avatar
      Btrfs: send, fix emission of invalid clone operations within the same file · 5aefd1fa
      Filipe Manana authored
      commit 9722b101 upstream.
      
      When doing an incremental send and a file has extents shared with itself
      at different file offsets, it's possible for send to emit clone operations
      that will fail at the destination because the source range goes beyond the
      file's current size. This happens when the file size has increased in the
      send snapshot, there is a hole between the shared extents and both shared
      extents are at file offsets which are greater the file's size in the
      parent snapshot.
      
      Example:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt/sdb
      
        $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
        $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
        $ btrfs send -f /tmp/1.snap /mnt/sdb/base
      
        # Create a 320K extent at file offset 512K.
        $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
        $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
        $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
        $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
        $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar
      
        # Clone part of that 320K extent into a lower file offset (192K).
        # This file offset is greater than the file's size in the parent
        # snapshot (64K). Also the clone range is a bit behind the offset of
        # the 320K extent so that we leave a hole between the shared extents.
        $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar
      
        $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
        $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt/sdc
      
        $ btrfs receive -f /tmp/1.snap /mnt/sdc
        $ btrfs receive -f /tmp/2.snap /mnt/sdc
        ERROR: failed to clone extents to foobar: Invalid argument
      
      The problem is that after processing the extent at file offset 256K, which
      refers to the first 128K of the 320K extent created by the buffered write
      operations, we have 'cur_inode_next_write_offset' set to 384K, which
      corresponds to the end offset of the partially shared extent (256K + 128K)
      and to the current file size in the receiver. Then when we process the
      extent at offset 512K, we do extent backreference iteration to figure out
      if we can clone the extent from some other inode or from the same inode,
      and we consider the extent at offset 256K of the same inode as a valid
      source for a clone operation, which is not correct because at that point
      the current file size in the receiver is 384K, which corresponds to the
      end of last processed extent (at file offset 256K), so using a clone
      source range from 256K to 256K + 320K is invalid because that goes past
      the current size of the file (384K) - this makes the receiver get an
      -EINVAL error when attempting the clone operation.
      
      So fix this by excluding clone sources that have a range that goes beyond
      the current file size in the receiver when iterating extent backreferences.
      
      A test case for fstests follows soon.
      
      Fixes: 11f2069c
      
       ("Btrfs: send, allow clone operations within the same file")
      CC: stable@vger.kernel.org # 5.5+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5aefd1fa
    • Filipe Manana's avatar
      Btrfs: send, allow clone operations within the same file · 19d84126
      Filipe Manana authored
      commit 11f2069c upstream.
      
      For send we currently skip clone operations when the source and
      destination files are the same. This is so because clone didn't support
      this case in its early days, but support for it was added back in May
      2013 by commit a96fbc72
      
       ("Btrfs: allow file data clone within a
      file"). This change adds support for it.
      
      Example:
      
        $ mkfs.btrfs -f /dev/sdd
        $ mount /dev/sdd /mnt/sdd
      
        $ xfs_io -f -c "pwrite -S 0xab -b 64K 0 64K" /mnt/sdd/foobar
        $ xfs_io -c "reflink /mnt/sdd/foobar 0 64K 64K" /mnt/sdd/foobar
      
        $ btrfs subvolume snapshot -r /mnt/sdd /mnt/sdd/snap
      
        $ mkfs.btrfs -f /dev/sde
        $ mount /dev/sde /mnt/sde
      
        $ btrfs send /mnt/sdd/snap | btrfs receive /mnt/sde
      
      Without this change file foobar at the destination has a single 128Kb
      extent:
      
        $ filefrag -v /mnt/sde/snap/foobar
        Filesystem type is: 9123683e
        File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes)
         ext:     logical_offset:        physical_offset: length:   expected: flags:
           0:        0..      31:          0..        31:     32:             last,unknown_loc,delalloc,eof
        /mnt/sde/snap/foobar: 1 extent found
      
      With this we get a single 64Kb extent that is shared at file offsets 0
      and 64K, just like in the source filesystem:
      
        $ filefrag -v /mnt/sde/snap/foobar
        Filesystem type is: 9123683e
        File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes)
         ext:     logical_offset:        physical_offset: length:   expected: flags:
           0:        0..      15:       3328..      3343:     16:             shared
           1:       16..      31:       3328..      3343:     16:       3344: last,shared,eof
        /mnt/sde/snap/foobar: 2 extents found
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19d84126
    • Dinh Nguyen's avatar
      arm64: dts: stratix10: add status to qspi dts node · f02dc39b
      Dinh Nguyen authored
      commit 263a0269 upstream.
      
      Add status = "okay" to QSPI node.
      
      Fixes: 0cb140d0
      
       ("arm64: dts: stratix10: Add QSPI support for Stratix10")
      Cc: linux-stable <stable@vger.kernel.org> # >= v5.6
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      [iwamatsu: Drop arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts]
      Signed-off-by: default avatarNobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f02dc39b
    • Jean Delvare's avatar
      i2c: i801: Exclude device from suspend direct complete optimization · e8e1d16e
      Jean Delvare authored
      commit 845b8912 upstream.
      
      By default, PCI drivers with runtime PM enabled will skip the calls
      to suspend and resume on system PM. For this driver, we don't want
      that, as we need to perform additional steps for system PM to work
      properly on all systems. So instruct the PM core to not skip these
      calls.
      
      Fixes: a9c8088c
      
       ("i2c: i801: Don't restore config registers on runtime PM")
      Reported-by: default avatarVolker Rümelin <volker.ruemelin@googlemail.com>
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      [iwamatsu: Use DPM_FLAG_NEVER_SKIP instead of DPM_FLAG_NO_DIRECT_COMPLETE]
      Signed-off-by: default avatarNobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8e1d16e
    • Tommi Rantala's avatar
      perf top: Fix stdio interface input handling with glibc 2.28+ · 2118c7ba
      Tommi Rantala authored
      commit 29b4f5f1 upstream.
      
      Since glibc 2.28 when running 'perf top --stdio', input handling no
      longer works, but hitting any key always just prints the "Mapped keys"
      help text.
      
      To fix it, call clearerr() in the display_thread() loop to clear any EOF
      sticky errors, as instructed in the glibc NEWS file
      (https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS
      
      ):
      
       * All stdio functions now treat end-of-file as a sticky condition.  If you
         read from a file until EOF, and then the file is enlarged by another
         process, you must call clearerr or another function with the same effect
         (e.g. fseek, rewind) before you can read the additional data.  This
         corrects a longstanding C99 conformance bug.  It is most likely to affect
         programs that use stdio to read interactive input from a terminal.
         (Bug #1190.)
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200305083714.9381-2-tommi.t.rantala@nokia.com
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2118c7ba
    • Tommi Rantala's avatar
      perf test session topology: Fix data path · 2499c151
      Tommi Rantala authored
      commit dbd660e6 upstream.
      
      Commit 2d4f2799 ("perf data: Add global path holder") missed path
      conversion in tests/topology.c, causing the "Session topology" testcase
      to "hang" (waits forever for input from stdin) when doing "ssh $VM perf
      test".
      
      Can be reproduced by running "cat | perf test topo", and crashed by
      replacing cat with true:
      
        $ true | perf test -v topo
        40: Session topology                                      :
        --- start ---
        test child forked, pid 3638
        templ file: /tmp/perf-test-QPvAch
        incompatible file format
        incompatible file format (rerun with -v to learn more)
        free(): invalid pointer
        test child interrupted
        ---- end ----
        Session topology: FAILED!
      
      Committer testing:
      
      Reproduced the above result before the patch and after it is back
      working:
      
        # true | perf test -v topo
        41: Session topology                                      :
        --- start ---
        test child forked, pid 19374
        templ file: /tmp/perf-test-YOTEQg
        CPU 0, core 0, socket 0
        CPU 1, core 1, socket 0
        CPU 2, core 2, socket 0
        CPU 3, core 3, socket 0
        CPU 4, core 0, socket 0
        CPU 5, core 1, socket 0
        CPU 6, core 2, socket 0
        CPU 7, core 3, socket 0
        test child finished with 0
        ---- end ----
        Session topology: Ok
        #
      
      Fixes: 2d4f2799
      
       ("perf data: Add global path holder")
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200423115341.562782-1-tommi.t.rantala@nokia.com
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2499c151
    • Tetsuo Handa's avatar
      driver core: Fix probe_count imbalance in really_probe() · 7c1847aa
      Tetsuo Handa authored
      commit b292b50b upstream.
      
      syzbot is reporting hung task in wait_for_device_probe() [1]. At least,
      we always need to decrement probe_count if we incremented probe_count in
      really_probe().
      
      However, since I can't find "Resources present before probing" message in
      the console log, both "this message simply flowed off" and "syzbot is not
      hitting this path" will be possible. Therefore, while we are at it, let's
      also prepare for concurrent wait_for_device_probe() calls by replacing
      wake_up() with wake_up_all().
      
      [1] https://syzkaller.appspot.com/bug?id=25c833f1983c9c1d512f4ff860dd0d7f5a2e2c0f
      
      Reported-by: default avatarsyzbot <syzbot+805f5f6ae37411f15b64@syzkaller.appspotmail.com>
      Fixes: 7c35e699
      
       ("driver core: Print device when resources present in really_probe()")
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: stable <stable@kernel.org>
      Link: https://lore.kernel.org/r/20200713021254.3444-1-penguin-kernel@I-love.SAKURA.ne.jp
      
      
      [iwamatsu: Drop patch for deferred_probe_timeout_work_func()]
      Signed-off-by: default avatarNobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c1847aa
    • Aaron Ma's avatar
      platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse · 3fd2647f
      Aaron Ma authored
      commit 720ef73d upstream.
      
      Evaluating ACPI _BCL could fail, then ACPI buffer size will be set to 0.
      When reuse this ACPI buffer, AE_BUFFER_OVERFLOW will be triggered.
      
      Re-initialize buffer size will make ACPI evaluate successfully.
      
      Fixes: 46445b6b
      
       ("thinkpad-acpi: fix handle locate for video and query of _BCL")
      Signed-off-by: default avatarAaron Ma <aaron.ma@canonical.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fd2647f
    • Hans de Goede's avatar
      platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting · da4cdc87
      Hans de Goede authored
      commit 8169bd3e upstream.
      
      2 recent commits:
      cfae58ed ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE
      on the 9 / "Laptop" chasis-type")
      1fac39fd ("platform/x86: intel-vbtn: Also handle tablet-mode switch on
      "Detachable" and "Portable" chassis-types")
      
      Enabled reporting of SW_TABLET_MODE on more devices since the vbtn ACPI
      interface is used by the firmware on some of those devices to report this.
      
      Testing has shown that unconditionally enabling SW_TABLET_MODE reporting
      on all devices with a chassis type of 8 ("Portable") or 10 ("Notebook")
      which support the VGBS method is a very bad idea.
      
      Many of these devices are normal laptops (non 2-in-1) models with a VGBS
      which always returns 0, which we translate to SW_TABLET_MODE=1. This in
      turn causes userspace (libinput) to suppress events from the builtin
      keyboard and touchpad, making the laptop essentially unusable.
      
      Since the problem of wrongly reporting SW_TABLET_MODE=1 in combination
      with libinput, leads to a non-usable system. Where as OTOH many people will
      not even notice when SW_TABLET_MODE is not being reported, this commit
      changes intel_vbtn_has_switches() to use a DMI based allow-list.
      
      The new DMI based allow-list matches on the 31 ("Convertible") and
      32 ("Detachable") chassis-types, as these clearly are 2-in-1s and
      so far if they support the intel-vbtn ACPI interface they all have
      properly working SW_TABLET_MODE reporting.
      
      Besides these 2 generic matches, it also contains model specific matches
      for 2-in-1 models which use a different chassis-type and which are known
      to have properly working SW_TABLET_MODE reporting.
      
      This has been tested on the following 2-in-1 devices:
      
      Dell Venue 11 Pro 7130 vPro
      HP Pavilion X2 10-p002nd
      HP Stream x360 Convertible PC 11
      Medion E1239T
      
      Fixes: cfae58ed ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type")
      BugLink: https://forum.manjaro.org/t/keyboard-and-touchpad-only-work-on-kernel-5-6/22668
      BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175599
      
      
      Cc: Barnabás Pőcze <pobrn@protonmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da4cdc87
    • Tony Ambardar's avatar
      bpf: Prevent .BTF section elimination · 6440fb9b
      Tony Ambardar authored
      commit 65c20439 upstream.
      
      Systems with memory or disk constraints often reduce the kernel footprint
      by configuring LD_DEAD_CODE_DATA_ELIMINATION. However, this can result in
      removal of any BTF information.
      
      Use the KEEP() macro to preserve the BTF data as done with other important
      sections, while still allowing for smaller kernels.
      
      Fixes: 90ceddcb
      
       ("bpf: Support llvm-objcopy for vmlinux BTF")
      Signed-off-by: default avatarTony Ambardar <Tony.Ambardar@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/a635b5d3e2da044e7b51ec1315e8910fbce0083f.1600417359.git.Tony.Ambardar@gmail.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6440fb9b
    • Tony Ambardar's avatar
      bpf: Fix sysfs export of empty BTF section · 67a57230
      Tony Ambardar authored
      commit e23bb04b upstream.
      
      If BTF data is missing or removed from the ELF section it is still exported
      via sysfs as a zero-length file:
      
        root@OpenWrt:/# ls -l /sys/kernel/btf/vmlinux
        -r--r--r--    1 root    root    0 Jul 18 02:59 /sys/kernel/btf/vmlinux
      
      Moreover, reads from this file succeed and leak kernel data:
      
        root@OpenWrt:/# hexdump -C /sys/kernel/btf/vmlinux|head -10
        000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
        *
        000cc0 00 00 00 00 00 00 00 00 00 00 00 00 80 83 b0 80 |................|
        000cd0 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
        000ce0 00 00 00 00 00 00 00 00 00 00 00 00 57 ac 6e 9d |............W.n.|
        000cf0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
        *
        002650 00 00 00 00 00 00 00 10 00 00 00 01 00 00 00 01 |................|
        002660 80 82 9a c4 80 85 97 80 81 a9 51 68 00 00 00 02 |..........Qh....|
        002670 80 25 44 dc 80 85 97 80 81 a9 50 24 81 ab c4 60 |.%D.......P$...`|
      
      This situation was first observed with kernel 5.4.x, cross-compiled for a
      MIPS target system. Fix by adding a sanity-check for export of zero-length
      data sections.
      
      Fixes: 341dfcf8
      
       ("btf: expose BTF info through sysfs")
      Signed-off-by: default avatarTony Ambardar <Tony.Ambardar@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/b38db205a66238f70823039a8c531535864eaac5.1600417359.git.Tony.Ambardar@gmail.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67a57230
    • Tom Rix's avatar
      platform/x86: thinkpad_acpi: initialize tp_nvram_state variable · 9bd694cc
      Tom Rix authored
      commit 5f38b06d upstream.
      
      clang static analysis flags this represenative problem
      thinkpad_acpi.c:2523:7: warning: Branch condition evaluates
        to a garbage value
                      if (!oldn->mute ||
                          ^~~~~~~~~~~
      
      In hotkey_kthread() mute is conditionally set by hotkey_read_nvram()
      but unconditionally checked by hotkey_compare_and_issue_event().
      So the tp_nvram_state variable s[2] needs to be initialized.
      
      Fixes: 01e88f25
      
       ("ACPI: thinkpad-acpi: add CMOS NVRAM polling for hot keys (v9)")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarmark gross <mgross@linux.intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bd694cc
    • Hans de Goede's avatar
      platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 · d101961c
      Hans de Goede authored
      commit d8233468 upstream.
      
      Commit cfae58ed ("platform/x86: intel-vbtn: Only blacklist
      SW_TABLET_MODE on the 9 / "Laptop" chasis-type") restored SW_TABLET_MODE
      reporting on the HP stream x360 11 series on which it was previously broken
      by commit de9647ef ("platform/x86: intel-vbtn: Only activate tablet
      mode switch on 2-in-1's").
      
      It turns out that enabling SW_TABLET_MODE reporting on devices with a
      chassis-type of 10 ("Notebook") causes SW_TABLET_MODE to always report 1
      at boot on the HP Pavilion 11 x360, which causes libinput to disable the
      kbd and touchpad.
      
      The HP Pavilion 11 x360's ACPI VGBS method sets bit 4 instead of bit 6 when
      NOT in tablet mode at boot. Inspecting all the DSDTs in my DSDT collection
      shows only one other model, the Medion E1239T ever setting bit 4 and it
      always sets this together with bit 6.
      
      So lets treat bit 4 as a second bit which when set indicates the device not
      being in tablet-mode, as we already do for bit 6.
      
      While at it also prefix all VGBS constant defines with "VGBS_".
      
      Fixes: cfae58ed
      
       ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type")
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarMark Gross <mgross@linux.intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d101961c
    • Dinghao Liu's avatar
      Platform: OLPC: Fix memleak in olpc_ec_probe · 22932723
      Dinghao Liu authored
      commit 4fd9ac6b upstream.
      
      When devm_regulator_register() fails, ec should be
      freed just like when olpc_ec_cmd() fails.
      
      Fixes: 231c0c21
      
       ("Platform: OLPC: Add a regulator for the DCON")
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22932723
    • Linus Torvalds's avatar
      usermodehelper: reset umask to default before executing user process · ce843291
      Linus Torvalds authored
      commit 4013c149
      
       upstream.
      
      Kernel threads intentionally do CLONE_FS in order to follow any changes
      that 'init' does to set up the root directory (or cwd).
      
      It is admittedly a bit odd, but it avoids the situation where 'init'
      does some extensive setup to initialize the system environment, and then
      we execute a usermode helper program, and it uses the original FS setup
      from boot time that may be very limited and incomplete.
      
      [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will
        follow the root regardless, since it fixes up other users of root (see
        chroot_fs_refs() for details), but overmounting root and doing a
        chroot() would not. ]
      
      However, Vegard Nossum noticed that the CLONE_FS not only means that we
      follow the root and current working directories, it also means we share
      umask with whatever init changed it to. That wasn't intentional.
      
      Just reset umask to the original default (0022) before actually starting
      the usermode helper program.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce843291
    • Greg Kurz's avatar
      vhost: Use vhost_get_used_size() in vhost_vring_set_addr() · 920a61dd
      Greg Kurz authored
      commit 71878fa4 upstream.
      
      The open-coded computation of the used size doesn't take the event
      into account when the VIRTIO_RING_F_EVENT_IDX feature is present.
      Fix that by using vhost_get_used_size().
      
      Fixes: 8ea8cf89
      
       ("vhost: support event index")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bahia.lan
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      920a61dd
    • Greg Kurz's avatar
      vhost: Don't call access_ok() when using IOTLB · 57b47abc
      Greg Kurz authored
      commit 0210a8db upstream.
      
      When the IOTLB device is enabled, the vring addresses we get
      from userspace are GIOVAs. It is thus wrong to pass them down
      to access_ok() which only takes HVAs.
      
      Access validation is done at prefetch time with IOTLB. Teach
      vq_access_ok() about that by moving the (vq->iotlb) check
      from vhost_vq_access_ok() to vq_access_ok(). This prevents
      vhost_vring_set_addr() to fail when verifying the accesses.
      No behavior change for vhost_vq_access_ok().
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084
      Fixes: 6b1e6cc7
      
       ("vhost: new device IOTLB API")
      Cc: jasowang@redhat.com
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahia.lan
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57b47abc