1. 08 Sep, 2021 1 commit
  2. 03 Sep, 2021 39 commits
    • Greg Kroah-Hartman's avatar
    • Yonghong Song's avatar
      bpf: Fix potentially incorrect results with bpf_get_local_storage() · 0c9a876f
      Yonghong Song authored
      commit a2baf4e8 upstream.
      
      Commit b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
      helper") fixed a bug for bpf_get_local_storage() helper so different tasks
      won't mess up with each other's percpu local storage.
      
      The percpu data contains 8 slots so it can hold up to 8 contexts (same or
      different tasks), for 8 different program runs, at the same time. This in
      general is sufficient. But our internal testing showed the following warning
      multiple times:
      
        [...]
        warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
           __cgroup_bpf_run_filter_sock_ops+0x13e/0x180
        RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
        <IRQ>
         tcp_call_bpf.constprop.99+0x93/0xc0
         tcp_conn_request+0x41e/0xa50
         ? tcp_rcv_state_process+0x203/0xe00
         tcp_rcv_state_process+0x203/0xe00
         ? sk_filter_trim_cap+0xbc/0x210
         ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
         tcp_v6_do_rcv+0x181/0x3e0
         tcp_v6_rcv+0xc65/0xcb0
         ip6_protocol_deliver_rcu+0xbd/0x450
         ip6_input_finish+0x11/0x20
         ip6_input+0xb5/0xc0
         ip6_sublist_rcv_finish+0x37/0x50
         ip6_sublist_rcv+0x1dc/0x270
         ipv6_list_rcv+0x113/0x140
         __netif_receive_skb_list_core+0x1a0/0x210
         netif_receive_skb_list_internal+0x186/0x2a0
         gro_normal_list.part.170+0x19/0x40
         napi_complete_done+0x65/0x150
         mlx5e_napi_poll+0x1ae/0x680
         __napi_poll+0x25/0x120
         net_rx_action+0x11e/0x280
         __do_softirq+0xbb/0x271
         irq_exit_rcu+0x97/0xa0
         common_interrupt+0x7f/0xa0
         </IRQ>
         asm_common_interrupt+0x1e/0x40
        RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
         ? __cgroup_bpf_run_filter_skb+0x378/0x4e0
         ? do_softirq+0x34/0x70
         ? ip6_finish_output2+0x266/0x590
         ? ip6_finish_output+0x66/0xa0
         ? ip6_output+0x6c/0x130
         ? ip6_xmit+0x279/0x550
         ? ip6_dst_check+0x61/0xd0
        [...]
      
      Using drgn [0] to dump the percpu buffer contents showed that on this CPU
      slot 0 is still available, but slots 1-7 are occupied and those tasks in
      slots 1-7 mostly don't exist any more. So we might have issues in
      bpf_cgroup_storage_unset().
      
      Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
      Currently, it tries to unset "current" slot with searching from the start.
      So the following sequence is possible:
      
        1. A task is running and claims slot 0
        2. Running BPF program is done, and it checked slot 0 has the "task"
           and ready to reset it to NULL (not yet).
        3. An interrupt happens, another BPF program runs and it claims slot 1
           with the *same* task.
        4. The unset() in interrupt context releases slot 0 since it matches "task".
        5. Interrupt is done, the task in process context reset slot 0.
      
      At the end, slot 1 is not reset and the same process can continue to occupy
      slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF
      program won't be able to claim an empty slot and a warning will be issued.
      
      To fix the issue, for unset() function, we should traverse from the last slot
      to the first. This way, the above issue can be avoided.
      
      The same reverse traversal should also be done in bpf_get_local_storage() helper
      itself. Otherwise, incorrect local storage may be returned to BPF program.
      
        [0] https://github.com/osandov/drgn
      
      Fixes: b910eaaa
      
       ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c9a876f
    • Richard Guy Briggs's avatar
      audit: move put_tree() to avoid trim_trees refcount underflow and UAF · 38c1915d
      Richard Guy Briggs authored
      commit 67d69e9d upstream.
      
      AUDIT_TRIM is expected to be idempotent, but multiple executions resulted
      in a refcount underflow and use-after-free.
      
      git bisect fingered commit fb041bb7	("locking/refcount: Consolidate
      implementations of refcount_t") but this patch with its more thorough
      checking that wasn't in the x86 assembly code merely exposed a previously
      existing tree refcount imbalance in the case of tree trimming code that
      was refactored with prune_one() to remove a tree introduced in
      commit 8432c700 ("audit: Simplify locking around untag_chunk()")
      
      Move the put_tree() to cover only the prune_one() case.
      
      Passes audit-testsuite and 3 passes of "auditctl -t" with at least one
      directory watch.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Seiji Nishikawa <snishika@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 8432c700
      
       ("audit: Simplify locking around untag_chunk()")
      Signed-off-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      [PM: reformatted/cleaned-up the commit description]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38c1915d
    • Peter Collingbourne's avatar
      net: don't unconditionally copy_from_user a struct ifreq for socket ioctls · 1890ee7f
      Peter Collingbourne authored
      commit d0efb162 upstream.
      
      A common implementation of isatty(3) involves calling a ioctl passing
      a dummy struct argument and checking whether the syscall failed --
      bionic and glibc use TCGETS (passing a struct termios), and musl uses
      TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
      copy sizeof(struct ifreq) bytes of data from the argument and return
      -EFAULT if that fails. The result is that the isatty implementations
      may return a non-POSIX-compliant value in errno in the case where part
      of the dummy struct argument is inaccessible, as both struct termios
      and struct winsize are smaller than struct ifreq (at least on arm64).
      
      Although there is usually enough stack space following the argument
      on the stack that this did not present a practical problem up to now,
      with MTE stack instrumentation it's more likely for the copy to fail,
      as the memory following the struct may have a different tag.
      
      Fix the problem by adding an early check for whether the ioctl is a
      valid socket ioctl, and return -ENOTTY if it isn't.
      
      Fixes: 44c02a2c ("dev_ioctl(): move copyin/copyout to callers")
      Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d
      
      Signed-off-by: default avatarPeter Collingbourne <pcc@google.com>
      Cc: <stable@vger.kernel.org> # 4.19
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1890ee7f
    • Helge Deller's avatar
      Revert "parisc: Add assembly implementations for memset, strlen, strcpy, strncpy and strcat" · 0085646e
      Helge Deller authored
      commit f6a3308d upstream.
      
      This reverts commit 83af58f8
      
      .
      
      It turns out that at least the assembly implementation for strncpy() was
      buggy.  Revert the whole commit and return back to the default coding.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v5.4+
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0085646e
    • Denis Efremov's avatar
      Revert "floppy: reintroduce O_NDELAY fix" · 17982c66
      Denis Efremov authored
      commit c7e9d002 upstream.
      
      The patch breaks userspace implementations (e.g. fdutils) and introduces
      regressions in behaviour. Previously, it was possible to O_NDELAY open a
      floppy device with no media inserted or with write protected media without
      an error. Some userspace tools use this particular behavior for probing.
      
      It's not the first time when we revert this patch. Previous revert is in
      commit f2791e7e (Revert "floppy: refactor open() flags handling").
      
      This reverts commit 8a0c014c.
      
      Link: https://lore.kernel.org/linux-block/de10cb47-34d1-5a88-7751-225ca380f735@compro.net/
      
      Reported-by: default avatarMark Hounschell <markh@compro.net>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Wim Osterholt <wim@djo.tudelft.nl>
      Cc: Kurt Garloff <kurt@garloff.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDenis Efremov <efremov@linux.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17982c66
    • Peter Zijlstra's avatar
      kthread: Fix PF_KTHREAD vs to_kthread() race · 709c162d
      Peter Zijlstra authored
      commit 3a7956e2 upstream.
      
      The kthread_is_per_cpu() construct relies on only being called on
      PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the
      following usage pattern:
      
      	if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
      
      However, as reported by syzcaller, this is broken. The scenario is:
      
      	CPU0				CPU1 (running p)
      
      	(p->flags & PF_KTHREAD) // true
      
      					begin_new_exec()
      					  me->flags &= ~(PF_KTHREAD|...);
      	kthread_is_per_cpu(p)
      	  to_kthread(p)
      	    WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT*
      
      Introduce __to_kthread() that omits the WARN and is sure to check both
      values.
      
      Use this to remove the problematic pattern for kthread_is_per_cpu()
      and fix a number of other kthread_*() functions that have similar
      issues but are currently not used in ways that would expose the
      problem.
      
      Notably kthread_func() is only ever called on 'current', while
      kthread_probe_data() is only used for PF_WQ_WORKER, which implies the
      task is from kthread_create*().
      
      Fixes: ac687e6e
      
       ("kthread: Extract KTHREAD_IS_PER_CPU")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarValentin Schneider <Valentin.Schneider@arm.com>
      Link: https://lkml.kernel.org/r/YH6WJc825C4P0FCK@hirez.programming.kicks-ass.net
      
      
      [ Drop the balance_push() hunk as it is not needed. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      709c162d
    • Qu Wenruo's avatar
      btrfs: fix NULL pointer dereference when deleting device by invalid id · c43add24
      Qu Wenruo authored
      commit e4571b8c upstream.
      
      [BUG]
      It's easy to trigger NULL pointer dereference, just by removing a
      non-existing device id:
      
       # mkfs.btrfs -f -m single -d single /dev/test/scratch1 \
      				     /dev/test/scratch2
       # mount /dev/test/scratch1 /mnt/btrfs
       # btrfs device remove 3 /mnt/btrfs
      
      Then we have the following kernel NULL pointer dereference:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 9 PID: 649 Comm: btrfs Not tainted 5.14.0-rc3-custom+ #35
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
       RIP: 0010:btrfs_rm_device+0x4de/0x6b0 [btrfs]
        btrfs_ioctl+0x18bb/0x3190 [btrfs]
        ? lock_is_held_type+0xa5/0x120
        ? find_held_lock.constprop.0+0x2b/0x80
        ? do_user_addr_fault+0x201/0x6a0
        ? lock_release+0xd2/0x2d0
        ? __x64_sys_ioctl+0x83/0xb0
        __x64_sys_ioctl+0x83/0xb0
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [CAUSE]
      Commit a27a94c2 ("btrfs: Make btrfs_find_device_by_devspec return
      btrfs_device directly") moves the "missing" device path check into
      btrfs_rm_device().
      
      But btrfs_rm_device() itself can have case where it only receives
      @devid, with NULL as @device_path.
      
      In that case, calling strcmp() on NULL will trigger the NULL pointer
      dereference.
      
      Before that commit, we handle the "missing" case inside
      btrfs_find_device_by_devspec(), which will not check @device_path at all
      if @devid is provided, thus no way to trigger the bug.
      
      [FIX]
      Before calling strcmp(), also make sure @device_path is not NULL.
      
      Fixes: a27a94c2
      
       ("btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly")
      CC: stable@vger.kernel.org # 5.4+
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c43add24
    • Petr Vorel's avatar
      arm64: dts: qcom: msm8994-angler: Fix gpio-reserved-ranges 85-88 · 1604c42a
      Petr Vorel authored
      commit f890f89d upstream.
      
      Reserve GPIO pins 85-88 as these aren't meant to be accessible from the
      application CPUs (causes reboot). Yet another fix similar to
      91345867, 5f8d3ab1, which is needed to allow angler to boot after
      3edfb7bd ("gpiolib: Show correct direction from the beginning").
      
      Fixes: feeaf56a
      
       ("arm64: dts: msm8994 SoC and Huawei Angler (Nexus 6P) support")
      Signed-off-by: default avatarPetr Vorel <petr.vorel@gmail.com>
      Reviewed-by: default avatarKonrad Dybcio <konrad.dybcio@somainline.org>
      Link: https://lore.kernel.org/r/20210415193913.1836153-1-petr.vorel@gmail.com
      
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1604c42a
    • Kees Cook's avatar
      lkdtm: Enable DOUBLE_FAULT on all architectures · f760c110
      Kees Cook authored
      commit f123c42b upstream
      
      Where feasible, I prefer to have all tests visible on all architectures,
      but to have them wired to XFAIL. DOUBLE_FAIL was set up to XFAIL, but
      wasn't actually being added to the test list.
      
      Fixes: cea23efb
      
       ("lkdtm/bugs: Make double-fault test always available")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210623203936.3151093-7-keescook@chromium.org
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [sudip: adjust context]
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f760c110
    • DENG Qingfang's avatar
      net: dsa: mt7530: fix VLAN traffic leaks again · b6c657ab
      DENG Qingfang authored
      commit 7428022b upstream.
      
      When a port leaves a VLAN-aware bridge, the current code does not clear
      other ports' matrix field bit. If the bridge is later set to VLAN-unaware
      mode, traffic in the bridge may leak to that port.
      
      Remove the VLAN filtering check in mt7530_port_bridge_leave.
      
      Fixes: 474a2dda ("net: dsa: mt7530: fix VLAN traffic leaks")
      Fixes: 83163f7d
      
       ("net: dsa: mediatek: add VLAN support for MT7530")
      Signed-off-by: default avatarDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6c657ab
    • Bjorn Andersson's avatar
      usb: typec: ucsi: Clear pending after acking connector change · f8242f55
      Bjorn Andersson authored
      commit 8c9b3caa upstream.
      
      It's possible that the interrupt handler for the UCSI driver signals a
      connector changes after the handler clears the PENDING bit, but before
      it has sent the acknowledge request. The result is that the handler is
      invoked yet again, to ack the same connector change.
      
      At least some versions of the Qualcomm UCSI firmware will not handle the
      second - "spurious" - acknowledgment gracefully. So make sure to not
      clear the pending flag until the change is acknowledged.
      
      Any connector changes coming in after the acknowledgment, that would
      have the pending flag incorrectly cleared, would afaict be covered by
      the subsequent connector status check.
      
      Fixes: 217504a0
      
       ("usb: typec: ucsi: Work around PPM losing change information")
      Cc: stable <stable@vger.kernel.org>
      Reviewed-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Acked-By: default avatarBenjamin Berg <bberg@redhat.com>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Link: https://lore.kernel.org/r/20210516040953.622409-1-bjorn.andersson@linaro.org
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8242f55
    • Benjamin Berg's avatar
      usb: typec: ucsi: Work around PPM losing change information · e15e32d5
      Benjamin Berg authored
      commit 217504a0
      
       upstream.
      
      Some/many PPMs are simply clearing the change bitfield when a
      notification on a port is acknowledge. Unfortunately, doing so means
      that any changes between the GET_CONNECTOR_STATUS and ACK_CC_CI commands
      is simply lost.
      
      Work around this by re-fetching the connector status afterwards. We can
      then infer any changes that we see have happened but that may not be
      respresented in the change bitfield.
      
      We end up with the following actions:
       1. UCSI_GET_CONNECTOR_STATUS, store result, update unprocessed_changes
       2. UCSI_GET_CAM_SUPPORTED, discard result
       3. ACK connector change
       4. UCSI_GET_CONNECTOR_STATUS, store result
       5. Infere lost changes by comparing UCSI_GET_CONNECTOR_STATUS results
       6. If PPM reported a new change, then restart in order to ACK
       7. Process everything as usual.
      
      The worker is also changed to re-schedule itself if a new change
      notification happened while it was running.
      
      Doing this fixes quite commonly occurring issues where e.g. the UCSI
      power supply would remain online even thought the ThunderBolt cable was
      unplugged.
      
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Acked-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarBenjamin Berg <bberg@redhat.com>
      Link: https://lore.kernel.org/r/20201009144047.505957-3-benjamin@sipsolutions.net
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e15e32d5
    • Benjamin Berg's avatar
      usb: typec: ucsi: acpi: Always decode connector change information · 08953884
      Benjamin Berg authored
      commit 47ea2929
      
       upstream.
      
      Normal commands may be reporting that a connector has changed. Always
      call the usci_connector_change handler and let it take care of
      scheduling the work when needed.
      Doing this makes the ACPI code path identical to the CCG one.
      
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Acked-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarBenjamin Berg <bberg@redhat.com>
      Link: https://lore.kernel.org/r/20201009144047.505957-2-benjamin@sipsolutions.net
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08953884
    • Mathieu Desnoyers's avatar
      tracepoint: Use rcu get state and cond sync for static call updates · 9a4f1dc8
      Mathieu Desnoyers authored
      commit 7b40066c upstream.
      
      State transitions from 1->0->1 and N->2->1 callbacks require RCU
      synchronization. Rather than performing the RCU synchronization every
      time the state change occurs, which is quite slow when many tracepoints
      are registered in batch, instead keep a snapshot of the RCU state on the
      most recent transitions which belong to a chain, and conditionally wait
      for a grace period on the last transition of the chain if one g.p. has
      not elapsed since the last snapshot.
      
      This applies to both RCU and SRCU.
      
      This brings the performance regression caused by commit 231264d6
      ("Fix: tracepoint: static call function vs data state mismatch") back to
      what it was originally.
      
      Before this commit:
      
        # trace-cmd start -e all
        # time trace-cmd start -p nop
      
        real	0m10.593s
        user	0m0.017s
        sys	0m0.259s
      
      After this commit:
      
        # trace-cmd start -e all
        # time trace-cmd start -p nop
      
        real	0m0.878s
        user	0m0.000s
        sys	0m0.103s
      
      Link: https://lkml.kernel.org/r/20210805192954.30688-1-mathieu.desnoyers@efficios.com
      Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Stefan Metzmacher <metze@samba.org>
      Fixes: 231264d6
      
       ("Fix: tracepoint: static call function vs data state mismatch")
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a4f1dc8
    • Paul E. McKenney's avatar
      srcu: Provide polling interfaces for Tiny SRCU grace periods · b6ae3854
      Paul E. McKenney authored
      commit 8b5bd67c upstream.
      
      There is a need for a polling interface for SRCU grace
      periods, so this commit supplies get_state_synchronize_srcu(),
      start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
      purpose.  The first can be used if future grace periods are inevitable
      (perhaps due to a later call_srcu() invocation), the second if future
      grace periods might not otherwise happen, and the third to check if a
      grace period has elapsed since the corresponding call to either of the
      first two.
      
      As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
      the return value from either get_state_synchronize_srcu() or
      start_poll_synchronize_srcu() must be passed in to a later call to
      poll_state_synchronize_srcu().
      
      Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/
      
      Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      [ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
      [ paulmck: Apply feedback from Neeraj Upadhyay. ]
      Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/
      
      Reviewed-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6ae3854
    • Paul E. McKenney's avatar
      srcu: Make Tiny SRCU use multi-bit grace-period counter · 450948b0
      Paul E. McKenney authored
      commit 74612a07 upstream.
      
      There is a need for a polling interface for SRCU grace periods.  This
      polling needs to distinguish between an SRCU instance being idle on the
      one hand or in the middle of a grace period on the other.  This commit
      therefore converts the Tiny SRCU srcu_struct structure's srcu_idx from
      a defacto boolean to a free-running counter, using the bottom bit to
      indicate that a grace period is in progress.  The second-from-bottom
      bit is thus used as the index returned by srcu_read_lock().
      
      Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/
      
      Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      [ paulmck: Fix ->srcu_lock_nesting[] indexing per Neeraj Upadhyay. ]
      Reviewed-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      450948b0
    • Paul E. McKenney's avatar
      srcu: Provide internal interface to start a Tiny SRCU grace period · 641e1d88
      Paul E. McKenney authored
      commit 1a893c71 upstream.
      
      There is a need for a polling interface for SRCU grace periods.
      This polling needs to initiate an SRCU grace period without
      having to queue (and manage) a callback.  This commit therefore
      splits the Tiny SRCU call_srcu() function into callback-queuing and
      start-grace-period portions, with the latter in a new function named
      srcu_gp_start_if_needed().
      
      Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/
      
      Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      641e1d88
    • Paul E. McKenney's avatar
      srcu: Provide polling interfaces for Tree SRCU grace periods · f789de3b
      Paul E. McKenney authored
      commit 5358c9fa upstream.
      
      There is a need for a polling interface for SRCU grace
      periods, so this commit supplies get_state_synchronize_srcu(),
      start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
      purpose.  The first can be used if future grace periods are inevitable
      (perhaps due to a later call_srcu() invocation), the second if future
      grace periods might not otherwise happen, and the third to check if a
      grace period has elapsed since the corresponding call to either of the
      first two.
      
      As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
      the return value from either get_state_synchronize_srcu() or
      start_poll_synchronize_srcu() must be passed in to a later call to
      poll_state_synchronize_srcu().
      
      Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/
      
      Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      [ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
      [ paulmck: Apply feedback from Neeraj Upadhyay. ]
      Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/
      
      Reviewed-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f789de3b
    • Paul E. McKenney's avatar
      srcu: Provide internal interface to start a Tree SRCU grace period · fdf66e5a
      Paul E. McKenney authored
      commit 29d2bb94 upstream.
      
      There is a need for a polling interface for SRCU grace periods.
      This polling needs to initiate an SRCU grace period without having
      to queue (and manage) a callback.  This commit therefore splits the
      Tree SRCU __call_srcu() function into callback-initialization and
      queuing/start-grace-period portions, with the latter in a new function
      named srcu_gp_start_if_needed().  This function may be passed a NULL
      callback pointer, in which case it will refrain from queuing anything.
      
      Why have the new function mess with queuing?  Locking considerations,
      of course!
      
      Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/
      
      Reported-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdf66e5a
    • Athira Rajeev's avatar
      powerpc/perf: Invoke per-CPU variable access with disabled interrupts · d3c38d85
      Athira Rajeev authored
      commit f66de7ac
      
       upstream.
      
      The power_pmu_event_init() callback access per-cpu variable
      (cpu_hw_events) to check for event constraints and Branch Stack
      (BHRB). Current usage is to disable preemption when accessing the
      per-cpu variable, but this does not prevent timer callback from
      interrupting event_init. Fix this by using 'local_irq_save/restore'
      to make sure the code path is invoked with disabled interrupts.
      
      This change is tested in mambo simulator to ensure that, if a timer
      interrupt comes in during the per-cpu access in event_init, it will be
      soft masked and replayed later. For testing purpose, introduced a
      udelay() in power_pmu_event_init() to make sure a timer interrupt arrives
      while in per-cpu variable access code between local_irq_save/resore.
      As expected the timer interrupt was replayed later during local_irq_restore
      called from power_pmu_event_init. This was confirmed by adding
      breakpoint in mambo and checking the backtrace when timer_interrupt
      was hit.
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1606814880-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
      
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3c38d85
    • Martin Liška's avatar
      perf annotate: Fix jump parsing for C++ code. · 77b77d45
      Martin Liška authored
      commit 1f0e6edc
      
       upstream.
      
      Considering the following testcase:
      
        int
        foo(int a, int b)
        {
           for (unsigned i = 0; i < 1000000000; i++)
             a += b;
           return a;
        }
      
        int main()
        {
           foo (3, 4);
           return 0;
        }
      
      'perf annotate' displays:
      
        86.52 │40055e: → ja   40056c <foo(int, int)+0x26>
        13.37 │400560:   mov  -0x18(%rbp),%eax
              │400563:   add  %eax,-0x14(%rbp)
              │400566:   addl $0x1,-0x4(%rbp)
         0.11 │40056a: → jmp  400557 <foo(int, int)+0x11>
              │40056c:   mov  -0x14(%rbp),%eax
              │40056f:   pop  %rbp
      
      and the 'ja 40056c' does not link to the location in the function.  It's
      caused by fact that comma is wrongly parsed, it's part of function
      signature.
      
      With my patch I see:
      
        86.52 │   ┌──ja   26
        13.37 │   │  mov  -0x18(%rbp),%eax
              │   │  add  %eax,-0x14(%rbp)
              │   │  addl $0x1,-0x4(%rbp)
         0.11 │   │↑ jmp  11
              │26:└─→mov  -0x14(%rbp),%eax
      
      and 'o' output prints:
      
        86.52 │4005┌── ↓ ja   40056c <foo(int, int)+0x26>
        13.37 │4005│0:   mov  -0x18(%rbp),%eax
              │4005│3:   add  %eax,-0x14(%rbp)
              │4005│6:   addl $0x1,-0x4(%rbp)
         0.11 │4005│a: ↑ jmp  400557 <foo(int, int)+0x11>
              │4005└─→   mov  -0x14(%rbp),%eax
      
      On the contrary, compiling the very same file with gcc -x c, the parsing
      is fine because function arguments are not displayed:
      
        jmp  400543 <foo+0x1d>
      
      Committer testing:
      
      Before:
      
        $ cat cpp_args_annotate.c
        int
        foo(int a, int b)
        {
           for (unsigned i = 0; i < 1000000000; i++)
             a += b;
           return a;
        }
      
        int main()
        {
           foo (3, 4);
           return 0;
        }
        $ gcc --version |& head -1
        gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
        $ gcc -g cpp_args_annotate.c -o cpp_args_annotate
        $ perf record ./cpp_args_annotate
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.275 MB perf.data (7188 samples) ]
        $ perf annotate --stdio2 foo
        Samples: 7K of event 'cycles:u', 4000 Hz, Event count (approx.): 7468429289, [percent: local period]
        foo() /home/acme/c/cpp_args_annotate
        Percent
                    0000000000401106 <foo>:
                    foo():
                    int
                    foo(int a, int b)
                    {
                      push %rbp
                      mov  %rsp,%rbp
                      mov  %edi,-0x14(%rbp)
                      mov  %esi,-0x18(%rbp)
                    for (unsigned i = 0; i < 1000000000; i++)
                      movl $0x0,-0x4(%rbp)
                    ↓ jmp  1d
                    a += b;
         13.45  13:   mov  -0x18(%rbp),%eax
                      add  %eax,-0x14(%rbp)
                    for (unsigned i = 0; i < 1000000000; i++)
                      addl $0x1,-0x4(%rbp)
          0.09  1d:   cmpl $0x3b9ac9ff,-0x4(%rbp)
         86.46      ↑ jbe  13
                    return a;
                      mov  -0x14(%rbp),%eax
                    }
                      pop  %rbp
                    ← retq
        $
      
      I.e. works for C, now lets switch to C++:
      
        $ g++ -g cpp_args_annotate.c -o cpp_args_annotate
        $ perf record ./cpp_args_annotate
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.268 MB perf.data (6976 samples) ]
        $ perf annotate --stdio2 foo
        Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
        foo() /home/acme/c/cpp_args_annotate
        Percent
                    0000000000401106 <foo(int, int)>:
                    foo(int, int):
                    int
                    foo(int a, int b)
                    {
                      push %rbp
                      mov  %rsp,%rbp
                      mov  %edi,-0x14(%rbp)
                      mov  %esi,-0x18(%rbp)
                    for (unsigned i = 0; i < 1000000000; i++)
                      movl $0x0,-0x4(%rbp)
                      cmpl $0x3b9ac9ff,-0x4(%rbp)
         86.53      → ja   40112c <foo(int, int)+0x26>
                    a += b;
         13.32        mov  -0x18(%rbp),%eax
          0.00        add  %eax,-0x14(%rbp)
                    for (unsigned i = 0; i < 1000000000; i++)
                      addl $0x1,-0x4(%rbp)
          0.15      → jmp  401117 <foo(int, int)+0x11>
                    return a;
                      mov  -0x14(%rbp),%eax
                    }
                      pop  %rbp
                    ← retq
        $
      
      Reproduced.
      
      Now with this patch:
      
      Reusing the C++ built binary, as we can see here:
      
        $ readelf -wi cpp_args_annotate | grep producer
          <c>   DW_AT_producer    : (indirect string, offset: 0x2e): GNU C++14 10.2.1 20201125 (Red Hat 10.2.1-9) -mtune=generic -march=x86-64 -g
        $
      
      And furthermore:
      
        $ file cpp_args_annotate
        cpp_args_annotate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4fe3cab260204765605ec630d0dc7a7e93c361a9, for GNU/Linux 3.2.0, with debug_info, not stripped
        $ perf buildid-list -i cpp_args_annotate
        4fe3cab260204765605ec630d0dc7a7e93c361a9
        $ perf buildid-list | grep cpp_args_annotate
        4fe3cab260204765605ec630d0dc7a7e93c361a9 /home/acme/c/cpp_args_annotate
        $
      
      It now works:
      
        $ perf annotate --stdio2 foo
        Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
        foo() /home/acme/c/cpp_args_annotate
        Percent
                    0000000000401106 <foo(int, int)>:
                    foo(int, int):
                    int
                    foo(int a, int b)
                    {
                      push %rbp
                      mov  %rsp,%rbp
                      mov  %edi,-0x14(%rbp)
                      mov  %esi,-0x18(%rbp)
                    for (unsigned i = 0; i < 1000000000; i++)
                      movl $0x0,-0x4(%rbp)
                11:   cmpl $0x3b9ac9ff,-0x4(%rbp)
         86.53      ↓ ja   26
                    a += b;
         13.32        mov  -0x18(%rbp),%eax
          0.00        add  %eax,-0x14(%rbp)
                    for (unsigned i = 0; i < 1000000000; i++)
                      addl $0x1,-0x4(%rbp)
          0.15      ↑ jmp  11
                    return a;
                26:   mov  -0x14(%rbp),%eax
                    }
                      pop  %rbp
                    ← retq
        $
      Signed-off-by: default avatarMartin Liška <mliska@suse.cz>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Link: http://lore.kernel.org/lkml/13e1a405-edf9-e4c2-4327-a9b454353730@suse.cz
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77b77d45
    • Jianlin Lv's avatar
      perf tools: Fix arm64 build error with gcc-11 · 9f9e40dd
      Jianlin Lv authored
      commit 06701297
      
       upstream.
      
      gcc version: 11.0.0 20210208 (experimental) (GCC)
      
      Following build error on arm64:
      
      .......
      In function ‘printf’,
          inlined from ‘regs_dump__printf’ at util/session.c:1141:3,
          inlined from ‘regs__printf’ at util/session.c:1169:2:
      /usr/include/aarch64-linux-gnu/bits/stdio2.h:107:10: \
        error: ‘%-5s’ directive argument is null [-Werror=format-overflow=]
      
      107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, \
                      __va_arg_pack ());
      
      ......
      In function ‘fprintf’,
        inlined from ‘perf_sample__fprintf_regs.isra’ at \
          builtin-script.c:622:14:
      /usr/include/aarch64-linux-gnu/bits/stdio2.h:100:10: \
          error: ‘%5s’ directive argument is null [-Werror=format-overflow=]
        100 |   return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt,
        101 |                         __va_arg_pack ());
      
      cc1: all warnings being treated as errors
      .......
      
      This patch fixes Wformat-overflow warnings. Add helper function to
      convert NULL to "unknown".
      Signed-off-by: default avatarJianlin Lv <Jianlin.Lv@arm.com>
      Reviewed-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: iecedge@gmail.com
      Cc: linux-csky@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210218031245.2078492-1-Jianlin.Lv@arm.com
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f9e40dd
    • Namhyung Kim's avatar
      perf record: Fix memory leak in vDSO found using ASAN · 94687c49
      Namhyung Kim authored
      commit 41d58541
      
       upstream.
      
      I got several memory leak reports from Asan with a simple command.  It
      was because VDSO is not released due to the refcount.  Like in
      __dsos_addnew_id(), it should put the refcount after adding to the list.
      
        $ perf record true
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]
      
        =================================================================
        ==692599==ERROR: LeakSanitizer: detected memory leaks
      
        Direct leak of 439 byte(s) in 1 object(s) allocated from:
          #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
          #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
          #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
          #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
          #4 0x559bce50826c in map__new util/map.c:175
          #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
          #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
          #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
          #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
          #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
          #10 0x559bce519bea in __perf_session__process_events util/session.c:2268
          #11 0x559bce519bea in perf_session__process_events util/session.c:2297
          #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
          #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
          #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
          #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
          #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
          #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
          #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
          #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
          #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308
      
        Indirect leak of 32 byte(s) in 1 object(s) allocated from:
          #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
          #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
          #2 0x559bce50821b in map__new util/map.c:168
          #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
          #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
          #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
          #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
          #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
          #8 0x559bce519bea in __perf_session__process_events util/session.c:2268
          #9 0x559bce519bea in perf_session__process_events util/session.c:2297
          #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
          #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
          #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
          #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
          #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
          #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
          #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
          #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
          #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308
      
        SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94687c49
    • Riccardo Mancini's avatar
      perf symbol-elf: Fix memory leak by freeing sdt_note.args · e0ca6703
      Riccardo Mancini authored
      commit 69c9ffed
      
       upstream.
      
      Reported by ASan.
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Fabian Hemmer <copy@copy.sh>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Link: http://lore.kernel.org/lkml/20210602220833.285226-1-rickyman7@gmail.com
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0ca6703
    • Riccardo Mancini's avatar
      perf env: Fix memory leak of bpf_prog_info_linear member · 0d8e39bb
      Riccardo Mancini authored
      commit 67069a1f
      
       upstream.
      
      ASan reported a memory leak caused by info_linear not being deallocated.
      
      The info_linear was allocated during in perf_event__synthesize_one_bpf_prog().
      
      This patch adds the corresponding free() when bpf_prog_info_node
      is freed in perf_env__purge_bpf().
      
        $ sudo ./perf record -- sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]
      
        =================================================================
        ==297735==ERROR: LeakSanitizer: detected memory leaks
      
        Direct leak of 7688 byte(s) in 19 object(s) allocated from:
            #0 0x4f420f in malloc (/home/user/linux/tools/perf/perf+0x4f420f)
            #1 0xc06a74 in bpf_program__get_prog_info_linear /home/user/linux/tools/lib/bpf/libbpf.c:11113:16
            #2 0xb426fe in perf_event__synthesize_one_bpf_prog /home/user/linux/tools/perf/util/bpf-event.c:191:16
            #3 0xb42008 in perf_event__synthesize_bpf_events /home/user/linux/tools/perf/util/bpf-event.c:410:9
            #4 0x594596 in record__synthesize /home/user/linux/tools/perf/builtin-record.c:1490:8
            #5 0x58c9ac in __cmd_record /home/user/linux/tools/perf/builtin-record.c:1798:8
            #6 0x58990b in cmd_record /home/user/linux/tools/perf/builtin-record.c:2901:8
            #7 0x7b2a20 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
            #8 0x7b12ff in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
            #9 0x7b2583 in run_argv /home/user/linux/tools/perf/perf.c:409:2
            #10 0x7b0d79 in main /home/user/linux/tools/perf/perf.c:539:3
            #11 0x7fa357ef6b74 in __libc_start_main /usr/src/debug/glibc-2.33-8.fc34.x86_64/csu/../csu/libc-start.c:332:16
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lore.kernel.org/lkml/20210602224024.300485-1-rickyman7@gmail.com
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d8e39bb
    • Guo Ren's avatar
      riscv: Fixup patch_text panic in ftrace · 133d7f93
      Guo Ren authored
      commit 5ad84adf upstream.
      
      Just like arm64, we can't trace the function in the patch_text path.
      
      Here is the bug log:
      
      [   45.234334] Unable to handle kernel paging request at virtual address ffffffd38ae80900
      [   45.242313] Oops [#1]
      [   45.244600] Modules linked in:
      [   45.247678] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-00025-g9b7db83-dirty #215
      [   45.255797] epc: ffffffe00021689a ra : ffffffe00021718e sp : ffffffe01afabb58
      [   45.262955]  gp : ffffffe00136afa0 tp : ffffffe01af94d00 t0 : 0000000000000002
      [   45.270200]  t1 : 0000000000000000 t2 : 0000000000000001 s0 : ffffffe01afabc08
      [   45.277443]  s1 : ffffffe0013718a8 a0 : 0000000000000000 a1 : ffffffe01afabba8
      [   45.284686]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : c4c16ad38ae80900
      [   45.291929]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
      [   45.299173]  s2 : 0000000000000001 s3 : ffffffe000206a60 s4 : ffffffe000206a60
      [   45.306415]  s5 : 00000000000009ec s6 : ffffffe0013718a8 s7 : c4c16ad38ae80900
      [   45.313658]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001
      [   45.320902]  s11: 0000000000000003 t3 : 0000000000000001 t4 : ffffffffd192fe79
      [   45.328144]  t5 : ffffffffb8f80000 t6 : 0000000000040000
      [   45.333472] status: 0000000200000100 badaddr: ffffffd38ae80900 cause: 000000000000000f
      [   45.341514] ---[ end trace d95102172248fdcf ]---
      [   45.346176] note: migration/0[11] exited with preempt_count 1
      
      (gdb) x /2i $pc
      => 0xffffffe00021689a <__do_proc_dointvec+196>: sd      zero,0(s7)
         0xffffffe00021689e <__do_proc_dointvec+200>: li      s11,0
      
      (gdb) bt
      0  __do_proc_dointvec (tbl_data=0x0, table=0xffffffe01afabba8,
      write=0, buffer=0x0, lenp=0x7bf897061f9a0800, ppos=0x4, conv=0x0,
      data=0x52464e43) at kernel/sysctl.c:581
      1  0xffffffe00021718e in do_proc_dointvec (data=<optimized out>,
      conv=<optimized out>, ppos=<optimized out>, lenp=<optimized out>,
      buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
      at kernel/sysctl.c:964
      2  proc_dointvec_minmax (ppos=<optimized out>, lenp=<optimized out>,
      buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
      at kernel/sysctl.c:964
      3  proc_do_static_key (table=<optimized out>, write=1, buffer=0x0,
      lenp=0x0, ppos=0x7bf897061f9a0800) at kernel/sysctl.c:1643
      4  0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
      addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
      5  0xffffffe0002c9c04 in __ftrace_replace_code
      (rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
      6  0xffffffe0002ca0b2 in ftrace_replace_code (mod_flags=<optimized
      out>) at kernel/trace/ftrace.c:2530
      7  0xffffffe0002ca26a in ftrace_modify_all_code (command=5) at
      kernel/trace/ftrace.c:2677
      8  0xffffffe0002ca30e in __ftrace_modify_code (data=<optimized out>)
      at kernel/trace/ftrace.c:2703
      9  0xffffffe0002c13b0 in multi_cpu_stop (data=0x0) at kernel/stop_machine.c:224
      10 0xffffffe0002c0fde in cpu_stopper_thread (cpu=<optimized out>) at
      kernel/stop_machine.c:491
      11 0xffffffe0002343de in smpboot_thread_fn (data=0x0) at kernel/smpboot.c:165
      12 0xffffffe00022f8b4 in kthread (_create=0xffffffe01af0c040) at
      kernel/kthread.c:292
      13 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
      
         0xffffffe00020678a <+114>:   auipc   ra,0xffffe
         0xffffffe00020678e <+118>:   jalr    -118(ra) # 0xffffffe000204714 <patch_text_nosync>
         0xffffffe000206792 <+122>:   snez    a0,a0
      
      (gdb) disassemble patch_text_nosync
      Dump of assembler code for function patch_text_nosync:
         0xffffffe000204714 <+0>:     addi    sp,sp,-32
         0xffffffe000204716 <+2>:     sd      s0,16(sp)
         0xffffffe000204718 <+4>:     sd      ra,24(sp)
         0xffffffe00020471a <+6>:     addi    s0,sp,32
         0xffffffe00020471c <+8>:     auipc   ra,0x0
         0xffffffe000204720 <+12>:    jalr    -384(ra) # 0xffffffe00020459c <patch_insn_write>
         0xffffffe000204724 <+16>:    beqz    a0,0xffffffe00020472e <patch_text_nosync+26>
         0xffffffe000204726 <+18>:    ld      ra,24(sp)
         0xffffffe000204728 <+20>:    ld      s0,16(sp)
         0xffffffe00020472a <+22>:    addi    sp,sp,32
         0xffffffe00020472c <+24>:    ret
         0xffffffe00020472e <+26>:    sd      a0,-24(s0)
         0xffffffe000204732 <+30>:    auipc   ra,0x4
         0xffffffe000204736 <+34>:    jalr    -1464(ra) # 0xffffffe00020817a <flush_icache_all>
         0xffffffe00020473a <+38>:    ld      a0,-24(s0)
         0xffffffe00020473e <+42>:    ld      ra,24(sp)
         0xffffffe000204740 <+44>:    ld      s0,16(sp)
         0xffffffe000204742 <+46>:    addi    sp,sp,32
         0xffffffe000204744 <+48>:    ret
      
      (gdb) disassemble flush_icache_all-4
      Dump of assembler code for function flush_icache_all:
         0xffffffe00020817a <+0>:     addi    sp,sp,-8
         0xffffffe00020817c <+2>:     sd      ra,0(sp)
         0xffffffe00020817e <+4>:     auipc   ra,0xfffff
         0xffffffe000208182 <+8>:     jalr    -1822(ra) # 0xffffffe000206a60 <ftrace_caller>
         0xffffffe000208186 <+12>:    ld      ra,0(sp)
         0xffffffe000208188 <+14>:    addi    sp,sp,8
         0xffffffe00020818a <+0>:     addi    sp,sp,-16
         0xffffffe00020818c <+2>:     sd      s0,0(sp)
         0xffffffe00020818e <+4>:     sd      ra,8(sp)
         0xffffffe000208190 <+6>:     addi    s0,sp,16
         0xffffffe000208192 <+8>:     li      a0,0
         0xffffffe000208194 <+10>:    auipc   ra,0xfffff
         0xffffffe000208198 <+14>:    jalr    -410(ra) # 0xffffffe000206ffa <sbi_remote_fence_i>
         0xffffffe00020819c <+18>:    ld      s0,0(sp)
         0xffffffe00020819e <+20>:    ld      ra,8(sp)
         0xffffffe0002081a0 <+22>:    addi    sp,sp,16
         0xffffffe0002081a2 <+24>:    ret
      
      (gdb) frame 5
      (rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
      2503                    return ftrace_make_call(rec, ftrace_addr);
      (gdb) p /x rec->ip
      $2 = 0xffffffe00020817a -> flush_icache_all !
      
      When we modified flush_icache_all's patchable-entry with ftrace_caller:
       - Insert ftrace_caller at flush_icache_all prologue.
       - Call flush_icache_all to sync I/Dcache, but flush_icache_all is
      just we modified by half.
      
      Link: https://lore.kernel.org/linux-riscv/CAJF2gTT=oDWesWe0JVWvTpGi60-gpbNhYLdFWN_5EbyeqoEDdw@mail.gmail.com/T/#t
      
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Reviewed-by: default avatarAtish Patra <atish.patra@wdc.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: default avatarDimitri John Ledkov <dimitri.ledkov@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      133d7f93
    • Guo Ren's avatar
      riscv: Fixup wrong ftrace remove cflag · 7e208724
      Guo Ren authored
      commit 67d94577
      
       upstream.
      
      We must use $(CC_FLAGS_FTRACE) instead of directly using -pg. It
      will cause -fpatchable-function-entry error.
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: default avatarDimitri John Ledkov <dimitri.ledkov@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e208724
    • Pauli Virtanen's avatar
      Bluetooth: btusb: check conditions before enabling USB ALT 3 for WBS · b42fde92
      Pauli Virtanen authored
      commit 55981d35 upstream.
      
      Some USB BT adapters don't satisfy the MTU requirement mentioned in
      commit e848dbd3
      
       ("Bluetooth: btusb: Add support USB ALT 3 for WBS")
      and have ALT 3 setting that produces no/garbled audio. Some adapters
      with larger MTU were also reported to have problems with ALT 3.
      
      Add a flag and check it and MTU before selecting ALT 3, falling back to
      ALT 1. Enable the flag for Realtek, restoring the previous behavior for
      non-Realtek devices.
      
      Tested with USB adapters (mtu<72, no/garbled sound with ALT3, ALT1
      works) BCM20702A1 0b05:17cb, CSR8510A10 0a12:0001, and (mtu>=72, ALT3
      works) RTL8761BU 0bda:8771, Intel AX200 8087:0029 (after disabling
      ALT6). Also got reports for (mtu>=72, ALT 3 reported to produce bad
      audio) Intel 8087:0a2b.
      Signed-off-by: default avatarPauli Virtanen <pav@iki.fi>
      Fixes: e848dbd3
      
       ("Bluetooth: btusb: Add support USB ALT 3 for WBS")
      Tested-by: default avatarMichał Kępień <kernel@kempniu.pl>
      Tested-by: default avatarJonathan Lampérth <jon@h4n.dev>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b42fde92
    • Linus Torvalds's avatar
      vt_kdsetmode: extend console locking · 60d69cb4
      Linus Torvalds authored
      commit 2287a51b
      
       upstream.
      
      As per the long-suffering comment.
      Reported-by: default avatarMinh Yuan <yuanmingbuaa@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60d69cb4
    • Xin Long's avatar
      tipc: call tipc_wait_for_connect only when dlen is not 0 · 0a178a01
      Xin Long authored
      commit 7387a72c
      
       upstream.
      
      __tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg()
      or tipc_connect(). The difference is in tipc_connect(), it will call
      tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting
      is done. So there's no need to wait in __tipc_sendmsg() for this case.
      
      This patch is to fix it by calling tipc_wait_for_connect() only when dlen
      is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect().
      
      Note this also fixes the failure in tipcutils/test/ptts/:
      
        # ./tipcTS &
        # ./tipcTC 9
        (hang)
      
      Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+")
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a178a01
    • Frieder Schrempf's avatar
      mtd: spinand: Fix incorrect parameters for on-die ECC · ded6da21
      Frieder Schrempf authored
      The new generic NAND ECC framework stores the configuration and
      requirements in separate places since commit 93ef92f6
      
       ("mtd: nand: Use
      the new generic ECC object"). In 5.10.x The SPI NAND layer still uses only
      the requirements to track the ECC properties. This mismatch leads to
      values of zero being used for ECC strength and step_size in the SPI NAND
      layer wherever nanddev_get_ecc_conf() is used and therefore breaks the SPI
      NAND on-die ECC support in 5.10.x.
      
      By using nanddev_get_ecc_requirements() instead of nanddev_get_ecc_conf()
      for SPI NAND, we make sure that the correct parameters for the detected
      chip are used. In later versions (5.11.x) this is fixed anyway with the
      implementation of the SPI NAND on-die ECC engine.
      
      Cc: stable@vger.kernel.org # 5.10.x
      Reported-by: default avatarvoice INTER connect GmbH <developer@voiceinterconnect.de>
      Signed-off-by: default avatarFrieder Schrempf <frieder.schrempf@kontron.de>
      Acked-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ded6da21
    • Linus Torvalds's avatar
      pipe: do FASYNC notifications for every pipe IO, not just state changes · 3b2018f9
      Linus Torvalds authored
      commit fe67f4dd upstream.
      
      It turns out that the SIGIO/FASYNC situation is almost exactly the same
      as the EPOLLET case was: user space really wants to be notified after
      every operation.
      
      Now, in a perfect world it should be sufficient to only notify user
      space on "state transitions" when the IO state changes (ie when a pipe
      goes from unreadable to readable, or from unwritable to writable).  User
      space should then do as much as possible - fully emptying the buffer or
      what not - and we'll notify it again the next time the state changes.
      
      But as with EPOLLET, we have at least one case (stress-ng) where the
      kernel sent SIGIO due to the pipe being marked for asynchronous
      notification, but the user space signal handler then didn't actually
      necessarily read it all before returning (it read more than what was
      written, but since there could be multiple writes, it could leave data
      pending).
      
      The user space code then expected to get another SIGIO for subsequent
      writes - even though the pipe had been readable the whole time - and
      would only then read more.
      
      This is arguably a user space bug - and Colin King already fixed the
      stress-ng code in question - but the kernel regression rules are clear:
      it doesn't matter if kernel people think that user space did something
      silly and wrong.  What matters is that it used to work.
      
      So if user space depends on specific historical kernel behavior, it's a
      regression when that behavior changes.  It's on us: we were silly to
      have that non-optimal historical behavior, and our old kernel behavior
      was what user space was tested against.
      
      Because of how the FASYNC notification was tied to wakeup behavior, this
      was first broken by commits f467a6a6 and 1b6b26ae ("pipe: fix
      and clarify pipe read/write wakeup logic"), but at the time it seems
      nobody noticed.  Probably because the stress-ng problem case ends up
      being timing-dependent too.
      
      It was then unwittingly fixed by commit 3a34b13a ("pipe: make pipe
      writes always wake up readers") only to be broken again when by commit
      3b844826 ("pipe: avoid unnecessary EPOLLET wakeups under normal
      loads").
      
      And at that point the kernel test robot noticed the performance
      refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
      below is somewhat ad hoc, but it matches when the issue was noticed.
      
      Fix it for good (knock wood) by simply making the kill_fasync() case
      separate from the wakeup case.  FASYNC is quite rare, and we clearly
      shouldn't even try to use the "avoid unnecessary wakeups" logic for it.
      
      Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
      Fixes: 3b844826
      
       ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarOliver Sang <oliver.sang@intel.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b2018f9
    • Linus Torvalds's avatar
      pipe: avoid unnecessary EPOLLET wakeups under normal loads · e91da23c
      Linus Torvalds authored
      commit 3b844826 upstream.
      
      I had forgotten just how sensitive hackbench is to extra pipe wakeups,
      and commit 3a34b13a ("pipe: make pipe writes always wake up
      readers") ended up causing a quite noticeable regression on larger
      machines.
      
      Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
      not clear that this matters in real life all that much, but as Mel
      points out, it's used often enough when comparing kernels and so the
      performance regression shows up like a sore thumb.
      
      It's easy enough to fix at least for the common cases where pipes are
      used purely for data transfer, and you never have any exciting poll
      usage at all.  So set a special 'poll_usage' flag when there is polling
      activity, and make the ugly "EPOLLET has crazy legacy expectations"
      semantics explicit to only that case.
      
      I would love to limit it to just the broken EPOLLET case, but the pipe
      code can't see the difference between epoll and regular select/poll, so
      any non-read/write waiting will trigger the extra wakeup behavior.  That
      is sufficient for at least the hackbench case.
      
      Apart from making the odd extra wakeup cases more explicitly about
      EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
      write, not at the first write chunk.  That is actually much saner
      semantics (as much as you can call any of the legacy edge-triggered
      expectations for EPOLLET "sane") since it means that you know the wakeup
      will happen once the write is done, rather than possibly in the middle
      of one.
      
      [ For stable people: I'm putting a "Fixes" tag on this, but I leave it
        up to you to decide whether you actually want to backport it or not.
        It likely has no impact outside of synthetic benchmarks  - Linus ]
      
      Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
      Fixes: 3a34b13a
      
       ("pipe: make pipe writes always wake up readers")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarSandeep Patil <sspatil@android.com>
      Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e91da23c
    • Filipe Manana's avatar
      btrfs: fix race between marking inode needs to be logged and log syncing · d845f89d
      Filipe Manana authored
      commit bc0939fc
      
       upstream.
      
      We have a race between marking that an inode needs to be logged, either
      at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between
      btrfs_sync_log(). The following steps describe how the race happens.
      
      1) We are at transaction N;
      
      2) Inode I was previously fsynced in the current transaction so it has:
      
          inode->logged_trans set to N;
      
      3) The inode's root currently has:
      
         root->log_transid set to 1
         root->last_log_commit set to 0
      
         Which means only one log transaction was committed to far, log
         transaction 0. When a log tree is created we set ->log_transid and
         ->last_log_commit of its parent root to 0 (at btrfs_add_log_tree());
      
      4) One more range of pages is dirtied in inode I;
      
      5) Some task A starts an fsync against some other inode J (same root), and
         so it joins log transaction 1.
      
         Before task A calls btrfs_sync_log()...
      
      6) Task B starts an fsync against inode I, which currently has the full
         sync flag set, so it starts delalloc and waits for the ordered extent
         to complete before calling btrfs_inode_in_log() at btrfs_sync_file();
      
      7) During ordered extent completion we have btrfs_update_inode() called
         against inode I, which in turn calls btrfs_set_inode_last_trans(),
         which does the following:
      
           spin_lock(&inode->lock);
           inode->last_trans = trans->transaction->transid;
           inode->last_sub_trans = inode->root->log_transid;
           inode->last_log_commit = inode->root->last_log_commit;
           spin_unlock(&inode->lock);
      
         So ->last_trans is set to N and ->last_sub_trans set to 1.
         But before setting ->last_log_commit...
      
      8) Task A is at btrfs_sync_log():
      
         - it increments root->log_transid to 2
         - starts writeback for all log tree extent buffers
         - waits for the writeback to complete
         - writes the super blocks
         - updates root->last_log_commit to 1
      
         It's a lot of slow steps between updating root->log_transid and
         root->last_log_commit;
      
      9) The task doing the ordered extent completion, currently at
         btrfs_set_inode_last_trans(), then finally runs:
      
           inode->last_log_commit = inode->root->last_log_commit;
           spin_unlock(&inode->lock);
      
         Which results in inode->last_log_commit being set to 1.
         The ordered extent completes;
      
      10) Task B is resumed, and it calls btrfs_inode_in_log() which returns
          true because we have all the following conditions met:
      
          inode->logged_trans == N which matches fs_info->generation &&
          inode->last_subtrans (1) <= inode->last_log_commit (1) &&
          inode->last_subtrans (1) <= root->last_log_commit (1) &&
          list inode->extent_tree.modified_extents is empty
      
          And as a consequence we return without logging the inode, so the
          existing logged version of the inode does not point to the extent
          that was written after the previous fsync.
      
      It should be impossible in practice for one task be able to do so much
      progress in btrfs_sync_log() while another task is at
      btrfs_set_inode_last_trans() right after it reads root->log_transid and
      before it reads root->last_log_commit. Even if kernel preemption is enabled
      we know the task at btrfs_set_inode_last_trans() can not be preempted
      because it is holding the inode's spinlock.
      
      However there is another place where we do the same without holding the
      spinlock, which is in the memory mapped write path at:
      
        vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
        {
           (...)
           BTRFS_I(inode)->last_trans = fs_info->generation;
           BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
           BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
           (...)
      
      So with preemption happening after setting ->last_sub_trans and before
      setting ->last_log_commit, it is less of a stretch to have another task
      do enough progress at btrfs_sync_log() such that the task doing the memory
      mapped write ends up with ->last_sub_trans and ->last_log_commit set to
      the same value. It is still a big stretch to get there, as the task doing
      btrfs_sync_log() has to start writeback, wait for its completion and write
      the super blocks.
      
      So fix this in two different ways:
      
      1) For btrfs_set_inode_last_trans(), simply set ->last_log_commit to the
         value of ->last_sub_trans minus 1;
      
      2) For btrfs_page_mkwrite() only set the inode's ->last_sub_trans, just
         like we do for buffered and direct writes at btrfs_file_write_iter(),
         which is all we need to make sure multiple writes and fsyncs to an
         inode in the same transaction never result in an fsync missing that
         the inode changed and needs to be logged. Turn this into a helper
         function and use it both at btrfs_page_mkwrite() and at
         btrfs_file_write_iter() - this also fixes the problem that at
         btrfs_page_mkwrite() we were setting those fields without the
         protection of the inode's spinlock.
      
      This is an extremely unlikely race to happen in practice.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d845f89d
    • Gerd Rausch's avatar
      net/rds: dma_map_sg is entitled to merge entries · 6f38d95f
      Gerd Rausch authored
      [ Upstream commit fb4b1373 ]
      
      Function "dma_map_sg" is entitled to merge adjacent entries
      and return a value smaller than what was passed as "nents".
      
      Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len")
      rather than the original "nents" parameter ("sg_len").
      
      This old RDS bug was exposed and reliably causes kernel panics
      (using RDMA operations "rds-stress -D") on x86_64 starting with:
      commit c588072b
      
       ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
      
      Simply put: Linux 5.11 and later.
      Signed-off-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f38d95f
    • Ben Skeggs's avatar
      drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences · b882dda2
      Ben Skeggs authored
      [ Upstream commit e78b1b54
      
       ]
      
      Should fix some initial modeset failures on (at least) Ampere boards.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b882dda2
    • Ben Skeggs's avatar
      drm/nouveau/disp: power down unused DP links during init · 7f422cda
      Ben Skeggs authored
      [ Upstream commit 6eaa1f3c
      
       ]
      
      When booted with multiple displays attached, the EFI GOP driver on (at
      least) Ampere, can leave DP links powered up that aren't being used to
      display anything.  This confuses our tracking of SOR routing, with the
      likely result being a failed modeset and display engine hang.
      
      Fix this by (ab?)using the DisableLT IED script to power-down the link,
      restoring HW to a state the driver expects.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7f422cda
    • Mark Yacoub's avatar
      drm: Copy drm_wait_vblank to user before returning · 6fd6e205
      Mark Yacoub authored
      [ Upstream commit fa0b1ef5
      
       ]
      
      [Why]
      Userspace should get back a copy of drm_wait_vblank that's been modified
      even when drm_wait_vblank_ioctl returns a failure.
      
      Rationale:
      drm_wait_vblank_ioctl modifies the request and expects the user to read
      it back. When the type is RELATIVE, it modifies it to ABSOLUTE and updates
      the sequence to become current_vblank_count + sequence (which was
      RELATIVE), but now it became ABSOLUTE.
      drmWaitVBlank (in libdrm) expects this to be the case as it modifies
      the request to be Absolute so it expects the sequence to would have been
      updated.
      
      The change is in compat_drm_wait_vblank, which is called by
      drm_compat_ioctl. This change of copying the data back regardless of the
      return number makes it en par with drm_ioctl, which always copies the
      data before returning.
      
      [How]
      Return from the function after everything has been copied to user.
      
      Fixes IGT:kms_flip::modeset-vs-vblank-race-interruptible
      Tested on ChromeOS Trogdor(msm)
      Reviewed-by: default avatarMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: default avatarMark Yacoub <markyacoub@chromium.org>
      Signed-off-by: default avatarSean Paul <seanpaul@chromium.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210812194917.1703356-1-markyacoub@chromium.org
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6fd6e205