1. 03 Sep, 2020 40 commits
    • Thomas Gleixner's avatar
      x86/irq: Unbreak interrupt affinity setting · bbf423c2
      Thomas Gleixner authored
      commit e027ffff upstream.
      
      Several people reported that 5.8 broke the interrupt affinity setting
      mechanism.
      
      The consolidation of the entry code reused the regular exception entry code
      for device interrupts and changed the way how the vector number is conveyed
      from ptregs->orig_ax to a function argument.
      
      The low level entry uses the hardware error code slot to push the vector
      number onto the stack which is retrieved from there into a function
      argument and the slot on stack is set to -1.
      
      The reason for setting it to -1 is that the error code slot is at the
      position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
      indicates that the entry came via a syscall. If it's not set to a negative
      value then a signal delivery on return to userspace would try to restart a
      syscall. But there are other places which rely on pt_regs::orig_ax being a
      valid indicator for syscall entry.
      
      But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
      interrupt affinity setting mechanism, which was overlooked when this change
      was made.
      
      Moving interrupts on x86 happens in several steps. A new vector on a
      different CPU is allocated and the relevant interrupt source is
      reprogrammed to that. But that's racy and there might be an interrupt
      already in flight to the old vector. So the old vector is preserved until
      the first interrupt arrives on the new vector and the new target CPU. Once
      that happens the old vector is cleaned up, but this cleanup still depends
      on the vector number being stored in pt_regs::orig_ax, which is now -1.
      
      That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
      always false. As a consequence the interrupt is moved once, but then it
      cannot be moved anymore because the cleanup of the old vector never
      happens.
      
      There would be several ways to convey the vector information to that place
      in the guts of the interrupt handling, but on deeper inspection it turned
      out that this check is pointless and a leftover from the old affinity model
      of X86 which supported multi-CPU affinities. Under this model it was
      possible that an interrupt had an old and a new vector on the same CPU, so
      the vector match was required.
      
      Under the new model the effective affinity of an interrupt is always a
      single CPU from the requested affinity mask. If the affinity mask changes
      then either the interrupt stays on the CPU and on the same vector when that
      CPU is still in the new affinity mask or it is moved to a different CPU, but
      it is never moved to a different vector on the same CPU.
      
      Ergo the cleanup check for the matching vector number is not required and
      can be removed which makes the dependency on pt_regs:orig_ax go away.
      
      The remaining check for new_cpu == smp_processsor_id() is completely
      sufficient. If it matches then the interrupt was successfully migrated and
      the cleanup can proceed.
      
      For paranoia sake add a warning into the vector assignment code to
      validate that the assumption of never moving to a different vector on
      the same CPU holds.
      
      Fixes: 633260fa
      
       ("x86/irq: Convey vector as argument and not in ptregs")
      Reported-by: default avatarAlex bykov <alex.bykov@scylladb.com>
      Reported-by: default avatarAvi Kivity <avi@scylladb.com>
      Reported-by: default avatarAlexander Graf <graf@amazon.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarAlexander Graf <graf@amazon.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbf423c2
    • qiuguorui1's avatar
      irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake · 66e1e9bd
      qiuguorui1 authored
      commit e579076a upstream.
      
      In the current code, when the eoi callback of the exti clears the pending
      bit of the current interrupt, it will first read the values of fpr and
      rpr, then logically OR the corresponding bit of the interrupt number,
      and finally write back to fpr and rpr.
      
      We found through experiments that if two exti interrupts,
      we call them int1/int2, arrive almost at the same time. in our scenario,
      the time difference is 30 microseconds, assuming int1 is triggered first.
      
      there will be an extreme scenario: both int's pending bit are set to 1,
      the irq handle of int1 is executed first, and eoi handle is then executed,
      at this moment, all pending bits are cleared, but the int 2 has not
      finally been reported to the cpu yet, which eventually lost int2.
      
      According to stm32's TRM description about rpr and fpr: Writing a 1 to this
      bit will trigger a rising edge event on event x, Writing 0 has no
      effect.
      
      Therefore, when clearing the pending bit, we only need to clear the
      pending bit of the irq.
      
      Fixes: 927abfc4
      
       ("irqchip/stm32: Add stm32mp1 support with hierarchy domain")
      Signed-off-by: default avatarqiuguorui1 <qiuguorui1@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org # v4.18+
      Link: https://lore.kernel.org/r/20200820031629.15582-1-qiuguorui1@huawei.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66e1e9bd
    • Thomas Gleixner's avatar
      genirq/matrix: Deal with the sillyness of for_each_cpu() on UP · a1b11651
      Thomas Gleixner authored
      commit 784a0830 upstream.
      
      Most of the CPU mask operations behave the same way, but for_each_cpu() and
      it's variants ignore the cpumask argument and claim that CPU0 is always in
      the mask. This is historical, inconsistent and annoying behaviour.
      
      The matrix allocator uses for_each_cpu() and can be called on UP with an
      empty cpumask. The calling code does not expect that this succeeds but
      until commit e027ffff ("x86/irq: Unbreak interrupt affinity setting")
      this went unnoticed. That commit added a WARN_ON() to catch cases which
      move an interrupt from one vector to another on the same CPU. The warning
      triggers on UP.
      
      Add a check for the cpumask being empty to prevent this.
      
      Fixes: 2f75d9e1
      
       ("genirq: Implement bitmap matrix allocator")
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1b11651
    • M. Vefa Bicakci's avatar
      usbip: Implement a match function to fix usbip · 8cb3561d
      M. Vefa Bicakci authored
      commit 7a2f2974 upstream.
      
      Commit 88b7381a ("USB: Select better matching USB drivers when
      available") introduced the use of a "match" function to select a
      non-generic/better driver for a particular USB device. This
      unfortunately breaks the operation of usbip in general, as reported in
      the kernel bugzilla with bug 208267 (linked below).
      
      Upon inspecting the aforementioned commit, one can observe that the
      original code in the usb_device_match function used to return 1
      unconditionally, but the aforementioned commit makes the usb_device_match
      function use identifier tables and "match" virtual functions, if either of
      them are available.
      
      Hence, this commit implements a match function for usbip that
      unconditionally returns true to ensure that usbip is functional again.
      
      This change has been verified to restore usbip functionality, with a
      v5.7.y kernel on an up-to-date version of Qubes OS 4.0, which uses
      usbip to redirect USB devices between VMs.
      
      Thanks to Jonathan Dieter for the effort in bisecting this issue down
      to the aforementioned commit.
      
      Fixes: 88b7381a ("USB: Select better matching USB drivers when available")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=208267
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1856443
      Link: https://github.com/QubesOS/qubes-issues/issues/5905
      
      Signed-off-by: default avatarM. Vefa Bicakci <m.v.b@runbox.com>
      Cc: <stable@vger.kernel.org> # 5.7
      Cc: Valentina Manea <valentina.manea.m@gmail.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Reviewed-by: default avatarBastien Nocera <hadess@hadess.net>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20200810160017.46002-1-m.v.b@runbox.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cb3561d
    • Herbert Xu's avatar
      crypto: af_alg - Work around empty control messages without MSG_MORE · 3c491c44
      Herbert Xu authored
      commit c195d66a
      
       upstream.
      
      The iwd daemon uses libell which sets up the skcipher operation with
      two separate control messages.  As the first control message is sent
      without MSG_MORE, it is interpreted as an empty request.
      
      While libell should be fixed to use MSG_MORE where appropriate, this
      patch works around the bug in the kernel so that existing binaries
      continue to work.
      
      We will print a warning however.
      
      A separate issue is that the new kernel code no longer allows the
      control message to be sent twice within the same request.  This
      restriction is obviously incompatible with what iwd was doing (first
      setting an IV and then sending the real control message).  This
      patch changes the kernel so that this is explicitly allowed.
      Reported-by: default avatarCaleb Jorden <caljorden@hotmail.com>
      Fixes: f3c802a1
      
       ("crypto: algif_aead - Only wake up when...")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c491c44
    • Heikki Krogerus's avatar
      device property: Fix the secondary firmware node handling in set_primary_fwnode() · 1d35dfde
      Heikki Krogerus authored
      commit c15e1bdd upstream.
      
      When the primary firmware node pointer is removed from a
      device (set to NULL) the secondary firmware node pointer,
      when it exists, is made the primary node for the device.
      However, the secondary firmware node pointer of the original
      primary firmware node is never cleared (set to NULL).
      
      To avoid situation where the secondary firmware node pointer
      is pointing to a non-existing object, clearing it properly
      when the primary node is removed from a device in
      set_primary_fwnode().
      
      Fixes: 97badf87
      
       ("device property: Make it possible to use secondary firmware nodes")
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d35dfde
    • Alexey Kardashevskiy's avatar
      powerpc/perf: Fix crashes with generic_compat_pmu & BHRB · 9a9cc8c9
      Alexey Kardashevskiy authored
      commit b460b512 upstream.
      
      The bhrb_filter_map ("The Branch History Rolling Buffer") callback is
      only defined in raw CPUs' power_pmu structs. The "architected" CPUs
      use generic_compat_pmu, which does not have this callback, and crashes
      occur if a user tries to enable branch stack for an event.
      
      This add a NULL pointer check for bhrb_filter_map() which behaves as
      if the callback returned an error.
      
      This does not add the same check for config_bhrb() as the only caller
      checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0.
      
      Fixes: be80e758
      
       ("powerpc/perf: Add generic compat mode pmu driver")
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarMadhavan Srinivasan <maddy@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200602025612.62707-1-aik@ozlabs.ru
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a9cc8c9
    • Christophe Leroy's avatar
      powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU · bdae0167
      Christophe Leroy authored
      commit 4a133eb3 upstream.
      
      low_sleep_handler() can't restore the context from virtual
      stack because the stack can hardly be accessed with MMU OFF.
      
      For now, disable VMAP stack when CONFIG_ADB_PMU is selected.
      
      Fixes: cd08f109
      
       ("powerpc/32s: Enable CONFIG_VMAP_STACK")
      Cc: stable@vger.kernel.org # v5.6+
      Reported-by: default avatarGiuseppe Sacco <giuseppe@sguazz.it>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/ec96c15bfa1a7415ab604ee1c98cd45779c08be0.1598553015.git.christophe.leroy@csgroup.eu
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdae0167
    • Rafael J. Wysocki's avatar
      PM: sleep: core: Fix the handling of pending runtime resume requests · 3b555853
      Rafael J. Wysocki authored
      commit e3eb6e8f upstream.
      
      It has been reported that system-wide suspend may be aborted in the
      absence of any wakeup events due to unforseen interactions of it with
      the runtume PM framework.
      
      One failing scenario is when there are multiple devices sharing an
      ACPI power resource and runtime-resume needs to be carried out for
      one of them during system-wide suspend (for example, because it needs
      to be reconfigured before the whole system goes to sleep).  In that
      case, the runtime-resume of that device involves turning the ACPI
      power resource "on" which in turn causes runtime-resume requests
      to be queued up for all of the other devices sharing it.  Those
      requests go to the runtime PM workqueue which is frozen during
      system-wide suspend, so they are not actually taken care of until
      the resume of the whole system, but the pm_runtime_barrier()
      call in __device_suspend() sees them and triggers system wakeup
      events for them which then cause the system-wide suspend to be
      aborted if wakeup source objects are in active use.
      
      Of course, the logic that leads to triggering those wakeup events is
      questionable in the first place, because clearly there are cases in
      which a pending runtime resume request for a device is not connected
      to any real wakeup events in any way (like the one above).  Moreover,
      it is racy, because the device may be resuming already by the time
      the pm_runtime_barrier() runs and so if the driver doesn't take care
      of signaling the wakeup event as appropriate, it will be lost.
      However, if the driver does take care of that, the extra
      pm_wakeup_event() call in the core is redundant.
      
      Accordingly, drop the conditional pm_wakeup_event() call fron
      __device_suspend() and make the latter call pm_runtime_barrier()
      alone.  Also modify the comment next to that call to reflect the new
      code and extend it to mention the need to avoid unwanted interactions
      between runtime PM and system-wide device suspend callbacks.
      
      Fixes: 1e2ef05b
      
       ("PM: Limit race conditions between runtime PM and system sleep (v2)")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarUtkarsh H Patel <utkarsh.h.patel@intel.com>
      Tested-by: default avatarUtkarsh H Patel <utkarsh.h.patel@intel.com>
      Tested-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b555853
    • Frank van der Linden's avatar
      arm64: vdso32: make vdso32 install conditional · 17d66e62
      Frank van der Linden authored
      commit 5d28ba5f upstream.
      
      vdso32 should only be installed if CONFIG_COMPAT_VDSO is enabled,
      since it's not even supposed to be compiled otherwise, and arm64
      builds without a 32bit crosscompiler will fail.
      
      Fixes: 8d75785a
      
       ("ARM64: vdso32: Install vdso32 from vdso_install")
      Signed-off-by: default avatarFrank van der Linden <fllinden@amazon.com>
      Cc: stable@vger.kernel.org [5.4+]
      Link: https://lore.kernel.org/r/20200827234012.19757-1-fllinden@amazon.com
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17d66e62
    • James Morse's avatar
      KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception · d36a6712
      James Morse authored
      commit 71a7f8cb upstream.
      
      AT instructions do a translation table walk and return the result, or
      the fault in PAR_EL1. KVM uses these to find the IPA when the value is
      not provided by the CPU in HPFAR_EL1.
      
      If a translation table walk causes an external abort it is taken as an
      exception, even if it was due to an AT instruction. (DDI0487F.a's D5.2.11
      "Synchronous faults generated by address translation instructions")
      
      While we previously made KVM resilient to exceptions taken due to AT
      instructions, the device access causes mismatched attributes, and may
      occur speculatively. Prevent this, by forbidding a walk through memory
      described as device at stage2. Now such AT instructions will report a
      stage2 fault.
      
      Such a fault will cause KVM to restart the guest. If the AT instructions
      always walk the page tables, but guest execution uses the translation cached
      in the TLB, the guest can't make forward progress until the TLB entry is
      evicted. This isn't a problem, as since commit 5dcd0fdb ("KVM: arm64:
      Defer guest entry when an asynchronous exception is pending"), KVM will
      return to the host to process IRQs allowing the rest of the system to keep
      running.
      
      Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdb
      
       ("KVM: arm64: Defer guest entry when an asynchronous exception is pending")
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d36a6712
    • Pavel Begunkov's avatar
      io-wq: fix hang after cancelling pending hashed work · 13f35a2c
      Pavel Begunkov authored
      commit 204361a7
      
       upstream.
      
      Don't forget to update wqe->hash_tail after cancelling a pending work
      item, if it was hashed.
      
      Cc: stable@vger.kernel.org # 5.7+
      Reported-by: default avatarDmitry Shulyak <yashulyak@gmail.com>
      Fixes: 86f3cd1b
      
       ("io-wq: handle hashed writes in chains")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13f35a2c
    • Ding Hui's avatar
      xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed · 96d020dd
      Ding Hui authored
      commit f1ec7ae6 upstream.
      
      Some device drivers call libusb_clear_halt when target ep queue
      is not empty. (eg. spice client connected to qemu for usb redir)
      
      Before commit f5249461 ("xhci: Clear the host side toggle
      manually when endpoint is soft reset"), that works well.
      But now, we got the error log:
      
          EP not empty, refuse reset
      
      xhci_endpoint_reset failed and left ep_state's EP_SOFT_CLEAR_TOGGLE
      bit still set
      
      So all the subsequent urb sumbits to the ep will fail with the
      warn log:
      
          Can't enqueue URB while manually clearing toggle
      
      We need to clear ep_state EP_SOFT_CLEAR_TOGGLE bit after
      xhci_endpoint_reset, even if it failed.
      
      Fixes: f5249461
      
       ("xhci: Clear the host side toggle manually when endpoint is soft reset")
      Cc: stable <stable@vger.kernel.org> # v4.17+
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Link: https://lore.kernel.org/r/20200821091549.20556-4-mathias.nyman@linux.intel.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96d020dd
    • Kai-Heng Feng's avatar
      xhci: Do warm-reset when both CAS and XDEV_RESUME are set · 7d31eaa7
      Kai-Heng Feng authored
      commit 904df64a
      
       upstream.
      
      Sometimes re-plugging a USB device during system sleep renders the device
      useless:
      [  173.418345] xhci_hcd 0000:00:14.0: Get port status 2-4 read: 0x14203e2, return 0x10262
      ...
      [  176.496485] usb 2-4: Waited 2000ms for CONNECT
      [  176.496781] usb usb2-port4: status 0000.0262 after resume, -19
      [  176.497103] usb 2-4: can't resume, status -19
      [  176.497438] usb usb2-port4: logical disconnect
      
      Because PLS equals to XDEV_RESUME, xHCI driver reports U3 to usbcore,
      despite of CAS bit is flagged.
      
      So proritize CAS over XDEV_RESUME to let usbcore handle warm-reset for
      the port.
      
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Link: https://lore.kernel.org/r/20200821091549.20556-3-mathias.nyman@linux.intel.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d31eaa7
    • Li Jun's avatar
      usb: host: xhci: fix ep context print mismatch in debugfs · 50a7c735
      Li Jun authored
      commit 0077b1b2 upstream.
      
      dci is 0 based and xhci_get_ep_ctx() will do ep index increment to get
      the ep context.
      
      [rename dci to ep_index -Mathias]
      Cc: stable <stable@vger.kernel.org> # v4.15+
      Fixes: 02b6fdc2
      
       ("usb: xhci: Add debugfs interface for xHCI driver")
      Signed-off-by: default avatarLi Jun <jun.li@nxp.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Link: https://lore.kernel.org/r/20200821091549.20556-2-mathias.nyman@linux.intel.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50a7c735
    • JC Kuo's avatar
      usb: host: xhci-tegra: fix tegra_xusb_get_phy() · 55c0eeab
      JC Kuo authored
      commit d54343a8
      
       upstream.
      
      tegra_xusb_get_phy() should take input argument "name".
      Signed-off-by: default avatarJC Kuo <jckuo@nvidia.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200811092553.657762-1-jckuo@nvidia.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55c0eeab
    • JC Kuo's avatar
      usb: host: xhci-tegra: otg usb2/usb3 port init · c08e590b
      JC Kuo authored
      commit 316a2868
      
       upstream.
      
      tegra_xusb_init_usb_phy() should initialize "otg_usb2_port" and
      "otg_usb3_port" with -EINVAL because "0" is a valid value
      represents usb2 port 0 or usb3 port 0.
      Signed-off-by: default avatarJC Kuo <jckuo@nvidia.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200811093143.699541-1-jckuo@nvidia.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c08e590b
    • Vinod Koul's avatar
      usb: renesas-xhci: remove version check · 68adec46
      Vinod Koul authored
      commit d66a57be upstream.
      
      Some devices in wild are reporting bunch of firmware versions, so remove
      the check for versions in driver
      
      Reported by: Anastasios Vacharakis <vacharakis@gmail.com>
      Reported by: Glen Journeay <journeay@gmail.com>
      Fixes: 2478be82 ("usb: renesas-xhci: Add ROM loader for uPD720201")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208911
      
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200818071739.789720-1-vkoul@kernel.org
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68adec46
    • Thomas Gleixner's avatar
      XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN... · 2b32323f
      Thomas Gleixner authored
      XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information.
      
      commit c330fb1d
      
       upstream.
      
      handler data is meant for interrupt handlers and not for storing irq chip
      specific information as some devices require handler data to store internal
      per interrupt information, e.g. pinctrl/GPIO chained interrupt handlers.
      
      This obviously creates a conflict of interests and crashes the machine
      because the XEN pointer is overwritten by the driver pointer.
      
      As the XEN data is not handler specific it should be stored in
      irqdesc::irq_data::chip_data instead.
      
      A simple sed s/irq_[sg]et_handler_data/irq_[sg]et_chip_data/ cures that.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarRoman Shaposhnik <roman@zededa.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarRoman Shaposhnik <roman@zededa.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Link: https://lore.kernel.org/r/87lfi2yckt.fsf@nanos.tec.linutronix.de
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      2b32323f
    • Jan Kara's avatar
      writeback: Fix sync livelock due to b_dirty_time processing · 05ae7cf3
      Jan Kara authored
      commit f9cae926 upstream.
      
      When we are processing writeback for sync(2), move_expired_inodes()
      didn't set any inode expiry value (older_than_this). This can result in
      writeback never completing if there's steady stream of inodes added to
      b_dirty_time list as writeback rechecks dirty lists after each writeback
      round whether there's more work to be done. Fix the problem by using
      sync(2) start time is inode expiry value when processing b_dirty_time
      list similarly as for ordinarily dirtied inodes. This requires some
      refactoring of older_than_this handling which simplifies the code
      noticeably as a bonus.
      
      Fixes: 0ae45f63
      
       ("vfs: add support for a lazytime mount option")
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05ae7cf3
    • Jan Kara's avatar
      writeback: Avoid skipping inode writeback · d74c235b
      Jan Kara authored
      commit 5afced3b
      
       upstream.
      
      Inode's i_io_list list head is used to attach inode to several different
      lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker
      prepares a list of inodes to writeback e.g. for sync(2), it moves inodes
      to b_io list. Thus it is critical for sync(2) data integrity guarantees
      that inode is not requeued to any other writeback list when inode is
      queued for processing by flush worker. That's the reason why
      writeback_single_inode() does not touch i_io_list (unless the inode is
      completely clean) and why __mark_inode_dirty() does not touch i_io_list
      if I_SYNC flag is set.
      
      However there are two flaws in the current logic:
      
      1) When inode has only I_DIRTY_TIME set but it is already queued in b_io
      list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC)
      can still move inode back to b_dirty list resulting in skipping
      writeback of inode time stamps during sync(2).
      
      2) When inode is on b_dirty_time list and writeback_single_inode() races
      with __mark_inode_dirty() like:
      
      writeback_single_inode()		__mark_inode_dirty(inode, I_DIRTY_PAGES)
        inode->i_state |= I_SYNC
        __writeback_single_inode()
      					  inode->i_state |= I_DIRTY_PAGES;
      					  if (inode->i_state & I_SYNC)
      					    bail
        if (!(inode->i_state & I_DIRTY_ALL))
        - not true so nothing done
      
      We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus
      standard background writeback will not writeback this inode leading to
      possible dirty throttling stalls etc. (thanks to Martijn Coenen for this
      analysis).
      
      Fix these problems by tracking whether inode is queued in b_io or
      b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we
      know flush worker has queued inode and we should not touch i_io_list.
      On the other hand we also know that once flush worker is done with the
      inode it will requeue the inode to appropriate dirty list. When
      I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode
      to appropriate dirty list.
      Reported-by: default avatarMartijn Coenen <maco@android.com>
      Reviewed-by: default avatarMartijn Coenen <maco@android.com>
      Tested-by: default avatarMartijn Coenen <maco@android.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Fixes: 0ae45f63
      
       ("vfs: add support for a lazytime mount option")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d74c235b
    • Jan Kara's avatar
      writeback: Protect inode->i_io_list with inode->i_lock · dd71dd1d
      Jan Kara authored
      commit b35250c0
      
       upstream.
      
      Currently, operations on inode->i_io_list are protected by
      wb->list_lock. In the following patches we'll need to maintain
      consistency between inode->i_state and inode->i_io_list so change the
      code so that inode->i_lock protects also all inode's i_io_list handling.
      Reviewed-by: default avatarMartijn Coenen <maco@android.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      CC: stable@vger.kernel.org # Prerequisite for "writeback: Avoid skipping inode writeback"
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd71dd1d
    • Jens Axboe's avatar
      io_uring: clear req->result on IOPOLL re-issue · 1506fdcd
      Jens Axboe authored
      commit 56450c20
      
       upstream.
      
      Make sure we clear req->result, which was set to -EAGAIN for retry
      purposes, when moving it to the reissue list. Otherwise we can end up
      retrying a request more than once, which leads to weird results in
      the io-wq handling (and other spots).
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1506fdcd
    • Sergey Senozhatsky's avatar
      serial: 8250: change lock order in serial8250_do_startup() · 116790cf
      Sergey Senozhatsky authored
      commit 205d300a
      
       upstream.
      
      We have a number of "uart.port->desc.lock vs desc.lock->uart.port"
      lockdep reports coming from 8250 driver; this causes a bit of trouble
      to people, so let's fix it.
      
      The problem is reverse lock order in two different call paths:
      
      chain #1:
      
       serial8250_do_startup()
        spin_lock_irqsave(&port->lock);
         disable_irq_nosync(port->irq);
          raw_spin_lock_irqsave(&desc->lock)
      
      chain #2:
      
        __report_bad_irq()
         raw_spin_lock_irqsave(&desc->lock)
          for_each_action_of_desc()
           printk()
            spin_lock_irqsave(&port->lock);
      
      Fix this by changing the order of locks in serial8250_do_startup():
       do disable_irq_nosync() first, which grabs desc->lock, and grab
       uart->port after that, so that chain #1 and chain #2 have same lock
       order.
      
      Full lockdep splat:
      
       ======================================================
       WARNING: possible circular locking dependency detected
       5.4.39 #55 Not tainted
       ======================================================
      
       swapper/0/0 is trying to acquire lock:
       ffffffffab65b6c0 (console_owner){-...}, at: console_lock_spinning_enable+0x31/0x57
      
       but task is already holding lock:
       ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at: __report_bad_irq+0x5b/0xba
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #2 (&irq_desc_lock_class){-.-.}:
              _raw_spin_lock_irqsave+0x61/0x8d
              __irq_get_desc_lock+0x65/0x89
              __disable_irq_nosync+0x3b/0x93
              serial8250_do_startup+0x451/0x75c
              uart_startup+0x1b4/0x2ff
              uart_port_activate+0x73/0xa0
              tty_port_open+0xae/0x10a
              uart_open+0x1b/0x26
              tty_open+0x24d/0x3a0
              chrdev_open+0xd5/0x1cc
              do_dentry_open+0x299/0x3c8
              path_openat+0x434/0x1100
              do_filp_open+0x9b/0x10a
              do_sys_open+0x15f/0x3d7
              kernel_init_freeable+0x157/0x1dd
              kernel_init+0xe/0x105
              ret_from_fork+0x27/0x50
      
       -> #1 (&port_lock_key){-.-.}:
              _raw_spin_lock_irqsave+0x61/0x8d
              serial8250_console_write+0xa7/0x2a0
              console_unlock+0x3b7/0x528
              vprintk_emit+0x111/0x17f
              printk+0x59/0x73
              register_console+0x336/0x3a4
              uart_add_one_port+0x51b/0x5be
              serial8250_register_8250_port+0x454/0x55e
              dw8250_probe+0x4dc/0x5b9
              platform_drv_probe+0x67/0x8b
              really_probe+0x14a/0x422
              driver_probe_device+0x66/0x130
              device_driver_attach+0x42/0x5b
              __driver_attach+0xca/0x139
              bus_for_each_dev+0x97/0xc9
              bus_add_driver+0x12b/0x228
              driver_register+0x64/0xed
              do_one_initcall+0x20c/0x4a6
              do_initcall_level+0xb5/0xc5
              do_basic_setup+0x4c/0x58
              kernel_init_freeable+0x13f/0x1dd
              kernel_init+0xe/0x105
              ret_from_fork+0x27/0x50
      
       -> #0 (console_owner){-...}:
              __lock_acquire+0x118d/0x2714
              lock_acquire+0x203/0x258
              console_lock_spinning_enable+0x51/0x57
              console_unlock+0x25d/0x528
              vprintk_emit+0x111/0x17f
              printk+0x59/0x73
              __report_bad_irq+0xa3/0xba
              note_interrupt+0x19a/0x1d6
              handle_irq_event_percpu+0x57/0x79
              handle_irq_event+0x36/0x55
              handle_fasteoi_irq+0xc2/0x18a
              do_IRQ+0xb3/0x157
              ret_from_intr+0x0/0x1d
              cpuidle_enter_state+0x12f/0x1fd
              cpuidle_enter+0x2e/0x3d
              do_idle+0x1ce/0x2ce
              cpu_startup_entry+0x1d/0x1f
              start_kernel+0x406/0x46a
              secondary_startup_64+0xa4/0xb0
      
       other info that might help us debug this:
      
       Chain exists of:
         console_owner --> &port_lock_key --> &irq_desc_lock_class
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&irq_desc_lock_class);
                                      lock(&port_lock_key);
                                      lock(&irq_desc_lock_class);
         lock(console_owner);
      
        *** DEADLOCK ***
      
       2 locks held by swapper/0/0:
        #0: ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at: __report_bad_irq+0x5b/0xba
        #1: ffffffffab65b5c0 (console_lock){+.+.}, at: console_trylock_spinning+0x20/0x181
      
       stack backtrace:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.39 #55
       Hardware name: XXXXXX
       Call Trace:
        <IRQ>
        dump_stack+0xbf/0x133
        ? print_circular_bug+0xd6/0xe9
        check_noncircular+0x1b9/0x1c3
        __lock_acquire+0x118d/0x2714
        lock_acquire+0x203/0x258
        ? console_lock_spinning_enable+0x31/0x57
        console_lock_spinning_enable+0x51/0x57
        ? console_lock_spinning_enable+0x31/0x57
        console_unlock+0x25d/0x528
        ? console_trylock+0x18/0x4e
        vprintk_emit+0x111/0x17f
        ? lock_acquire+0x203/0x258
        printk+0x59/0x73
        __report_bad_irq+0xa3/0xba
        note_interrupt+0x19a/0x1d6
        handle_irq_event_percpu+0x57/0x79
        handle_irq_event+0x36/0x55
        handle_fasteoi_irq+0xc2/0x18a
        do_IRQ+0xb3/0x157
        common_interrupt+0xf/0xf
        </IRQ>
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Fixes: 768aec0b
      
       ("serial: 8250: fix shared interrupts issues with SMP and RT kernels")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reported-by: default avatarRaul Rangel <rrangel@google.com>
      BugLink: https://bugs.chromium.org/p/chromium/issues/detail?id=1114800
      Link: https://lore.kernel.org/lkml/CAHQZ30BnfX+gxjPm1DUd5psOTqbyDh4EJE=2=VAMW_VDafctkA@mail.gmail.com/T/#u
      
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200817022646.1484638-1-sergey.senozhatsky@gmail.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      116790cf
    • Valmer Huhn's avatar
      serial: 8250_exar: Fix number of ports for Commtech PCIe cards · 89171ef8
      Valmer Huhn authored
      commit c6b9e95d upstream.
      
      The following in 8250_exar.c line 589 is used to determine the number
      of ports for each Exar board:
      
      nr_ports = board->num_ports ? board->num_ports : pcidev->device & 0x0f;
      
      If the number of ports a card has is not explicitly specified, it defaults
      to the rightmost 4 bits of the PCI device ID. This is prone to error since
      not all PCI device IDs contain a number which corresponds to the number of
      ports that card provides.
      
      This particular case involves COMMTECH_4222PCIE, COMMTECH_4224PCIE and
      COMMTECH_4228PCIE cards with device IDs 0x0022, 0x0020 and 0x0021.
      Currently the multiport cards receive 2, 0 and 1 port instead of 2, 4 and
      8 ports respectively.
      
      To fix this, each Commtech Fastcom PCIe card is given a struct where the
      number of ports is explicitly specified. This ensures 'board->num_ports'
      is used instead of the default 'pcidev->device & 0x0f'.
      
      Fixes: d0aeaa83
      
       ("serial: exar: split out the exar code from 8250_pci")
      Signed-off-by: default avatarValmer Huhn <valmer.huhn@concurrent-rt.com>
      Tested-by: default avatarValmer Huhn <valmer.huhn@concurrent-rt.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200813165255.GC345440@icarus.concurrent-rt.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89171ef8
    • Holger Assmann's avatar
      serial: stm32: avoid kernel warning on absence of optional IRQ · 0a60539b
      Holger Assmann authored
      commit fdf16d78 upstream.
      
      stm32_init_port() of the stm32-usart may trigger a warning in
      platform_get_irq() when the device tree specifies no wakeup interrupt.
      
      The wakeup interrupt is usually a board-specific GPIO and the driver
      functions correctly in its absence. The mainline stm32mp151.dtsi does
      not specify it, so all mainline device trees trigger an unnecessary
      kernel warning. Use of platform_get_irq_optional() avoids this.
      
      Fixes: 2c58e560
      
       ("serial: stm32: fix the get_irq error case")
      Signed-off-by: default avatarHolger Assmann <h.assmann@pengutronix.de>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200813152757.32751-1-h.assmann@pengutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a60539b
    • Lukas Wunner's avatar
      serial: pl011: Don't leak amba_ports entry on driver register error · df264303
      Lukas Wunner authored
      commit 89efbe70 upstream.
      
      pl011_probe() calls pl011_setup_port() to reserve an amba_ports[] entry,
      then calls pl011_register_port() to register the uart driver with the
      tty layer.
      
      If registration of the uart driver fails, the amba_ports[] entry is not
      released.  If this happens 14 times (value of UART_NR macro), then all
      amba_ports[] entries will have been leaked and driver probing is no
      longer possible.  (To be fair, that can only happen if the DeviceTree
      doesn't contain alias IDs since they cause the same entry to be used for
      a given port.)   Fix it.
      
      Fixes: ef2889f7
      
       ("serial: pl011: Move uart_register_driver call to device")
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org # v3.15+
      Cc: Tushar Behera <tushar.behera@linaro.org>
      Link: https://lore.kernel.org/r/138f8c15afb2f184d8102583f8301575566064a6.1597316167.git.lukas@wunner.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df264303
    • Lukas Wunner's avatar
      serial: pl011: Fix oops on -EPROBE_DEFER · 6648c599
      Lukas Wunner authored
      commit 27afac93 upstream.
      
      If probing of a pl011 gets deferred until after free_initmem(), an oops
      ensues because pl011_console_match() is called which has been freed.
      
      Fix by removing the __init attribute from the function and those it
      calls.
      
      Commit 10879ae5 ("serial: pl011: add console matching function")
      introduced pl011_console_match() not just for early consoles but
      regular preferred consoles, such as those added by acpi_parse_spcr().
      Regular consoles may be registered after free_initmem() for various
      reasons, one being deferred probing, another being dynamic enablement
      of serial ports using a DeviceTree overlay.
      
      Thus, pl011_console_match() must not be declared __init and the
      functions it calls mustn't either.
      
      Stack trace for posterity:
      
      Unable to handle kernel paging request at virtual address 80c38b58
      Internal error: Oops: 8000000d [#1] PREEMPT SMP ARM
      PC is at pl011_console_match+0x0/0xfc
      LR is at register_console+0x150/0x468
      [<80187004>] (register_console)
      [<805a8184>] (uart_add_one_port)
      [<805b2b68>] (pl011_register_port)
      [<805b3ce4>] (pl011_probe)
      [<80569214>] (amba_probe)
      [<805ca088>] (really_probe)
      [<805ca2ec>] (driver_probe_device)
      [<805ca5b0>] (__device_attach_driver)
      [<805c8060>] (bus_for_each_drv)
      [<805c9dfc>] (__device_attach)
      [<805ca630>] (device_initial_probe)
      [<805c90a8>] (bus_probe_device)
      [<805c95a8>] (deferred_probe_work_func)
      
      Fixes: 10879ae5
      
       ("serial: pl011: add console matching function")
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org # v4.10+
      Cc: Aleksey Makarov <amakarov@marvell.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Christopher Covington <cov@codeaurora.org>
      Link: https://lore.kernel.org/r/f827ff09da55b8c57d316a1b008a137677b58921.1597315557.git.lukas@wunner.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6648c599
    • Tamseel Shams's avatar
      serial: samsung: Removes the IRQ not found warning · e3f8041d
      Tamseel Shams authored
      commit 8c6c378b
      
       upstream.
      
      In few older Samsung SoCs like s3c2410, s3c2412
      and s3c2440, UART IP is having 2 interrupt lines.
      However, in other SoCs like s3c6400, s5pv210,
      exynos5433, and exynos4210 UART is having only 1
      interrupt line. Due to this, "platform_get_irq(platdev, 1)"
      call in the driver gives the following false-positive error:
      "IRQ index 1 not found" on newer SoC's.
      
      This patch adds the condition to check for Tx interrupt
      only for the those SoC's which have 2 interrupt lines.
      Tested-by: default avatarAlim Akhtar <alim.akhtar@samsung.com>
      Tested-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: default avatarAlim Akhtar <alim.akhtar@samsung.com>
      Signed-off-by: default avatarTamseel Shams <m.shams@samsung.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200810030021.45348-1-m.shams@samsung.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3f8041d
    • George Kennedy's avatar
      vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize() · edc8a4eb
      George Kennedy authored
      commit bc5269ca
      
       upstream.
      
      vc_resize() can return with an error after failure. Change VT_RESIZEX ioctl
      to save struct vc_data values that are modified and restore the original
      values in case of error.
      Signed-off-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Cc: stable <stable@vger.kernel.org>
      Reported-by: syzbot+38a3699c7eaf165b97a6@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/1596213192-6635-2-git-send-email-george.kennedy@oracle.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edc8a4eb
    • Tetsuo Handa's avatar
      vt: defer kfree() of vc_screenbuf in vc_do_resize() · 2392eea6
      Tetsuo Handa authored
      commit f8d1653d upstream.
      
      syzbot is reporting UAF bug in set_origin() from vc_do_resize() [1], for
      vc_do_resize() calls kfree(vc->vc_screenbuf) before calling set_origin().
      
      Unfortunately, in set_origin(), vc->vc_sw->con_set_origin() might access
      vc->vc_pos when scroll is involved in order to manipulate cursor, but
      vc->vc_pos refers already released vc->vc_screenbuf until vc->vc_pos gets
      updated based on the result of vc->vc_sw->con_set_origin().
      
      Preserving old buffer and tolerating outdated vc members until set_origin()
      completes would be easier than preventing vc->vc_sw->con_set_origin() from
      accessing outdated vc members.
      
      [1] https://syzkaller.appspot.com/bug?id=6649da2081e2ebdc65c0642c214b27fe91099db3
      
      Reported-by: default avatarsyzbot <syzbot+9116ecc1978ca3a12f43@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/1596034621-4714-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2392eea6
    • Evgeny Novikov's avatar
      USB: lvtest: return proper error code in probe · e863ac5f
      Evgeny Novikov authored
      commit 53141249
      
       upstream.
      
      lvs_rh_probe() can return some nonnegative value from usb_control_msg()
      when it is less than "USB_DT_HUB_NONVAR_SIZE + 2" that is considered as
      a failure. Make lvs_rh_probe() return -EINVAL in this case.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarEvgeny Novikov <novikov@ispras.ru>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200805090643.3432-1-novikov@ispras.ru
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e863ac5f
    • George Kennedy's avatar
      fbcon: prevent user font height or width change from causing potential out-of-bounds access · 34cf1aff
      George Kennedy authored
      commit 39b3cffb
      
       upstream.
      
      Add a check to fbcon_resize() to ensure that a possible change to user font
      height or user font width will not allow a font data out-of-bounds access.
      NOTE: must use original charcount in calculation as font charcount can
      change and cannot be used to determine the font data allocated size.
      Signed-off-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Cc: stable <stable@vger.kernel.org>
      Reported-by: syzbot+38a3699c7eaf165b97a6@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/1596213192-6635-1-git-send-email-george.kennedy@oracle.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34cf1aff
    • Boris Burkov's avatar
      btrfs: detect nocow for swap after snapshot delete · bb77dd02
      Boris Burkov authored
      commit a84d5d42
      
       upstream.
      
      can_nocow_extent and btrfs_cross_ref_exist both rely on a heuristic for
      detecting a must cow condition which is not exactly accurate, but saves
      unnecessary tree traversal. The incorrect assumption is that if the
      extent was created in a generation smaller than the last snapshot
      generation, it must be referenced by that snapshot. That is true, except
      the snapshot could have since been deleted, without affecting the last
      snapshot generation.
      
      The original patch claimed a performance win from this check, but it
      also leads to a bug where you are unable to use a swapfile if you ever
      snapshotted the subvolume it's in. Make the check slower and more strict
      for the swapon case, without modifying the general cow checks as a
      compromise. Turning swap on does not seem to be a particularly
      performance sensitive operation, so incurring a possibly unnecessary
      btrfs_search_slot seems worthwhile for the added usability.
      
      Note: Until the snapshot is competely cleaned after deletion,
      check_committed_refs will still cause the logic to think that cow is
      necessary, so the user must until 'btrfs subvolu sync' finished before
      activating the swapfile swapon.
      
      CC: stable@vger.kernel.org # 5.4+
      Suggested-by: default avatarOmar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb77dd02
    • Filipe Manana's avatar
      btrfs: fix space cache memory leak after transaction abort · b40d12b7
      Filipe Manana authored
      commit bbc37d6e
      
       upstream.
      
      If a transaction aborts it can cause a memory leak of the pages array of
      a block group's io_ctl structure. The following steps explain how that can
      happen:
      
      1) Transaction N is committing, currently in state TRANS_STATE_UNBLOCKED
         and it's about to start writing out dirty extent buffers;
      
      2) Transaction N + 1 already started and another task, task A, just called
         btrfs_commit_transaction() on it;
      
      3) Block group B was dirtied (extents allocated from it) by transaction
         N + 1, so when task A calls btrfs_start_dirty_block_groups(), at the
         very beginning of the transaction commit, it starts writeback for the
         block group's space cache by calling btrfs_write_out_cache(), which
         allocates the pages array for the block group's io_ctl with a call to
         io_ctl_init(). Block group A is added to the io_list of transaction
         N + 1 by btrfs_start_dirty_block_groups();
      
      4) While transaction N's commit is writing out the extent buffers, it gets
         an IO error and aborts transaction N, also setting the file system to
         RO mode;
      
      5) Task A has already returned from btrfs_start_dirty_block_groups(), is at
         btrfs_commit_transaction() and has set transaction N + 1 state to
         TRANS_STATE_COMMIT_START. Immediately after that it checks that the
         filesystem was turned to RO mode, due to transaction N's abort, and
         jumps to the "cleanup_transaction" label. After that we end up at
         btrfs_cleanup_one_transaction() which calls btrfs_cleanup_dirty_bgs().
         That helper finds block group B in the transaction's io_list but it
         never releases the pages array of the block group's io_ctl, resulting in
         a memory leak.
      
      In fact at the point when we are at btrfs_cleanup_dirty_bgs(), the pages
      array points to pages that were already released by us at
      __btrfs_write_out_cache() through the call to io_ctl_drop_pages(). We end
      up freeing the pages array only after waiting for the ordered extent to
      complete through btrfs_wait_cache_io(), which calls io_ctl_free() to do
      that. But in the transaction abort case we don't wait for the space cache's
      ordered extent to complete through a call to btrfs_wait_cache_io(), so
      that's why we end up with a memory leak - we wait for the ordered extent
      to complete indirectly by shutting down the work queues and waiting for
      any jobs in them to complete before returning from close_ctree().
      
      We can solve the leak simply by freeing the pages array right after
      releasing the pages (with the call to io_ctl_drop_pages()) at
      __btrfs_write_out_cache(), since we will never use it anymore after that
      and the pages array points to already released pages at that point, which
      is currently not a problem since no one will use it after that, but not a
      good practice anyway since it can easily lead to use-after-free issues.
      
      So fix this by freeing the pages array right after releasing the pages at
      __btrfs_write_out_cache().
      
      This issue can often be reproduced with test case generic/475 from fstests
      and kmemleak can detect it and reports it with the following trace:
      
      unreferenced object 0xffff9bbf009fa600 (size 512):
        comm "fsstress", pid 38807, jiffies 4298504428 (age 22.028s)
        hex dump (first 32 bytes):
          00 a0 7c 4d 3d ed ff ff 40 a0 7c 4d 3d ed ff ff  ..|M=...@.|M=...
          80 a0 7c 4d 3d ed ff ff c0 a0 7c 4d 3d ed ff ff  ..|M=.....|M=...
        backtrace:
          [<00000000f4b5cfe2>] __kmalloc+0x1a8/0x3e0
          [<0000000028665e7f>] io_ctl_init+0xa7/0x120 [btrfs]
          [<00000000a1f95b2d>] __btrfs_write_out_cache+0x86/0x4a0 [btrfs]
          [<00000000207ea1b0>] btrfs_write_out_cache+0x7f/0xf0 [btrfs]
          [<00000000af21f534>] btrfs_start_dirty_block_groups+0x27b/0x580 [btrfs]
          [<00000000c3c23d44>] btrfs_commit_transaction+0xa6f/0xe70 [btrfs]
          [<000000009588930c>] create_subvol+0x581/0x9a0 [btrfs]
          [<000000009ef2fd7f>] btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
          [<00000000474e5187>] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
          [<00000000708ee349>] btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
          [<00000000ea60106f>] btrfs_ioctl+0x12c/0x3130 [btrfs]
          [<000000005c923d6d>] __x64_sys_ioctl+0x83/0xb0
          [<0000000043ace2c9>] do_syscall_64+0x33/0x80
          [<00000000904efbce>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b40d12b7
    • Josef Bacik's avatar
      btrfs: check the right error variable in btrfs_del_dir_entries_in_log · c7e8c6f4
      Josef Bacik authored
      commit fb2fecba upstream.
      
      With my new locking code dbench is so much faster that I tripped over a
      transaction abort from ENOSPC.  This turned out to be because
      btrfs_del_dir_entries_in_log was checking for ret == -ENOSPC, but this
      function sets err on error, and returns err.  So instead of properly
      marking the inode as needing a full commit, we were returning -ENOSPC
      and aborting in __btrfs_unlink_inode.  Fix this by checking the proper
      variable so that we return the correct thing in the case of ENOSPC.
      
      The ENOENT needs to be checked, because btrfs_lookup_dir_item_index()
      can return -ENOENT if the dir item isn't in the tree log (which would
      happen if we hadn't fsync'ed this guy).  We actually handle that case in
      __btrfs_unlink_inode, so it's an expected error to get back.
      
      Fixes: 4a500fd1
      
       ("Btrfs: Metadata ENOSPC handling for tree log")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ add note and comment about ENOENT ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7e8c6f4
    • Marcos Paulo de Souza's avatar
      btrfs: reset compression level for lzo on remount · 204ed5f3
      Marcos Paulo de Souza authored
      commit 282dd7d7
      
       upstream.
      
      Currently a user can set mount "-o compress" which will set the
      compression algorithm to zlib, and use the default compress level for
      zlib (3):
      
        relatime,compress=zlib:3,space_cache
      
      If the user remounts the fs using "-o compress=lzo", then the old
      compress_level is used:
      
        relatime,compress=lzo:3,space_cache
      
      But lzo does not expose any tunable compression level. The same happens
      if we set any compress argument with different level, also with zstd.
      
      Fix this by resetting the compress_level when compress=lzo is
      specified.  With the fix applied, lzo is shown without compress level:
      
        relatime,compress=lzo,space_cache
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      204ed5f3
    • Ming Lei's avatar
      blk-mq: order adding requests to hctx->dispatch and checking SCHED_RESTART · b4cbbc13
      Ming Lei authored
      commit d7d8535f upstream.
      
      SCHED_RESTART code path is relied to re-run queue for dispatch requests
      in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding
      requests to hctx->dispatch.
      
      memory barriers have to be used for ordering the following two pair of OPs:
      
      1) adding requests to hctx->dispatch and checking SCHED_RESTART in
      blk_mq_dispatch_rq_list()
      
      2) clearing SCHED_RESTART and checking if there is request in hctx->dispatch
      in blk_mq_sched_restart().
      
      Without the added memory barrier, either:
      
      1) blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime
      blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in
      dispatch side
      
      or
      
      2) blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue
      in dispatch side, meantime checking if there is request in
      hctx->dispatch from blk_mq_sched_restart() is missed.
      
      IO hang in ltp/fs_fill test is reported by kernel test robot:
      
      	https://lkml.org/lkml/2020/7/26/77
      
      Turns out it is caused by the above out-of-order OPs. And the IO hang
      can't be observed any more after applying this patch.
      
      Fixes: bd166ef1
      
       ("blk-mq-sched: add framework for MQ capable IO schedulers")
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Jeffery <djeffery@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4cbbc13
    • Hans de Goede's avatar
      HID: i2c-hid: Always sleep 60ms after I2C_HID_PWR_ON commands · 649b6c86
      Hans de Goede authored
      commit eef40162 upstream.
      
      Before this commit i2c_hid_parse() consists of the following steps:
      
      1. Send power on cmd
      2. usleep_range(1000, 5000)
      3. Send reset cmd
      4. Wait for reset to complete (device interrupt, or msleep(100))
      5. Send power on cmd
      6. Try to read HID descriptor
      
      Notice how there is an usleep_range(1000, 5000) after the first power-on
      command, but not after the second power-on command.
      
      Testing has shown that at least on the BMAX Y13 laptop's i2c-hid touchpad,
      not having a delay after the second power-on command causes the HID
      descriptor to read as all zeros.
      
      In case we hit this on other devices too, the descriptor being all zeros
      can be recognized by the following message being logged many, many times:
      
      hid-generic 0018:0911:5288.0002: unknown main item tag 0x0
      
      At the same time as the BMAX Y13's touchpad issue was debugged,
      Kai-Heng was working on debugging some issues with Goodix i2c-hid
      touchpads. It turns out that these need a delay after a PWR_ON command
      too, otherwise they stop working after a suspend/resume cycle.
      According to Goodix a delay of minimal 60ms is needed.
      
      Having multiple cases where we need a delay after sending the power-on
      command, seems to indicate that we should always sleep after the power-on
      command.
      
      This commit fixes the mentioned issues by moving the existing 1ms sleep to
      the i2c_hid_set_power() function and changing it to a 60ms sleep.
      
      Cc: stable@vger.kernel.org
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208247
      
      Reported-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Reported-and-tested-by: default avatarAndrea Borgia <andrea@borgia.bo.it>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      649b6c86
    • Ming Lei's avatar
      block: loop: set discard granularity and alignment for block device backed loop · 7aaaf975
      Ming Lei authored
      commit bcb21c8c upstream.
      
      In case of block device backend, if the backend supports write zeros, the
      loop device will set queue flag of QUEUE_FLAG_DISCARD. However,
      limits.discard_granularity isn't setup, and this way is wrong,
      see the following description in Documentation/ABI/testing/sysfs-block:
      
      	A discard_granularity of 0 means that the device does not support
      	discard functionality.
      
      Especially 9b15d109 ("block: improve discard bio alignment in
      __blkdev_issue_discard()") starts to take q->limits.discard_granularity
      for computing max discard sectors. And zero discard granularity may cause
      kernel oops, or fail discard request even though the loop queue claims
      discard support via QUEUE_FLAG_DISCARD.
      
      Fix the issue by setup discard granularity and alignment.
      
      Fixes: c52abf56
      
       ("loop: Better discard support for block devices")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarColy Li <colyli@suse.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Xiao Ni <xni@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Evan Green <evgreen@chromium.org>
      Cc: Gwendal Grignou <gwendal@chromium.org>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7aaaf975