1. 10 Nov, 2018 40 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.18.125 · 78e0897d
      Greg Kroah-Hartman authored
      78e0897d
    • Phil Auld's avatar
      sched/fair: Fix throttle_list starvation with low CFS quota · 6937db48
      Phil Auld authored
      commit baa9be4f upstream.
      
      With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
      distribute_cfs_runtime may not empty the throttled_list before it runs
      out of runtime to distribute. In that case, due to the change from
      c06f04c7
      
       to put throttled entries at the head of the list, later entries
      on the list will starve.  Essentially, the same X processes will get pulled
      off the list, given CPU time and then, when expired, get put back on the
      head of the list where distribute_cfs_runtime will give runtime to the same
      set of processes leaving the rest.
      
      Fix the issue by setting a bit in struct cfs_bandwidth when
      distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
      decide to put the throttled entry on the tail or the head of the list.  The
      bit is set/cleared by the callers of distribute_cfs_runtime while they hold
      cfs_bandwidth->lock.
      
      This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
      the live system. In some cases you can simply look at the throttled list and
      see the later entries are not changing:
      
        crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
          1     ffff90b56cb2d200  -976050
          2     ffff90b56cb2cc00  -484925
          3     ffff90b56cb2bc00  -658814
          4     ffff90b56cb2ba00  -275365
          5     ffff90b166a45600  -135138
          6     ffff90b56cb2da00  -282505
          7     ffff90b56cb2e000  -148065
          8     ffff90b56cb2fa00  -872591
          9     ffff90b56cb2c000  -84687
         10     ffff90b56cb2f000  -87237
         11     ffff90b166a40a00  -164582
      
        crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
          1     ffff90b56cb2d200  -994147
          2     ffff90b56cb2cc00  -306051
          3     ffff90b56cb2bc00  -961321
          4     ffff90b56cb2ba00  -24490
          5     ffff90b166a45600  -135138
          6     ffff90b56cb2da00  -282505
          7     ffff90b56cb2e000  -148065
          8     ffff90b56cb2fa00  -872591
          9     ffff90b56cb2c000  -84687
         10     ffff90b56cb2f000  -87237
         11     ffff90b166a40a00  -164582
      
      Sometimes it is easier to see by finding a process getting starved and looking
      at the sched_info:
      
        crash> task ffff8eb765994500 sched_info
        PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
          sched_info = {
            pcount = 8,
            run_delay = 697094208,
            last_arrival = 240260125039,
            last_queued = 240260327513
          },
        crash> task ffff8eb765994500 sched_info
        PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
          sched_info = {
            pcount = 8,
            run_delay = 697094208,
            last_arrival = 240260125039,
            last_queued = 240260327513
          },
      Signed-off-by: default avatarPhil Auld <pauld@redhat.com>
      Reviewed-by: default avatarBen Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
      Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6937db48
    • Alan Stern's avatar
      USB: fix the usbfs flag sanitization for control transfers · 341d66bb
      Alan Stern authored
      commit 665c365a upstream.
      
      Commit 7a68d9fb
      
       ("USB: usbdevfs: sanitize flags more") checks the
      transfer flags for URBs submitted from userspace via usbfs.  However,
      the check for whether the USBDEVFS_URB_SHORT_NOT_OK flag should be
      allowed for a control transfer was added in the wrong place, before
      the code has properly determined the direction of the control
      transfer.  (Control transfers are special because for them, the
      direction is set by the bRequestType byte of the Setup packet rather
      than direction bit of the endpoint address.)
      
      This patch moves code which sets up the allow_short flag for control
      transfers down after is_in has been set to the correct value.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-and-tested-by: syzbot+24a30223a4b609bb802e@syzkaller.appspotmail.com
      Fixes: 7a68d9fb
      
       ("USB: usbdevfs: sanitize flags more")
      CC: Oliver Neukum <oneukum@suse.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      341d66bb
    • Tobias Herzog's avatar
      cdc-acm: correct counting of UART states in serial state notification · 07e37034
      Tobias Herzog authored
      commit f976d0e5
      
       upstream.
      
      The usb standard ("Universal Serial Bus Class Definitions for Communication
      Devices") distiguishes between "consistent signals" (DSR, DCD), and
      "irregular signals" (break, ring, parity error, framing error, overrun).
      The bits of "irregular signals" are set, if this error/event occurred on
      the device side and are immeadeatly unset, if the serial state notification
      was sent.
      Like other drivers of real serial ports do, just the occurence of those
      events should be counted in serial_icounter_struct (but no 1->0
      transitions).
      Signed-off-by: default avatarTobias Herzog <t-herzog@gmx.de>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07e37034
    • Al Viro's avatar
      cachefiles: fix the race between cachefiles_bury_object() and rmdir(2) · fdbbd418
      Al Viro authored
      commit 169b8033 upstream.
      
      the victim might've been rmdir'ed just before the lock_rename();
      unlike the normal callers, we do not look the source up after the
      parents are locked - we know it beforehand and just recheck that it's
      still the child of what used to be its parent.  Unfortunately,
      the check is too weak - we don't spot a dead directory since its
      ->d_parent is unchanged, dentry is positive, etc.  So we sail all
      the way to ->rename(), with hosting filesystems _not_ expecting
      to be asked renaming an rmdir'ed subdirectory.
      
      The fix is easy, fortunately - the lock on parent is sufficient for
      making IS_DEADDIR() on child safe.
      
      Cc: stable@vger.kernel.org
      Fixes: 9ae326a6
      
       (CacheFiles: A cache that backs onto a mounted filesystem)
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdbbd418
    • Jakub Kicinski's avatar
      net: sched: gred: pass the right attribute to gred_change_table_def() · fa6810e7
      Jakub Kicinski authored
      [ Upstream commit 38b4f18d ]
      
      gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
      and expects it will be able to interpret its contents as
      struct tc_gred_sopt.  Pass the correct gred attribute, instead of
      TCA_OPTIONS.
      
      This bug meant the table definition could never be changed after
      Qdisc was initialized (unless whatever TCA_OPTIONS contained both
      passed netlink validation and was a valid struct tc_gred_sopt...).
      
      Old behaviour:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      RTNETLINK answers: Invalid argument
      
      Now:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      
      Fixes: f62d6b93
      
       ("[PKT_SCHED]: GRED: Use central VQ change procedure")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa6810e7
    • Ido Schimmel's avatar
      rtnetlink: Disallow FDB configuration for non-Ethernet device · 0e71018a
      Ido Schimmel authored
      [ Upstream commit da715775 ]
      
      When an FDB entry is configured, the address is validated to have the
      length of an Ethernet address, but the device for which the address is
      configured can be of any type.
      
      The above can result in the use of uninitialized memory when the address
      is later compared against existing addresses since 'dev->addr_len' is
      used and it may be greater than ETH_ALEN, as with ip6tnl devices.
      
      Fix this by making sure that FDB entries are only configured for
      Ethernet devices.
      
      BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
      CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x14b/0x190 lib/dump_stack.c:113
        kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
        __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
        memcmp+0x11d/0x180 lib/string.c:863
        dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
        ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
        rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
        rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
        netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
        rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
        netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
        netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
        netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x440ee9
      Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
      R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
        kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
        kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
        slab_post_alloc_hook mm/slab.h:446 [inline]
        slab_alloc_node mm/slub.c:2718 [inline]
        __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:996 [inline]
        netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
        netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      v2:
      * Make error message more specific (David)
      
      Fixes: 090096bf
      
       ("net: generic fdb support for drivers without ndo_fdb_<op>")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e71018a
    • Cong Wang's avatar
      net: drop skb on failure in ip_check_defrag() · d64a8204
      Cong Wang authored
      [ Upstream commit 7de414a9 ]
      
      Most callers of pskb_trim_rcsum() simply drop the skb when
      it fails, however, ip_check_defrag() still continues to pass
      the skb up to stack. This is suspicious.
      
      In ip_check_defrag(), after we learn the skb is an IP fragment,
      passing the skb to callers makes no sense, because callers expect
      fragments are defrag'ed on success. So, dropping the skb when we
      can't defrag it is reasonable.
      
      Note, prior to commit 88078d98, this is not a big problem as
      checksum will be fixed up anyway. After it, the checksum is not
      correct on failure.
      
      Found this during code review.
      
      Fixes: 88078d98
      
       ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d64a8204
    • Marcelo Ricardo Leitner's avatar
      sctp: fix race on sctp_id2asoc · 55eb3e7e
      Marcelo Ricardo Leitner authored
      [ Upstream commit b336deca
      
       ]
      
      syzbot reported an use-after-free involving sctp_id2asoc.  Dmitry Vyukov
      helped to root cause it and it is because of reading the asoc after it
      was freed:
      
              CPU 1                       CPU 2
      (working on socket 1)            (working on socket 2)
      	                         sctp_association_destroy
      sctp_id2asoc
         spin lock
           grab the asoc from idr
         spin unlock
                                         spin lock
      				     remove asoc from idr
      				   spin unlock
      				   free(asoc)
         if asoc->base.sk != sk ... [*]
      
      This can only be hit if trying to fetch asocs from different sockets. As
      we have a single IDR for all asocs, in all SCTP sockets, their id is
      unique on the system. An application can try to send stuff on an id
      that matches on another socket, and the if in [*] will protect from such
      usage. But it didn't consider that as that asoc may belong to another
      socket, it may be freed in parallel (read: under another socket lock).
      
      We fix it by moving the checks in [*] into the protected region. This
      fixes it because the asoc cannot be freed while the lock is held.
      
      Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55eb3e7e
    • Heiner Kallweit's avatar
      r8169: fix NAPI handling under high load · e29dae46
      Heiner Kallweit authored
      [ Upstream commit 6b839b6c ]
      
      rtl_rx() and rtl_tx() are called only if the respective bits are set
      in the interrupt status register. Under high load NAPI may not be
      able to process all data (work_done == budget) and it will schedule
      subsequent calls to the poll callback.
      rtl_ack_events() however resets the bits in the interrupt status
      register, therefore subsequent calls to rtl8169_poll() won't call
      rtl_rx() and rtl_tx() - chip interrupts are still disabled.
      
      Fix this by calling rtl_rx() and rtl_tx() independent of the bits
      set in the interrupt status register. Both functions will detect
      if there's nothing to do for them.
      
      Fixes: da78dbff
      
       ("r8169: remove work from irq handler.")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e29dae46
    • Niklas Cassel's avatar
      net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules · 410155b7
      Niklas Cassel authored
      [ Upstream commit 30549aab
      
       ]
      
      When building stmmac, it is only possible to select CONFIG_DWMAC_GENERIC,
      or any of the glue drivers, when CONFIG_STMMAC_PLATFORM is set.
      The only exception is CONFIG_STMMAC_PCI.
      
      When calling of_mdiobus_register(), it will call our ->reset()
      callback, which is set to stmmac_mdio_reset().
      
      Most of the code in stmmac_mdio_reset() is protected by a
      "#if defined(CONFIG_STMMAC_PLATFORM)", which will evaluate
      to false when CONFIG_STMMAC_PLATFORM=m.
      
      Because of this, the phy reset gpio will only be pulled when
      stmmac is built as built-in, but not when built as modules.
      
      Fix this by using "#if IS_ENABLED()" instead of "#if defined()".
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      410155b7
    • Wenwen Wang's avatar
      net: socket: fix a missing-check bug · 242384b3
      Wenwen Wang authored
      [ Upstream commit b6168562
      
       ]
      
      In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
      statement to see whether it is necessary to pre-process the ethtool
      structure, because, as mentioned in the comment, the structure
      ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
      is allocated through compat_alloc_user_space(). One thing to note here is
      that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
      partially determined by 'rule_cnt', which is actually acquired from the
      user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
      get_user(). After 'rxnfc' is allocated, the data in the original user-space
      buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
      including the 'rule_cnt' field. However, after this copy, no check is
      re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
      race to change the value in the 'compat_rxnfc->rule_cnt' between these two
      copies. Through this way, the attacker can bypass the previous check on
      'rule_cnt' and inject malicious data. This can cause undefined behavior of
      the kernel and introduce potential security risk.
      
      This patch avoids the above issue via copying the value acquired by
      get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      242384b3
    • David Ahern's avatar
      net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs · f28d8265
      David Ahern authored
      [ Upstream commit 4ba4c566 ]
      
      The loop wants to skip previously dumped addresses, so loops until
      current index >= saved index. If the message fills it wants to save
      the index for the next address to dump - ie., the one that did not
      fit in the current message.
      
      Currently, it is incrementing the index counter before comparing to the
      saved index, and then the saved index is off by 1 - it assumes the
      current address is going to fit in the message.
      
      Change the index handling to increment only after a succesful dump.
      
      Fixes: 502a2ffd
      
       ("ipv6: convert idev_list to list macros")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f28d8265
    • Stefano Brivio's avatar
      ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called · b719ba2b
      Stefano Brivio authored
      [ Upstream commit ee1abcf6 ]
      
      Commit a61bbcf2 ("[NET]: Store skb->timestamp as offset to a base
      timestamp") introduces a neighbour control buffer and zeroes it out in
      ndisc_rcv(), as ndisc_recv_ns() uses it.
      
      Commit f2776ff0 ("[IPV6]: Fix address/interface handling in UDP and
      DCCP, according to the scoping architecture.") introduces the usage of the
      IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in
      present-day __udp6_lib_err()).
      
      Now, with commit b94f1c09 ("ipv6: Use icmpv6_notify() to propagate
      redirect, instead of rt6_redirect()."), we call protocol error handlers
      from ndisc_redirect_rcv(), after the control buffer is already stolen and
      some parts are already zeroed out. This implies that inet6_iif() on this
      path will always return zero.
      
      This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as
      we might actually need to match sockets for a given interface.
      
      Instead of always claiming the control buffer in ndisc_rcv(), do that only
      when needed.
      
      Fixes: b94f1c09
      
       ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b719ba2b
    • Eric Dumazet's avatar
      ipv6: mcast: fix a use-after-free in inet6_mc_check · 03628bad
      Eric Dumazet authored
      [ Upstream commit dc012f36
      
       ]
      
      syzbot found a use-after-free in inet6_mc_check [1]
      
      The problem here is that inet6_mc_check() uses rcu
      and read_lock(&iml->sflock)
      
      So the fact that ip6_mc_leave_src() is called under RTNL
      and the socket lock does not help us, we need to acquire
      iml->sflock in write mode.
      
      In the future, we should convert all this stuff to RCU.
      
      [1]
      BUG: KASAN: use-after-free in ipv6_addr_equal include/net/ipv6.h:521 [inline]
      BUG: KASAN: use-after-free in inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
      Read of size 8 at addr ffff8801ce7f2510 by task syz-executor0/22432
      
      CPU: 1 PID: 22432 Comm: syz-executor0 Not tainted 4.19.0-rc7+ #280
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       ipv6_addr_equal include/net/ipv6.h:521 [inline]
       inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
       __raw_v6_lookup+0x320/0x3f0 net/ipv6/raw.c:98
       ipv6_raw_deliver net/ipv6/raw.c:183 [inline]
       raw6_local_deliver+0x3d3/0xcb0 net/ipv6/raw.c:240
       ip6_input_finish+0x467/0x1aa0 net/ipv6/ip6_input.c:345
       NF_HOOK include/linux/netfilter.h:289 [inline]
       ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:426
       ip6_mc_input+0x48a/0xd20 net/ipv6/ip6_input.c:503
       dst_input include/net/dst.h:450 [inline]
       ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
       NF_HOOK include/linux/netfilter.h:289 [inline]
       ipv6_rcv+0x120/0x640 net/ipv6/ip6_input.c:271
       __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
       netif_receive_skb_internal+0x12c/0x620 net/core/dev.c:5126
       napi_frags_finish net/core/dev.c:5664 [inline]
       napi_gro_frags+0x75a/0xc90 net/core/dev.c:5737
       tun_get_user+0x3189/0x4250 drivers/net/tun.c:1923
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1968
       call_write_iter include/linux/fs.h:1808 [inline]
       do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
       do_iter_write+0x185/0x5f0 fs/read_write.c:959
       vfs_writev+0x1f1/0x360 fs/read_write.c:1004
       do_writev+0x11a/0x310 fs/read_write.c:1039
       __do_sys_writev fs/read_write.c:1112 [inline]
       __se_sys_writev fs/read_write.c:1109 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457421
      Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b5 fb ff c3 48 83 ec 08 e8 1a 2d 00 00 48 89 04 24 b8 14 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 63 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
      RSP: 002b:00007f2d30ecaba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 000000000000003e RCX: 0000000000457421
      RDX: 0000000000000001 RSI: 00007f2d30ecabf0 RDI: 00000000000000f0
      RBP: 0000000020000500 R08: 00000000000000f0 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000293 R12: 00007f2d30ecb6d4
      R13: 00000000004c4890 R14: 00000000004d7b90 R15: 00000000ffffffff
      
      Allocated by task 22437:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       __do_kmalloc mm/slab.c:3718 [inline]
       __kmalloc+0x14e/0x760 mm/slab.c:3727
       kmalloc include/linux/slab.h:518 [inline]
       sock_kmalloc+0x15a/0x1f0 net/core/sock.c:1983
       ip6_mc_source+0x14dd/0x1960 net/ipv6/mcast.c:427
       do_ipv6_setsockopt.isra.9+0x3afb/0x45d0 net/ipv6/ipv6_sockglue.c:743
       ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:933
       rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1069
       sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3038
       __sys_setsockopt+0x1ba/0x3c0 net/socket.c:1902
       __do_sys_setsockopt net/socket.c:1913 [inline]
       __se_sys_setsockopt net/socket.c:1910 [inline]
       __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 22430:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kfree+0xcf/0x230 mm/slab.c:3813
       __sock_kfree_s net/core/sock.c:2004 [inline]
       sock_kfree_s+0x29/0x60 net/core/sock.c:2010
       ip6_mc_leave_src+0x11a/0x1d0 net/ipv6/mcast.c:2448
       __ipv6_sock_mc_close+0x20b/0x4e0 net/ipv6/mcast.c:310
       ipv6_sock_mc_close+0x158/0x1d0 net/ipv6/mcast.c:328
       inet6_release+0x40/0x70 net/ipv6/af_inet6.c:452
       __sock_release+0xd7/0x250 net/socket.c:579
       sock_close+0x19/0x20 net/socket.c:1141
       __fput+0x385/0xa30 fs/file_table.c:278
       ____fput+0x15/0x20 fs/file_table.c:309
       task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:193 [inline]
       exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
       prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
       do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801ce7f2500
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 16 bytes inside of
       192-byte region [ffff8801ce7f2500, ffff8801ce7f25c0)
      The buggy address belongs to the page:
      page:ffffea000739fc80 count:1 mapcount:0 mapping:ffff8801da800040 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006f6e548 ffffea000737b948 ffff8801da800040
      raw: 0000000000000000 ffff8801ce7f2000 0000000100000010 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801ce7f2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801ce7f2480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      >ffff8801ce7f2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
       ffff8801ce7f2580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8801ce7f2600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03628bad
    • Linus Torvalds's avatar
      mremap: properly flush TLB before releasing the page · 0f1490a7
      Linus Torvalds authored
      Commit eb66ae03
      
       upstream.
      
      This is a backport to stable 3.18.y, based on Will Deacon's 4.4.y
      backport.
      
      Jann Horn points out that our TLB flushing was subtly wrong for the
      mremap() case.  What makes mremap() special is that we don't follow the
      usual "add page to list of pages to be freed, then flush tlb, and then
      free pages".  No, mremap() obviously just _moves_ the page from one page
      table location to another.
      
      That matters, because mremap() thus doesn't directly control the
      lifetime of the moved page with a freelist: instead, the lifetime of the
      page is controlled by the page table locking, that serializes access to
      the entry.
      
      As a result, we need to flush the TLB not just before releasing the lock
      for the source location (to avoid any concurrent accesses to the entry),
      but also before we release the destination page table lock (to avoid the
      TLB being flushed after somebody else has already done something to that
      page).
      
      This also makes the whole "need_flush" logic unnecessary, since we now
      always end up flushing the TLB for every valid entry.
      Reported-and-tested-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [will: backport to 4.4 stable]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      [ghackmann@google.com: adjust context]
      Signed-off-by: default avatarGreg Hackmann <ghackmann@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f1490a7
    • Linus Torvalds's avatar
      /proc/iomem: only expose physical resource addresses to privileged users · 01393bd2
      Linus Torvalds authored
      commit 51d7b120 upstream.
      
      In commit c4004b02
      
       ("x86: remove the kernel code/data/bss resources
      from /proc/iomem") I was hoping to remove the phyiscal kernel address
      data from /proc/iomem entirely, but that had to be reverted because some
      system programs actually use it.
      
      This limits all the detailed resource information to properly
      credentialed users instead.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMark Salyzyn <salyzyn@android.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01393bd2
    • Rasmus Villemoes's avatar
      perf tools: Disable parallelism for 'make clean' · df10e01a
      Rasmus Villemoes authored
      [ Upstream commit da15fc2f
      
       ]
      
      The Yocto build system does a 'make clean' when rebuilding due to
      changed dependencies, and that consistently fails for me (causing the
      whole BSP build to fail) with errors such as
      
      | find: '[...]/perf/1.0-r9/perf-1.0/plugin_mac80211.so': No such file or directory
      | find: '[...]/perf/1.0-r9/perf-1.0/plugin_mac80211.so': No such file or directory
      | find: find: '[...]/perf/1.0-r9/perf-1.0/libtraceevent.a''[...]/perf/1.0-r9/perf-1.0/libtraceevent.a': No such file or directory: No such file or directory
      |
      [...]
      | find: cannot delete '/mnt/xfs/devel/pil/yocto/tmp-glibc/work/wandboard-oe-linux-gnueabi/perf/1.0-r9/perf-1.0/util/.pstack.o.cmd': No such file or directory
      
      Apparently (despite the comment), 'make clean' ends up launching
      multiple sub-makes that all want to remove the same things - perhaps
      this only happens in combination with a O=... parameter. In any case, we
      don't lose much by explicitly disabling the parallelism for the clean
      target, and it makes automated builds much more reliable.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180705131527.19749-1-linux@rasmusvillemoes.dk
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      df10e01a
    • Khazhismel Kumykov's avatar
      fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() · b9e6f13b
      Khazhismel Kumykov authored
      [ Upstream commit ac081c3b ]
      
      On non-preempt kernels this loop can take a long time (more than 50 ticks)
      processing through entries.
      
      Link: http://lkml.kernel.org/r/20181010172623.57033-1-khazhy@google.com
      
      Signed-off-by: default avatarKhazhismel Kumykov <khazhy@google.com>
      Acked-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b9e6f13b
    • Hannes Frederic Sowa's avatar
      unix: correctly track in-flight fds in sending process user_struct · 32378362
      Hannes Frederic Sowa authored
      [ Upstream commit 415e3d3e ]
      
      The commit referenced in the Fixes tag incorrectly accounted the number
      of in-flight fds over a unix domain socket to the original opener
      of the file-descriptor. This allows another process to arbitrary
      deplete the original file-openers resource limit for the maximum of
      open files. Instead the sending processes and its struct cred should
      be credited.
      
      To do so, we add a reference counted struct user_struct pointer to the
      scm_fp_list and use it to account for the number of inflight unix fds.
      
      Fixes: 712f4aad
      
       ("unix: properly account for FDs passed over unix sockets")
      Reported-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      32378362
    • Prarit Bhargava's avatar
      x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs · 7746e511
      Prarit Bhargava authored
      [ Upstream commit da77b671 ]
      
      Commit b8941571 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
      non-compliant BARs") marked Home Agent 0 & PCU has having non-compliant
      BARs.  Home Agent 1 also has non-compliant BARs.
      
      Mark Home Agent 1 as having non-compliant BARs so the PCI core doesn't
      touch them.
      
      The problem with these devices is documented in the Xeon v4 specification
      update:
      
        BDF2          PCI BARs in the Home Agent Will Return Non-Zero Values
                      During Enumeration
      
        Problem:      During system initialization the Operating System may access
                      the standard PCI BARs (Base Address Registers).  Due to
                      this erratum, accesses to the Home Agent BAR registers (Bus
                      1; Device 18; Function 0,4; Offsets (0x14-0x24) will return
                      non-zero values.
      
        Implication:  The operating system may issue a warning.  Intel has not
                      observed any functional failures due to this erratum.
      
      Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
      Fixes: b8941571
      
       ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: "H. Peter Anvin" <hpa@zytor.com>
      CC: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7746e511
    • Hannes Frederic Sowa's avatar
      net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration · a5bb227c
      Hannes Frederic Sowa authored
      [ Upstream commit 7bbadd2d ]
      
      Docbook does not like the definition of macros inside a field declaration
      and adds a warning. Move the definition out.
      
      Fixes: 79462ad0
      
       ("net: add validation for the socket syscall protocol argument")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a5bb227c
    • Alan Stern's avatar
      USB: hub: fix up early-exit pathway in hub_activate · e57bb991
      Alan Stern authored
      [ Upstream commit ca5cbc8b ]
      
      The early-exit pathway in hub_activate, added by commit e50293ef
      
      
      ("USB: fix invalid memory access in hub_activate()") needs
      improvement.  It duplicates code that is already present at the end of
      the subroutine, and it neglects to undo the effect of a
      usb_autopm_get_interface_no_resume() call.
      
      This patch fixes both problems by making the early-exit pathway jump
      directly to the end of the subroutine.  It simplifies the code at the
      end by merging two conditionals that actually test the same condition
      although they appear different: If type < HUB_INIT3 then type must be
      either HUB_INIT2 or HUB_INIT, and it can't be HUB_INIT because in that
      case the subroutine would have exited earlier.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      CC: <stable@vger.kernel.org> #4.4+
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e57bb991
    • Eric Biggers's avatar
      KEYS: put keyring if install_session_keyring_to_cred() fails · 7938ba3f
      Eric Biggers authored
      [ Upstream commit d636bd9f ]
      
      In join_session_keyring(), if install_session_keyring_to_cred() were to
      fail, we would leak the keyring reference, just like in the bug fixed by
      commit 23567fd0
      
       ("KEYS: Fix keyring ref leak in
      join_session_keyring()").  Fortunately this cannot happen currently, but
      we really should be more careful.  Do this by adding and using a new
      error label at which the keyring reference is dropped.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7938ba3f
    • Jan Beulich's avatar
      igb: fix NULL derefs due to skipped SR-IOV enabling · 7b8052e1
      Jan Beulich authored
      [ Upstream commit be06998f ]
      
      The combined effect of commits 6423fc34 ("igb: do not re-init SR-IOV
      during probe") and ceee3450
      
       ("igb: make sure SR-IOV init uses the
      right number of queues") causes VFs no longer getting set up, leading
      to NULL pointer dereferences due to the adapter's ->vf_data being NULL
      while ->vfs_allocated_count is non-zero. The first commit not only
      neglected the side effect of igb_sriov_reinit() that the second commit
      tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
      without which igb_enable_sriov() is effectively a no-op. Calling
      igb_{,re}set_interrupt_capability() as done here seems to address this,
      but I'm not sure whether this is better than sinply reverting the other
      two commits.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7b8052e1
    • Miklos Szeredi's avatar
      ovl: fix open in stacked overlay · d255d18a
      Miklos Szeredi authored
      [ Upstream commit 1c8a47df
      
       ]
      
      If two overlayfs filesystems are stacked on top of each other, then we need
      recursion in ovl_d_select_inode().
      
      I guess d_backing_inode() is supposed to do that.  But currently it doesn't
      and that functionality is open coded in vfs_open().  This is now copied
      into ovl_d_select_inode() to fix this regression.
      Reported-by: default avatarAlban Crequy <alban.crequy@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Fixes: 4bacc9c9
      
       ("overlayfs: Make f_path always point to the overlay...")
      Cc: David Howells <dhowells@redhat.com>
      Cc: <stable@vger.kernel.org> # v4.2+
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d255d18a
    • Arik Nemtsov's avatar
      iwlwifi: pcie: correctly define 7265-D cfg · 61fde28f
      Arik Nemtsov authored
      [ Upstream commit 2b0e2b0f ]
      
      The trans cfg was not replaced for 7265-D cards. This led to a check of
      the min-NVM version against a 7265-C card, causing very-old 7265-D cards
      to operate incorrectly with the driver.
      
      Fixes: 3fd0d3c1
      
       ("iwlwifi: pcie: support 7265-D devices")
      Signed-off-by: default avatarArik Nemtsov <arikx.nemtsov@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      61fde28f
    • Xin Long's avatar
      sctp: translate network order to host order when users get a hmacid · beb685c8
      Xin Long authored
      [ Upstream commit 7a84bd46 ]
      
      Commit ed5a377d ("sctp: translate host order to network order when
      setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
      but the same issue also exists on getting a hmacid.
      
      We fix it by changing hmacids to host order when users get them with
      getsockopt.
      
      Fixes: Commit ed5a377d
      
       ("sctp: translate host order to network order when setting a hmacid")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      beb685c8
    • Jan Kara's avatar
      vfs: Make sendfile(2) killable even better · ce2c2e07
      Jan Kara authored
      [ Upstream commit c725bfce ]
      
      Commit 296291cd
      
       (mm: make sendfile(2) killable) fixed an issue where
      sendfile(2) was doing a lot of tiny writes into a filesystem and thus
      was unkillable for a long time. However sendfile(2) can be (mis)used to
      issue lots of writes into arbitrary file descriptor such as evenfd or
      similar special file descriptors which never hit the standard filesystem
      write path and thus are still unkillable. E.g. the following example
      from Dmitry burns CPU for ~16s on my test system without possibility to
      be killed:
      
              int r1 = eventfd(0, 0);
              int r2 = memfd_create("", 0);
              unsigned long n = 1<<30;
              fallocate(r2, 0, 0, n);
              sendfile(r1, r2, 0, n);
      
      There are actually quite a few tests for pending signals in sendfile
      code however we data to write is always available none of them seems to
      trigger. So fix the problem by adding a test for pending signal into
      splice_from_pipe_next() also before the loop waiting for pipe buffers to
      be available. This should fix all the lockup issues with sendfile of the
      do-ton-of-tiny-writes nature.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ce2c2e07
    • Alex Williamson's avatar
      PCI: Fix devfn for VPD access through function 0 · ffad2775
      Alex Williamson authored
      [ Upstream commit 9d924075 ]
      
      Commit 932c435c ("PCI: Add dev_flags bit to access VPD through function
      0") passes PCI_SLOT(devfn) for the devfn parameter of pci_get_slot().
      Generally this works because we're fairly well guaranteed that a PCIe
      device is at slot address 0, but for the general case, including
      conventional PCI, it's incorrect.  We need to get the slot and then convert
      it back into a devfn.
      
      Fixes: 932c435c
      
       ("PCI: Add dev_flags bit to access VPD through function 0")
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <helgaas@kernel.org>
      Acked-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Acked-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ffad2775
    • Jan Beulich's avatar
      x86/ldt: Fix small LDT allocation for Xen · c7f6eab8
      Jan Beulich authored
      [ Upstream commit f454b478 ]
      
      While the following commit:
      
        37868fe1
      
       ("x86/ldt: Make modify_ldt synchronous")
      
      added a nice comment explaining that Xen needs page-aligned
      whole page chunks for guest descriptor tables, it then
      nevertheless used kzalloc() on the small size path.
      
      As I'm unaware of guarantees for kmalloc(PAGE_SIZE, ) to return
      page-aligned memory blocks, I believe this needs to be switched
      back to __get_free_page() (or better get_zeroed_page()).
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/55E735D6020000780009F1E6@prv-mh.provo.novell.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c7f6eab8
    • Ken Xue's avatar
      Revert "SCSI: Fix NULL pointer dereference in runtime PM" · 1c857dc0
      Ken Xue authored
      [ Upstream commit 1c69d3b6 ]
      
      This reverts commit 49718f0f ("SCSI: Fix NULL pointer dereference in
      runtime PM")
      
      The old commit may lead to a issue that blk_{pre|post}_runtime_suspend and
      blk_{pre|post}_runtime_resume may not be called in pairs.
      
      Take sr device as example, when sr device goes to runtime suspend,
      blk_{pre|post}_runtime_suspend will be called since sr device defined
      pm->runtime_suspend. But blk_{pre|post}_runtime_resume will not be called
      since sr device doesn't have pm->runtime_resume. so, sr device can not
      resume correctly anymore.
      
      More discussion can be found from below link.
      http://marc.info/?l=linux-scsi&m=144163730531875&w=2
      
      Signed-off-by: default avatarKen Xue <Ken.Xue@amd.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Cc: Xiangliang Yu <Xiangliang.Yu@amd.com>
      Cc: James E.J. Bottomley <JBottomley@odin.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michael Terry <Michael.terry@canonical.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1c857dc0
    • Naoya Horiguchi's avatar
      mm: migrate: hugetlb: putback destination hugepage to active list · f46c09f2
      Naoya Horiguchi authored
      [ Upstream commit 3aaa76e1 ]
      
      Since commit bcc54222 ("mm: hugetlb: introduce page_huge_active")
      each hugetlb page maintains its active flag to avoid a race condition
      betwe= en multiple calls of isolate_huge_page(), but current kernel
      doesn't set the f= lag on a hugepage allocated by migration because the
      proper putback routine isn= 't called.  This means that users could
      still encounter the race referred to by bcc54222 in this special
      case, so this patch fixes it.
      
      Fixes: bcc54222
      
       ("mm: hugetlb: introduce page_huge_active")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>  [4.1.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f46c09f2
    • Peter Zijlstra's avatar
      perf: Fix PERF_EVENT_IOC_PERIOD deadlock · 73c72ba6
      Peter Zijlstra authored
      [ Upstream commit 642c2d67
      
       ]
      
      Dmitry reported a fairly silly recursive lock deadlock for
      PERF_EVENT_IOC_PERIOD, fix this by explicitly doing the inactive part of
      __perf_event_period() instead of calling that function.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: c7999c6f ("perf: Fix PERF_EVENT_IOC_PERIOD migration race")
      Link: http://lkml.kernel.org/r/20151130115615.GJ17308@twins.programming.kicks-ass.net
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      73c72ba6
    • Sudip Mukherjee's avatar
      libata: blacklist Micron 500IT SSD with MU01 firmware · 4ac4abf7
      Sudip Mukherjee authored
      [ Upstream commit 136d769e ]
      
      While whitelisting Micron M500DC drives, the tweaked blacklist entry
      enabled queued TRIM from M500IT variants also. But these do not support
      queued TRIM. And while using those SSDs with the latest kernel we have
      seen errors and even the partition table getting corrupted.
      
      Some part from the dmesg:
      [    6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133
      [    6.727390] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
      [    6.741026] ata1.00: supports DRM functions and may not be fully accessible
      [    6.759887] ata1.00: configured for UDMA/133
      [    6.762256] scsi 0:0:0:0: Direct-Access     ATA      Micron_M500IT_MT MU01 PQ: 0 ANSI: 5
      
      and then for the error:
      [  120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen
      [  120.860338] ata1.00: irq_stat 0x40000008
      [  120.860342] ata1.00: failed command: SEND FPDMA QUEUED
      [  120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out
               res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout)
      [  120.860353] ata1.00: status: { DRDY }
      [  120.860543] ata1: hard resetting link
      [  121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
      [  121.166376] ata1.00: supports DRM functions and may not be fully accessible
      [  121.186238] ata1.00: supports DRM functions and may not be fully accessible
      [  121.204445] ata1.00: configured for UDMA/133
      [  121.204454] ata1.00: device reported invalid CHS sector 0
      [  121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
      [  121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current]
      [  121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4
      [  121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00
      [  121.204559] print_req_error: I/O error, dev sda, sector 272512
      
      After few reboots with these errors, and the SSD is corrupted.
      After blacklisting it, the errors are not seen and the SSD does not get
      corrupted any more.
      
      Fixes: 243918be
      
       ("libata: Do not blacklist Micron M500DC")
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ac4abf7
    • Shota Suzuki's avatar
      igb: Unpair the queues when changing the number of queues · 3bf6a2fa
      Shota Suzuki authored
      [ Upstream commit 37a5d163 ]
      
      By the commit 72ddef05
      
       ("igb: Fix oops caused by missing queue
      pairing"), the IGB_FLAG_QUEUE_PAIRS flag can now be set when changing the
      number of queues by "ethtool -L", but it is never cleared unless the igb
      driver is reloaded.
      This patch clears it if queue pairing becomes unnecessary as a result of
      "ethtool -L".
      Signed-off-by: default avatarShota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3bf6a2fa
    • Filipe Manana's avatar
      Btrfs: do not ignore errors from btrfs_lookup_xattr in do_setxattr · b48138a2
      Filipe Manana authored
      [ Upstream commit 5cdf83ed ]
      
      The return value from btrfs_lookup_xattr() can be a pointer encoding an
      error, therefore deal with it. This fixes commit 5f5bc6b1
      
      
      ("Btrfs: make xattr replace operations atomic").
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b48138a2
    • Peter Hurley's avatar
      tty: audit: Fix audit source · 52a25e71
      Peter Hurley authored
      [ Upstream commit 6b2a3d62 ]
      
      The data to audit/record is in the 'from' buffer (ie., the input
      read buffer).
      
      Fixes: 72586c60
      
       ("n_tty: Fix auditing support for cannonical mode")
      Cc: stable <stable@vger.kernel.org> # 4.1+
      Cc: Miloslav Trmač <mitr@redhat.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Acked-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      52a25e71
    • Anssi Hannula's avatar
      ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly · 21bfce66
      Anssi Hannula authored
      [ Upstream commit 42e3121d ]
      
      AudioQuest DragonFly DAC reports a volume control range of 0..50
      (0x0000..0x0032) which in USB Audio means a range of 0 .. 0.2dB, which
      is obviously incorrect and would cause software using the dB information
      in e.g. volume sliders to have a massive volume difference in 100..102%
      range.
      
      Commit 2d1cb7f6
      
       ("ALSA: usb-audio: add dB range mapping for some
      devices") added a dB range mapping for it with range 0..50 dB.
      
      However, the actual volume mapping seems to be neither linear volume nor
      linear dB scale, but instead quite close to the cubic mapping e.g.
      alsamixer uses, with a range of approx. -53...0 dB.
      
      Replace the previous quirk with a custom dB mapping based on some basic
      output measurements, using a 10-item range TLV (which will still fit in
      alsa-lib MAX_TLV_RANGE_SIZE).
      
      Tested on AudioQuest DragonFly HW v1.2. The quirk is only applied if the
      range is 0..50, so if this gets fixed/changed in later HW revisions it
      will no longer be applied.
      
      v2: incorporated Takashi Iwai's suggestion for the quirk application
      method
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@iki.fi>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21bfce66
    • Mateusz Sylwestrzak's avatar
      ALSA: hda - Add headset mic support for Acer Aspire V5-573G · 7e746c55
      Mateusz Sylwestrzak authored
      [ Upstream commit 0420694d ]
      
      Acer Aspire V5 with the ALC282 codec is given the wrong value for the
      0x19 PIN by the laptop's BIOS. Overriding it with the correct value
      adds support for the headset microphone which would not otherwise be
      visible in the system.
      
      The fix is based on commit 7819717b with a similar quirk for Acer
      Aspire with the ALC269 codec.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96201
      
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMateusz Sylwestrzak <matisec7@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7e746c55