1. 21 Oct, 2017 28 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.18.77 · 6f457819
      Greg Kroah-Hartman authored
      6f457819
    • Greg Kroah-Hartman's avatar
      Revert "tty: goldfish: Fix a parameter of a call to free_irq" · 3cde5529
      Greg Kroah-Hartman authored
      This reverts commit 09610721 which is
      commit 1a5c2d1d upstream.
      
      Ben writes:
      	This fixes a bug introduced in 4.6 by commit 465893e1
      
       "tty:
      	goldfish: support platform_device with id -1".  For earlier
      	kernel versions, it *introduces* a bug.
      
      So let's drop it.
      Reported-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Sasha Levin <alexander.levin@verizon.com>
      Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
      3cde5529
    • Varun Prakash's avatar
      target/iscsi: Fix unsolicited data seq_end_offset calculation · e524bfab
      Varun Prakash authored
      [ Upstream commit 4d65491c
      
       ]
      
      In case of unsolicited data for the first sequence
      seq_end_offset must be set to minimum of total data length
      and FirstBurstLength, so do not add cmd->write_data_done
      to the min of total data length and FirstBurstLength.
      
      This patch avoids that with ImmediateData=Yes, InitialR2T=No,
      MaxXmitDataSegmentLength < FirstBurstLength that a WRITE command
      with IO size above FirstBurstLength triggers sequence error
      messages, for example
      
      Set following parameters on target (linux-4.8.12)
      ImmediateData = Yes
      InitialR2T = No
      MaxXmitDataSegmentLength = 8k
      FirstBurstLength = 64k
      
      Log in from Open iSCSI initiator and execute
      dd if=/dev/zero of=/dev/sdb bs=128k count=1 oflag=direct
      
      Error messages on target
      Command ITT: 0x00000035 with Offset: 65536, Length: 8192 outside
      of Sequence 73728:131072 while DataSequenceInOrder=Yes.
      Command ITT: 0x00000035, received DataSN: 0x00000001 higher than
      expected 0x00000000.
      Unable to perform within-command recovery while ERL=0.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      [ bvanassche: Use min() instead of open-coding it / edited patch description ]
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e524bfab
    • Dmitry V. Levin's avatar
      uapi: fix linux/mroute6.h userspace compilation errors · 1405b8e7
      Dmitry V. Levin authored
      [ Upstream commit 72aa107d
      
       ]
      
      Include <linux/in6.h> to fix the following linux/mroute6.h userspace
      compilation errors:
      
      /usr/include/linux/mroute6.h:80:22: error: field 'mf6cc_origin' has incomplete type
        struct sockaddr_in6 mf6cc_origin;  /* Origin of mcast */
      /usr/include/linux/mroute6.h:81:22: error: field 'mf6cc_mcastgrp' has incomplete type
        struct sockaddr_in6 mf6cc_mcastgrp;  /* Group in question */
      /usr/include/linux/mroute6.h:91:22: error: field 'src' has incomplete type
        struct sockaddr_in6 src;
      /usr/include/linux/mroute6.h:92:22: error: field 'grp' has incomplete type
        struct sockaddr_in6 grp;
      /usr/include/linux/mroute6.h:132:18: error: field 'im6_src' has incomplete type
        struct in6_addr im6_src, im6_dst;
      /usr/include/linux/mroute6.h:132:27: error: field 'im6_dst' has incomplete type
        struct in6_addr im6_src, im6_dst;
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1405b8e7
    • Dmitry V. Levin's avatar
      uapi: fix linux/rds.h userspace compilation errors · d090846a
      Dmitry V. Levin authored
      [ Upstream commit feb0869d
      
       ]
      
      Consistently use types from linux/types.h to fix the following
      linux/rds.h userspace compilation errors:
      
      /usr/include/linux/rds.h:106:2: error: unknown type name 'uint8_t'
        uint8_t name[32];
      /usr/include/linux/rds.h:107:2: error: unknown type name 'uint64_t'
        uint64_t value;
      /usr/include/linux/rds.h:117:2: error: unknown type name 'uint64_t'
        uint64_t next_tx_seq;
      /usr/include/linux/rds.h:118:2: error: unknown type name 'uint64_t'
        uint64_t next_rx_seq;
      /usr/include/linux/rds.h:121:2: error: unknown type name 'uint8_t'
        uint8_t transport[TRANSNAMSIZ];  /* null term ascii */
      /usr/include/linux/rds.h:122:2: error: unknown type name 'uint8_t'
        uint8_t flags;
      /usr/include/linux/rds.h:129:2: error: unknown type name 'uint64_t'
        uint64_t seq;
      /usr/include/linux/rds.h:130:2: error: unknown type name 'uint32_t'
        uint32_t len;
      /usr/include/linux/rds.h:135:2: error: unknown type name 'uint8_t'
        uint8_t flags;
      /usr/include/linux/rds.h:139:2: error: unknown type name 'uint32_t'
        uint32_t sndbuf;
      /usr/include/linux/rds.h:144:2: error: unknown type name 'uint32_t'
        uint32_t rcvbuf;
      /usr/include/linux/rds.h:145:2: error: unknown type name 'uint64_t'
        uint64_t inum;
      /usr/include/linux/rds.h:153:2: error: unknown type name 'uint64_t'
        uint64_t       hdr_rem;
      /usr/include/linux/rds.h:154:2: error: unknown type name 'uint64_t'
        uint64_t       data_rem;
      /usr/include/linux/rds.h:155:2: error: unknown type name 'uint32_t'
        uint32_t       last_sent_nxt;
      /usr/include/linux/rds.h:156:2: error: unknown type name 'uint32_t'
        uint32_t       last_expected_una;
      /usr/include/linux/rds.h:157:2: error: unknown type name 'uint32_t'
        uint32_t       last_seen_una;
      /usr/include/linux/rds.h:164:2: error: unknown type name 'uint8_t'
        uint8_t  src_gid[RDS_IB_GID_LEN];
      /usr/include/linux/rds.h:165:2: error: unknown type name 'uint8_t'
        uint8_t  dst_gid[RDS_IB_GID_LEN];
      /usr/include/linux/rds.h:167:2: error: unknown type name 'uint32_t'
        uint32_t max_send_wr;
      /usr/include/linux/rds.h:168:2: error: unknown type name 'uint32_t'
        uint32_t max_recv_wr;
      /usr/include/linux/rds.h:169:2: error: unknown type name 'uint32_t'
        uint32_t max_send_sge;
      /usr/include/linux/rds.h:170:2: error: unknown type name 'uint32_t'
        uint32_t rdma_mr_max;
      /usr/include/linux/rds.h:171:2: error: unknown type name 'uint32_t'
        uint32_t rdma_mr_size;
      /usr/include/linux/rds.h:212:9: error: unknown type name 'uint64_t'
       typedef uint64_t rds_rdma_cookie_t;
      /usr/include/linux/rds.h:215:2: error: unknown type name 'uint64_t'
        uint64_t addr;
      /usr/include/linux/rds.h:216:2: error: unknown type name 'uint64_t'
        uint64_t bytes;
      /usr/include/linux/rds.h:221:2: error: unknown type name 'uint64_t'
        uint64_t cookie_addr;
      /usr/include/linux/rds.h:222:2: error: unknown type name 'uint64_t'
        uint64_t flags;
      /usr/include/linux/rds.h:228:2: error: unknown type name 'uint64_t'
        uint64_t  cookie_addr;
      /usr/include/linux/rds.h:229:2: error: unknown type name 'uint64_t'
        uint64_t  flags;
      /usr/include/linux/rds.h:234:2: error: unknown type name 'uint64_t'
        uint64_t flags;
      /usr/include/linux/rds.h:240:2: error: unknown type name 'uint64_t'
        uint64_t local_vec_addr;
      /usr/include/linux/rds.h:241:2: error: unknown type name 'uint64_t'
        uint64_t nr_local;
      /usr/include/linux/rds.h:242:2: error: unknown type name 'uint64_t'
        uint64_t flags;
      /usr/include/linux/rds.h:243:2: error: unknown type name 'uint64_t'
        uint64_t user_token;
      /usr/include/linux/rds.h:248:2: error: unknown type name 'uint64_t'
        uint64_t  local_addr;
      /usr/include/linux/rds.h:249:2: error: unknown type name 'uint64_t'
        uint64_t  remote_addr;
      /usr/include/linux/rds.h:252:4: error: unknown type name 'uint64_t'
          uint64_t compare;
      /usr/include/linux/rds.h:253:4: error: unknown type name 'uint64_t'
          uint64_t swap;
      /usr/include/linux/rds.h:256:4: error: unknown type name 'uint64_t'
          uint64_t add;
      /usr/include/linux/rds.h:259:4: error: unknown type name 'uint64_t'
          uint64_t compare;
      /usr/include/linux/rds.h:260:4: error: unknown type name 'uint64_t'
          uint64_t swap;
      /usr/include/linux/rds.h:261:4: error: unknown type name 'uint64_t'
          uint64_t compare_mask;
      /usr/include/linux/rds.h:262:4: error: unknown type name 'uint64_t'
          uint64_t swap_mask;
      /usr/include/linux/rds.h:265:4: error: unknown type name 'uint64_t'
          uint64_t add;
      /usr/include/linux/rds.h:266:4: error: unknown type name 'uint64_t'
          uint64_t nocarry_mask;
      /usr/include/linux/rds.h:269:2: error: unknown type name 'uint64_t'
        uint64_t flags;
      /usr/include/linux/rds.h:270:2: error: unknown type name 'uint64_t'
        uint64_t user_token;
      /usr/include/linux/rds.h:274:2: error: unknown type name 'uint64_t'
        uint64_t user_token;
      /usr/include/linux/rds.h:275:2: error: unknown type name 'int32_t'
        int32_t  status;
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d090846a
    • Dan Carpenter's avatar
      scsi: scsi_dh_emc: return success in clariion_std_inquiry() · ffc669a3
      Dan Carpenter authored
      [ Upstream commit 4d7d39a1 ]
      
      We accidentally return an uninitialized variable on success.
      
      Fixes: b6ff1b14
      
       ("[SCSI] scsi_dh: Update EMC handler")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffc669a3
    • Eric Ren's avatar
      ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock · b370a037
      Eric Ren authored
      [ Upstream commit 439a36b8 ]
      
      We are in the situation that we have to avoid recursive cluster locking,
      but there is no way to check if a cluster lock has been taken by a precess
      already.
      
      Mostly, we can avoid recursive locking by writing code carefully.
      However, we found that it's very hard to handle the routines that are
      invoked directly by vfs code.  For instance:
      
        const struct inode_operations ocfs2_file_iops = {
            .permission     = ocfs2_permission,
            .get_acl        = ocfs2_iop_get_acl,
            .set_acl        = ocfs2_iop_set_acl,
        };
      
      Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
      
        do_sys_open
         may_open
          inode_permission
           ocfs2_permission
            ocfs2_inode_lock() <=== first time
             generic_permission
              get_acl
               ocfs2_iop_get_acl
        	ocfs2_inode_lock() <=== recursive one
      
      A deadlock will occur if a remote EX request comes in between two of
      ocfs2_inode_lock().  Briefly describe how the deadlock is formed:
      
      On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
      BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
      the remote EX lock request.  Another hand, the recursive cluster lock
      (the second one) will be blocked in in __ocfs2_cluster_lock() because of
      OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
      there is no chance for the first cluster lock on this node to be
      unlocked - we block ourselves in the code path.
      
      The idea to fix this issue is mostly taken from gfs2 code.
      
      1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track
         of the processes' pid who has taken the cluster lock of this lock
         resource;
      
      2. introduce a new flag for ocfs2_inode_lock_full:
         OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for
         us if we've got cluster lock.
      
      3. export a helper: ocfs2_is_locked_by_me() is used to check if we have
         got the cluster lock in the upper code path.
      
      The tracking logic should be used by some of the ocfs2 vfs's callbacks,
      to solve the recursive locking issue cuased by the fact that vfs
      routines can call into each other.
      
      The performance penalty of processing the holder list should only be
      seen at a few cases where the tracking logic is used, such as get/set
      acl.
      
      You may ask what if the first time we got a PR lock, and the second time
      we want a EX lock? fortunately, this case never happens in the real
      world, as far as I can see, including permission check,
      (get|set)_(acl|attr), and the gfs2 code also do so.
      
      [sfr@canb.auug.org.au remove some inlines]
      Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com
      
      Signed-off-by: default avatarEric Ren <zren@suse.com>
      Reviewed-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b370a037
    • Milan Broz's avatar
      crypto: xts - Add ECB dependency · 7cab7260
      Milan Broz authored
      [ Upstream commit 12cb3a1c ]
      
      Since the
         commit f1c131b4
      
      
         crypto: xts - Convert to skcipher
      the XTS mode is based on ECB, so the mode must select
      ECB otherwise it can fail to initialize.
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cab7260
    • Majd Dibbiny's avatar
      net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs · bcd17067
      Majd Dibbiny authored
      [ Upstream commit 95f1ba9a ]
      
      In the VF driver, module parameter mlx4_log_num_mgm_entry_size was
      mistakenly overwritten -- and in a manner which overrode the
      device-managed flow steering option encoded in the parameter.
      
      log_num_mgm_entry_size is a global module parameter which
      affects all ConnectX-3 PFs installed on that host.
      If a VF changes log_num_mgm_entry_size, this will affect all PFs
      which are probed subsequent to the change (by disabling DMFS for
      those PFs).
      
      Fixes: 3c439b55
      
       ("mlx4_core: Allow choosing flow steering mode")
      Signed-off-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcd17067
    • Robbie Ko's avatar
      Btrfs: send, fix failure to rename top level inode due to name collision · 2d9e4b59
      Robbie Ko authored
      [ Upstream commit 4dd9920d
      
       ]
      
      Under certain situations, an incremental send operation can fail due to a
      premature attempt to create a new top level inode (a direct child of the
      subvolume/snapshot root) whose name collides with another inode that was
      removed from the send snapshot.
      
      Consider the following example scenario.
      
      Parent snapshot:
      
        .                 (ino 256, gen 8)
        |---- a1/         (ino 257, gen 9)
        |---- a2/         (ino 258, gen 9)
      
      Send snapshot:
      
        .                 (ino 256, gen 3)
        |---- a2/         (ino 257, gen 7)
      
      In this scenario, when receiving the incremental send stream, the btrfs
      receive command fails like this (ran in verbose mode, -vv argument):
      
        rmdir a1
        mkfile o257-7-0
        rename o257-7-0 -> a2
        ERROR: rename o257-7-0 -> a2 failed: Is a directory
      
      What happens when computing the incremental send stream is:
      
      1) An operation to remove the directory with inode number 257 and
         generation 9 is issued.
      
      2) An operation to create the inode with number 257 and generation 7 is
         issued. This creates the inode with an orphanized name of "o257-7-0".
      
      3) An operation rename the new inode 257 to its final name, "a2", is
         issued. This is incorrect because inode 258, which has the same name
         and it's a child of the same parent (root inode 256), was not yet
         processed and therefore no rmdir operation for it was yet issued.
         The rename operation is issued because we fail to detect that the
         name of the new inode 257 collides with inode 258, because their
         parent, a subvolume/snapshot root (inode 256) has a different
         generation in both snapshots.
      
      So fix this by ignoring the generation value of a parent directory that
      matches a root inode (number 256) when we are checking if the name of the
      inode currently being processed collides with the name of some other
      inode that was not yet processed.
      
      We can achieve this scenario of different inodes with the same number but
      different generation values either by mounting a filesystem with the inode
      cache option (-o inode_cache) or by creating and sending snapshots across
      different filesystems, like in the following example:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
        $ mkdir /mnt/a1
        $ mkdir /mnt/a2
        $ btrfs subvolume snapshot -r /mnt /mnt/snap1
        $ btrfs send /mnt/snap1 -f /tmp/1.snap
        $ umount /mnt
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
        $ touch /mnt/a2
        $ btrfs subvolume snapshot -r /mnt /mnt/snap2
        $ btrfs receive /mnt -f /tmp/1.snap
        # Take note that once the filesystem is created, its current
        # generation has value 7 so the inode from the second snapshot has
        # a generation value of 7. And after receiving the first snapshot
        # the filesystem is at a generation value of 10, because the call to
        # create the second snapshot bumps the generation to 8 (the snapshot
        # creation ioctl does a transaction commit), the receive command calls
        # the snapshot creation ioctl to create the first snapshot, which bumps
        # the filesystem's generation to 9, and finally when the receive
        # operation finishes it calls an ioctl to transition the first snapshot
        # (snap1) from RW mode to RO mode, which does another transaction commit
        # and bumps the filesystem's generation to 10.
        $ rm -f /tmp/1.snap
        $ btrfs send /mnt/snap1 -f /tmp/1.snap
        $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.snap
        $ umount /mnt
      
        $ mkfs.btrfs -f /dev/sdd
        $ mount /dev/sdd /mnt
        $ btrfs receive /mnt /tmp/1.snap
        # Receive of snapshot snap2 used to fail.
        $ btrfs receive /mnt /tmp/2.snap
      Signed-off-by: default avatarRobbie Ko <robbieko@synology.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      [Rewrote changelog to be more precise and clear]
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d9e4b59
    • Christophe JAILLET's avatar
      iio: adc: xilinx: Fix error handling · 21bf5707
      Christophe JAILLET authored
      [ Upstream commit ca1c39ef
      
       ]
      
      Reorder error handling labels in order to match the way resources have
      been allocated.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21bf5707
    • Jarno Rajahalme's avatar
      netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value. · 9a504374
      Jarno Rajahalme authored
      [ Upstream commit 4b86c459 ]
      
      Commit 4dee62b1
      
       ("netfilter: nf_ct_expect: nf_ct_expect_insert()
      returns void") inadvertently changed the successful return value of
      nf_ct_expect_related_report() from 0 to 1 due to
      __nf_ct_expect_check() returning 1 on success.  Prevent this
      regression in the future by changing the return value of
      __nf_ct_expect_check() to 0 on success.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a504374
    • Franck Demathieu's avatar
      irqchip/crossbar: Fix incorrect type of local variables · ef042316
      Franck Demathieu authored
      [ Upstream commit b28ace12
      
       ]
      
      The max and entry variables are unsigned according to the dt-bindings.
      Fix following 3 sparse issues (-Wtypesign):
      
        drivers/irqchip/irq-crossbar.c:222:52: warning: incorrect type in argument 3 (different signedness)
        drivers/irqchip/irq-crossbar.c:222:52:    expected unsigned int [usertype] *out_value
        drivers/irqchip/irq-crossbar.c:222:52:    got int *<noident>
      
        drivers/irqchip/irq-crossbar.c:245:56: warning: incorrect type in argument 4 (different signedness)
        drivers/irqchip/irq-crossbar.c:245:56:    expected unsigned int [usertype] *out_value
        drivers/irqchip/irq-crossbar.c:245:56:    got int *<noident>
      
        drivers/irqchip/irq-crossbar.c:263:56: warning: incorrect type in argument 4 (different signedness)
        drivers/irqchip/irq-crossbar.c:263:56:    expected unsigned int [usertype] *out_value
        drivers/irqchip/irq-crossbar.c:263:56:    got int *<noident>
      Signed-off-by: default avatarFranck Demathieu <fdemathieu@gmail.com>
      Cc: marc.zyngier@arm.com
      Cc: jason@lakedaemon.net
      Link: http://lkml.kernel.org/r/20170223094855.6546-1-fdemathieu@gmail.com
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef042316
    • Arnd Bergmann's avatar
      watchdog: kempld: fix gcc-4.3 build · 5961adef
      Arnd Bergmann authored
      [ Upstream commit 3736d4eb
      
       ]
      
      gcc-4.3 can't decide whether the constant value in
      kempld_prescaler[PRESCALER_21] is built-time constant or
      not, and gets confused by the logic in do_div():
      
      drivers/watchdog/kempld_wdt.o: In function `kempld_wdt_set_stage_timeout':
      kempld_wdt.c:(.text.kempld_wdt_set_stage_timeout+0x130): undefined reference to `__aeabi_uldivmod'
      
      This adds a call to ACCESS_ONCE() to force it to not consider
      it to be constant, and leaves the more efficient normal case
      in place for modern compilers, using an #ifdef to annotate
      why we do this hack.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5961adef
    • Peter Zijlstra's avatar
      locking/lockdep: Add nest_lock integrity test · 98119907
      Peter Zijlstra authored
      [ Upstream commit 7fb4a2ce
      
       ]
      
      Boqun reported that hlock->references can overflow. Add a debug test
      for that to generate a clear error when this happens.
      
      Without this, lockdep is likely to report a mysterious failure on
      unlock.
      Reported-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98119907
    • Greg Kroah-Hartman's avatar
      Revert "bsg-lib: don't free job in bsg_prepare_job" · 4a8a916d
      Greg Kroah-Hartman authored
      This reverts commit d9100405 which was
      commit f507b54d
      
       upstream.
      
      Ben reports:
      	That function doesn't exist here (it was introduced in 4.13).
      	Instead, this backport has modified bsg_create_job(), creating a
      	leak.  Please revert this on the 3.18, 4.4 and 4.9 stable
      	branches.
      
      So I'm dropping it from here.
      Reported-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
      4a8a916d
    • Christoph Paasch's avatar
      net: Set sk_prot_creator when cloning sockets to the right proto · bc8a5a45
      Christoph Paasch authored
      [ Upstream commit 9d538fa6
      
       ]
      
      sk->sk_prot and sk->sk_prot_creator can differ when the app uses
      IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
      Which is why sk_prot_creator is there to make sure that sk_prot_free()
      does the kmem_cache_free() on the right kmem_cache slab.
      
      Now, if such a socket gets transformed back to a listening socket (using
      connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
      sk_clone_lock() when a new connection comes in. But sk_prot_creator will
      still point to the IPv6 kmem_cache (as everything got copied in
      sk_clone_lock()). When freeing, we will thus put this
      memory back into the IPv6 kmem_cache although it was allocated in the
      IPv4 cache. I have seen memory corruption happening because of this.
      
      With slub-debugging and MEMCG_KMEM enabled this gives the warning
      	"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"
      
      A C-program to trigger this:
      
      void main(void)
      {
              int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
              int new_fd, newest_fd, client_fd;
              struct sockaddr_in6 bind_addr;
              struct sockaddr_in bind_addr4, client_addr1, client_addr2;
              struct sockaddr unsp;
              int val;
      
              memset(&bind_addr, 0, sizeof(bind_addr));
              bind_addr.sin6_family = AF_INET6;
              bind_addr.sin6_port = ntohs(42424);
      
              memset(&client_addr1, 0, sizeof(client_addr1));
              client_addr1.sin_family = AF_INET;
              client_addr1.sin_port = ntohs(42424);
              client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&client_addr2, 0, sizeof(client_addr2));
              client_addr2.sin_family = AF_INET;
              client_addr2.sin_port = ntohs(42421);
              client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&unsp, 0, sizeof(unsp));
              unsp.sa_family = AF_UNSPEC;
      
              bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr));
      
              listen(fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1));
              new_fd = accept(fd, NULL, NULL);
              close(fd);
      
              val = AF_INET;
              setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val));
      
              connect(new_fd, &unsp, sizeof(unsp));
      
              memset(&bind_addr4, 0, sizeof(bind_addr4));
              bind_addr4.sin_family = AF_INET;
              bind_addr4.sin_port = ntohs(42421);
              bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4));
      
              listen(new_fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2));
      
              newest_fd = accept(new_fd, NULL, NULL);
              close(new_fd);
      
              close(client_fd);
              close(new_fd);
      }
      
      As far as I can see, this bug has been there since the beginning of the
      git-days.
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc8a5a45
    • Willem de Bruijn's avatar
      packet: in packet_do_bind, test fanout with bind_lock held · b0763909
      Willem de Bruijn authored
      [ Upstream commit 4971613c ]
      
      Once a socket has po->fanout set, it remains a member of the group
      until it is destroyed. The prot_hook must be constant and identical
      across sockets in the group.
      
      If fanout_add races with packet_do_bind between the test of po->fanout
      and taking the lock, the bind call may make type or dev inconsistent
      with that of the fanout group.
      
      Hold po->bind_lock when testing po->fanout to avoid this race.
      
      I had to introduce artificial delay (local_bh_enable) to actually
      observe the race.
      
      Fixes: dc99f600
      
       ("packet: Add fanout support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0763909
    • Sabrina Dubroca's avatar
      l2tp: fix race condition in l2tp_tunnel_delete · 5eaafe64
      Sabrina Dubroca authored
      [ Upstream commit 62b982ee ]
      
      If we try to delete the same tunnel twice, the first delete operation
      does a lookup (l2tp_tunnel_get), finds the tunnel, calls
      l2tp_tunnel_delete, which queues it for deletion by
      l2tp_tunnel_del_work.
      
      The second delete operation also finds the tunnel and calls
      l2tp_tunnel_delete. If the workqueue has already fired and started
      running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the
      same tunnel a second time, and try to free the socket again.
      
      Add a dead flag to prevent firing the workqueue twice. Then we can
      remove the check of queue_work's result that was meant to prevent that
      race but doesn't.
      
      Reproducer:
      
          ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000
          ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000
          ip link set l2tp1 up
          ip l2tp del tunnel tunnel_id 3000
          ip l2tp del tunnel tunnel_id 3000
      
      Fixes: f8ccac0e
      
       ("l2tp: put tunnel socket release on a workqueue")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5eaafe64
    • Ridge Kennedy's avatar
      l2tp: Avoid schedule while atomic in exit_net · 88db37f8
      Ridge Kennedy authored
      [ Upstream commit 12d656af ]
      
      While destroying a network namespace that contains a L2TP tunnel a
      "BUG: scheduling while atomic" can be observed.
      
      Enabling lockdep shows that this is happening because l2tp_exit_net()
      is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from
      within an RCU critical section.
      
      l2tp_exit_net() takes rcu_read_lock_bh()
        << list_for_each_entry_rcu() >>
        l2tp_tunnel_delete()
          l2tp_tunnel_closeall()
            __l2tp_session_unhash()
              synchronize_rcu() << Illegal inside RCU critical section >>
      
      BUG: sleeping function called from invalid context
      in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2
      INFO: lockdep is turned off.
      CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G        W  O    4.4.6-at1 #2
      Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016
      Workqueue: netns cleanup_net
       0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0
       ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8
       0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024
      Call Trace:
       [<ffffffff812b0013>] dump_stack+0x85/0xc2
       [<ffffffff8107aee8>] ___might_sleep+0x148/0x240
       [<ffffffff8107b024>] __might_sleep+0x44/0x80
       [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0
       [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0
       [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40
       [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220
       [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220
       [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140
       [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60
       [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270
       [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270
       [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60
       [<ffffffff81501166>] cleanup_net+0x1b6/0x280
       ...
      
      This bug can easily be reproduced with a few steps:
      
       $ sudo unshare -n bash  # Create a shell in a new namespace
       # ip link set lo up
       # ip addr add 127.0.0.1 dev lo
       # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \
          peer_tunnel_id 1 udp_sport 50000 udp_dport 50000
       # ip l2tp add session name foo tunnel_id 1 session_id 1 \
          peer_session_id 1
       # ip link set foo up
       # exit  # Exit the shell, in turn exiting the namespace
       $ dmesg
       ...
       [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200
       ...
      
      To fix this, move the call to l2tp_tunnel_closeall() out of the RCU
      critical section, and instead call it from l2tp_tunnel_del_work(), which
      is running from the l2tp_wq workqueue.
      
      Fixes: 2b551c6e
      
       ("l2tp: close sessions before initiating tunnel delete")
      Signed-off-by: default avatarRidge Kennedy <ridge.kennedy@alliedtelesis.co.nz>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88db37f8
    • Alexey Kodanev's avatar
      vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit · 6132b4ff
      Alexey Kodanev authored
      [ Upstream commit 36f6ee22 ]
      
      When running LTP IPsec tests, KASan might report:
      
      BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
      Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
      ...
      Call Trace:
        <IRQ>
        dump_stack+0x63/0x89
        print_address_description+0x7c/0x290
        kasan_report+0x28d/0x370
        ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        __asan_report_load4_noabort+0x19/0x20
        vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        ? vti_init_net+0x190/0x190 [ip_vti]
        ? save_stack_trace+0x1b/0x20
        ? save_stack+0x46/0xd0
        dev_hard_start_xmit+0x147/0x510
        ? icmp_echo.part.24+0x1f0/0x210
        __dev_queue_xmit+0x1394/0x1c60
      ...
      Freed by task 0:
        save_stack_trace+0x1b/0x20
        save_stack+0x46/0xd0
        kasan_slab_free+0x70/0xc0
        kmem_cache_free+0x81/0x1e0
        kfree_skbmem+0xb1/0xe0
        kfree_skb+0x75/0x170
        kfree_skb_list+0x3e/0x60
        __dev_queue_xmit+0x1298/0x1c60
        dev_queue_xmit+0x10/0x20
        neigh_resolve_output+0x3a8/0x740
        ip_finish_output2+0x5c0/0xe70
        ip_finish_output+0x4ba/0x680
        ip_output+0x1c1/0x3a0
        xfrm_output_resume+0xc65/0x13d0
        xfrm_output+0x1e4/0x380
        xfrm4_output_finish+0x5c/0x70
      
      Can be fixed if we get skb->len before dst_output().
      
      Fixes: b9959fd3 ("vti: switch to new ip tunnel code")
      Fixes: 22e1b23d
      
       ("vti6: Support inter address family tunneling.")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6132b4ff
    • Meng Xu's avatar
      isdn/i4l: fetch the ppp_write buffer in one shot · ba554fc7
      Meng Xu authored
      [ Upstream commit 02388bf8 ]
      
      In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is
      fetched twice from userspace. The first fetch is used to peek at the
      protocol of the message and reset the huptimer if necessary; while the
      second fetch copies in the whole buffer. However, given that buf resides
      in userspace memory, a user process can race to change its memory content
      across fetches. By doing so, we can either avoid resetting the huptimer
      for any type of packets (by first setting proto to PPP_LCP and later
      change to the actual type) or force resetting the huptimer for LCP
      packets.
      
      This patch changes this double-fetch behavior into two single fetches
      decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0).
      A more detailed discussion can be found at
      https://marc.info/?l=linux-kernel&m=150586376926123&w=2
      
      Signed-off-by: default avatarMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba554fc7
    • Willem de Bruijn's avatar
      packet: hold bind lock when rebinding to fanout hook · e4ffdf9e
      Willem de Bruijn authored
      [ Upstream commit 008ba2a1 ]
      
      Packet socket bind operations must hold the po->bind_lock. This keeps
      po->running consistent with whether the socket is actually on a ptype
      list to receive packets.
      
      fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
      binds the fanout object to receive through packet_rcv_fanout.
      
      Make it hold the po->bind_lock when testing po->running and rebinding.
      Else, it can race with other rebind operations, such as that in
      packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
      can result in a socket being added to a fanout group twice, causing
      use-after-free KASAN bug reports, among others.
      
      Reported independently by both trinity and syzkaller.
      Verified that the syzkaller reproducer passes after this patch.
      
      Fixes: dc99f600
      
       ("packet: Add fanout support.")
      Reported-by: default avatarnixioaming <nixiaoming@huawei.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4ffdf9e
    • Edward Cree's avatar
      bpf/verifier: reject BPF_ALU64|BPF_END · 63692f8b
      Edward Cree authored
      [ Upstream commit e67b8a68 ]
      
      Neither ___bpf_prog_run nor the JITs accept it.
      Also adds a new test case.
      
      Fixes: 17a52670
      
       ("bpf: verifier (add verifier core)")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63692f8b
    • Dan Carpenter's avatar
      sctp: potential read out of bounds in sctp_ulpevent_type_enabled() · db273979
      Dan Carpenter authored
      [ Upstream commit fa5f7b51
      
       ]
      
      This code causes a static checker warning because Smatch doesn't trust
      anything that comes from skb->data.  I've reviewed this code and I do
      think skb->data can be controlled by the user here.
      
      The sctp_event_subscribe struct has 13 __u8 fields and we want to see
      if ours is non-zero.  sn_type can be any value in the 0-USHRT_MAX range.
      We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
      either before the start of the struct or after the end.
      
      This is a very old bug and it's surprising that it would go undetected
      for so long but my theory is that it just doesn't have a big impact so
      it would be hard to notice.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db273979
    • Jan Kara's avatar
      ext4: avoid deadlock when expanding inode size · 15121294
      Jan Kara authored
      [ Upstream commit 2e81a4ee
      
       ]
      
      When we need to move xattrs into external xattr block, we call
      ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end
      up calling ext4_mark_inode_dirty() again which will recurse back into
      the inode expansion code leading to deadlocks.
      
      Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move
      its management into ext4_expand_extra_isize_ea() since its manipulation
      is safe there (due to xattr_sem) from possible races with
      ext4_xattr_set_handle() which plays with it as well.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15121294
    • Harry Wentland's avatar
      drm/dp/mst: save vcpi with payloads · 9b7e3d75
      Harry Wentland authored
      commit 6cecdf7a
      
       upstream.
      
      This makes it possibly for drivers to find the associated
      mst_port by looking at the payload allocation table.
      Signed-off-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1449514552-10236-3-git-send-email-harry.wentland@amd.com
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Kai Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b7e3d75
    • Sebastian Andrzej Siewior's avatar
      x86/mm: Disable preemption during CR3 read+write · 608c3080
      Sebastian Andrzej Siewior authored
      commit 5cf0791d
      
       upstream.
      
      There's a subtle preemption race on UP kernels:
      
      Usually current->mm (and therefore mm->pgd) stays the same during the
      lifetime of a task so it does not matter if a task gets preempted during
      the read and write of the CR3.
      
      But then, there is this scenario on x86-UP:
      
      TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:
      
       -> mmput()
       -> exit_mmap()
       -> tlb_finish_mmu()
       -> tlb_flush_mmu()
       -> tlb_flush_mmu_tlbonly()
       -> tlb_flush()
       -> flush_tlb_mm_range()
       -> __flush_tlb_up()
       -> __flush_tlb()
       ->  __native_flush_tlb()
      
      At this point current->mm is NULL but current->active_mm still points to
      the "old" mm.
      
      Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
      own mm so CR3 has changed.
      
      Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
      mm and so CR3 remains unchanged. Once taskA gets active it continues
      where it was interrupted and that means it writes its old CR3 value
      back. Everything is fine because userland won't need its memory
      anymore.
      
      Now the fun part:
      
      Let's preempt taskA one more time and get back to taskB. This
      time switch_mm() won't do a thing because oldmm (->active_mm)
      is the same as mm (as per context_switch()). So we remain
      with a bad CR3 / PGD and return to userland.
      
      The next thing that happens is handle_mm_fault() with an address for
      the execution of its code in userland. handle_mm_fault() realizes that
      it has a PTE with proper rights so it returns doing nothing. But the
      CPU looks at the wrong PGD and insists that something is wrong and
      faults again. And again. And one more time…
      
      This pagefault circle continues until the scheduler gets tired of it and
      puts another task on the CPU. It gets little difficult if the task is a
      RT task with a high priority. The system will either freeze or it gets
      fixed by the software watchdog thread which usually runs at RT-max prio.
      But waiting for the watchdog will increase the latency of the RT task
      which is no good.
      
      Fix this by disabling preemption across the critical code section.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
      
      
      [ Prettified the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Bernhard Kaindl <bernhard.kaindl@thalesgroup.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      608c3080
  2. 18 Oct, 2017 12 commits