1. 06 Jun, 2022 14 commits
    • Stephen Brennan's avatar
      assoc_array: Fix BUG_ON during garbage collect · 33c6a5ee
      Stephen Brennan authored
      commit d1dc8776 upstream.
      
      A rare BUG_ON triggered in assoc_array_gc:
      
          [3430308.818153] kernel BUG at lib/assoc_array.c:1609!
      
      Which corresponded to the statement currently at line 1593 upstream:
      
          BUG_ON(assoc_array_ptr_is_meta(p));
      
      Using the data from the core dump, I was able to generate a userspace
      reproducer[1] and determine the cause of the bug.
      
      [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
      
      After running the iterator on the entire branch, an internal tree node
      looked like the following:
      
          NODE (nr_leaves_on_branch: 3)
            SLOT [0] NODE (2 leaves)
            SLOT [1] NODE (1 leaf)
            SLOT [2..f] NODE (empty)
      
      In the userspace reproducer, the pr_devel output when compressing this
      node was:
      
          -- compress node 0x5607cc089380 --
          free=0, leaves=0
          [0] retain node 2/1 [nx 0]
          [1] fold node 1/1 [nx 0]
          [2] fold node 0/1 [nx 2]
          [3] fold node 0/2 [nx 2]
          [4] fold node 0/3 [nx 2]
          [5] fold node 0/4 [nx 2]
          [6] fold node 0/5 [nx 2]
          [7] fold node 0/6 [nx 2]
          [8] fold node 0/7 [nx 2]
          [9] fold node 0/8 [nx 2]
          [10] fold node 0/9 [nx 2]
          [11] fold node 0/10 [nx 2]
          [12] fold node 0/11 [nx 2]
          [13] fold node 0/12 [nx 2]
          [14] fold node 0/13 [nx 2]
          [15] fold node 0/14 [nx 2]
          after: 3
      
      At slot 0, an internal node with 2 leaves could not be folded into the
      node, because there was only one available slot (slot 0). Thus, the
      internal node was retained. At slot 1, the node had one leaf, and was
      able to be folded in successfully. The remaining nodes had no leaves,
      and so were removed. By the end of the compression stage, there were 14
      free slots, and only 3 leaf nodes. The tree was ascended and then its
      parent node was compressed. When this node was seen, it could not be
      folded, due to the internal node it contained.
      
      The invariant for compression in this function is: whenever
      nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
      leaf nodes. The compression step currently cannot guarantee this, given
      the corner case shown above.
      
      To fix this issue, retry compression whenever we have retained a node,
      and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
      compression will then allow the node in slot 1 to be folded in,
      satisfying the invariant. Below is the output of the reproducer once the
      fix is applied:
      
          -- compress node 0x560e9c562380 --
          free=0, leaves=0
          [0] retain node 2/1 [nx 0]
          [1] fold node 1/1 [nx 0]
          [2] fold node 0/1 [nx 2]
          [3] fold node 0/2 [nx 2]
          [4] fold node 0/3 [nx 2]
          [5] fold node 0/4 [nx 2]
          [6] fold node 0/5 [nx 2]
          [7] fold node 0/6 [nx 2]
          [8] fold node 0/7 [nx 2]
          [9] fold node 0/8 [nx 2]
          [10] fold node 0/9 [nx 2]
          [11] fold node 0/10 [nx 2]
          [12] fold node 0/11 [nx 2]
          [13] fold node 0/12 [nx 2]
          [14] fold node 0/13 [nx 2]
          [15] fold node 0/14 [nx 2]
          internal nodes remain despite enough space, retrying
          -- compress node 0x560e9c562380 --
          free=14, leaves=1
          [0] fold node 2/15 [nx 0]
          after: 3
      
      Changes
      =======
      DH:
       - Use false instead of 0.
       - Reorder the inserted lines in a couple of places to put retained before
         next_slot.
      
      ver #2)
       - Fix typo in pr_devel, correct comparison to "<="
      
      Fixes: 3cb98950
      
       ("Add a generic associative array implementation.")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarStephen Brennan <stephen.s.brennan@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Andrew Morton <akpm@linux-foundation.org>
      cc: keyrings@vger.kernel.org
      Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1
      Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/
      
       # v2
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33c6a5ee
    • Dan Carpenter's avatar
      i2c: ismt: prevent memory corruption in ismt_access() · fc2f9ee7
      Dan Carpenter authored
      commit 690b2549 upstream.
      
      The "data->block[0]" variable comes from the user and is a number
      between 0-255.  It needs to be capped to prevent writing beyond the end
      of dma_buffer[].
      
      Fixes: 5e9a97b1
      
       ("i2c: ismt: Adding support for I2C_SMBUS_BLOCK_PROC_CALL")
      Reported-and-tested-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc2f9ee7
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: disallow non-stateful expression in sets earlier · d8db0465
      Pablo Neira Ayuso authored
      commit 52077804 upstream.
      
      Since 3e135cd4 ("netfilter: nft_dynset: dynamic stateful expression
      instantiation"), it is possible to attach stateful expressions to set
      elements.
      
      cd5125d8 ("netfilter: nf_tables: split set destruction in deactivate
      and destroy phase") introduces conditional destruction on the object to
      accomodate transaction semantics.
      
      nft_expr_init() calls expr->ops->init() first, then check for
      NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful
      lookup expressions which points to a set, which might lead to UAF since
      the set is not properly detached from the set->binding for this case.
      Anyway, this combination is non-sense from nf_tables perspective.
      
      This patch fixes this problem by checking for NFT_STATEFUL_EXPR before
      expr->ops->init() is called.
      
      The reporter provides a KASAN splat and a poc reproducer (similar to
      those autogenerated by syzbot to report use-after-free errors). It is
      unknown to me if they are using syzbot or if they use similar automated
      tool to locate the bug that they are reporting.
      
      For the record, this is the KASAN splat.
      
      [   85.431824] ==================================================================
      [   85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20
      [   85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776
      [   85.434756]
      [   85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G        W         5.18.0+ #2
      [   85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
      
      Fixes: 0b2d8a7b
      
       ("netfilter: nf_tables: add helper functions for expression handling")
      Reported-and-tested-by: default avatarAaron Adams <edg-e@nccgroup.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8db0465
    • Piyush Malgujar's avatar
      drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers · 9fa0d64f
      Piyush Malgujar authored
      [ Upstream commit 03a35bc8
      
       ]
      
      Due to i2c->adap.dev.fwnode not being set, ACPI_COMPANION() wasn't properly
      found for TWSI controllers.
      Signed-off-by: default avatarSzymon Balcerak <sbalcerak@marvell.com>
      Signed-off-by: default avatarPiyush Malgujar <pmalgujar@marvell.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9fa0d64f
    • Mika Westerberg's avatar
      i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging · 9cfcf2ce
      Mika Westerberg authored
      [ Upstream commit 17a0f3ac
      
       ]
      
      Before sending a MSI the hardware writes information pertinent to the
      interrupt cause to a memory location pointed by SMTICL register. This
      memory holds three double words where the least significant bit tells
      whether the interrupt cause of master/target/error is valid. The driver
      does not use this but we need to set it up because otherwise it will
      perform DMA write to the default address (0) and this will cause an
      IOMMU fault such as below:
      
        DMAR: DRHD: handling fault status reg 2
        DMAR: [DMA Write] Request device [00:12.0] PASID ffffffff fault addr 0
              [fault reason 05] PTE Write access is not set
      
      To prevent this from happening, provide a proper DMA buffer for this
      that then gets mapped by the IOMMU accordingly.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarFrom: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9cfcf2ce
    • Joel Stanley's avatar
      net: ftgmac100: Disable hardware checksum on AST2600 · 07333169
      Joel Stanley authored
      [ Upstream commit 6fd45e79
      
       ]
      
      The AST2600 when using the i210 NIC over NC-SI has been observed to
      produce incorrect checksum results with specific MTU values. This was
      first observed when sending data across a long distance set of networks.
      
      On a local network, the following test was performed using a 1MB file of
      random data.
      
      On the receiver run this script:
      
       #!/bin/bash
       while [ 1 ]; do
              # Zero the stats
              nstat -r  > /dev/null
              nc -l 9899 > test-file
              # Check for checksum errors
              TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
              if [ -z "$TcpInCsumErrors" ]; then
                      echo No TcpInCsumErrors
              else
                      echo TcpInCsumErrors = $TcpInCsumErrors
              fi
       done
      
      On an AST2600 system:
      
       # nc <IP of  receiver host> 9899 < test-file
      
      The test was repeated with various MTU values:
      
       # ip link set mtu 1410 dev eth0
      
      The observed results:
      
       1500 - good
       1434 - bad
       1400 - good
       1410 - bad
       1420 - good
      
      The test was repeated after disabling tx checksumming:
      
       # ethtool -K eth0 tx-checksumming off
      
      And all MTU values tested resulted in transfers without error.
      
      An issue with the driver cannot be ruled out, however there has been no
      bug discovered so far.
      
      David has done the work to take the original bug report of slow data
      transfer between long distance connections and triaged it down to this
      test case.
      
      The vendor suspects this this is a hardware issue when using NC-SI. The
      fixes line refers to the patch that introduced AST2600 support.
      Reported-by: default avatarDavid Wilder <wilder@us.ibm.com>
      Reviewed-by: default avatarDylan Hung <dylan_hung@aspeedtech.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      07333169
    • Lin Ma's avatar
      nfc: pn533: Fix buggy cleanup order · 27836e86
      Lin Ma authored
      [ Upstream commit b8cedb70
      
       ]
      
      When removing the pn533 device (i2c or USB), there is a logic error. The
      original code first cancels the worker (flush_delayed_work) and then
      destroys the workqueue (destroy_workqueue), leaving the timer the last
      one to be deleted (del_timer). This result in a possible race condition
      in a multi-core preempt-able kernel. That is, if the cleanup
      (pn53x_common_clean) is concurrently run with the timer handler
      (pn533_listen_mode_timer), the timer can queue the poll_work to the
      already destroyed workqueue, causing use-after-free.
      
      This patch reorder the cleanup: it uses the del_timer_sync to make sure
      the handler is finished before the routine will destroy the workqueue.
      Note that the timer cannot be activated by the worker again.
      
      static void pn533_wq_poll(struct work_struct *work)
      ...
       rc = pn533_send_poll_frame(dev);
       if (rc)
         return;
      
       if (cur_mod->len == 0 && dev->poll_mod_count > 1)
         mod_timer(&dev->listen_timer, ...);
      
      That is, the mod_timer can be called only when pn533_send_poll_frame()
      returns no error, which is impossible because the device is detaching
      and the lower driver should return ENODEV code.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      27836e86
    • Thomas Bartschies's avatar
      net: af_key: check encryption module availability consistency · 888854fc
      Thomas Bartschies authored
      [ Upstream commit 015c44d7
      
       ]
      
      Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
      produces invalid pfkey acquire messages, when these encryption modules are disabled. This
      happens because the availability of the algos wasn't checked in all necessary functions.
      This patch adds these checks.
      Signed-off-by: default avatarThomas Bartschies <thomas.bartschies@cvk.de>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      888854fc
    • Al Viro's avatar
      percpu_ref_init(): clean ->percpu_count_ref on failure · 683a786b
      Al Viro authored
      [ Upstream commit a9171431
      
       ]
      
      That way percpu_ref_exit() is safe after failing percpu_ref_init().
      At least one user (cgroup_create()) had a double-free that way;
      there might be other similar bugs.  Easier to fix in percpu_ref_init(),
      rather than playing whack-a-mole in sloppy users...
      
      Usual symptoms look like a messed refcounting in one of subsystems
      that use percpu allocations (might be percpu-refcount, might be
      something else).  Having refcounts for two different objects share
      memory is Not Nice(tm)...
      
      Reported-by: syzbot+5b1e53987f858500ec00@syzkaller.appspotmail.com
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      683a786b
    • Quentin Perret's avatar
      KVM: arm64: Don't hypercall before EL2 init · dc8ae359
      Quentin Perret authored
      [ Upstream commit 2e403167
      
       ]
      
      Will reported the following splat when running with Protected KVM
      enabled:
      
      [    2.427181] ------------[ cut here ]------------
      [    2.427668] WARNING: CPU: 3 PID: 1 at arch/arm64/kvm/mmu.c:489 __create_hyp_private_mapping+0x118/0x1ac
      [    2.428424] Modules linked in:
      [    2.429040] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc2-00084-g8635adc4efc7 #1
      [    2.429589] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
      [    2.430286] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [    2.430734] pc : __create_hyp_private_mapping+0x118/0x1ac
      [    2.431091] lr : create_hyp_exec_mappings+0x40/0x80
      [    2.431377] sp : ffff80000803baf0
      [    2.431597] x29: ffff80000803bb00 x28: 0000000000000000 x27: 0000000000000000
      [    2.432156] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
      [    2.432561] x23: ffffcd96c343b000 x22: 0000000000000000 x21: ffff80000803bb40
      [    2.433004] x20: 0000000000000004 x19: 0000000000001800 x18: 0000000000000000
      [    2.433343] x17: 0003e68cf7efdd70 x16: 0000000000000004 x15: fffffc81f602a2c8
      [    2.434053] x14: ffffdf8380000000 x13: ffffcd9573200000 x12: ffffcd96c343b000
      [    2.434401] x11: 0000000000000004 x10: ffffcd96c1738000 x9 : 0000000000000004
      [    2.434812] x8 : ffff80000803bb40 x7 : 7f7f7f7f7f7f7f7f x6 : 544f422effff306b
      [    2.435136] x5 : 000000008020001e x4 : ffff207d80a88c00 x3 : 0000000000000005
      [    2.435480] x2 : 0000000000001800 x1 : 000000014f4ab800 x0 : 000000000badca11
      [    2.436149] Call trace:
      [    2.436600]  __create_hyp_private_mapping+0x118/0x1ac
      [    2.437576]  create_hyp_exec_mappings+0x40/0x80
      [    2.438180]  kvm_init_vector_slots+0x180/0x194
      [    2.458941]  kvm_arch_init+0x80/0x274
      [    2.459220]  kvm_init+0x48/0x354
      [    2.459416]  arm_init+0x20/0x2c
      [    2.459601]  do_one_initcall+0xbc/0x238
      [    2.459809]  do_initcall_level+0x94/0xb4
      [    2.460043]  do_initcalls+0x54/0x94
      [    2.460228]  do_basic_setup+0x1c/0x28
      [    2.460407]  kernel_init_freeable+0x110/0x178
      [    2.460610]  kernel_init+0x20/0x1a0
      [    2.460817]  ret_from_fork+0x10/0x20
      [    2.461274] ---[ end trace 0000000000000000 ]---
      
      Indeed, the Protected KVM mode promotes __create_hyp_private_mapping()
      to a hypercall as EL1 no longer has access to the hypervisor's stage-1
      page-table. However, the call from kvm_init_vector_slots() happens after
      pKVM has been initialized on the primary CPU, but before it has been
      initialized on secondaries. As such, if the KVM initcall procedure is
      migrated from one CPU to another in this window, the hypercall may end up
      running on a CPU for which EL2 has not been initialized.
      
      Fortunately, the pKVM hypervisor doesn't rely on the host to re-map the
      vectors in the private range, so the hypercall in question is in fact
      superfluous. Skip it when pKVM is enabled.
      Reported-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarQuentin Perret <qperret@google.com>
      [maz: simplified the checks slightly]
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220513092607.35233-1-qperret@google.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc8ae359
    • IotaHydrae's avatar
      pinctrl: sunxi: fix f1c100s uart2 function · 30ad11d0
      IotaHydrae authored
      [ Upstream commit fa8785e5
      
       ]
      
      Change suniv f1c100s pinctrl,PD14 multiplexing function lvds1 to uart2
      
      When the pin PD13 and PD14 is setting up to uart2 function in dts,
      there's an error occurred:
      1c20800.pinctrl: unsupported function uart2 on pin PD14
      
      Because 'uart2' is not any one multiplexing option of PD14,
      and pinctrl don't know how to configure it.
      
      So change the pin PD14 lvds1 function to uart2.
      Signed-off-by: default avatarIotaHydrae <writeforever@foxmail.com>
      Reviewed-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Link: https://lore.kernel.org/r/tencent_70C1308DDA794C81CAEF389049055BACEC09@qq.com
      
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      30ad11d0
    • Dustin L. Howett's avatar
      ALSA: hda/realtek: Add quirk for the Framework Laptop · 7e5a4f00
      Dustin L. Howett authored
      [ Upstream commit 309d7363
      
       ]
      
      Some board revisions of the Framework Laptop have an ALC295 with a
      disconnected or faulty headset mic presence detect.
      
      The "dell-headset-multi" fixup addresses this issue, but also enables an
      inoperative "Headphone Mic" input device whenever a headset is
      connected.
      
      Adding a new quirk chain specific to the Framework Laptop resolves this
      issue. The one introduced here is based on the System76 "no headphone
      mic" quirk chain.
      
      The VID:PID f111:0001 have been allocated to Framework Computer for this
      board revision.
      
      Revision history:
      - v2: Moved to a custom quirk chain to suppress the "Headphone Mic"
        pincfg.
      Signed-off-by: default avatarDustin L. Howett <dustin@howett.net>
      Link: https://lore.kernel.org/r/20220511010759.3554-1-dustin@howett.net
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7e5a4f00
    • Gabriele Mazzotta's avatar
      ALSA: hda/realtek: Add quirk for Dell Latitude 7520 · 118dc796
      Gabriele Mazzotta authored
      [ Upstream commit 1efcdd9c ]
      
      The driver is currently using ALC269_FIXUP_DELL4_MIC_NO_PRESENCE for
      the Latitude 7520, but this fixup chain has some issues:
      
       - The internal mic is really loud and the recorded audio is distorted
         at "standard" audio levels.
      
       - There are pop noises at system startup and when plugging/unplugging
         headphone jacks.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215885
      
      Signed-off-by: default avatarGabriele Mazzotta <gabriele.mzt@gmail.com>
      Link: https://lore.kernel.org/r/20220501124237.4667-1-gabriele.mzt@gmail.com
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      118dc796
    • Forest Crossman's avatar
      ALSA: usb-audio: Don't get sample rate for MCT Trigger 5 USB-to-HDMI · 0c6ba757
      Forest Crossman authored
      [ Upstream commit d7be2138
      
       ]
      
      This device doesn't support reading the sample rate, so we need to apply
      this quirk to avoid a 15-second delay waiting for three timeouts.
      Signed-off-by: default avatarForest Crossman <cyrozap@gmail.com>
      Link: https://lore.kernel.org/r/20220504002444.114011-2-cyrozap@gmail.com
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0c6ba757
  2. 30 May, 2022 26 commits
    • Greg Kroah-Hartman's avatar
    • Edward Matijevic's avatar
      ALSA: ctxfi: Add SB046x PCI ID · d3bbcba9
      Edward Matijevic authored
      commit 1b073ebb
      
       upstream.
      
      Adds the PCI ID for X-Fi cards sold under the Platnum and XtremeMusic names
      
      Before: snd_ctxfi 0000:05:05.0: chip 20K1 model Unknown (1102:0021) is found
      After: snd_ctxfi 0000:05:05.0: chip 20K1 model SB046x (1102:0021) is found
      
      [ This is only about defining the model name string, and the rest is
        handled just like before, as a default unknown device.
        Edward confirmed that the stuff has been working fine -- tiwai ]
      Signed-off-by: default avatarEdward Matijevic <motolav@gmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/cae7d1a4-8bd9-7dfe-7427-db7e766f7272@gmail.com
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3bbcba9
    • Lorenzo Pieralisi's avatar
      ACPI: sysfs: Fix BERT error region memory mapping · 567ae03f
      Lorenzo Pieralisi authored
      commit 1bbc2178 upstream.
      
      Currently the sysfs interface maps the BERT error region as "memory"
      (through acpi_os_map_memory()) in order to copy the error records into
      memory buffers through memory operations (eg memory_read_from_buffer()).
      
      The OS system cannot detect whether the BERT error region is part of
      system RAM or it is "device memory" (eg BMC memory) and therefore it
      cannot detect which memory attributes the bus to memory support (and
      corresponding kernel mapping, unless firmware provides the required
      information).
      
      The acpi_os_map_memory() arch backend implementation determines the
      mapping attributes. On arm64, if the BERT error region is not present in
      the EFI memory map, the error region is mapped as device-nGnRnE; this
      triggers alignment faults since memcpy unaligned accesses are not
      allowed in device-nGnRnE regions.
      
      The ACPI sysfs code cannot therefore map by default the BERT error
      region with memory semantics but should use a safer default.
      
      Change the sysfs code to map the BERT error region as MMIO (through
      acpi_os_map_iomem()) and use the memcpy_fromio() interface to read the
      error region into the kernel buffer.
      
      Link: https://lore.kernel.org/linux-arm-kernel/31ffe8fc-f5ee-2858-26c5-0fd8bdd68702@arm.com
      Link: https://lore.kernel.org/linux-acpi/CAJZ5v0g+OVbhuUUDrLUCfX_mVqY_e8ubgLTU98=jfjTeb4t+Pw@mail.gmail.com
      
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: default avatarVeronika Kabatova <vkabatov@redhat.com>
      Tested-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: dann frazier <dann.frazier@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      567ae03f
    • Jason A. Donenfeld's avatar
      random: check for signals after page of pool writes · 3879d3f9
      Jason A. Donenfeld authored
      commit 1ce6c8d6
      
       upstream.
      
      get_random_bytes_user() checks for signals after producing a PAGE_SIZE
      worth of output, just like /dev/zero does. write_pool() is doing
      basically the same work (actually, slightly more expensive), and so
      should stop to check for signals in the same way. Let's also name it
      write_pool_user() to match get_random_bytes_user(), so this won't be
      misused in the future.
      
      Before this patch, massive writes to /dev/urandom would tie up the
      process for an extremely long time and make it unterminatable. After, it
      can be successfully interrupted. The following test program can be used
      to see this works as intended:
      
        #include <unistd.h>
        #include <fcntl.h>
        #include <signal.h>
        #include <stdio.h>
      
        static unsigned char x[~0U];
      
        static void handle(int) { }
      
        int main(int argc, char *argv[])
        {
          pid_t pid = getpid(), child;
          int fd;
          signal(SIGUSR1, handle);
          if (!(child = fork())) {
            for (;;)
              kill(pid, SIGUSR1);
          }
          fd = open("/dev/urandom", O_WRONLY);
          pause();
          printf("interrupted after writing %zd bytes\n", write(fd, x, sizeof(x)));
          close(fd);
          kill(child, SIGTERM);
          return 0;
        }
      
      Result before: "interrupted after writing 2147479552 bytes"
      Result after: "interrupted after writing 4096 bytes"
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3879d3f9
    • Jens Axboe's avatar
      random: wire up fops->splice_{read,write}_iter() · de63c5e7
      Jens Axboe authored
      commit 79025e72 upstream.
      
      Now that random/urandom is using {read,write}_iter, we can wire it up to
      using the generic splice handlers.
      
      Fixes: 36e2c742
      
       ("fs: don't allow splice read/write without explicit ops")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      [Jason: added the splice_write path. Note that sendfile() and such still
       does not work for read, though it does for write, because of a file
       type restriction in splice_direct_to_actor(), which I'll address
       separately.]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de63c5e7
    • Jens Axboe's avatar
      random: convert to using fops->write_iter() · 27bf1c93
      Jens Axboe authored
      commit 22b0a222
      
       upstream.
      
      Now that the read side has been converted to fix a regression with
      splice, convert the write side as well to have some symmetry in the
      interface used (and help deprecate ->write()).
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      [Jason: cleaned up random_ioctl a bit, require full writes in
       RNDADDENTROPY since it's crediting entropy, simplify control flow of
       write_pool(), and incorporate suggestions from Al.]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27bf1c93
    • Jens Axboe's avatar
      random: convert to using fops->read_iter() · afc002fd
      Jens Axboe authored
      commit 1b388e77
      
       upstream.
      
      This is a pre-requisite to wiring up splice() again for the random
      and urandom drivers. It also allows us to remove the INT_MAX check in
      getrandom(), because import_single_range() applies capping internally.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      [Jason: rewrote get_random_bytes_user() to simplify and also incorporate
       additional suggestions from Al.]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afc002fd
    • Jason A. Donenfeld's avatar
      random: unify batched entropy implementations · fb7d06d3
      Jason A. Donenfeld authored
      commit 3092adce
      
       upstream.
      
      There are currently two separate batched entropy implementations, for
      u32 and u64, with nearly identical code, with the goal of avoiding
      unaligned memory accesses and letting the buffers be used more
      efficiently. Having to maintain these two functions independently is a
      bit of a hassle though, considering that they always need to be kept in
      sync.
      
      This commit factors them out into a type-generic macro, so that the
      expansion produces the same code as before, such that diffing the
      assembly shows no differences. This will also make it easier in the
      future to add u16 and u8 batches.
      
      This was initially tested using an always_inline function and letting
      gcc constant fold the type size in, but the code gen was less efficient,
      and in general it was more verbose and harder to follow. So this patch
      goes with the boring macro solution, similar to what's already done for
      the _wait functions in random.h.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb7d06d3
    • Jason A. Donenfeld's avatar
      random: move randomize_page() into mm where it belongs · 817191b4
      Jason A. Donenfeld authored
      commit 5ad7dd88
      
       upstream.
      
      randomize_page is an mm function. It is documented like one. It contains
      the history of one. It has the naming convention of one. It looks
      just like another very similar function in mm, randomize_stack_top().
      And it has always been maintained and updated by mm people. There is no
      need for it to be in random.c. In the "which shape does not look like
      the other ones" test, pointing to randomize_page() is correct.
      
      So move randomize_page() into mm/util.c, right next to the similar
      randomize_stack_top() function.
      
      This commit contains no actual code changes.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      817191b4
    • Jason A. Donenfeld's avatar
      random: move initialization functions out of hot pages · 41f07747
      Jason A. Donenfeld authored
      commit 560181c2
      
       upstream.
      
      Much of random.c is devoted to initializing the rng and accounting for
      when a sufficient amount of entropy has been added. In a perfect world,
      this would all happen during init, and so we could mark these functions
      as __init. But in reality, this isn't the case: sometimes the rng only
      finishes initializing some seconds after system init is finished.
      
      For this reason, at the moment, a whole host of functions that are only
      used relatively close to system init and then never again are intermixed
      with functions that are used in hot code all the time. This creates more
      cache misses than necessary.
      
      In order to pack the hot code closer together, this commit moves the
      initialization functions that can't be marked as __init into
      .text.unlikely by way of the __cold attribute.
      
      Of particular note is moving credit_init_bits() into a macro wrapper
      that inlines the crng_ready() static branch check. This avoids a
      function call to a nop+ret, and most notably prevents extra entropy
      arithmetic from being computed in mix_interrupt_randomness().
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41f07747
    • Jason A. Donenfeld's avatar
      random: make consistent use of buf and len · e35c23cb
      Jason A. Donenfeld authored
      commit a1940263
      
       upstream.
      
      The current code was a mix of "nbytes", "count", "size", "buffer", "in",
      and so forth. Instead, let's clean this up by naming input parameters
      "buf" (or "ubuf") and "len", so that you always understand that you're
      reading this variety of function argument.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e35c23cb
    • Jason A. Donenfeld's avatar
      random: use proper return types on get_random_{int,long}_wait() · 65d3f67f
      Jason A. Donenfeld authored
      commit 7c3a8a1d
      
       upstream.
      
      Before these were returning signed values, but the API is intended to be
      used with unsigned values.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65d3f67f
    • Jason A. Donenfeld's avatar
      random: remove extern from functions in header · 245b1ae3
      Jason A. Donenfeld authored
      commit 7782cfec
      
       upstream.
      
      Accoriding to the kernel style guide, having `extern` on functions in
      headers is old school and deprecated, and doesn't add anything. So remove
      them from random.h, and tidy up the file a little bit too.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      245b1ae3
    • Jason A. Donenfeld's avatar
      random: use static branch for crng_ready() · 80ec4c64
      Jason A. Donenfeld authored
      commit f5bda35f
      
       upstream.
      
      Since crng_ready() is only false briefly during initialization and then
      forever after becomes true, we don't need to evaluate it after, making
      it a prime candidate for a static branch.
      
      One complication, however, is that it changes state in a particular call
      to credit_init_bits(), which might be made from atomic context, which
      means we must kick off a workqueue to change the static key. Further
      complicating things, credit_init_bits() may be called sufficiently early
      on in system initialization such that system_wq is NULL.
      
      Fortunately, there exists the nice function execute_in_process_context(),
      which will immediately execute the function if !in_interrupt(), and
      otherwise defer it to a workqueue. During early init, before workqueues
      are available, in_interrupt() is always false, because interrupts
      haven't even been enabled yet, which means the function in that case
      executes immediately. Later on, after workqueues are available,
      in_interrupt() might be true, but in that case, the work is queued in
      system_wq and all goes well.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Sultan Alsawaf <sultan@kerneltoast.com>
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80ec4c64
    • Jason A. Donenfeld's avatar
      random: credit architectural init the exact amount · d3fc4f46
      Jason A. Donenfeld authored
      commit 12e45a2a
      
       upstream.
      
      RDRAND and RDSEED can fail sometimes, which is fine. We currently
      initialize the RNG with 512 bits of RDRAND/RDSEED. We only need 256 bits
      of those to succeed in order to initialize the RNG. Instead of the
      current "all or nothing" approach, actually credit these contributions
      the amount that is actually contributed.
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3fc4f46
    • Jason A. Donenfeld's avatar
      random: handle latent entropy and command line from random_init() · e136fbd6
      Jason A. Donenfeld authored
      commit 2f14062b
      
       upstream.
      
      Currently, start_kernel() adds latent entropy and the command line to
      the entropy bool *after* the RNG has been initialized, deferring when
      it's actually used by things like stack canaries until the next time
      the pool is seeded. This surely is not intended.
      
      Rather than splitting up which entropy gets added where and when between
      start_kernel() and random_init(), just do everything in random_init(),
      which should eliminate these kinds of bugs in the future.
      
      While we're at it, rename the awkwardly titled "rand_initialize()" to
      the more standard "random_init()" nomenclature.
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e136fbd6
    • Jason A. Donenfeld's avatar
      random: use proper jiffies comparison macro · e78d195f
      Jason A. Donenfeld authored
      commit 8a5b8a4a
      
       upstream.
      
      This expands to exactly the same code that it replaces, but makes things
      consistent by using the same macro for jiffy comparisons throughout.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e78d195f
    • Jason A. Donenfeld's avatar
      random: remove ratelimiting for in-kernel unseeded randomness · b4b11eb0
      Jason A. Donenfeld authored
      commit cc1e127b
      
       upstream.
      
      The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the
      kernel warns about all unseeded randomness or just the first instance.
      There's some complicated rate limiting and comparison to the previous
      caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled,
      developers still don't see all the messages or even an accurate count of
      how many were missed. This is the result of basically parallel
      mechanisms aimed at accomplishing more or less the same thing, added at
      different points in random.c history, which sort of compete with the
      first-instance-only limiting we have now.
      
      It turns out, however, that nobody cares about the first unseeded
      randomness instance of in-kernel users. The same first user has been
      there for ages now, and nobody is doing anything about it. It isn't even
      clear that anybody _can_ do anything about it. Most places that can do
      something about it have switched over to using get_random_bytes_wait()
      or wait_for_random_bytes(), which is the right thing to do, but there is
      still much code that needs randomness sometimes during init, and as a
      geeneral rule, if you're not using one of the _wait functions or the
      readiness notifier callback, you're bound to be doing it wrong just
      based on that fact alone.
      
      So warning about this same first user that can't easily change is simply
      not an effective mechanism for anything at all. Users can't do anything
      about it, as the Kconfig text points out -- the problem isn't in
      userspace code -- and kernel developers don't or more often can't react
      to it.
      
      Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM
      is set, so that developers can debug things need be, or if it isn't set,
      don't show a warning at all.
      
      At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting
      random.ratelimit_disable=1 on by default, since if you care about one
      you probably care about the other too. And we can clean up usage around
      the related urandom_warning ratelimiter as well (whose behavior isn't
      changing), so that it properly counts missed messages after the 10
      message threshold is reached.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4b11eb0
    • Jason A. Donenfeld's avatar
      random: move initialization out of reseeding hot path · e6b205dc
      Jason A. Donenfeld authored
      commit 68c9c8b1
      
       upstream.
      
      Initialization happens once -- by way of credit_init_bits() -- and then
      it never happens again. Therefore, it doesn't need to be in
      crng_reseed(), which is a hot path that is called multiple times. It
      also doesn't make sense to have there, as initialization activity is
      better associated with initialization routines.
      
      After the prior commit, crng_reseed() now won't be called by multiple
      concurrent callers, which means that we can safely move the
      "finialize_init" logic into crng_init_bits() unconditionally.
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6b205dc
    • Jason A. Donenfeld's avatar
      random: avoid initializing twice in credit race · 8fe9ac5e
      Jason A. Donenfeld authored
      commit fed7ef06
      
       upstream.
      
      Since all changes of crng_init now go through credit_init_bits(), we can
      fix a long standing race in which two concurrent callers of
      credit_init_bits() have the new bit count >= some threshold, but are
      doing so with crng_init as a lower threshold, checked outside of a lock,
      resulting in crng_reseed() or similar being called twice.
      
      In order to fix this, we can use the original cmpxchg value of the bit
      count, and only change crng_init when the bit count transitions from
      below a threshold to meeting the threshold.
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fe9ac5e
    • Jason A. Donenfeld's avatar
      random: use symbolic constants for crng_init states · 4f8ab1ca
      Jason A. Donenfeld authored
      commit e3d2c5e7
      
       upstream.
      
      crng_init represents a state machine, with three states, and various
      rules for transitions. For the longest time, we've been managing these
      with "0", "1", and "2", and expecting people to figure it out. To make
      the code more obvious, replace these with proper enum values
      representing the transition, and then redocument what each of these
      states mean.
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f8ab1ca
    • Jason A. Donenfeld's avatar
      siphash: use one source of truth for siphash permutations · 9ebf07a7
      Jason A. Donenfeld authored
      commit e73aaae2
      
       upstream.
      
      The SipHash family of permutations is currently used in three places:
      
      - siphash.c itself, used in the ordinary way it was intended.
      - random32.c, in a construction from an anonymous contributor.
      - random.c, as part of its fast_mix function.
      
      Each one of these places reinvents the wheel with the same C code, same
      rotation constants, and same symmetry-breaking constants.
      
      This commit tidies things up a bit by placing macros for the
      permutations and constants into siphash.h, where each of the three .c
      users can access them. It also leaves a note dissuading more users of
      them from emerging.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ebf07a7
    • Jason A. Donenfeld's avatar
      random: help compiler out with fast_mix() by using simpler arguments · c5ff607d
      Jason A. Donenfeld authored
      commit 791332b3
      
       upstream.
      
      Now that fast_mix() has more than one caller, gcc no longer inlines it.
      That's fine. But it also doesn't handle the compound literal argument we
      pass it very efficiently, nor does it handle the loop as well as it
      could. So just expand the code to spell out this function so that it
      generates the same code as it did before. Performance-wise, this now
      behaves as it did before the last commit. The difference in actual code
      size on x86 is 45 bytes, which is less than a cache line.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5ff607d
    • Jason A. Donenfeld's avatar
      random: do not use input pool from hard IRQs · de2ba595
      Jason A. Donenfeld authored
      commit e3e33fc2
      
       upstream.
      
      Years ago, a separate fast pool was added for interrupts, so that the
      cost associated with taking the input pool spinlocks and mixing into it
      would be avoided in places where latency is critical. However, one
      oversight was that add_input_randomness() and add_disk_randomness()
      still sometimes are called directly from the interrupt handler, rather
      than being deferred to a thread. This means that some unlucky interrupts
      will be caught doing a blake2s_compress() call and potentially spinning
      on input_pool.lock, which can also be taken by unprivileged users by
      writing into /dev/urandom.
      
      In order to fix this, add_timer_randomness() now checks whether it is
      being called from a hard IRQ and if so, just mixes into the per-cpu IRQ
      fast pool using fast_mix(), which is much faster and can be done
      lock-free. A nice consequence of this, as well, is that it means hard
      IRQ context FPU support is likely no longer useful.
      
      The entropy estimation algorithm used by add_timer_randomness() is also
      somewhat different than the one used for add_interrupt_randomness(). The
      former looks at deltas of deltas of deltas, while the latter just waits
      for 64 interrupts for one bit or for one second since the last bit. In
      order to bridge these, and since add_interrupt_randomness() runs after
      an add_timer_randomness() that's called from hard IRQ, we add to the
      fast pool credit the related amount, and then subtract one to account
      for add_interrupt_randomness()'s contribution.
      
      A downside of this, however, is that the num argument is potentially
      attacker controlled, which puts a bit more pressure on the fast_mix()
      sponge to do more than it's really intended to do. As a mitigating
      factor, the first 96 bits of input aren't attacker controlled (a cycle
      counter followed by zeros), which means it's essentially two rounds of
      siphash rather than one, which is somewhat better. It's also not that
      much different from add_interrupt_randomness()'s use of the irq stack
      instruction pointer register.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Filipe Manana <fdmanana@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de2ba595
    • Jason A. Donenfeld's avatar
      random: order timer entropy functions below interrupt functions · 1705dc1f
      Jason A. Donenfeld authored
      commit a4b5c26b
      
       upstream.
      
      There are no code changes here; this is just a reordering of functions,
      so that in subsequent commits, the timer entropy functions can call into
      the interrupt ones.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1705dc1f
    • Jason A. Donenfeld's avatar
      random: do not pretend to handle premature next security model · c3b04928
      Jason A. Donenfeld authored
      commit e85c0fc1 upstream.
      
      Per the thread linked below, "premature next" is not considered to be a
      realistic threat model, and leads to more serious security problems.
      
      "Premature next" is the scenario in which:
      
      - Attacker compromises the current state of a fully initialized RNG via
        some kind of infoleak.
      - New bits of entropy are added directly to the key used to generate the
        /dev/urandom stream, without any buffering or pooling.
      - Attacker then, somehow having read access to /dev/urandom, samples RNG
        output and brute forces the individual new bits that were added.
      - Result: the RNG never "recovers" from the initial compromise, a
        so-called violation of what academics term "post-compromise security".
      
      The usual solutions to this involve some form of delaying when entropy
      gets mixed into the crng. With Fortuna, this involves multiple input
      buckets. With what the Linux RNG was trying to do prior, this involves
      entropy estimation.
      
      However, by delaying when entropy gets mixed in, it also means that RNG
      compromises are extremely dangerous during the window of time before
      the RNG has gathered enough entropy, during which time nonces may become
      predictable (or repeated), ephemeral keys may not be secret, and so
      forth. Moreover, it's unclear how realistic "premature next" is from an
      attack perspective, if these attacks even make sense in practice.
      
      Put together -- and discussed in more detail in the thread below --
      these constitute grounds for just doing away with the current code that
      pretends to handle premature next. I say "pretends" because it wasn't
      doing an especially great job at it either; should we change our mind
      about this direction, we would probably implement Fortuna to "fix" the
      "problem", in which case, removing the pretend solution still makes
      sense.
      
      This also reduces the crng reseed period from 5 minutes down to 1
      minute. The rationale from the thread might lead us toward reducing that
      even further in the future (or even eliminating it), but that remains a
      topic of a future commit.
      
      At a high level, this patch changes semantics from:
      
          Before: Seed for the first time after 256 "bits" of estimated
          entropy have been accumulated since the system booted. Thereafter,
          reseed once every five minutes, but only if 256 new "bits" have been
          accumulated since the last reseeding.
      
          After: Seed for the first time after 256 "bits" of estimated entropy
          have been accumulated since the system booted. Thereafter, reseed
          once every minute.
      
      Most of this patch is renaming and removing: POOL_MIN_BITS becomes
      POOL_INIT_BITS, credit_entropy_bits() becomes credit_init_bits(),
      crng_reseed() loses its "force" parameter since it's now always true,
      the drain_entropy() function no longer has any use so it's removed,
      entropy estimation is skipped if we've already init'd, the various
      notifiers for "low on entropy" are now only active prior to init, and
      finally, some documentation comments are cleaned up here and there.
      
      Link: https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/
      
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Nadia Heninger <nadiah@cs.ucsd.edu>
      Cc: Tom Ristenpart <ristenpart@cornell.edu>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3b04928