1. 08 Apr, 2022 1 commit
  2. 08 Jan, 2022 1 commit
  3. 18 Jun, 2021 1 commit
  4. 12 May, 2021 1 commit
    • Oleg Nesterov's avatar
      ptrace: make ptrace() fail if the tracee changed its pid unexpectedly · dbb5afad
      Oleg Nesterov authored
      
      Suppose we have 2 threads, the group-leader L and a sub-theread T,
      both parked in ptrace_stop(). Debugger tries to resume both threads
      and does
      
      	ptrace(PTRACE_CONT, T);
      	ptrace(PTRACE_CONT, L);
      
      If the sub-thread T execs in between, the 2nd PTRACE_CONT doesn not
      resume the old leader L, it resumes the post-exec thread T which was
      actually now stopped in PTHREAD_EVENT_EXEC. In this case the
      PTHREAD_EVENT_EXEC event is lost, and the tracer can't know that the
      tracee changed its pid.
      
      This patch makes ptrace() fail in this case until debugger does wait()
      and consumes PTHREAD_EVENT_EXEC which reports old_pid. This affects all
      ptrace requests except the "asynchronous" PTRACE_INTERRUPT/KILL.
      
      The patch doesn't add the new PTRACE_ option to not complicate the API,
      and I _hope_ this won't cause any noticeable regression:
      
      	- If debugger uses PTRACE_O_TRACEEXEC and the thread did an exec
      	  and the tracer does a ptrace request without having consumed
      	  the exec event, it's 100% sure that the thread the ptracer
      	  thinks it is targeting does not exist anymore, or isn't the
      	  same as the one it thinks it is targeting.
      
      	- To some degree this patch adds nothing new. In the scenario
      	  above ptrace(L) can fail with -ESRCH if it is called after the
      	  execing sub-thread wakes the leader up and before it "steals"
      	  the leader's pid.
      
      Test-case:
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <signal.h>
      	#include <sys/ptrace.h>
      	#include <sys/wait.h>
      	#include <errno.h>
      	#include <pthread.h>
      	#include <assert.h>
      
      	void *tf(void *arg)
      	{
      		execve("/usr/bin/true", NULL, NULL);
      		assert(0);
      
      		return NULL;
      	}
      
      	int main(void)
      	{
      		int leader = fork();
      		if (!leader) {
      			kill(getpid(), SIGSTOP);
      
      			pthread_t th;
      			pthread_create(&th, NULL, tf, NULL);
      			for (;;)
      				pause();
      
      			return 0;
      		}
      
      		waitpid(leader, NULL, WSTOPPED);
      
      		ptrace(PTRACE_SEIZE, leader, 0,
      				PTRACE_O_TRACECLONE | PTRACE_O_TRACEEXEC);
      		waitpid(leader, NULL, 0);
      
      		ptrace(PTRACE_CONT, leader, 0,0);
      		waitpid(leader, NULL, 0);
      
      		int status, thread = waitpid(-1, &status, 0);
      		assert(thread > 0 && thread != leader);
      		assert(status == 0x80137f);
      
      		ptrace(PTRACE_CONT, thread, 0,0);
      		/*
      		 * waitid() because waitpid(leader, &status, WNOWAIT) does not
      		 * report status. Why ????
      		 *
      		 * Why WEXITED? because we have another kernel problem connected
      		 * to mt-exec.
      		 */
      		siginfo_t info;
      		assert(waitid(P_PID, leader, &info, WSTOPPED|WEXITED|WNOWAIT) == 0);
      		assert(info.si_pid == leader && info.si_status == 0x0405);
      
      		/* OK, it sleeps in ptrace(PTRACE_EVENT_EXEC == 0x04) */
      		assert(ptrace(PTRACE_CONT, leader, 0,0) == -1);
      		assert(errno == ESRCH);
      
      		assert(leader == waitpid(leader, &status, WNOHANG));
      		assert(status == 0x04057f);
      
      		assert(ptrace(PTRACE_CONT, leader, 0,0) == 0);
      
      		return 0;
      	}
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarSimon Marchi <simon.marchi@efficios.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarPedro Alves <palves@redhat.com>
      Acked-by: default avatarSimon Marchi <simon.marchi@efficios.com>
      Acked-by: default avatarJan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbb5afad
  5. 27 Mar, 2021 1 commit
  6. 17 Mar, 2021 1 commit
  7. 22 Feb, 2021 1 commit
  8. 15 Dec, 2020 1 commit
  9. 17 Nov, 2020 1 commit
    • Mickaël Salaün's avatar
      ptrace: Set PF_SUPERPRIV when checking capability · cf237052
      Mickaël Salaün authored
      Commit 69f594a3 ("ptrace: do not audit capability check when outputing
      /proc/pid/stat") replaced the use of ns_capable() with
      has_ns_capability{,_noaudit}() which doesn't set PF_SUPERPRIV.
      
      Commit 6b3ad664 ("ptrace: reintroduce usage of subjective credentials in
      ptrace_has_cap()") replaced has_ns_capability{,_noaudit}() with
      security_capable(), which doesn't set PF_SUPERPRIV neither.
      
      Since commit 98f368e9 ("kernel: Add noaudit variant of ns_capable()"), a
      new ns_capable_noaudit() helper is available.  Let's use it!
      
      As a result, the signature of ptrace_has_cap() is restored to its original one.
      
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Serge E. Hallyn <serge@hallyn.com>
      Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
      Cc: stable@vger.kernel.org
      Fixes: 6b3ad664 ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()")
      Fixes: 69f594a3
      
       ("ptrace: do not audit capability check when outputing /proc/pid/stat")
      Signed-off-by: default avatarMickaël Salaün <mic@linux.microsoft.com>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20201030123849.770769-2-mic@digikod.net
      cf237052
  10. 16 Nov, 2020 2 commits
  11. 18 Jan, 2020 1 commit
    • Christian Brauner's avatar
      ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() · 6b3ad664
      Christian Brauner authored
      Commit 69f594a3
      
       ("ptrace: do not audit capability check when outputing /proc/pid/stat")
      introduced the ability to opt out of audit messages for accesses to various
      proc files since they are not violations of policy.  While doing so it
      somehow switched the check from ns_capable() to
      has_ns_capability{_noaudit}(). That means it switched from checking the
      subjective credentials of the task to using the objective credentials. This
      is wrong since. ptrace_has_cap() is currently only used in
      ptrace_may_access() And is used to check whether the calling task (subject)
      has the CAP_SYS_PTRACE capability in the provided user namespace to operate
      on the target task (object). According to the cred.h comments this would
      mean the subjective credentials of the calling task need to be used.
      This switches ptrace_has_cap() to use security_capable(). Because we only
      call ptrace_has_cap() in ptrace_may_access() and in there we already have a
      stable reference to the calling task's creds under rcu_read_lock() there's
      no need to go through another series of dereferences and rcu locking done
      in ns_capable{_noaudit}().
      
      As one example where this might be particularly problematic, Jann pointed
      out that in combination with the upcoming IORING_OP_OPENAT feature, this
      bug might allow unprivileged users to bypass the capability checks while
      asynchronously opening files like /proc/*/mem, because the capability
      checks for this would be performed against kernel credentials.
      
      To illustrate on the former point about this being exploitable: When
      io_uring creates a new context it records the subjective credentials of the
      caller. Later on, when it starts to do work it creates a kernel thread and
      registers a callback. The callback runs with kernel creds for
      ktask->real_cred and ktask->cred. To prevent this from becoming a
      full-blown 0-day io_uring will call override_cred() and override
      ktask->cred with the subjective credentials of the creator of the io_uring
      instance. With ptrace_has_cap() currently looking at ktask->real_cred this
      override will be ineffective and the caller will be able to open arbitray
      proc files as mentioned above.
      Luckily, this is currently not exploitable but will turn into a 0-day once
      IORING_OP_OPENAT{2} land in v5.6. Fix it now!
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Fixes: 69f594a3
      
       ("ptrace: do not audit capability check when outputing /proc/pid/stat")
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      6b3ad664
  12. 17 Jul, 2019 1 commit
    • Elvira Khabirova's avatar
      ptrace: add PTRACE_GET_SYSCALL_INFO request · 201766a2
      Elvira Khabirova authored
      PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
      details of the syscall the tracee is blocked in.
      
      There are two reasons for a special syscall-related ptrace request.
      
      Firstly, with the current ptrace API there are cases when ptracer cannot
      retrieve necessary information about syscalls.  Some examples include:
      
       * The notorious int-0x80-from-64-bit-task issue. See [1] for details.
         In short, if a 64-bit task performs a syscall through int 0x80, its
         tracer has no reliable means to find out that the syscall was, in
         fact, a compat syscall, and misidentifies it.
      
       * Syscall-enter-stop and syscall-exit-stop look the same for the
         tracer. Common practice is to keep track of the sequence of
         ptrace-stops in order not to mix the two syscall-stops up. But it is
         not as simple as it looks; for example, strace had a (just recently
         fixed) long-standing bug where attaching strace to a tracee that is
         performing the execve system call led to the tracer identifying the
         following syscall-exit-stop as syscall-enter-stop, which messed up
         all the state tracking.
      
       * Since the introduction of commit 84d77d3f ("ptrace: Don't allow
         accessing an undumpable mm"), both PTRACE_PEEKDATA and
         process_vm_readv become unavailable when the process dumpable flag is
         cleared. On such architectures as ia64 this results in all syscall
         arguments being unavailable for the tracer.
      
      Secondly, ptracers also have to support a lot of arch-specific code for
      obtaining information about the tracee.  For some architectures, this
      requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
      argument and return value.
      
      ptrace(2) man page:
      
      long ptrace(enum __ptrace_request request, pid_t pid,
                  void *addr, void *data);
      ...
      PTRACE_GET_SYSCALL_INFO
             Retrieve information about the syscall that caused the stop.
             The information is placed into the buffer pointed by "data"
             argument, which should be a pointer to a buffer of type
             "struct ptrace_syscall_info".
             The "addr" argument contains the size of the buffer pointed to
             by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
             The return value contains the number of bytes available
             to be written by the kernel.
             If the size of data to be written by the kernel exceeds the size
             specified by "addr" argument, the output is truncated.
      
      [ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
        Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org
      Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org
      
      Signed-off-by: default avatarElvira Khabirova <lineprinter@altlinux.org>
      Co-developed-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Eugene Syromyatnikov <esyr@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greentime Hu <greentime@andestech.com>
      Cc: Helge Deller <deller@gmx.de>	[parisc]
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: kbuild test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      201766a2
  13. 04 Jul, 2019 1 commit
    • Jann Horn's avatar
      ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME · 6994eefb
      Jann Horn authored
      Fix two issues:
      
      When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU
      reference to the parent's objective credentials, then give that pointer
      to get_cred().  However, the object lifetime rules for things like
      struct cred do not permit unconditionally turning an RCU reference into
      a stable reference.
      
      PTRACE_TRACEME records the parent's credentials as if the parent was
      acting as the subject, but that's not the case.  If a malicious
      unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
      at a later point, the parent process becomes attacker-controlled
      (because it drops privileges and calls execve()), the attacker ends up
      with control over two processes with a privileged ptrace relationship,
      which can be abused to ptrace a suid binary and obtain root privileges.
      
      Fix both of these by always recording the credentials of the process
      that is requesting the creation of the ptrace relationship:
      current_cred() can't change under us, and current is the proper subject
      for access control.
      
      This change is theoretically userspace-visible, but I am not aware of
      any code that it will actually break.
      
      Fixes: 64b875f7
      
       ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6994eefb
  14. 11 Jun, 2019 1 commit
  15. 05 Jun, 2019 1 commit
  16. 30 May, 2019 1 commit
    • Eric W. Biederman's avatar
      signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO · f6e2aa91
      Eric W. Biederman authored
      Recently syzbot in conjunction with KMSAN reported that
      ptrace_peek_siginfo can copy an uninitialized siginfo to userspace.
      Inspecting ptrace_peek_siginfo confirms this.
      
      The problem is that off when initialized from args.off can be
      initialized to a negaive value.  At which point the "if (off >= 0)"
      test to see if off became negative fails because off started off
      negative.
      
      Prevent the core problem by adding a variable found that is only true
      if a siginfo is found and copied to a temporary in preparation for
      being copied to userspace.
      
      Prevent args.off from being truncated when being assigned to off by
      testing that off is <= the maximum possible value of off.  Convert off
      to an unsigned long so that we should not have to truncate args.off,
      we have well defined overflow behavior so if we add another check we
      won't risk fighting undefined compiler behavior, and so that we have a
      type whose maximum value is easy to test for.
      
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+0d602a1b0d8c95bdf299@syzkaller.appspotmail.com
      Fixes: 84c751bd
      
       ("ptrace: add ability to retrieve signals without removing from a queue (v4)")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      f6e2aa91
  17. 21 May, 2019 1 commit
  18. 29 Mar, 2019 1 commit
  19. 04 Jan, 2019 1 commit
    • Linus Torvalds's avatar
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds authored
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csk...
      96d4f267
  20. 28 Nov, 2018 1 commit
    • Thomas Gleixner's avatar
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · 46f7ecb1
      Thomas Gleixner authored
      
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de
      
      46f7ecb1
  21. 03 Oct, 2018 2 commits
    • Eric W. Biederman's avatar
      signal: Distinguish between kernel_siginfo and siginfo · ae7795bc
      Eric W. Biederman authored
      
      Linus recently observed that if we did not worry about the padding
      member in struct siginfo it is only about 48 bytes, and 48 bytes is
      much nicer than 128 bytes for allocating on the stack and copying
      around in the kernel.
      
      The obvious thing of only adding the padding when userspace is
      including siginfo.h won't work as there are sigframe definitions in
      the kernel that embed struct siginfo.
      
      So split siginfo in two; kernel_siginfo and siginfo.  Keeping the
      traditional name for the userspace definition.  While the version that
      is used internally to the kernel and ultimately will not be padded to
      128 bytes is called kernel_siginfo.
      
      The definition of struct kernel_siginfo I have put in include/signal_types.h
      
      A set of buildtime checks has been added to verify the two structures have
      the same field offsets.
      
      To make it easy to verify the change kernel_siginfo retains the same
      size as siginfo.  The reduction in size comes in a following change.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      ae7795bc
    • Eric W. Biederman's avatar
      signal: Introduce copy_siginfo_from_user and use it's return value · 4cd2e0e7
      Eric W. Biederman authored
      
      In preparation for using a smaller version of siginfo in the kernel
      introduce copy_siginfo_from_user and use it when siginfo is copied from
      userspace.
      
      Make the pattern for using copy_siginfo_from_user and
      copy_siginfo_from_user32 to capture the return value and return that
      value on error.
      
      This is a necessary prerequisite for using a smaller siginfo
      in the kernel than the kernel exports to userspace.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      4cd2e0e7
  22. 26 Sep, 2018 1 commit
    • Jiri Kosina's avatar
      x86/speculation: Apply IBPB more strictly to avoid cross-process data leak · dbfe2953
      Jiri Kosina authored
      Currently, IBPB is only issued in cases when switching into a non-dumpable
      process, the rationale being to protect such 'important and security
      sensitive' processess (such as GPG) from data leaking into a different
      userspace process via spectre v2.
      
      This is however completely insufficient to provide proper userspace-to-userpace
      spectrev2 protection, as any process can poison branch buffers before being
      scheduled out, and the newly scheduled process immediately becomes spectrev2
      victim.
      
      In order to minimize the performance impact (for usecases that do require
      spectrev2 protection), issue the barrier only in cases when switching between
      processess where the victim can't be ptraced by the potential attacker (as in
      such cases, the attacker doesn't have to bother with branch buffers at all).
      
      [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
        PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
        fine-grained ]
      
      Fixes: 18bf3c3e
      
       ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
      Originally-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm
      dbfe2953
  23. 11 Sep, 2018 1 commit
  24. 07 Feb, 2018 1 commit
  25. 16 Jan, 2018 2 commits
  26. 28 Nov, 2017 1 commit
    • Tycho Andersen's avatar
      ptrace, seccomp: add support for retrieving seccomp metadata · 26500475
      Tycho Andersen authored
      
      With the new SECCOMP_FILTER_FLAG_LOG, we need to be able to extract these
      flags for checkpoint restore, since they describe the state of a filter.
      
      So, let's add PTRACE_SECCOMP_GET_METADATA, similar to ..._GET_FILTER, which
      returns the metadata of the nth filter (right now, just the flags).
      Hopefully this will be future proof, and new per-filter metadata can be
      added to this struct.
      Signed-off-by: default avatarTycho Andersen <tycho@docker.com>
      CC: Kees Cook <keescook@chromium.org>
      CC: Andy Lutomirski <luto@amacapital.net>
      CC: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      26500475
  27. 24 Jul, 2017 1 commit
    • Eric W. Biederman's avatar
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman authored
      
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
  28. 23 May, 2017 1 commit
    • Eric W. Biederman's avatar
      ptrace: Properly initialize ptracer_cred on fork · c70d9d80
      Eric W. Biederman authored
      
      When I introduced ptracer_cred I failed to consider the weirdness of
      fork where the task_struct copies the old value by default.  This
      winds up leaving ptracer_cred set even when a process forks and
      the child process does not wind up being ptraced.
      
      Because ptracer_cred is not set on non-ptraced processes whose
      parents were ptraced this has broken the ability of the enlightenment
      window manager to start setuid children.
      
      Fix this by properly initializing ptracer_cred in ptrace_init_task
      
      This must be done with a little bit of care to preserve the current value
      of ptracer_cred when ptrace carries through fork.  Re-reading the
      ptracer_cred from the ptracing process at this point is inconsistent
      with how PT_PTRACE_CAP has been maintained all of these years.
      Tested-by: default avatarTakashi Iwai <tiwai@suse.de>
      Fixes: 64b875f7
      
       ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      c70d9d80
  29. 08 Apr, 2017 1 commit
  30. 02 Mar, 2017 3 commits
  31. 22 Nov, 2016 3 commits
    • Eric W. Biederman's avatar
      ptrace: Don't allow accessing an undumpable mm · 84d77d3f
      Eric W. Biederman authored
      
      It is the reasonable expectation that if an executable file is not
      readable there will be no way for a user without special privileges to
      read the file.  This is enforced in ptrace_attach but if ptrace
      is already attached before exec there is no enforcement for read-only
      executables.
      
      As the only way to read such an mm is through access_process_vm
      spin a variant called ptrace_access_vm that will fail if the
      target process is not being ptraced by the current process, or
      the current process did not have sufficient privileges when ptracing
      began to read the target processes mm.
      
      In the ptrace implementations replace access_process_vm by
      ptrace_access_vm.  There remain several ptrace sites that still use
      access_process_vm as they are reading the target executables
      instructions (for kernel consumption) or register stacks.  As such it
      does not appear necessary to add a permission check to those calls.
      
      This bug has always existed in Linux.
      
      Fixes: v1.0
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      84d77d3f
    • Eric W. Biederman's avatar
      ptrace: Capture the ptracer's creds not PT_PTRACE_CAP · 64b875f7
      Eric W. Biederman authored
      
      When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
      overlooked.  This can result in incorrect behavior when an application
      like strace traces an exec of a setuid executable.
      
      Further PT_PTRACE_CAP does not have enough information for making good
      security decisions as it does not report which user namespace the
      capability is in.  This has already allowed one mistake through
      insufficient granulariy.
      
      I found this issue when I was testing another corner case of exec and
      discovered that I could not get strace to set PT_PTRACE_CAP even when
      running strace as root with a full set of caps.
      
      This change fixes the above issue with strace allowing stracing as
      root a setuid executable without disabling setuid.  More fundamentaly
      this change allows what is allowable at all times, by using the correct
      information in it's decision.
      
      Cc: stable@vger.kernel.org
      Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      64b875f7
    • Eric W. Biederman's avatar
      mm: Add a user_ns owner to mm_struct and fix ptrace permission checks · bfedb589
      Eric W. Biederman authored
      
      During exec dumpable is cleared if the file that is being executed is
      not readable by the user executing the file.  A bug in
      ptrace_may_access allows reading the file if the executable happens to
      enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
      unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
      
      This problem is fixed with only necessary userspace breakage by adding
      a user namespace owner to mm_struct, captured at the time of exec, so
      it is clear in which user namespace CAP_SYS_PTRACE must be present in
      to be able to safely give read permission to the executable.
      
      The function ptrace_may_access is modified to verify that the ptracer
      has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
      This ensures that if the task changes it's cred into a subordinate
      user namespace it does not become ptraceable.
      
      The function ptrace_attach is modified to only set PT_PTRACE_CAP when
      CAP_SYS_PTRACE is held over task->mm->user_ns.  The intent of
      PT_PTRACE_CAP is to be a flag to note that whatever permission changes
      the task might go through the tracer has sufficient permissions for
      it not to be an issue.  task->cred->user_ns is always the same
      as or descendent of mm->user_ns.  Which guarantees that having
      CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
      credentials.
      
      To prevent regressions mm->dumpable and mm->user_ns are not considered
      when a task has no mm.  As simply failing ptrace_may_attach causes
      regressions in privileged applications attempting to read things
      such as /proc/<pid>/stat
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Fixes: 8409cca7
      
       ("userns: allow ptrace from non-init user namespaces")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      bfedb589
  32. 19 Oct, 2016 1 commit
  33. 11 Oct, 2016 1 commit