1. 04 Dec, 2014 2 commits
  2. 30 Jul, 2014 1 commit
    • Eric W. Biederman's avatar
      namespaces: Use task_lock and not rcu to protect nsproxy · 728dba3a
      Eric W. Biederman authored
      The synchronous syncrhonize_rcu in switch_task_namespaces makes setns
      a sufficiently expensive system call that people have complained.
      
      Upon inspect nsproxy no longer needs rcu protection for remote reads.
      remote reads are rare.  So optimize for same process reads and write
      by switching using rask_lock instead.
      
      This yields a simpler to understand lock, and a faster setns system call.
      
      In particular this fixes a performance regression observed
      by Rafael David Tinoco <rafael.tinoco@canonical.com>.
      
      This is effectively a revert of Pavel Emelyanov's commit
      cf7b708c
      
       Make access to task's nsproxy lighter
      from 2007.  The race this originialy fixed no longer exists as
      do_notify_parent uses task_active_pid_ns(parent) instead of
      parent->nsproxy.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      728dba3a
  3. 31 Aug, 2013 1 commit
  4. 27 Aug, 2013 2 commits
  5. 01 May, 2013 1 commit
  6. 23 Feb, 2013 1 commit
  7. 22 Feb, 2013 1 commit
  8. 20 Nov, 2012 4 commits
  9. 19 Nov, 2012 6 commits
    • Eric W. Biederman's avatar
      vfs: Add a user namespace reference from struct mnt_namespace · 771b1371
      Eric W. Biederman authored
      
      This will allow for support for unprivileged mounts in a new user namespace.
      Acked-by: default avatar"Serge E. Hallyn" <serge@hallyn.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      771b1371
    • Eric W. Biederman's avatar
      pidns: Support unsharing the pid namespace. · 50804fe3
      Eric W. Biederman authored
      
      Unsharing of the pid namespace unlike unsharing of other namespaces
      does not take affect immediately.  Instead it affects the children
      created with fork and clone.  The first of these children becomes the init
      process of the new pid namespace, the rest become oddball children
      of pid 0.  From the point of view of the new pid namespace the process
      that created it is pid 0, as it's pid does not map.
      
      A couple of different semantics were considered but this one was
      settled on because it is easy to implement and it is usable from
      pam modules.  The core reasons for the existence of unshare.
      
      I took a survey of the callers of pam modules and the following
      appears to be a representative sample of their logic.
      {
      	setup stuff include pam
      	child = fork();
      	if (!child) {
      		setuid()
                      exec /bin/bash
              }
              waitpid(child);
      
              pam and other cleanup
      }
      
      As you can see there is a fork to create the unprivileged user
      space process.  Which means that the unprivileged user space
      process will appear as pid 1 in the new pid namespace.  Further
      most login processes do not cope with extraneous children which
      means shifting the duty of reaping extraneous child process to
      the creator of those extraneous children makes the system more
      comprehensible.
      
      The practical reason for this set of pid namespace semantics is
      that it is simple to implement and verify they work correctly.
      Whereas an implementation that requres changing the struct
      pid on a process comes with a lot more races and pain.  Not
      the least of which is that glibc caches getpid().
      
      These semantics are implemented by having two notions
      of the pid namespace of a proces.  There is task_active_pid_ns
      which is the pid namspace the process was created with
      and the pid namespace that all pids are presented to
      that process in.  The task_active_pid_ns is stored
      in the struct pid of the task.
      
      Then there is the pid namespace that will be used for children
      that pid namespace is stored in task->nsproxy->pid_ns.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      50804fe3
    • Eric W. Biederman's avatar
      pidns: Use task_active_pid_ns where appropriate · 17cf22c3
      Eric W. Biederman authored
      
      The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
      aka ns_of_pid(task_pid(tsk)) should have the same number of
      cache line misses with the practical difference that
      ns_of_pid(task_pid(tsk)) is released later in a processes life.
      
      Furthermore by using task_active_pid_ns it becomes trivial
      to write an unshare implementation for the the pid namespace.
      
      So I have used task_active_pid_ns everywhere I can.
      
      In fork since the pid has not yet been attached to the
      process I use ns_of_pid, to achieve the same effect.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      17cf22c3
    • Eric W. Biederman's avatar
      pidns: Capture the user namespace and filter ns_last_pid · 49f4d8b9
      Eric W. Biederman authored
      
      - Capture the the user namespace that creates the pid namespace
      - Use that user namespace to test if it is ok to write to
        /proc/sys/kernel/ns_last_pid.
      
      Zhao Hongjiang <zhaohongjiang@huawei.com> noticed I was missing a put_user_ns
      in when destroying a pid_ns.  I have foloded his patch into this one
      so that bisects will work properly.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      49f4d8b9
    • Eric W. Biederman's avatar
      userns: make each net (net_ns) belong to a user_ns · 038e7332
      Eric W. Biederman authored
      
      The user namespace which creates a new network namespace owns that
      namespace and all resources created in it.  This way we can target
      capability checks for privileged operations against network resources to
      the user_ns which created the network namespace in which the resource
      lives.  Privilege to the user namespace which owns the network
      namespace, or any parent user namespace thereof, provides the same
      privilege to the network resource.
      
      This patch is reworked from a version originally by
      Serge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      038e7332
    • Eric W. Biederman's avatar
      userns: make each net (net_ns) belong to a user_ns · d328b836
      Eric W. Biederman authored
      
      The user namespace which creates a new network namespace owns that
      namespace and all resources created in it.  This way we can target
      capability checks for privileged operations against network resources to
      the user_ns which created the network namespace in which the resource
      lives.  Privilege to the user namespace which owns the network
      namespace, or any parent user namespace thereof, provides the same
      privilege to the network resource.
      
      This patch is reworked from a version originally by
      Serge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d328b836
  10. 31 Oct, 2011 1 commit
    • Paul Gortmaker's avatar
      kernel: Map most files to use export.h instead of module.h · 9984de1a
      Paul Gortmaker authored
      
      The changed files were only including linux/module.h for the
      EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
      onto the isolated export header for faster compile times.
      
      Nothing to see here but a whole lot of instances of:
      
        -#include <linux/module.h>
        +#include <linux/export.h>
      
      This commit is only changing the kernel dir; next targets
      will probably be mm, fs, the arch dirs, etc.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      9984de1a
  11. 20 Jul, 2011 1 commit
  12. 27 May, 2011 1 commit
    • Daniel Lezcano's avatar
      cgroup: remove the ns_cgroup · a77aea92
      Daniel Lezcano authored
      The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
      leads to some problems:
      
        * cgroup creation is out-of-control
        * cgroup name can conflict when pids are looping
        * it is not possible to have a single process handling a lot of
          namespaces without falling in a exponential creation time
        * we may want to create a namespace without creating a cgroup
      
        The ns_cgroup was replaced by a compatibility flag 'clone_children',
        where a newly created cgroup will copy the parent cgroup values.
        The userspace has to manually create a cgroup and add a task to
        the 'tasks' file.
      
      This patch removes the ns_cgroup as suggested in the following thread:
      
      https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
      
      The 'cgroup_clone' function is removed because it is no longer used.
      
      This is a userspace-visible change.  Commit 45531757 ("cgroup: notify
      ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
      printk warning us...
      a77aea92
  13. 10 May, 2011 1 commit
    • Eric W. Biederman's avatar
      ns: Introduce the setns syscall · 0663c6f8
      Eric W. Biederman authored
      
      With the networking stack today there is demand to handle
      multiple network stacks at a time.  Not in the context
      of containers but in the context of people doing interesting
      things with routing.
      
      There is also demand in the context of containers to have
      an efficient way to execute some code in the container itself.
      If nothing else it is very useful ad a debugging technique.
      
      Both problems can be solved by starting some form of login
      daemon in the namespaces people want access to, or you
      can play games by ptracing a process and getting the
      traced process to do things you want it to do. However
      it turns out that a login daemon or a ptrace puppet
      controller are more code, they are more prone to
      failure, and generally they are less efficient than
      simply changing the namespace of a process to a
      specified one.
      
      Pieces of this puzzle can also be solved by instead of
      coming up with a general purpose system call coming up
      with targed system calls perhaps socketat that solve
      a subset of the larger problem.  Overall that appears
      to be more work for less reward.
      
      int setns(int fd, int nstype);
      
      The fd argument is a file descriptor referring to a proc
      file of the namespace you want to switch the process to.
      
      In the setns system call the nstype is 0 or specifies
      an clone flag of the namespace you intend to change
      to prevent changing a namespace unintentionally.
      
      v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
      v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
      v4: Moved wiring up of the system call to another patch
      v5: Cleaned up the system call arguments
          - Changed the order.
          - Modified nstype to take the standard clone flags.
      v6: Added missing error handling as pointed out by Matt Helsley <matthltc@us.ibm.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      0663c6f8
  14. 24 Mar, 2011 4 commits
  15. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        bloc...
      5a0e3ad6
  16. 12 Mar, 2010 1 commit
  17. 18 Jun, 2009 1 commit
  18. 24 Nov, 2008 1 commit
    • Serge Hallyn's avatar
      User namespaces: set of cleanups (v2) · 18b6e041
      Serge Hallyn authored
      
      The user_ns is moved from nsproxy to user_struct, so that a struct
      cred by itself is sufficient to determine access (which it otherwise
      would not be).  Corresponding ecryptfs fixes (by David Howells) are
      here as well.
      
      Fix refcounting.  The following rules now apply:
              1. The task pins the user struct.
              2. The user struct pins its user namespace.
              3. The user namespace pins the struct user which created it.
      
      User namespaces are cloned during copy_creds().  Unsharing a new user_ns
      is no longer possible.  (We could re-add that, but it'll cause code
      duplication and doesn't seem useful if PAM doesn't need to clone user
      namespaces).
      
      When a user namespace is created, its first user (uid 0) gets empty
      keyrings and a clean group_info.
      
      This incorporates a previous patch by David Howells.  Here
      is his original patch description:
      
      >I suggest adding the attached incremental patch.  It makes the following
      >changes:
      >
      > (1) Provides a current_user_ns() macro to wrap accesses to current's user
      >     namespace.
      >
      > (2) Fixes eCryptFS.
      >
      > (3) Renames create_new_userns() to create_user_ns() to be more consistent
      >     with the other associated functions and because the 'new' in the name is
      >     superfluous.
      >
      > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the
      >     beginning of do_fork() so that they're done prior to making any attempts
      >     at allocation.
      >
      > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds
      >     to fill in rather than have it return the new root user.  I don't imagine
      >     the new root user being used for anything other than filling in a cred
      >     struct.
      >
      >     This also permits me to get rid of a get_uid() and a free_uid(), as the
      >     reference the creds were holding on the old user_struct can just be
      >     transferred to the new namespace's creator pointer.
      >
      > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under
      >     preparation rather than doing it in copy_creds().
      >
      >David
      
      >Signed-off-by: David Howells <dhowells@redhat.com>
      
      Changelog:
      	Oct 20: integrate dhowells comments
      		1. leave thread_keyring alone
      		2. use current_user_ns() in set_user()
      Signed-off-by: default avatarSerge Hallyn <serue@us.ibm.com>
      18b6e041
  19. 23 Aug, 2008 1 commit
  20. 25 Jul, 2008 1 commit
    • Serge E. Hallyn's avatar
      cgroup_clone: use pid of newly created task for new cgroup · e885dcde
      Serge E. Hallyn authored
      
      cgroup_clone creates a new cgroup with the pid of the task.  This works
      correctly for unshare, but for clone cgroup_clone is called from
      copy_namespaces inside copy_process, which happens before the new pid is
      created.  As a result, the new cgroup was created with current's pid.
      This patch:
      
      	1. Moves the call inside copy_process to after the new pid
      	   is created
      	2. Passes the struct pid into ns_cgroup_clone (as it is not
      	   yet attached to the task)
      	3. Passes a name from ns_cgroup_clone() into cgroup_clone()
      	   so as to keep cgroup_clone() itself simpler
      	4. Uses pid_vnr() to get the process id value, so that the
      	   pid used to name the new cgroup is always the pid as it
      	   would be known to the task which did the cloning or
      	   unsharing.  I think that is the most intuitive thing to
      	   do.  This way, task t1 does clone(CLONE_NEWPID) to get
      	   t2, which does clone(CLONE_NEWPID) to get t3, then the
      	   cgroup for t3 will be named for the pid by which t2 knows
      	   t3.
      
      (Thanks to Dan Smith for finding the main bug)
      
      Changelog:
      	June 11: Incorporate Paul Menage's feedback:  don't pass
      	         NULL to ns_cgroup_clone from unshare, and reduce
      		 patch size by using 'nodename' in cgroup_clone.
      	June 10: Original version
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarSerge Hallyn <serge@us.ibm.com>
      Acked-by: default avatarPaul Menage <menage@google.com>
      Tested-by: default avatarDan Smith <danms@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e885dcde
  21. 29 Apr, 2008 1 commit
    • Serge E. Hallyn's avatar
      ipc: sysvsem: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC) · 02fdb36a
      Serge E. Hallyn authored
      
      CLONE_NEWIPC|CLONE_SYSVSEM interaction isn't handled properly.  This can cause
      a kernel memory corruption.  CLONE_NEWIPC must detach from the existing undo
      lists.
      
      Fix, part 3: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC).
      
      With unshare, specifying CLONE_SYSVSEM means unshare the sysvsem.  So it seems
      reasonable that CLONE_NEWIPC without CLONE_SYSVSEM would just imply
      CLONE_SYSVSEM.
      
      However with clone, specifying CLONE_SYSVSEM means *share* the sysvsem.  So
      calling clone(CLONE_SYSVSEM|CLONE_NEWIPC) is explicitly asking for something
      we can't allow.  So return -EINVAL in that case.
      
      [akpm@linux-foundation.org: cleanups]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02fdb36a
  22. 08 Feb, 2008 1 commit
    • Pavel Emelyanov's avatar
      namespaces: move the IPC namespace under IPC_NS option · ae5e1b22
      Pavel Emelyanov authored
      
      Currently the IPC namespace management code is spread over the ipc/*.c files.
      I moved this code into ipc/namespace.c file which is compiled out when needed.
      
      The linux/ipc_namespace.h file is used to store the prototypes of the
      functions in namespace.c and the stubs for NAMESPACES=n case.  This is done
      so, because the stub for copy_ipc_namespace requires the knowledge of the
      CLONE_NEWIPC flag, which is in sched.h.  But the linux/ipc.h file itself in
      included into many many .c files via the sys.h->sem.h sequence so adding the
      sched.h into it will make all these .c depend on sched.h which is not that
      good.  On the other hand the knowledge about the namespaces stuff is required
      in 4 .c files only.
      
      Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
      msg.c and shm.c files.  It turned out that moving these functions into
      namespaces.c is not that easy because they use many other calls and macros
      from the original file.  Moving them would make this patch complicated.  On
      the other hand all these functions can be consolidated, so I will send a
      separate patch doing this a bit later.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae5e1b22
  23. 19 Oct, 2007 4 commits
    • Pavel Emelyanov's avatar
      pid namespaces: allow cloning of new namespace · 30e49c26
      Pavel Emelyanov authored
      
      When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
      create a new struct pid for the new process.  Allocate pid_t's for the new
      process in the new pid namespace and all ancestor pid namespaces.  Make the
      newly cloned process the session and process group leader.
      
      Since the active pid namespace is special and expected to be the first entry
      in pid->upid_list, preserve the order of pid namespaces.
      
      The size of 'struct pid' is dependent on the the number of pid namespaces the
      process exists in, so we use multiple pid-caches'.  Only one pid cache is
      created during system startup and this used by processes that exist only in
      init_pid_ns.
      
      When a process clones its pid namespace, we create additional pid caches as
      necessary and use the pid cache to allocate 'struct pids' for that depth.
      
      Note, that with this patch the newly created namespace won't work, since the
      rest of the kernel still uses global pids, but this is to be fixed soon.  Init
      pid namespace still works.
      
      [oleg@tv-sign.ru: merge fix]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30e49c26
    • Pavel Emelyanov's avatar
      Make access to task's nsproxy lighter · cf7b708c
      Pavel Emelyanov authored
      
      When someone wants to deal with some other taks's namespaces it has to lock
      the task and then to get the desired namespace if the one exists.  This is
      slow on read-only paths and may be impossible in some cases.
      
      E.g.  Oleg recently noticed a race between unshare() and the (sent for
      review in cgroups) pid namespaces - when the task notifies the parent it
      has to know the parent's namespace, but taking the task_lock() is
      impossible there - the code is under write locked tasklist lock.
      
      On the other hand switching the namespace on task (daemonize) and releasing
      the namespace (after the last task exit) is rather rare operation and we
      can sacrifice its speed to solve the issues above.
      
      The access to other task namespaces is proposed to be performed
      like this:
      
           rcu_read_lock();
           nsproxy = task_nsproxy(tsk);
           if (nsproxy != NULL) {
                   / *
                     * work with the namespaces here
                     * e.g. get the reference on one of them
                     * /
           } / *
               * NULL task_nsproxy() means that this task is
               * almost dead (zombie)
               * /
           rcu_read_unlock();
      
      This patch has passed the review by Eric and Oleg :) and,
      of course, tested.
      
      [clg@fr.ibm.com: fix unshare()]
      [ebiederm@xmission.com: Update get_net_ns_by_pid]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf7b708c
    • Sukadev Bhattiprolu's avatar
      pid namespaces: define and use task_active_pid_ns() wrapper · 2894d650
      Sukadev Bhattiprolu authored
      
      With multiple pid namespaces, a process is known by some pid_t in every
      ancestor pid namespace.  Every time the process forks, the child process also
      gets a pid_t in every ancestor pid namespace.
      
      While a process is visible in >=1 pid namespaces, it can see pid_t's in only
      one pid namespace.  We call this pid namespace it's "active pid namespace",
      and it is always the youngest pid namespace in which the process is known.
      
      This patch defines and uses a wrapper to find the active pid namespace of a
      process.  The implementation of the wrapper will be changed in when support
      for multiple pid namespaces are added.
      
      Changelog:
      	2.6.22-rc4-mm2-pidns1:
      	- [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
      	  task_active_pid_ns() in child_reaper() since task->nsproxy
      	  can be NULL during task exit (so child_reaper() continues to
      	  use init_pid_ns).
      
      	  to implement child_reaper() since init_pid_ns.child_reaper to
      	  implement child_reaper() since tsk->nsproxy can be NULL during exit.
      
      	2.6.21-rc6-mm1:
      	- Rename task_pid_ns() to task_active_pid_ns() to reflect that a
      	  process can have multiple pid namespaces.
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: default avatarPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2894d650
    • Serge E. Hallyn's avatar
      cgroups: implement namespace tracking subsystem · 858d72ea
      Serge E. Hallyn authored
      
      When a task enters a new namespace via a clone() or unshare(), a new cgroup
      is created and the task moves into it.
      
      This version names cgroups which are automatically created using
      cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or
      cloned process.  (Thanks Pavel for the idea) This is safe because if the
      process unshares again, it will create
      
      	/cgroups/(...)/node_<pid>/node_<pid>
      
      The only possibilities (AFAICT) for a -EEXIST on unshare are
      
      	1. pid wraparound
      	2. a process fails an unshare, then tries again.
      
      Case 1 is unlikely enough that I ignore it (at least for now).  In case 2, the
      node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare()
      succeed.
      
      Changelog:
      	Name cloned cgroups as "node_<pid>".
      
      [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      858d72ea
  24. 17 Oct, 2007 1 commit