1. 01 Aug, 2012 1 commit
  2. 31 Jul, 2012 2 commits
    • Steven Rostedt's avatar
      sysctl: suppress kmemleak messages · fd4b616b
      Steven Rostedt authored
      
      register_sysctl_table() is a strange function, as it makes internal
      allocations (a header) to register a sysctl_table.  This header is a
      handle to the table that is created, and can be used to unregister the
      table.  But if the table is permanent and never unregistered, the header
      acts the same as a static variable.
      
      Unfortunately, this allocation of memory that is never expected to be
      freed fools kmemleak in thinking that we have leaked memory.  For those
      sysctl tables that are never unregistered, and have no pointer referencing
      them, kmemleak will think that these are memory leaks:
      
      unreferenced object 0xffff880079fb9d40 (size 192):
        comm "swapper/0", pid 0, jiffies 4294667316 (age 12614.152s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8146b590>] kmemleak_alloc+0x73/0x98
          [<ffffffff8110a935>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
          [<ffffffff8110b852>] __kmalloc+0x107/0x153
          [<ffffffff8116fa72>] kzalloc.constprop.8+0xe/0x10
          [<ffffffff811703c9>] __register_sysctl_paths+0xe1/0x160
          [<ffffffff81170463>] register_sysctl_paths+0x1b/0x1d
          [<ffffffff8117047d>] register_sysctl_table+0x18/0x1a
          [<ffffffff81afb0a1>] sysctl_init+0x10/0x14
          [<ffffffff81b05a6f>] proc_sys_init+0x2f/0x31
          [<ffffffff81b0584c>] proc_root_init+0xa5/0xa7
          [<ffffffff81ae5b7e>] start_kernel+0x3d0/0x40a
          [<ffffffff81ae52a7>] x86_64_start_reservations+0xae/0xb2
          [<ffffffff81ae53ad>] x86_64_start_kernel+0x102/0x111
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The sysctl_base_table used by sysctl itself is one such instance that
      registers the table to never be unregistered.
      
      Use kmemleak_not_leak() to suppress the kmemleak false positive.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd4b616b
    • Kees Cook's avatar
      coredump: warn about unsafe suid_dumpable / core_pattern combo · 54b50199
      Kees Cook authored
      
      When suid_dumpable=2, detect unsafe core_pattern settings and warn when
      they are seen.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54b50199
  3. 05 Apr, 2012 1 commit
  4. 29 Mar, 2012 1 commit
  5. 28 Mar, 2012 2 commits
  6. 13 Feb, 2012 1 commit
  7. 25 Jan, 2012 3 commits
  8. 05 Dec, 2011 1 commit
  9. 01 Nov, 2011 1 commit
    • Dan Ballard's avatar
      kernel/sysctl.c: add cap_last_cap to /proc/sys/kernel · 73efc039
      Dan Ballard authored
      
      Userspace needs to know the highest valid capability of the running
      kernel, which right now cannot reliably be retrieved from the header files
      only.  The fact that this value cannot be determined properly right now
      creates various problems for libraries compiled on newer header files
      which are run on older kernels.  They assume capabilities are available
      which actually aren't.  libcap-ng is one example.  And we ran into the
      same problem with systemd too.
      
      Now the capability is exported in /proc/sys/kernel/cap_last_cap.
      
      [akpm@linux-foundation.org: make cap_last_cap const, per Ulrich]
      Signed-off-by: default avatarDan Ballard <dan@mindstab.net>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Ulrich Drepper <drepper@akkadia.org>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73efc039
  10. 30 Oct, 2011 1 commit
  11. 14 Aug, 2011 1 commit
  12. 20 Jul, 2011 1 commit
  13. 04 Jun, 2011 1 commit
  14. 23 May, 2011 1 commit
  15. 20 May, 2011 1 commit
    • Chris Metcalf's avatar
      arch/tile: support signal "exception-trace" hook · 571d76ac
      Chris Metcalf authored
      
      This change adds support for /proc/sys/debug/exception-trace to tile.
      Like x86 and sparc, by default it is set to "1", generating a one-line
      printk whenever a user process crashes.  By setting it to "2", we get
      a much more complete userspace diagnostic at crash time, including
      a user-space backtrace, register dump, and memory dump around the
      address of the crash.
      
      Some vestiges of the Tilera-internal version of this support are
      removed with this patch (the show_crashinfo variable and the
      arch_coredump_signal function).  We retain a "crashinfo" boot parameter
      which allows you to set the boot-time value of exception-trace.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      571d76ac
  16. 04 Apr, 2011 1 commit
    • Eric Paris's avatar
      capabilites: allow the application of capability limits to usermode helpers · 17f60a7d
      Eric Paris authored
      
      There is no way to limit the capabilities of usermodehelpers. This problem
      reared its head recently when someone complained that any user with
      cap_net_admin was able to load arbitrary kernel modules, even though the user
      didn't have cap_sys_module.  The reason is because the actual load is done by
      a usermode helper and those always have the full cap set.  This patch addes new
      sysctls which allow us to bound the permissions of usermode helpers.
      
      /proc/sys/kernel/usermodehelper/bset
      /proc/sys/kernel/usermodehelper/inheritable
      
      You must have CAP_SYS_MODULE  and CAP_SETPCAP to change these (changes are
      &= ONLY).  When the kernel launches a usermodehelper it will do so with these
      as the bset and pI.
      
      -v2:	make globals static
      	create spinlock to protect globals
      
      -v3:	require both CAP_SETPCAP and CAP_SYS_MODULE
      -v4:	fix the typo s/CAP_SET_PCAP/CAP_SETPCAP/ because I didn't commit
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      No-objection-from: Serge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarAndrew G. Morgan <morgan@kernel.org>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      17f60a7d
  17. 24 Mar, 2011 2 commits
  18. 08 Mar, 2011 1 commit
    • Al Viro's avatar
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro authored
      
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
  19. 23 Feb, 2011 1 commit
  20. 16 Feb, 2011 1 commit
  21. 03 Feb, 2011 1 commit
    • Rik van Riel's avatar
      sched: Use a buddy to implement yield_task_fair() · ac53db59
      Rik van Riel authored
      
      Use the buddy mechanism to implement yield_task_fair.  This
      allows us to skip onto the next highest priority se at every
      level in the CFS tree, unless doing so would introduce gross
      unfairness in CPU time distribution.
      
      We order the buddy selection in pick_next_entity to check
      yield first, then last, then next.  We need next to be able
      to override yield, because it is possible for the "next" and
      "yield" task to be different processen in the same sub-tree
      of the CFS tree.  When they are, we need to go into that
      sub-tree regardless of the "yield" hint, and pick the correct
      entity once we get to the right level.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ac53db59
  22. 01 Feb, 2011 1 commit
  23. 24 Jan, 2011 1 commit
  24. 13 Jan, 2011 3 commits
  25. 09 Dec, 2010 1 commit
  26. 30 Nov, 2010 1 commit
    • Mike Galbraith's avatar
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith authored
      
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5091faa4
  27. 18 Nov, 2010 3 commits
    • Paul Turner's avatar
      sched: Add sysctl_sched_shares_window · a7a4f8a7
      Paul Turner authored
      
      Introduce a new sysctl for the shares window and disambiguate it from
      sched_time_avg.
      
      A 10ms window appears to be a good compromise between accuracy and performance.
      Signed-off-by: default avatarPaul Turner <pjt@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234938.112173964@google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a7a4f8a7
    • Peter Zijlstra's avatar
      sched: Rewrite tg_shares_up) · 2069dd75
      Peter Zijlstra authored
      
      By tracking a per-cpu load-avg for each cfs_rq and folding it into a
      global task_group load on each tick we can rework tg_shares_up to be
      strictly per-cpu.
      
      This should improve cpu-cgroup performance for smp systems
      significantly.
      
      [ Paul: changed to use queueing cfs_rq + bug fixes ]
      Signed-off-by: default avatarPaul Turner <pjt@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.580480400@google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2069dd75
    • Don Zickus's avatar
      x86, nmi_watchdog: Remove the old nmi_watchdog · 5f2b0ba4
      Don Zickus authored
      
      Now that we have a new nmi_watchdog that is more generic and
      sits on top of the perf subsystem, we really do not need the old
      nmi_watchdog any more.
      
      In addition, the old nmi_watchdog doesn't really work if you are
      using the default clocksource, hpet.  The old nmi_watchdog code
      relied on local apic interrupts to determine if the cpu is still
      alive.  With hpet as the clocksource, these interrupts don't
      increment any more and the old nmi_watchdog triggers false
      postives.
      
      This piece removes the old nmi_watchdog code and stubs out any
      variables and functions calls.  The stubs are the same ones used
      by the new nmi_watchdog code, so it should be well tested.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: fweisbec@gmail.com
      Cc: gorcunov@openvz.org
      LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5f2b0ba4
  28. 16 Nov, 2010 1 commit
  29. 12 Nov, 2010 1 commit
  30. 26 Oct, 2010 2 commits
    • Namhyung Kim's avatar
      printk: declare printk_ratelimit_state in ratelimit.h · f5d87d85
      Namhyung Kim authored
      
      Adding declaration of printk_ratelimit_state in ratelimit.h removes
      potential build breakage and following sparse warning:
      
       kernel/printk.c:1426:1: warning: symbol 'printk_ratelimit_state' was not declared. Should it be static?
      
      [akpm@linux-foundation.org: remove unneeded ifdef]
      Signed-off-by: default avatarNamhyung Kim <namhyung@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5d87d85
    • Eric Dumazet's avatar
      fs: allow for more than 2^31 files · 518de9b3
      Eric Dumazet authored
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use ...
      518de9b3