1. 27 Aug, 2015 1 commit
    • Rafael J. Wysocki's avatar
      ACPI / init: Switch over platform to the ACPI mode later · f9f58191
      Rafael J. Wysocki authored
      [ Upstream commit cdbbeb69 ]
      
      commit b064a8fa upstream.
      
      Commit 73f7d1ca "ACPI / init: Run acpi_early_init() before
      timekeeping_init()" moved the ACPI subsystem initialization,
      including the ACPI mode enabling, to an earlier point in the
      initialization sequence, to allow the timekeeping subsystem
      use ACPI early.  Unfortunately, that resulted in boot regressions
      on some systems and the early ACPI initialization was moved toward
      its original position in the kernel initialization code by commit
      c4e1acbb "ACPI / init: Invoke early ACPI initialization later".
      
      However, that turns out to be insufficient, as boot is still broken
      on the Tyan S8812 mainboard.
      
      To fix that issue, split the ACPI early initialization code into
      two pieces so the majority of it still located in acpi_early_init()
      and the part switching over the platform into the ACPI mode goes into
      a new function, acpi_subsyst...
      f9f58191
  2. 11 Nov, 2014 1 commit
    • Daniel Thompson's avatar
      param: fix crash on bad kernel arguments · 3438cf54
      Daniel Thompson authored
      Currently if the user passes an invalid value on the kernel command line
      then the kernel will crash during argument parsing. On most systems this
      is very hard to debug because the console hasn't been initialized yet.
      
      This is a regression due to commit 51e158c1
      
       ("param: hand arguments
      after -- straight to init") which, in response to the systemd debug
      controversy, made it possible to explicitly pass arguments to init. To
      achieve this parse_args() was extended from simply returning an error
      code to returning a pointer. Regretably the new init args logic does not
      perform a proper validity check on the pointer resulting in a crash.
      
      This patch fixes the validity check. Should the check fail then no arguments
      will be passed to init. This is reasonable and matches how the kernel treats
      its own arguments (i.e. no error recovery).
      Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      3438cf54
  3. 14 Oct, 2014 1 commit
  4. 19 Sep, 2014 1 commit
    • Aaron Tomlin's avatar
      init/main.c: Give init_task a canary · d4311ff1
      Aaron Tomlin authored
      Tasks get their end of stack set to STACK_END_MAGIC with the
      aim to catch stack overruns. Currently this feature does not
      apply to init_task. This patch removes this restriction.
      
      Note that a similar patch was posted by Prarit Bhargava
      some time ago but was never merged:
      
        http://marc.info/?l=linux-kernel&m=127144305403241&w=2
      
      Signed-off-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d4311ff1
  5. 16 Sep, 2014 1 commit
    • Paul E. McKenney's avatar
      rcu: Fix attempt to avoid unsolicited offloading of callbacks · f4579fc5
      Paul E. McKenney authored
      Commit b58cc46c
      
       (rcu: Don't offload callbacks unless specifically
      requested) failed to adjust the callback lists of the CPUs that are
      known to be no-CBs CPUs only because they are also nohz_full= CPUs.
      This failure can result in callbacks that are posted during early boot
      getting stranded on nxtlist for CPUs whose no-CBs property becomes
      apparent late, and there can also be spurious warnings about offline
      CPUs posting callbacks.
      
      This commit fixes these problems by adding an early-boot rcu_init_nohz()
      that properly initializes the no-CBs CPUs.
      
      Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
      CONFIG_RCU_NOCB_CPU=n do not exhibit this bug.  Neither do kernels
      booted without the nohz_full= boot parameter.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Tested-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      f4579fc5
  6. 13 Sep, 2014 1 commit
    • Frederic Weisbecker's avatar
      nohz: Move nohz full init call to tick init · a80e49e2
      Frederic Weisbecker authored
      
      This way we unbloat a bit main.c and more importantly we initialize
      nohz full after init_IRQ(). This dependency will be needed in further
      patches because nohz full needs irq work to raise its own IRQ.
      Information about the support for this ability on ARM64 is obtained on
      init_IRQ() which initialize the pointer to __smp_call_function.
      
      Since tick_init() is called right after init_IRQ(), this is a good place
      to call tick_nohz_init() and prepare for that dependency.
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      a80e49e2
  7. 08 Aug, 2014 1 commit
  8. 04 Jun, 2014 6 commits
  9. 05 May, 2014 1 commit
  10. 04 May, 2014 1 commit
  11. 30 Apr, 2014 1 commit
    • H. Peter Anvin's avatar
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 3891a04a
      H. Peter Anvin authored
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  This
      causes some 16-bit software to break, but it also leaks kernel state
      to user space.  We have a software workaround for that ("espfix") for
      the 32-bit kernel, but it relies on a nonzero stack segment base which
      is not available in 64-bit mode.
      
      In checkin:
      
          b3b42ac2
      
       x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
      the logic that 16-bit support is crippled on 64-bit kernels anyway (no
      V86 support), but it turns out that people are doing stuff like
      running old Win16 binaries under Wine and expect it to work.
      
      This works around this by creating percpu "ministacks", each of which
      is mapped 2^16 times 64K apart.  When we detect that the return SS is
      on the LDT, we copy the IRET frame to the ministack and use the
      relevant alias to return to userspace.  The ministacks are mapped
      readonly, so if IRET faults we promote #GP to #DF which is an IST
      vector and thus has its own stack; we then do the fixup in the #DF
      handler.
      
      (Making #GP an IST exception would make the msr_safe functions unsafe
      in NMI/MC context, and quite possibly have other effects.)
      
      Special thanks to:
      
      - Andy Lutomirski, for the suggestion of using very small stack slots
        and copy (as opposed to map) the IRET frame there, and for the
        suggestion to mark them readonly and let the fault promote to #DF.
      - Konrad Wilk for paravirt fixup and testing.
      - Borislav Petkov for testing help and useful comments.
      Reported-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Lutomriski <amluto@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
      Cc: comex <comexk@gmail.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # consider after upstream merge
      3891a04a
  12. 28 Apr, 2014 1 commit
    • Rusty Russell's avatar
      param: hand arguments after -- straight to init · 51e158c1
      Rusty Russell authored
      
      The kernel passes any args it doesn't need through to init, except it
      assumes anything containing '.' belongs to the kernel (for a module).
      This change means all users can clearly distinguish which arguments
      are for init.
      
      For example, the kernel uses debug ("dee-bug") to mean log everything to
      the console, where systemd uses the debug from the Scandinavian "day-boog"
      meaning "fail to boot".  If a future versions uses argv[] instead of
      reading /proc/cmdline, this confusion will be avoided.
      
      eg: test 'FOO="this is --foo"' -- 'systemd.debug="true true true"'
      
      Gives:
      argv[0] = '/debug-init'
      argv[1] = 'test'
      argv[2] = 'systemd.debug=true true true'
      envp[0] = 'HOME=/'
      envp[1] = 'TERM=linux'
      envp[2] = 'FOO=this is --foo'
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      51e158c1
  13. 12 Mar, 2014 1 commit
  14. 05 Feb, 2014 1 commit
    • Linus Torvalds's avatar
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds authored
      
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98
  15. 28 Jan, 2014 1 commit
  16. 24 Jan, 2014 2 commits
  17. 22 Jan, 2014 2 commits
    • Santosh Shilimkar's avatar
      init/main.c: use memblock apis for early memory allocations · 098b081b
      Santosh Shilimkar authored
      
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fall back to
      exiting bootmem APIs.
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      098b081b
    • Kirill A. Shutemov's avatar
      mm: create a separate slab for page->ptl allocation · b35f1819
      Kirill A. Shutemov authored
      
      If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
      is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
      so we loose 24 on each.  An average system can easily allocate few tens
      thousands of page->ptl and overhead is significant.
      
      Let's create a separate slab for page->ptl allocation to solve this.
      
      To make sure that it really works this time, some numbers from my test
      machine (just booted, no load):
      
      Before:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        kmalloc-96         31987  32190    128   30    1 : tunables  120   60    8 : slabdata   1073   1073     92
      After:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        page->ptl          27516  28143     72   53    1 : tunables  120   60    8 : slabdata    531    531      9
        kmalloc-96          3853   5280    128   30    1 : tunables  120   60    8 : slabdata    176    176      0
      
      Note that the patch is useful not only for debug case, but also for
      PREEMPT_RT, where spinlock_t is always bloated.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b35f1819
  18. 16 Jan, 2014 1 commit
  19. 20 Nov, 2013 1 commit
  20. 15 Nov, 2013 1 commit
  21. 13 Nov, 2013 2 commits
  22. 31 Oct, 2013 1 commit
    • Krzysztof Mazur's avatar
      init: fix in-place parameter modification regression · 08746a65
      Krzysztof Mazur authored
      Before commit 026cee00
      
      
      ("params: <level>_initcall-like kernel parameters") the __setup
      parameter parsing code could modify parameter in the
      static_command_line buffer and such modifications were kept. After
      that commit such modifications are destroyed during per-initcall level
      parameter parsing because the same static_command_line buffer is used
      and only parameters for appropriate initcall level are parsed.
      
      That change broke at least parsing "ubd" parameter in the ubd driver
      when the COW file is used.
      
      Now the separate buffer is used for per-initcall parameter parsing.
      Signed-off-by: default avatarKrzysztof Mazur <krzysiek@podlesie.net>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      08746a65
  23. 19 Oct, 2013 1 commit
    • Hannes Frederic Sowa's avatar
      static_key: WARN on usage before jump_label_init was called · c4b2c0c5
      Hannes Frederic Sowa authored
      
      Usage of the static key primitives to toggle a branch must not be used
      before jump_label_init() is called from init/main.c. jump_label_init
      reorganizes and wires up the jump_entries so usage before that could
      have unforeseen consequences.
      
      Following primitives are now checked for correct use:
      * static_key_slow_inc
      * static_key_slow_dec
      * static_key_slow_dec_deferred
      * jump_label_rate_limit
      
      The x86 architecture already checks this by testing if the default_nop
      was already replaced with an optimal nop or with a branch instruction. It
      will panic then. Other architectures don't check for this.
      
      Because we need to relax this check for the x86 arch to allow code to
      transition from default_nop to the enabled state and other architectures
      did not check for this at all this patch introduces checking on the
      static_key primitives in a non-arch dependent manner.
      
      All checked functions are considered slow-path so the additional check
      does no harm to performance.
      
      The warnings are best observed with earlyprintk.
      
      Based on a patch from Andi Kleen.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4b2c0c5
  24. 25 Sep, 2013 1 commit
  25. 23 Sep, 2013 1 commit
  26. 14 Aug, 2013 1 commit
    • Frederic Weisbecker's avatar
      context_tracking: Ground setup for static key use · 65f382fd
      Frederic Weisbecker authored
      
      Prepare for using a static key in the context tracking subsystem.
      This will help optimizing the off case on its many users:
      
      * user_enter, user_exit, exception_enter, exception_exit, guest_enter,
        guest_exit, vtime_*()
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      65f382fd
  27. 03 Jul, 2013 1 commit
  28. 12 Jun, 2013 1 commit
  29. 28 May, 2013 1 commit
    • Stephane Eranian's avatar
      perf: Use hrtimers for event multiplexing · 9e630205
      Stephane Eranian authored
      
      The current scheme of using the timer tick was fine for per-thread
      events. However, it was causing bias issues in system-wide mode
      (including for uncore PMUs). Event groups would not get their fair
      share of runtime on the PMU. With tickless kernels, if a core is idle
      there is no timer tick, and thus no event rotation (multiplexing).
      However, there are events (especially uncore events) which do count
      even though cores are asleep.
      
      This patch changes the timer source for multiplexing.  It introduces a
      per-PMU per-cpu hrtimer. The advantage is that even when a core goes
      idle, it will come back to service the hrtimer, thus multiplexing on
      system-wide events works much better.
      
      The per-PMU implementation (suggested by PeterZ) enables adjusting the
      multiplexing interval per PMU. The preferred interval is stashed into
      the struct pmu. If not set, it will be forced to the default interval
      value.
      
      In order to minimize the impact of the hrtimer, it is turned on and
      off on demand. When the PMU on a CPU is overcommited, the hrtimer is
      activated.  It is stopped when the PMU is not overcommitted.
      
      In order for this to work properly, we had to change the order of
      initialization in start_kernel() such that hrtimer_init() is run
      before perf_event_init().
      
      The default interval in milliseconds is set to a timer tick just like
      with the old code. We will provide a sysctl to tune this in another
      patch.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9e630205
  30. 01 May, 2013 1 commit
  31. 30 Apr, 2013 2 commits