1. 24 Jun, 2009 1 commit
    • Tejun Heo's avatar
      percpu: cleanup percpu array definitions · 204fba4a
      Tejun Heo authored
      
      Currently, the following three different ways to define percpu arrays
      are in use.
      
      1. DEFINE_PER_CPU(elem_type[array_len], array_name);
      2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
      3. DEFINE_PER_CPU(elem_type, array_name)[array_len];
      
      Unify to #1 which correctly separates the roles of the two parameters
      and thus allows more flexibility in the way percpu variables are
      defined.
      
      [ Impact: cleanup ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: linux-mm@kvack.org
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David S. Miller <davem@davemloft.net>
      204fba4a
  2. 21 Jun, 2009 1 commit
  3. 16 Jun, 2009 1 commit
    • Michael Ellerman's avatar
      powerpc: Add configurable -Werror for arch/powerpc · ba55bd74
      Michael Ellerman authored
      
      Add the option to build the code under arch/powerpc with -Werror.
      
      The intention is to make it harder for people to inadvertantly introduce
      warnings in the arch/powerpc code. It needs to be configurable so that
      if a warning is introduced, people can easily work around it while it's
      being fixed.
      
      The option is a negative, ie. don't enable -Werror, so that it will be
      turned on for allyes and allmodconfig builds.
      
      The default is n, in the hope that developers will build with -Werror,
      that will probably lead to some build breaks, I am prepared to be flamed.
      
      It's not enabled for math-emu, which is a steaming pile of warnings.
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ba55bd74
  4. 12 Jun, 2009 1 commit
  5. 11 Jun, 2009 1 commit
  6. 09 Jun, 2009 4 commits
    • Benjamin Herrenschmidt's avatar
      powerpc: Shield code specific to 64-bit server processors · 94491685
      Benjamin Herrenschmidt authored
      
      This is a random collection of added ifdef's around portions of
      code that only mak sense on server processors. Using either
      CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate.
      
      This is meant to make the future merging of Book3E 64-bit support
      easier.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94491685
    • Benjamin Herrenschmidt's avatar
      powerpc: Set init_bootmem_done on NUMA platforms as well · d3f6204a
      Benjamin Herrenschmidt authored
      
      For some obscure reason, we only set init_bootmem_done after initializing
      bootmem when NUMA isn't enabled. We even document this next to the declaration
      of that global in system.h which of course I didn't read before I had to
      debug why some WIP code wasn't working properly...
      
      This patch changes it so that we always set it after bootmem is initialized
      which should have always been the case... go figure !
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d3f6204a
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock · b46b6942
      Benjamin Herrenschmidt authored
      
      The MMU context_lock can be taken from switch_mm() while the
      rq->lock is held. The rq->lock can also be taken from interrupts,
      thus if we get interrupted in destroy_context() with the context
      lock held and that interrupt tries to take the rq->lock, there's
      a possible deadlock scenario with another CPU having the rq->lock
      and calling switch_mm() which takes our context lock.
      
      The fix is to always ensure interrupts are off when taking our
      context lock. The switch_mm() path is already good so this fixes
      the destroy_context() path.
      
      While at it, turn the context lock into a new style spinlock.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b46b6942
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Fix some SMP issues with MMU context handling · 3035c863
      Benjamin Herrenschmidt authored
      
      This patch fixes a couple of issues that can happen as a result
      of steal_context() dropping the context_lock when all possible
      PIDs are ineligible for stealing (hopefully an extremely hard to
      hit occurence).
      
      This case exposes the possibility of a stale context_mm[] entry
      to be seen since destroy_context() doesn't clear it and the free
      map isn't re-tested. It also means steal_context() will not notice
      a context freed while the lock was help, thus possibly trying to
      steal a context when a free one was available.
      
      This fixes it by always returning to the caller from steal_context
      when it dropped the lock with a return value that causes the
      caller to re-samble the number of free contexts, along with
      properly clearing the context_mm[] array for destroyed contexts.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3035c863
  7. 27 May, 2009 3 commits
  8. 26 May, 2009 1 commit
  9. 21 May, 2009 1 commit
  10. 18 May, 2009 1 commit
    • Mel Gorman's avatar
      powerpc: Do not assert pte_locked for hugepage PTE entries · af3e4aca
      Mel Gorman authored
      
      With CONFIG_DEBUG_VM, an assertion is made when changing the protection
      flags of a PTE that the PTE is locked. Huge pages use a different pagetable
      format and the assertion is bogus and will always trigger with a bug looking
      something like
      
       Unable to handle kernel paging request for data at address 0xf1a00235800006f8
       Faulting instruction address: 0xc000000000034a80
       Oops: Kernel access of bad area, sig: 11 [#1]
       SMP NR_CPUS=32 NUMA Maple
       Modules linked in: dm_snapshot dm_mirror dm_region_hash
        dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
        pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
        windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
        windfarm_cpufreq_clamp windfarm_core i2c_powermac
       NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
       REGS: c000000003037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
       MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28002484  XER: 200fffff
       DAR: f1a00235800006f8, DSISR: 0000000040010000
       TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
       GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
       GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
       GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
       GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
       GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
       GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
       GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
       GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
       NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
       LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
       Call Trace:
       [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
       [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
       [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
       [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
       [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
       [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
       [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
       Instruction dump:
       7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
       7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000
      
      This patch fixes the problem by not asseting the PTE is locked for VMAs
      backed by huge pages.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      af3e4aca
  11. 15 May, 2009 1 commit
  12. 23 Apr, 2009 2 commits
  13. 22 Apr, 2009 1 commit
  14. 08 Apr, 2009 2 commits
  15. 07 Apr, 2009 1 commit
  16. 06 Apr, 2009 2 commits
  17. 24 Mar, 2009 5 commits
  18. 11 Mar, 2009 2 commits
  19. 09 Mar, 2009 1 commit
  20. 23 Feb, 2009 5 commits
    • Anton Blanchard's avatar
      powerpc: Increase stack gap on 64bit binaries · 002b0ec7
      Anton Blanchard authored
      
      On 64bit there is a possibility our stack and mmap randomisation will put
      the two close enough such that we can't expand our stack to match the ulimit
      specified.
      
      To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
      address space, so in the worst case we end up with the same ~128MB hole as in
      32bit. This works because we randomise the stack over a 1GB range.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      002b0ec7
    • Anton Blanchard's avatar
      powerpc: Ensure random space between stack and mmaps · a5adc91a
      Anton Blanchard authored
      
      get_random_int() returns the same value within a 1 jiffy interval. This means
      that the mmap and stack regions will almost always end up the same distance
      apart, making a relative offset based attack possible.
      
      To fix this, shift the randomness we use for the mmap region by 1 bit.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a5adc91a
    • Anton Blanchard's avatar
      powerpc: Randomise mmap start address · 9f14c42d
      Anton Blanchard authored
      
      Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks.
      Until ppc32 uses the mmap.c functionality, this is ppc64 specific.
      
      Before:
      
      # ./test & cat /proc/${!}/maps|tail -2|head -1
      f75fe000-f7fff000 rw-p f75fe000 00:00 0
      f75fe000-f7fff000 rw-p f75fe000 00:00 0
      f75fe000-f7fff000 rw-p f75fe000 00:00 0
      f75fe000-f7fff000 rw-p f75fe000 00:00 0
      f75fe000-f7fff000 rw-p f75fe000 00:00 0
      
      After:
      # ./test & cat /proc/${!}/maps|tail -2|head -1
      f718b000-f7b8c000 rw-p f718b000 00:00 0
      f7551000-f7f52000 rw-p f7551000 00:00 0
      f6ee7000-f78e8000 rw-p f6ee7000 00:00 0
      f74d4000-f7ed5000 rw-p f74d4000 00:00 0
      f6e9d000-f789e000 rw-p f6e9d000 00:00 0
      
      Similar for 64bit, but with 1GB of scatter:
      # ./test & cat /proc/${!}/maps|tail -2|head -1
      fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0
      fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0
      fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0
      fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0
      fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9f14c42d
    • Anton Blanchard's avatar
      powerpc: Rearrange mmap.c · 13a2cb36
      Anton Blanchard authored
      
      Rearrange mmap.c to better match the x86 version.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      13a2cb36
    • Nathan Fontenot's avatar
      powerpc/numa: Cleanup hot_add_scn_to_nid · 0f16ef7f
      Nathan Fontenot authored
      
      This patch reworks the hot_add_scn_to_nid and its supporting functions
      to make them easier to understand.  There are no functional changes in
      this patch and has been tested on machine with memory represented in the
      device tree as memory nodes and in the ibm,dynamic-memory property.
      
      My previous patch that introduced support for hotplug memory add on
      systems whose memory was represented by the ibm,dynamic-memory property
      of the device tree only left the code more unintelligible.  This
      will hopefully makes things easier to understand.
      Signed-off-by: default avatarNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0f16ef7f
  21. 22 Feb, 2009 1 commit
  22. 13 Feb, 2009 1 commit
    • Dave Hansen's avatar
      powerpc/mm: Fix numa reserve bootmem page selection · 06eccea6
      Dave Hansen authored
      Fix the powerpc NUMA reserve bootmem page selection logic.
      
      commit 8f64e1f2
      
       (powerpc: Reserve
      in bootmem lmb reserved regions that cross NUMA nodes) changed
      the logic for how the powerpc LMB reserved regions were converted
      to bootmen reserved regions.  As the folowing discussion reports,
      the new logic was not correct.
      
      mark_reserved_regions_for_nid() goes through each LMB on the
      system that specifies a reserved area.  It searches for
      active regions that intersect with that LMB and are on the
      specified node.  It attempts to bootmem-reserve only the area
      where the active region and the reserved LMB intersect.  We
      can not reserve things on other nodes as they may not have
      bootmem structures allocated, yet.
      
      We base the size of the bootmem reservation on two possible
      things.  Normally, we just make the reservation start and
      stop exactly at the start and end of the LMB.
      
      However, the LMB reservations are not aware of NUMA nodes and
      on occasion a single LMB may cross into several adjacent
      active regions.  Those may even be on different NUMA nodes
      and will require separate calls to the bootmem reserve
      functions.  So, the bootmem reservation must be trimmed to
      fit inside the current active region.
      
      That's all fine and dandy, but we trim the reservation
      in a page-aligned fashion.  That's bad because we start the
      reservation at a non-page-aligned address: physbase.
      
      The reservation may only span 2 bytes, but that those bytes
      may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
      
      Take the case where you reserve 0x2 bytes at 0x0fff and
      where the active region ends at 0x1000.  You'll jump into
      that if() statment, but node_ar.end_pfn=0x1 and
      start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
      and then call
      
        reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
      
      0x1000 may not be on the same node as 0xfff.  Oops.
      
      In almost all the vm code, end_<anything> is not inclusive.
      If you have an end_pfn of 0x1234, page 0x1234 is not
      included in the range.  Using PFN_UP instead of the
      (>> >> PAGE_SHIFT) will make this consistent with the other VM
      code.
      
      We also need to do math for the reserved size with physbase
      instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
      *precisely* the end of the node.  However,
      (start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
      of the reserved area.  That is, of course, physbase.
      If we don't use physbase here, the reserve_size can be
      made too large.
      
      From: Dave Hansen <dave@linux.vnet.ibm.com>
      Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      06eccea6
  23. 12 Feb, 2009 1 commit