1. 21 Oct, 2017 1 commit
    • Yasuaki Ishimatsu's avatar
      mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next · a5f043b2
      Yasuaki Ishimatsu authored
      [ Upstream commit ddffe98d ]
      
      To identify that pages of page table are allocated from bootmem
      allocator, magic number sets to page->lru.next.
      
      But page->lru list is initialized in reserve_bootmem_region().  So when
      calling free_pagetable(), the function cannot find the magic number of
      pages.  And free_pagetable() frees the pages by free_reserved_page() not
      put_page_bootmem().
      
      But if the pages are allocated from bootmem allocator and used as page
      table, the pages have private flag.  So before freeing the pages, we
      should clear the private flag by put_page_bootmem().
      
      Before applying the commit 7bfec6f4 ("mm, page_alloc: check multiple
      page fields with a single branch"), we could find the following visible
      issue:
      
        BUG: Bad page state in process kworker/u1024:1
        page:ffffea103cfd8040 count:0 mapcount:0 mappi
        flags: 0x6fffff80000800(private)
        page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
        bad because of flags: 0x800(private)
        <snip>
        Call Trace:
        [...] dump_stack+0x63/0x87
        [...] bad_page+0x114/0x130
        [...] free_pages_prepare+0x299/0x2d0
        [...] free_hot_cold_page+0x31/0x150
        [...] __free_pages+0x25/0x30
        [...] free_pagetable+0x6f/0xb4
        [...] remove_pagetable+0x379/0x7ff
        [...] vmemmap_free+0x10/0x20
        [...] sparse_remove_one_section+0x149/0x180
        [...] __remove_pages+0x2e9/0x4f0
        [...] arch_remove_memory+0x63/0xc0
        [...] remove_memory+0x8c/0xc0
        [...] acpi_memory_device_remove+0x79/0xa5
        [...] acpi_bus_trim+0x5a/0x8d
        [...] acpi_bus_trim+0x38/0x8d
        [...] acpi_device_hotplug+0x1b7/0x418
        [...] acpi_hotplug_work_fn+0x1e/0x29
        [...] process_one_work+0x152/0x400
        [...] worker_thread+0x125/0x4b0
        [...] kthread+0xd8/0xf0
        [...] ret_from_fork+0x22/0x40
      
      And the issue still silently occurs.
      
      Until freeing the pages of page table allocated from bootmem allocator,
      the page->freelist is never used.  So the patch sets magic number to
      page->freelist instead of page->lru.next.
      
      [isimatu.yasuaki@jp.fujitsu.com: fix merge issue]
        Link: http://lkml.kernel.org/r/722b1cc4-93ac-dd8b-2be2-7a7e313b3b0b@gmail.com
      Link: http://lkml.kernel.org/r/2c29bd9f-5b67-02d0-18a3-8828e78bbb6f@gmail.com
      
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5f043b2
  2. 02 Aug, 2016 1 commit
  3. 28 Jul, 2016 1 commit
  4. 17 Mar, 2016 2 commits
  5. 16 Jan, 2016 1 commit
  6. 07 Apr, 2014 1 commit
  7. 02 Apr, 2014 1 commit
  8. 22 Jan, 2014 1 commit
    • Santosh Shilimkar's avatar
      mm/sparse: use memblock apis for early memory allocations · bb016b84
      Santosh Shilimkar authored
      
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fallback to
      exiting bootmem APIs.
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb016b84
  9. 13 Nov, 2013 2 commits
    • Zhang Yanfei's avatar
      mm/sparsemem: fix a bug in free_map_bootmem when CONFIG_SPARSEMEM_VMEMMAP · 81556b02
      Zhang Yanfei authored
      
      We pass the number of pages which hold page structs of a memory section
      to free_map_bootmem().  This is right when !CONFIG_SPARSEMEM_VMEMMAP but
      wrong when CONFIG_SPARSEMEM_VMEMMAP.  When CONFIG_SPARSEMEM_VMEMMAP, we
      should pass the number of pages of a memory section to free_map_bootmem.
      
      So the fix is removing the nr_pages parameter.  When
      CONFIG_SPARSEMEM_VMEMMAP, we directly use the prefined marco
      PAGES_PER_SECTION in free_map_bootmem.  When !CONFIG_SPARSEMEM_VMEMMAP,
      we calculate page numbers needed to hold the page structs for a memory
      section and use the value in free_map_bootmem().
      
      This was found by reading the code.  And I have no machine that support
      memory hot-remove to test the bug now.
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81556b02
    • Zhang Yanfei's avatar
      mm/sparsemem: use PAGES_PER_SECTION to remove redundant nr_pages parameter · 85b35fea
      Zhang Yanfei authored
      
      For below functions,
      
      - sparse_add_one_section()
      - kmalloc_section_memmap()
      - __kmalloc_section_memmap()
      - __kfree_section_memmap()
      
      they are always invoked to operate on one memory section, so it is
      redundant to always pass a nr_pages parameter, which is the page numbers
      in one section.  So we can directly use predefined macro PAGES_PER_SECTION
      instead of passing the parameter.
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85b35fea
  10. 11 Sep, 2013 1 commit
    • Wanpeng Li's avatar
      mm/sparse: introduce alloc_usemap_and_memmap · 18732093
      Wanpeng Li authored
      After commit 9bdac914
      
       ("sparsemem: Put mem map for one node
      together."), vmemmap for one node will be allocated together, its logic
      is similar as memory allocation for pageblock flags.  This patch
      introduces alloc_usemap_and_memmap to extract the same logic of memory
      alloction for pageblock flags and vmemmap.
      Signed-off-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18732093
  11. 09 Jul, 2013 1 commit
  12. 03 Jul, 2013 1 commit
  13. 28 May, 2013 1 commit
  14. 29 Apr, 2013 2 commits
    • David Rientjes's avatar
      mm, hotplug: avoid compiling memory hotremove functions when disabled · 4edd7cef
      David Rientjes authored
      
      __remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE.  PowerPC
      pseries will return -EOPNOTSUPP if unsupported.
      
      Adding an #ifdef causes several other functions it depends on to also
      become unnecessary, which saves in .text when disabled (it's disabled in
      most defconfigs besides powerpc, including x86).  remove_memory_block()
      becomes static since it is not referenced outside of
      drivers/base/memory.c.
      
      Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
      and disabled.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4edd7cef
    • Johannes Weiner's avatar
      sparse-vmemmap: specify vmemmap population range in bytes · 0aad818b
      Johannes Weiner authored
      
      The sparse code, when asking the architecture to populate the vmemmap,
      specifies the section range as a starting page and a number of pages.
      
      This is an awkward interface, because none of the arch-specific code
      actually thinks of the range in terms of 'struct page' units and always
      translates it to bytes first.
      
      In addition, later patches mix huge page and regular page backing for
      the vmemmap.  For this, they need to call vmemmap_populate_basepages()
      on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
      are not necessarily multiples of the 'struct page' size and so this unit
      is too coarse.
      
      Just translate the section range into bytes once in the generic sparse
      code, then pass byte ranges down the stack.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0aad818b
  15. 24 Feb, 2013 4 commits
    • Xishi Qiu's avatar
      memory-failure: use num_poisoned_pages instead of mce_bad_pages · 293c07e3
      Xishi Qiu authored
      
      Since MCE is an x86 concept, and this code is in mm/, it would be better
      to use the name num_poisoned_pages instead of mce_bad_pages.
      
      [akpm@linux-foundation.org: fix mm/sparse.c]
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Suggested-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      293c07e3
    • Wen Congyang's avatar
      memory-hotplug: consider compound pages when free memmap · 8a356ce3
      Wen Congyang authored
      
      usemap could also be allocated as compound pages.  Should also consider
      compound pages when freeing memmap.
      
      If we don't fix it, there could be problems when we free vmemmap
      pagetables which are stored in compound pages.  The old pagetables will
      not be freed properly, and when we add the memory again, no new
      pagetable will be created.  And the old pagetable entry is used, than
      the kernel will panic.
      
      The call trace is like the following:
      
        BUG: unable to handle kernel paging request at ffffea0040000000
        IP: [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166
        PGD 7ff7d4067 PUD 78e035067 PMD 78e11d067 PTE 0
        Oops: 0002 [#1] SMP
        Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg lpc_ich mfd_core i2c_i801 i2c_core i7core_edac edac_core ioatdma e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
        CPU 0
        Pid: 4, comm: kworker/0:0 Tainted: G        W 3.8.0-rc3-phy-hot-remove+ #3 FUJITSU-SV PRIMEQUEST 1800E/SB
        RIP: 0010:[<ffffffff816a483f>]  [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166
        RSP: 0018:ffff8807bdcb35d8  EFLAGS: 00010006
        RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000200000
        RDX: ffff88078df01148 RSI: 0000000000000282 RDI: ffffea0040000000
        RBP: ffff8807bdcb3618 R08: 4cf05005b019467a R09: 0cd98fa09631467a
        R10: 0000000000000000 R11: 0000000000030e20 R12: 0000000000008000
        R13: ffffea0040000000 R14: ffff88078df66248 R15: ffff88078ea13b10
        FS:  0000000000000000(0000) GS:ffff8807c1a00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: ffffea0040000000 CR3: 0000000001c0c000 CR4: 00000000000007f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process kworker/0:0 (pid: 4, threadinfo ffff8807bdcb2000, task ffff8807bde18000)
        Call Trace:
          __add_pages+0x85/0x120
          arch_add_memory+0x71/0xf0
          add_memory+0xd6/0x1f0
          acpi_memory_device_add+0x170/0x20c
          acpi_device_probe+0x50/0x18a
          really_probe+0x6c/0x320
          driver_probe_device+0x47/0xa0
          __device_attach+0x53/0x60
          bus_for_each_drv+0x6c/0xa0
          device_attach+0xa8/0xc0
          bus_probe_device+0xb0/0xe0
          device_add+0x301/0x570
          device_register+0x1e/0x30
          acpi_device_register+0x1d8/0x27c
          acpi_add_single_object+0x1df/0x2b9
          acpi_bus_check_add+0x112/0x18f
          acpi_ns_walk_namespace+0x105/0x255
          acpi_walk_namespace+0xcf/0x118
          acpi_bus_scan+0x5b/0x7c
          acpi_bus_add+0x2a/0x2c
          container_notify_cb+0x112/0x1a9
          acpi_ev_notify_dispatch+0x46/0x61
          acpi_os_execute_deferred+0x27/0x34
          process_one_work+0x20e/0x5c0
          worker_thread+0x12e/0x370
          kthread+0xee/0x100
          ret_from_fork+0x7c/0xb0
        Code: 00 00 48 89 df 48 89 45 c8 e8 3e 71 b1 ff 48 89 c2 48 8b 75 c8 b8 ef ff ff ff f6 02 01 75 4b 49 63 cc 31 c0 4c 89 ef 48 c1 e1 06 <f3> aa 48 8b 02 48 83 c8 01 48 85 d2 48 89 02 74 29 a8 01 74 25
        RIP  [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166
         RSP <ffff8807bdcb35d8>
        CR2: ffffea0040000000
        ---[ end trace e7f94e3a34c442d4 ]---
        Kernel panic - not syncing: Fatal exception
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a356ce3
    • Tang Chen's avatar
      memory-hotplug: remove memmap of sparse-vmemmap · 0197518c
      Tang Chen authored
      
      Introduce a new API vmemmap_free() to free and remove vmemmap
      pagetables.  Since pagetable implements are different, each architecture
      has to provide its own version of vmemmap_free(), just like
      vmemmap_populate().
      
      Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.
      
      [mhocko@suse.cz: fix implicit declaration of remove_pagetable]
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0197518c
    • Tang Chen's avatar
      memory-hotplug: move pgdat_resize_lock into sparse_remove_one_section() · cd099682
      Tang Chen authored
      
      In __remove_section(), we locked pgdat_resize_lock when calling
      sparse_remove_one_section().  This lock will disable irq.  But we don't
      need to lock the whole function.  If we do some work to free pagetables
      in free_section_usemap(), we need to call flush_tlb_all(), which need
      irq enabled.  Otherwise the WARN_ON_ONCE() in smp_call_function_many()
      will be triggered.
      
      If we lock the whole sparse_remove_one_section(), then we come to this call trace:
      
        ------------[ cut here ]------------
        WARNING: at kernel/smp.c:461 smp_call_function_many+0xbd/0x260()
        Hardware name: PRIMEQUEST 1800E
        ......
        Call Trace:
          smp_call_function_many+0xbd/0x260
          smp_call_function+0x3b/0x50
          on_each_cpu+0x3b/0xc0
          flush_tlb_all+0x1c/0x20
          remove_pagetable+0x14e/0x1d0
          vmemmap_free+0x18/0x20
          sparse_remove_one_section+0xf7/0x100
          __remove_section+0xa2/0xb0
          __remove_pages+0xa0/0xd0
          arch_remove_memory+0x6b/0xc0
          remove_memory+0xb8/0xf0
          acpi_memory_device_remove+0x53/0x96
          acpi_device_remove+0x90/0xb2
          __device_release_driver+0x7c/0xf0
          device_release_driver+0x2f/0x50
          acpi_bus_remove+0x32/0x6d
          acpi_bus_trim+0x91/0x102
          acpi_bus_hot_remove_device+0x88/0x16b
          acpi_os_execute_deferred+0x27/0x34
          process_one_work+0x20e/0x5c0
          worker_thread+0x12e/0x370
          kthread+0xee/0x100
          ret_from_fork+0x7c/0xb0
        ---[ end trace 25e85300f542aa01 ]---
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd099682
  16. 12 Dec, 2012 2 commits
    • Wen Congyang's avatar
      memory-hotplug, mm/sparse.c: clear the memory to store struct page · 3ac19f8e
      Wen Congyang authored
      
      If sparse memory vmemmap is enabled, we can't free the memory to store
      struct page when a memory device is hotremoved, because we may store
      struct page in the memory to manage the memory which doesn't belong to
      this memory device.  When we hotadded this memory device again, we will
      reuse this memory to store struct page, and struct page may contain some
      obsolete information, and we will get bad-page state:
      
        init_memory_mapping: [mem 0x80000000-0x9fffffff]
        Built 2 zonelists in Node order, mobility grouping on.  Total pages: 547617
        Policy zone: Normal
        BUG: Bad page state in process bash  pfn:9b6dc
        page:ffffea0002200020 count:0 mapcount:0 mapping:          (null) index:0xfdfdfdfdfdfdfdfd
        page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
        Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
        Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
        Call Trace:
         [<ffffffff810e9b30>] ? bad_page+0xb0/0x100
         [<ffffffff810ea4c3>] ? free_pages_prepare+0xb3/0x100
         [<ffffffff810ea668>] ? free_hot_cold_page+0x48/0x1a0
         [<ffffffff8112cc08>] ? online_pages_range+0x68/0xa0
         [<ffffffff8112cba0>] ? __online_page_increment_counters+0x10/0x10
         [<ffffffff81045561>] ? walk_system_ram_range+0x101/0x110
         [<ffffffff814c4f95>] ? online_pages+0x1a5/0x2b0
         [<ffffffff8135663d>] ? __memory_block_change_state+0x20d/0x270
         [<ffffffff81356756>] ? store_mem_state+0xb6/0xf0
         [<ffffffff8119e482>] ? sysfs_write_file+0xd2/0x160
         [<ffffffff8113769a>] ? vfs_write+0xaa/0x160
         [<ffffffff81137977>] ? sys_write+0x47/0x90
         [<ffffffff814e2f25>] ? async_page_fault+0x25/0x30
         [<ffffffff814ea239>] ? system_call_fastpath+0x16/0x1b
        Disabling lock debugging due to kernel taint
      
      This patch clears the memory to store struct page to avoid unexpected error.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reported-by: default avatarVasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ac19f8e
    • Wen Congyang's avatar
      memory-hotplug: update mce_bad_pages when removing the memory · 95a4774d
      Wen Congyang authored
      
      When we hotremove a memory device, we will free the memory to store struct
      page.  If the page is hwpoisoned page, we should decrease mce_bad_pages.
      
      [akpm@linux-foundation.org: cleanup ifdefs]
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95a4774d
  17. 30 Nov, 2012 1 commit
    • Jianguo Wu's avatar
      mm/vmemmap: fix wrong use of virt_to_page · ae64ffca
      Jianguo Wu authored
      
      I enable CONFIG_DEBUG_VIRTUAL and CONFIG_SPARSEMEM_VMEMMAP, when doing
      memory hotremove, there is a kernel BUG at arch/x86/mm/physaddr.c:20.
      
      It is caused by free_section_usemap()->virt_to_page(), virt_to_page() is
      only used for kernel direct mapping address, but sparse-vmemmap uses
      vmemmap address, so it is going wrong here.
      
        ------------[ cut here ]------------
        kernel BUG at arch/x86/mm/physaddr.c:20!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: acpihp_drv acpihp_slot edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse vfat fat loop dm_mod coretemp kvm crc32c_intel ipv6 ixgbe igb iTCO_wdt i7core_edac edac_core pcspkr iTCO_vendor_support ioatdma microcode joydev sr_mod i2c_i801 dca lpc_ich mfd_core mdio tpm_tis i2c_core hid_generic tpm cdrom sg tpm_bios rtc_cmos button ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh ata_generic ata_piix libata megaraid_sas scsi_mod
        CPU 39
        Pid: 6454, comm: sh Not tainted 3.7.0-rc1-acpihp-final+ #45 QCI QSSC-S4R/QSSC-S4R
        RIP: 0010:[<ffffffff8103c908>]  [<ffffffff8103c908>] __phys_addr+0x88/0x90
        RSP: 0018:ffff8804440d7c08  EFLAGS: 00010006
        RAX: 0000000000000006 RBX: ffffea0012000000 RCX: 000000000000002c
        ...
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Reviewd-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae64ffca
  18. 01 Aug, 2012 4 commits
    • Gavin Shan's avatar
      mm/sparse: remove index_init_lock · c1c95183
      Gavin Shan authored
      sparse_index_init() uses the index_init_lock spinlock to protect root
      mem_section assignment.  The lock is not necessary anymore because the
      function is called only during boot (during paging init which is executed
      only from a single CPU) and from the hotplug code (by add_memory() via
      arch_add_memory()) which uses mem_hotplug_mutex.
      
      The lock was introduced by 28ae55c9 ("sparsemem extreme: hotplug
      preparation") and sparse_index_init() was used only during boot at that
      time.
      
      Later when the hotplug code (and add_memory()) was introduced there was no
      synchronization so it was possible to online more sections from the same
      root probably (though I am not 100% sure about that).  The first
      synchronization has been added by 6ad696d2 ("mm: allow memory hotplug and
      hibernation in the same kernel") which was later replaced by the
      mem_hotplug_mutex - 20d6c96b
      
       ("mem-hotplug: introduce
      {un}lock_memory_hotplug()").
      
      Let's remove the lock as it is not needed and it makes the code more
      confusing.
      
      [mhocko@suse.cz: changelog]
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1c95183
    • Gavin Shan's avatar
      mm/sparse: more checks on mem_section number · db36a461
      Gavin Shan authored
      
      __section_nr() was implemented to retrieve the corresponding memory
      section number according to its descriptor.  It's possible that the
      specified memory section descriptor doesn't exist in the global array.  So
      add more checking on that and report an error for a wrong case.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db36a461
    • Gavin Shan's avatar
      mm/sparse: optimize sparse_index_alloc · 5b760e64
      Gavin Shan authored
      
      With CONFIG_SPARSEMEM_EXTREME, the two levels of memory section
      descriptors are allocated from slab or bootmem.  When allocating from
      slab, let slab/bootmem allocator clear the memory chunk.  We needn't clear
      it explicitly.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b760e64
    • Xishi Qiu's avatar
      mm: setup pageblock_order before it's used by sparsemem · ca57df79
      Xishi Qiu authored
      
      On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as
      Itanium, pageblock_order is a variable with default value of 0.  It's set
      to the right value by set_pageblock_order() in function
      free_area_init_core().
      
      But pageblock_order may be used by sparse_init() before free_area_init_core()
      is called along path:
      sparse_init()
          ->sparse_early_usemaps_alloc_node()
      	->usemap_size()
      	    ->SECTION_BLOCKFLAGS_BITS
      		->((1UL << (PFN_SECTION_SHIFT - pageblock_order)) *
      NR_PAGEBLOCK_BITS)
      
      The uninitialized pageblock_size will cause memory wasting because
      usemap_size() returns a much bigger value then it's really needed.
      
      For example, on an Itanium platform,
      sparse_init() pageblock_order=0 usemap_size=24576
      free_area_init_core() before pageblock_order=0, usemap_size=24576
      free_area_init_core() after pageblock_order=12, usemap_size=8
      
      That means 24K memory has been wasted for each section, so fix it by calling
      set_pageblock_order() from sparse_init().
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarJiang Liu <liuj97@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca57df79
  19. 11 Jul, 2012 2 commits
    • Yinghai Lu's avatar
      mm: sparse: fix usemap allocation above node descriptor section · 99ab7b19
      Yinghai Lu authored
      After commit f5bf18fa ("bootmem/sparsemem: remove limit constraint
      in alloc_bootmem_section"), usemap allocations may easily be placed
      outside the optimal section that holds the node descriptor, even if
      there is space available in that section.  This results in unnecessary
      hotplug dependencies that need to have the node unplugged before the
      section holding the usemap.
      
      The reason is that the bootmem allocator doesn't guarantee a linear
      search starting from the passed allocation goal but may start out at a
      much higher address absent an upper limit.
      
      Fix this by trying the allocation with the limit at the section end,
      then retry without if that fails.  This keeps the fix from f5bf18fa
      
      
      of not panicking if the allocation does not fit in the section, but
      still makes sure to try to stay within the section at first.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[3.3.x, 3.4.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99ab7b19
    • Yinghai Lu's avatar
      mm: sparse: fix section usemap placement calculation · 07b4e2bc
      Yinghai Lu authored
      Commit 238305bb
      
       ("mm: remove sparsemem allocation details from the
      bootmem allocator") introduced a bug in the allocation goal calculation
      that put section usemaps not in the same section as the node
      descriptors, creating unnecessary hotplug dependencies between them:
      
        node 0 must be removed before remove section 16399
        node 1 must be removed before remove section 16399
        node 2 must be removed before remove section 16399
        node 3 must be removed before remove section 16399
        node 4 must be removed before remove section 16399
        node 5 must be removed before remove section 16399
        node 6 must be removed before remove section 16399
      
      The reason is that it applies PAGE_SECTION_MASK to the physical address
      of the node descriptor when finding a suitable place to put the usemap,
      when this mask is actually intended to be used with PFNs.  Because the
      PFN mask is wider, the target address will point beyond the wanted
      section holding the node descriptor and the node must be offlined before
      the section holding the usemap can go.
      
      Fix this by extending the mask to address width before use.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      07b4e2bc
  20. 29 May, 2012 1 commit
  21. 22 Mar, 2012 1 commit
    • Nishanth Aravamudan's avatar
      bootmem/sparsemem: remove limit constraint in alloc_bootmem_section · f5bf18fa
      Nishanth Aravamudan authored
      
      While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
      Overcommit) on powerpc, we tripped the following:
      
        kernel BUG at mm/bootmem.c:483!
        cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
            pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
            lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
            sp: c000000000c03bc0
           msr: 8000000000021032
          current = 0xc000000000b0cce0
          paca    = 0xc000000001d80000
            pid   = 0, comm = swapper
        kernel BUG at mm/bootmem.c:483!
        enter ? for help
        [c000000000c03c80] c000000000a64bcc
        .sparse_early_usemaps_alloc_node+0x84/0x29c
        [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
        [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
        [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
        [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
      
      This is
      
              BUG_ON(limit && goal + size > limit);
      
      and after some debugging, it seems that
      
      	goal = 0x7ffff000000
      	limit = 0x80000000000
      
      and sparse_early_usemaps_alloc_node ->
      sparse_early_usemaps_alloc_pgdat_section calls
      
      	return alloc_bootmem_section(usemap_size() * count, section_nr);
      
      This is on a system with 8TB available via the AMS pool, and as a quirk
      of AMS in firmware, all of that memory shows up in node 0.  So, we end
      up with an allocation that will fail the goal/limit constraints.
      
      In theory, we could "fall-back" to alloc_bootmem_node() in
      sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
      defined, we'll BUG_ON() instead.  A simple solution appears to be to
      unconditionally remove the limit condition in alloc_bootmem_section,
      meaning allocations are allowed to cross section boundaries (necessary
      for systems of this size).
      
      Johannes Weiner pointed out that if alloc_bootmem_section() no longer
      guarantees section-locality, we need check_usemap_section_nr() to print
      possible cross-dependencies between node descriptors and the usemaps
      allocated through it.  That makes the two loops in
      sparse_early_usemaps_alloc_node() identical, so re-factor the code a
      bit.
      
      [akpm@linux-foundation.org: code simplification]
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Anton Blanchard <anton@au1.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[3.3.1]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5bf18fa
  22. 31 Oct, 2011 1 commit
  23. 26 Jul, 2011 1 commit
  24. 31 Mar, 2011 1 commit
  25. 14 Jan, 2011 1 commit
  26. 25 May, 2010 1 commit
    • Yinghai Lu's avatar
      sparsemem: on no vmemmap path put mem_map on node high too · e48e67e0
      Yinghai Lu authored
      
      We need to put mem_map high when virtual memmap is not used.
      
      before this patch
      free mem pfn range on first node:
      [    0.000000]  19 - 1f
      [    0.000000]  28 40 - 80 95
      [    0.000000]  702 740 - 1000 1000
      [    0.000000]  347c - 347e
      [    0.000000]  34e7 3500 - 3b80 3b8b
      [    0.000000]  73b8b 73bc0 - 73c00 73c00
      [    0.000000]  73ddd - 73e00
      [    0.000000]  73fdd - 74000
      [    0.000000]  741dd - 74200
      [    0.000000]  743dd - 74400
      [    0.000000]  745dd - 74600
      [    0.000000]  747dd - 74800
      [    0.000000]  749dd - 74a00
      [    0.000000]  74bdd - 74c00
      [    0.000000]  74ddd - 74e00
      [    0.000000]  74fdd - 75000
      [    0.000000]  751dd - 75200
      [    0.000000]  753dd - 75400
      [    0.000000]  755dd - 75600
      [    0.000000]  757dd - 75800
      [    0.000000]  759dd - 75a00
      [    0.000000]  79bdd 79c00 - 7d540 7d550
      [    0.000000]  7f745 - 7f750
      [    0.000000]  10000b 100040 - 2080000 2080000
      so only 79c00 - 7d540 are major free block under 4g...
      
      after this patch, we will get
      [    0.000000]  19 - 1f
      [    0.000000]  28 40 - 80 95
      [    0.000000]  702 740 - 1000 1000
      [    0.000000]  347c - 347e
      [    0.000000]  34e7 3500 - 3600 3600
      [    0.000000]  37dd - 3800
      [    0.000000]  39dd - 3a00
      [    0.000000]  3bdd - 3c00
      [    0.000000]  3ddd - 3e00
      [    0.000000]  3fdd - 4000
      [    0.000000]  41dd - 4200
      [    0.000000]  43dd - 4400
      [    0.000000]  45dd - 4600
      [    0.000000]  47dd - 4800
      [    0.000000]  49dd - 4a00
      [    0.000000]  4bdd - 4c00
      [    0.000000]  4ddd - 4e00
      [    0.000000]  4fdd - 5000
      [    0.000000]  51dd - 5200
      [    0.000000]  53dd - 5400
      [    0.000000]  95dd 9600 - 7d540 7d550
      [    0.000000]  7f745 - 7f750
      [    0.000000]  17000b 170040 - 2080000 2080000
      we will have 9600 - 7d540 for major free block...
      
      sparse-vmemmap path already used __alloc_bootmem_node_high()
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e48e67e0
  27. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  28. 02 Mar, 2010 1 commit
  29. 12 Feb, 2010 1 commit
    • Yinghai Lu's avatar
      sparsemem: Put mem map for one node together. · 9bdac914
      Yinghai Lu authored
      
      Add vmemmap_alloc_block_buf for mem map only.
      
      It will fallback to the old way if it cannot get a block that big.
      
      Before this patch, when a node have 128g ram installed, memmap are
      split into two parts or more.
      [    0.000000]  [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1
      [    0.000000]  [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1
      [    0.000000]  [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0
      [    0.000000]  [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0
      [    0.000000]  [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0
      [    0.000000]  [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2
      [    0.000000]  [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2
      [    0.000000]  [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2
      [    0.000000]  [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3
      [    0.000000]  [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3
      [    0.000000]  [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4
      [    0.000000]  [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4
      [    0.000000]  [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5
      [    0.000000]  [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5
      [    0.000000]  [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5
      [    0.000000]  [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6
      [    0.000000]  [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6
      [    0.000000]  [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6
      [    0.000000]  [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7
      [    0.000000]  [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7
      
      after patch will get
      [    0.000000]  [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0
      [    0.000000]  [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1
      [    0.000000]  [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2
      [    0.000000]  [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3
      [    0.000000]  [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4
      [    0.000000]  [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5
      [    0.000000]  [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6
      [    0.000000]  [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7
      
      -v2: change buf to vmemmap_buf instead according to Ingo
           also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo
      -v3: according to Andrew, use sizeof(name) instead of hard coded 15
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-19-git-send-email-yinghai@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      9bdac914