• Naoya Horiguchi's avatar
    mm: hwpoison: disable memory error handling on 1GB hugepage · 31286a84
    Naoya Horiguchi authored
    Recently the following BUG was reported:
    
        Injecting memory failure for pfn 0x3c0000 at process virtual address 0x7fe300000000
        Memory failure: 0x3c0000: recovery action for huge page: Recovered
        BUG: unable to handle kernel paging request at ffff8dfcc0003000
        IP: gup_pgd_range+0x1f0/0xc20
        PGD 17ae72067 P4D 17ae72067 PUD 0
        Oops: 0000 [#1] SMP PTI
        ...
        CPU: 3 PID: 5467 Comm: hugetlb_1gb Not tainted 4.15.0-rc8-mm1-abc+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
    
    You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on
    a 1GB hugepage.  This happens because get_user_pages_fast() is not aware
    of a migration entry on pud that was created in the 1st madvise() event.
    
    I think that conversion to pud-aligned migration entry is working, but
    other MM code walking over page table isn't prepared for it.  We need
    some time and effort to make all this work properly, so thi...
    31286a84
memory-failure.c 48 KB