• Nathan Zimmer's avatar
    mm/mempolicy.c: convert the shared_policy lock to a rwlock · 4a8c7bb5
    Nathan Zimmer authored
    When running the SPECint_rate gcc on some very large boxes it was
    noticed that the system was spending lots of time in
    mpol_shared_policy_lookup().  The gamess benchmark can also show it and
    is what I mostly used to chase down the issue since the setup for that I
    found to be easier.
    
    To be clear the binaries were on tmpfs because of disk I/O requirements.
    We then used text replication to avoid icache misses and having all the
    copies banging on the memory where the instruction code resides.  This
    results in us hitting a bottleneck in mpol_shared_policy_lookup() since
    lookup is serialised by the shared_policy lock.
    
    I have only reproduced this on very large (3k+ cores) boxes.  The
    problem starts showing up at just a few hundred ranks getting worse
    until it threatens to livelock once it gets large enough.  For example
    on the gamess benchmark at 128 ranks this area consumes only ~1% of
    time, at 512 ranks it consumes nearly 13%, and at 2k ranks it is o...
    4a8c7bb5
mempolicy.c 70.6 KB