1. 07 Apr, 2020 1 commit
  2. 02 Apr, 2020 5 commits
    • Mina Almasry's avatar
      hugetlb_cgroup: add accounting for shared mappings · 075a61d0
      Mina Almasry authored
      
      For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives
      in the resv_map entries, in file_region->reservation_counter.
      
      After a call to region_chg, we charge the approprate hugetlb_cgroup, and
      if successful, we pass on the hugetlb_cgroup info to a follow up
      region_add call.  When a file_region entry is added to the resv_map via
      region_add, we put the pointer to that cgroup in
      file_region->reservation_counter.  If charging doesn't succeed, we report
      the error to the caller, so that the kernel fails the reservation.
      
      On region_del, which is when the hugetlb memory is unreserved, we also
      uncharge the file_region->reservation_counter.
      
      [akpm@linux-foundation.org: forward declare struct file_region]
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-5-almasrymina@google.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      075a61d0
    • Mina Almasry's avatar
      hugetlb_cgroup: add reservation accounting for private mappings · e9fe92ae
      Mina Almasry authored
      
      Normally the pointer to the cgroup to uncharge hangs off the struct page,
      and gets queried when it's time to free the page.  With hugetlb_cgroup
      reservations, this is not possible.  Because it's possible for a page to
      be reserved by one task and actually faulted in by another task.
      
      The best place to put the hugetlb_cgroup pointer to uncharge for
      reservations is in the resv_map.  But, because the resv_map has different
      semantics for private and shared mappings, the code patch to
      charge/uncharge shared and private mappings is different.  This patch
      implements charging and uncharging for private mappings.
      
      For private mappings, the counter to uncharge is in
      resv_map->reservation_counter.  On initializing the resv_map this is set
      to NULL.  On reservation of a region in private mapping, the tasks
      hugetlb_cgroup is charged and the hugetlb_cgroup is placed is
      resv_map->reservation_counter.
      
      On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter.
      
      [akpm@linux-foundation.org: forward declare struct resv_map]
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-3-almasrymina@google.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9fe92ae
    • Mina Almasry's avatar
      mm/hugetlb_cgroup: fix hugetlb_cgroup migration · 9808895e
      Mina Almasry authored
      
      Commit c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge
      hugetlb reservations") mistakingly doesn't handle the migration of *both*
      the reservation hugetlb_cgroup and the fault hugetlb_cgroup correctly.
      
      What should happen is that both cgroups shuold be queried from the old
      page, then both set to NULL on the old page, then both inserted into the
      new page.
      
      The mistake also creates the following warning:
      
      mm/hugetlb_cgroup.c: In function 'hugetlb_cgroup_migrate':
      mm/hugetlb_cgroup.c:777:25: warning: variable 'h_cg' set but not used
      [-Wunused-but-set-variable]
        struct hugetlb_cgroup *h_cg;
                               ^~~~
      
      Solution is to add the missing steps, namly setting the reservation
      hugetlb_cgroup to NULL on the old page, and setting the fault
      hugetlb_cgroup on the new page.
      
      Fixes: c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200218194727.46995-1-almasrymina@google.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9808895e
    • Mina Almasry's avatar
      hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations · 1adc4d41
      Mina Almasry authored
      
      Augments hugetlb_cgroup_charge_cgroup to be able to charge hugetlb usage
      or hugetlb reservation counter.
      
      Adds a new interface to uncharge a hugetlb_cgroup counter via
      hugetlb_cgroup_uncharge_counter.
      
      Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init,
      hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline.
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-2-almasrymina@google.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1adc4d41
    • Mina Almasry's avatar
      hugetlb_cgroup: add hugetlb_cgroup reservation counter · cdc2fcfe
      Mina Almasry authored
      These counters will track hugetlb reservations rather than hugetlb memory
      faulted in.  This patch only adds the counter, following patches add the
      charging and uncharging of the counter.
      
      This is patch 1 of an 9 patch series.
      
      Problem:
      
      Currently tasks attempting to reserve more hugetlb memory than is
      available get a failure at mmap/shmget time.  This is thanks to Hugetlbfs
      Reservations [1].  However, if a task attempts to reserve more hugetlb
      memory than its hugetlb_cgroup limit allows, the kernel will allow the
      mmap/shmget call, but will SIGBUS the task when it attempts to fault in
      the excess memory.
      
      We have users hitting their hugetlb_cgroup limits and thus we've been
      looking at this failure mode.  We'd like to improve this behavior such
      that users violating the hugetlb_cgroup limits get an error on mmap/shmget
      time, rather than getting SIGBUS'd when they try to fault the excess
      memory in.  This gives the user an opportunity to fallback more gracefully
      to non-hugetlbfs memory for example.
      
      The underlying problem is that today's hugetlb_cgroup accounting happens
      at hugetlb memory *fault* time, rather than at *reservation* time.  Thus,
      enforcing the hugetlb_cgroup limit only happens at fault time, and the
      offending task gets SIGBUS'd.
      
      Proposed Solution:
      
      A new page counter named
      'hugetlb.xMB.rsvd.[limit|usage|max_usage]_in_bytes'. This counter has
      slightly different semantics than
      'hugetlb.xMB.[limit|usage|max_usage]_in_bytes':
      
      - While usage_in_bytes tracks all *faulted* hugetlb memory,
        rsvd.usage_in_bytes tracks all *reserved* hugetlb memory and hugetlb
        memory faulted in without a prior reservation.
      
      - If a task attempts to reserve more memory than limit_in_bytes allows,
        the kernel will allow it to do so.  But if a task attempts to reserve
        more memory than rsvd.limit_in_bytes, the kernel will fail this
        reservation.
      
      This proposal is implemented in this patch series, with tests to verify
      functionality and show the usage.
      
      Alternatives considered:
      
      1. A new cgroup, instead of only a new page_counter attached to the
         existing hugetlb_cgroup.  Adding a new cgroup seemed like a lot of code
         duplication with hugetlb_cgroup.  Keeping hugetlb related page counters
         under hugetlb_cgroup seemed cleaner as well.
      
      2. Instead of adding a new counter, we considered adding a sysctl that
         modifies the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do
         accounting at reservation time rather than fault time.  Adding a new
         page_counter seems better as userspace could, if it wants, choose to
         enforce different cgroups differently: one via limit_in_bytes, and
         another via rsvd.limit_in_bytes.  This could be very useful if you're
         transitioning how hugetlb memory is partitioned on your system one
         cgroup at a time, for example.  Also, someone may find usage for both
         limit_in_bytes and rsvd.limit_in_bytes concurrently, and this approach
         gives them the option to do so.
      
      Testing:
      - Added tests passing.
      - Used libhugetlbfs for regression testing.
      
      [1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.html
      
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20200211213128.73302-1-almasrymina@google.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cdc2fcfe
  3. 29 Mar, 2020 1 commit
  4. 16 Dec, 2019 1 commit
    • Giuseppe Scrivano's avatar
      mm: hugetlb controller for cgroups v2 · faced7e0
      Giuseppe Scrivano authored
      
      In the effort of supporting cgroups v2 into Kubernetes, I stumped on
      the lack of the hugetlb controller.
      
      When the controller is enabled, it exposes four new files for each
      hugetlb size on non-root cgroups:
      
      - hugetlb.<hugepagesize>.current
      - hugetlb.<hugepagesize>.max
      - hugetlb.<hugepagesize>.events
      - hugetlb.<hugepagesize>.events.local
      
      The differences with the legacy hierarchy are in the file names and
      using the value "max" instead of "-1" to disable a limit.
      
      The file .limit_in_bytes is renamed to .max.
      
      The file .usage_in_bytes is renamed to .current.
      
      .failcnt is not provided as a single file anymore, but its value can
      be read through the new flat-keyed files .events and .events.local,
      through the "max" key.
      Signed-off-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      faced7e0
  5. 16 Nov, 2019 1 commit
  6. 24 Sep, 2019 1 commit
  7. 08 Jun, 2018 1 commit
  8. 21 May, 2016 1 commit
    • David Rientjes's avatar
      mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size · 297880f4
      David Rientjes authored
      
      The page_counter rounds limits down to page size values.  This makes
      sense, except in the case of hugetlb_cgroup where it's not possible to
      charge partial hugepages.  If the hugetlb_cgroup margin is less than the
      hugepage size being charged, it will fail as expected.
      
      Round the hugetlb_cgroup limit down to hugepage size, since it is the
      effective limit of the cgroup.
      
      For consistency, round down PAGE_COUNTER_MAX as well when a
      hugetlb_cgroup is created: this prevents error reports when a user
      cannot restore the value to the kernel default.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      297880f4
  9. 07 Nov, 2015 1 commit
    • Kirill A. Shutemov's avatar
      mm: make compound_head() robust · 1d798ca3
      Kirill A. Shutemov authored
      Hugh has pointed that compound_head() call can be unsafe in some
      context. There's one example:
      
      	CPU0					CPU1
      
      isolate_migratepages_block()
        page_count()
          compound_head()
            !!PageTail() == true
      					put_page()
      					  tail->first_page = NULL
            head = tail->first_page
      					alloc_pages(__GFP_COMP)
      					   prep_compound_page()
      					     tail->first_page = head
      					     __SetPageTail(p);
            !!PageTail() == true
          <head == NULL dereferencing>
      
      The race is pure theoretical. I don't it's possible to trigger it in
      practice. But who knows.
      
      We can fix the race by changing how encode PageTail() and compound_head()
      within struct page to be able to update them in one shot.
      
      The patch introduces page->compound_head into third double word block in
      front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
      the rest bits are pointer to head page if bit zero is set.
      
      The patch moves page->pmd_huge_pte out of word, just in case if an
      architecture defines pgtable_t into something what can have the bit 0
      set.
      
      hugetlb_cgroup uses page->lru.next in the second tail page to store
      pointer struct hugetlb_cgroup. The patch switch it to use page->private
      in the second tail page instead. The space is free since ->first_page is
      removed from the union.
      
      The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
      limitation, since there's now space in first tail page to store struct
      hugetlb_cgroup pointer. But that's out of scope of the patch.
      
      That means page->compound_head shares storage space with:
      
       - page->lru.next;
       - page->next;
       - page->rcu_head.next;
      
      That's too long list to be absolutely sure, but looks like nobody uses
      bit 0 of the word.
      
      page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
      call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
      call_rcu_lazy() is not allowed as it makes use of the bit and we can
      get false positive PageTail().
      
      [1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d798ca3
  10. 06 Nov, 2015 1 commit
  11. 12 Feb, 2015 1 commit
  12. 11 Dec, 2014 1 commit
  13. 29 Aug, 2014 1 commit
  14. 14 Aug, 2014 1 commit
  15. 15 Jul, 2014 1 commit
    • Tejun Heo's avatar
      cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() · 2cf669a5
      Tejun Heo authored
      
      Currently, cftypes added by cgroup_add_cftypes() are used for both the
      unified default hierarchy and legacy ones and subsystems can mark each
      file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
      appear only on one of them.  This is quite hairy and error-prone.
      Also, we may end up exposing interface files to the default hierarchy
      without thinking it through.
      
      cgroup_subsys will grow two separate cftype addition functions and
      apply each only on the hierarchies of the matching type.  This will
      allow organizing cftypes in a lot clearer way and encourage subsystems
      to scrutinize the interface which is being exposed in the new default
      hierarchy.
      
      In preparation, this patch adds cgroup_add_legacy_cftypes() which
      currently is a simple wrapper around cgroup_add_cftypes() and replaces
      all cgroup_add_cftypes() usages with it.
      
      While at it, this patch drops a completely spurious return from
      __hugetlb_cgroup_file_init().
      
      This patch doesn't introduce any functional differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      2cf669a5
  16. 16 May, 2014 1 commit
    • Tejun Heo's avatar
      cgroup: remove css_parent() · 5c9d535b
      Tejun Heo authored
      
      cgroup in general is moving towards using cgroup_subsys_state as the
      fundamental structural component and css_parent() was introduced to
      convert from using cgroup->parent to css->parent.  It was quite some
      time ago and we're moving forward with making css more prominent.
      
      This patch drops the trivial wrapper css_parent() and let the users
      dereference css->parent.  While at it, explicitly mark fields of css
      which are public and immutable.
      
      v2: New usage from device_cgroup.c converted.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      5c9d535b
  17. 13 May, 2014 3 commits
    • Tejun Heo's avatar
      cgroup: replace cftype->trigger() with cftype->write() · 6770c64e
      Tejun Heo authored
      
      cftype->trigger() is pointless.  It's trivial to ignore the input
      buffer from a regular ->write() operation.  Convert all ->trigger()
      users to ->write() and remove ->trigger().
      
      This patch doesn't introduce any visible behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      6770c64e
    • Tejun Heo's avatar
      cgroup: replace cftype->write_string() with cftype->write() · 451af504
      Tejun Heo authored
      
      Convert all cftype->write_string() users to the new cftype->write()
      which maps directly to kernfs write operation and has full access to
      kernfs and cgroup contexts.  The conversions are mostly mechanical.
      
      * @css and @cft are accessed using of_css() and of_cft() accessors
        respectively instead of being specified as arguments.
      
      * Should return @nbytes on success instead of 0.
      
      * @buf is not trimmed automatically.  Trim if necessary.  Note that
        blkcg and netprio don't need this as the parsers already handle
        whitespaces.
      
      cftype->write_string() has no user left after the conversions and
      removed.
      
      While at it, remove unnecessary local variable @p in
      cgroup_subtree_control_write() and stale comment about
      CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c.
      
      This patch doesn't introduce any visible behavior changes.
      
      v2: netprio was missing from conversion.  Converted.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarAristeu Rozanski <arozansk@redhat.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      451af504
    • Tejun Heo's avatar
      cgroup: rename css_tryget*() to css_tryget_online*() · ec903c0c
      Tejun Heo authored
      
      Unlike the more usual refcnting, what css_tryget() provides is the
      distinction between online and offline csses instead of protection
      against upping a refcnt which already reached zero.  cgroup is
      planning to provide actual tryget which fails if the refcnt already
      reached zero.  Let's rename the existing trygets so that they clearly
      indicate that they're onliness.
      
      I thought about keeping the existing names as-are and introducing new
      names for the planned actual tryget; however, given that each
      controller participates in the synchronization of the online state, it
      seems worthwhile to make it explicit that these functions are about
      on/offline state.
      
      Rename css_tryget() to css_tryget_online() and css_tryget_from_dir()
      to css_tryget_online_from_dir().  This is pure rename.
      
      v2: cgroup_freezer grew new usages of css_tryget().  Update
          accordingly.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      ec903c0c
  18. 19 Mar, 2014 1 commit
    • Tejun Heo's avatar
      cgroup: drop const from @buffer of cftype->write_string() · 4d3bb511
      Tejun Heo authored
      
      cftype->write_string() just passes on the writeable buffer from kernfs
      and there's no reason to add const restriction on the buffer.  The
      only thing const achieves is unnecessarily complicating parsing of the
      buffer.  Drop const from @buffer.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>                                           
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      4d3bb511
  19. 08 Feb, 2014 1 commit
    • Tejun Heo's avatar
      cgroup: clean up cgroup_subsys names and initialization · 073219e9
      Tejun Heo authored
      cgroup_subsys is a bit messier than it needs to be.
      
      * The name of a subsys can be different from its internal identifier
        defined in cgroup_subsys.h.  Most subsystems use the matching name
        but three - cpu, memory and perf_event - use different ones.
      
      * cgroup_subsys_id enums are postfixed with _subsys_id and each
        cgroup_subsys is postfixed with _subsys.  cgroup.h is widely
        included throughout various subsystems, it doesn't and shouldn't
        have claim on such generic names which don't have any qualifier
        indicating that they belong to cgroup.
      
      * cgroup_subsys->subsys_id should always equal the matching
        cgroup_subsys_id enum; however, we require each controller to
        initialize it and then BUG if they don't match, which is a bit
        silly.
      
      This patch cleans up cgroup_subsys names and initialization by doing
      the followings.
      
      * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
        cgroup_subsys with _cgrp_subsys.
      
      * With the above, renaming subsys identifiers to match the userland
        visible names doesn't cause any naming conflicts.  All non-matching
        identifiers are renamed to match the official names.
      
        cpu_cgroup -> cpu
        mem_cgroup -> memory
        perf -> perf_event
      
      * controllers no longer need to initialize ->subsys_id and ->name.
        They're generated in cgroup core and set automatically during boot.
      
      * Redundant cgroup_subsys declarations removed.
      
      * While updating BUG_ON()s in cgroup_init_early(), convert them to
        WARN()s.  BUGging that early during boot is stupid - the kernel
        can't print anything, even through serial console and the trap
        handler doesn't even link stack frame properly for back-tracing.
      
      This patch doesn't introduce any behavior changes.
      
      v2: Rebased on top of fe1217c4
      
       ("net: net_cls: move cgroupfs
          classid handling into core").
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Acked-by: default avatar"Rafael J. Wysocki" <rjw@rjwysocki.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@redhat.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      073219e9
  20. 24 Jan, 2014 1 commit
  21. 05 Dec, 2013 1 commit
    • Tejun Heo's avatar
      hugetlb_cgroup: convert away from cftype->read() · 716f479d
      Tejun Heo authored
      
      In preparation of conversion to kernfs, cgroup file handling is being
      consolidated so that it can be easily mapped to the seq_file based
      interface of kernfs.
      
      All users of cftype->read() can be easily served, usually better, by
      seq_file and other methods.  Update hugetlb_cgroup_read() to return
      u64 instead of printing itself and rename it to
      hugetlb_cgroup_read_u64().
      
      This patch doesn't make any visible behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      716f479d
  22. 09 Aug, 2013 6 commits
    • Tejun Heo's avatar
      cgroup: pass around cgroup_subsys_state instead of cgroup in file methods · 182446d0
      Tejun Heo authored
      
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup.
      Please see the previous commit which converts the subsystem methods
      for rationale.
      
      This patch converts all cftype file operations to take @css instead of
      @cgroup.  cftypes for the cgroup core files don't have their subsytem
      pointer set.  These will automatically use the dummy_css added by the
      previous patch and can be converted the same way.
      
      Most subsystem conversions are straight forwards but there are some
      interesting ones.
      
      * freezer: update_if_frozen() is also converted to take @css instead
        of @cgroup for consistency.  This will make the code look simpler
        too once iterators are converted to use css.
      
      * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
        vmpressure while mem_cgroup_from_cont() can be made static.
        Updated accordingly.
      
      * cpu: cgroup_tg() doesn't have any user left.  Removed.
      
      * cpuacct: cgroup_ca() doesn't have any user left.  Removed.
      
      * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
        Removed.
      
      * net_cls: cgrp_cls_state() doesn't have any user left.  Removed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      182446d0
    • Tejun Heo's avatar
      cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods · eb95419b
      Tejun Heo authored
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup *
      in subsystem implementations for the following reasons.
      
      * With unified hierarchy, subsystems will be dynamically bound and
        unbound from cgroups and thus css's (cgroup_subsys_state) may be
        created and destroyed dynamically over the lifetime of a cgroup,
        which is different from the current state where all css's are
        allocated and destroyed together with the associated cgroup.  This
        in turn means that cgroup_css() should be synchronized and may
        return NULL, making it more cumbersome to use.
      
      * Differing levels of per-subsystem granularity in the unified
        hierarchy means that the task and descendant iterators should behave
        differently depending on the specific subsystem the iteration is
        being performed for.
      
      * In majority of the cases, subsystems only care about its part in the
        cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
        often obtain the matching css pointer from the cgroup and don't
        bother with the cgroup pointer itself.  Passing around css fits
        much better.
      
      This patch converts all cgroup_subsys methods to take @css instead of
      @cgroup.  The conversions are mostly straight-forward.  A few
      noteworthy changes are
      
      * ->css_alloc() now takes css of the parent cgroup rather than the
        pointer to the new cgroup as the css for the new cgroup doesn't
        exist yet.  Knowing the parent css is enough for all the existing
        subsystems.
      
      * In kernel/cgroup.c::offline_css(), unnecessary open coded css
        dereference is replaced with local variable access.
      
      This patch shouldn't cause any behavior differences.
      
      v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
          with local variable @css as suggested by Li Zefan.
      
          Rebased on top of new for-3.12 which includes for-3.11-fixes so
          that ->css_free() invocation added by da0a12ca
      
       ("cgroup: fix a
          leak when percpu_ref_init() fails") is converted too.  Suggested
          by Li Zefan.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eb95419b
    • Tejun Heo's avatar
      cgroup: add css_parent() · 63876986
      Tejun Heo authored
      
      Currently, controllers have to explicitly follow the cgroup hierarchy
      to find the parent of a given css.  cgroup is moving towards using
      cgroup_subsys_state as the main controller interface construct, so
      let's provide a way to climb the hierarchy using just csses.
      
      This patch implements css_parent() which, given a css, returns its
      parent.  The function is guarnateed to valid non-NULL parent css as
      long as the target css is not at the top of the hierarchy.
      
      freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
      are converted to use css_parent() instead of accessing cgroup->parent
      directly.
      
      * __parent_ca() is dropped from cpuacct and its usage is replaced with
        parent_ca().  The only difference between the two was NULL test on
        cgroup->parent which is now embedded in css_parent() making the
        distinction moot.  Note that eventually a css->parent field will be
        added to css and the NULL check in css_parent() will go away.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      63876986
    • Tejun Heo's avatar
      cgroup: add/update accessors which obtain subsys specific data from css · a7c6d554
      Tejun Heo authored
      
      css (cgroup_subsys_state) is usually embedded in a subsys specific
      data structure.  Subsystems either use container_of() directly to cast
      from css to such data structure or has an accessor function wrapping
      such cast.  As cgroup as whole is moving towards using css as the main
      interface handle, add and update such accessors to ease dealing with
      css's.
      
      All accessors explicitly handle NULL input and return NULL in those
      cases.  While this looks like an extra branch in the code, as all
      controllers specific data structures have css as the first field, the
      casting doesn't involve any offsetting and the compiler can trivially
      optimize out the branch.
      
      * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
        accessor.  Added.
      
      * memory, hugetlb and devices already had one but didn't explicitly
        handle NULL input.  Updated.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      a7c6d554
    • Tejun Heo's avatar
      hugetlb_cgroup: pass around @hugetlb_cgroup instead of @cgroup · 3f798518
      Tejun Heo authored
      
      cgroup controller API will be converted to primarily use struct
      cgroup_subsys_state instead of struct cgroup.  In preparation, make
      hugetlb_cgroup functions pass around struct hugetlb_cgroup instead of
      struct cgroup.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      3f798518
    • Tejun Heo's avatar
      cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ · 8af01f56
      Tejun Heo authored
      
      The names of the two struct cgroup_subsys_state accessors -
      cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
      The former clashes with the type name and the latter doesn't even
      indicate it's somehow related to cgroup.
      
      We're about to revamp large portion of cgroup API, so, let's rename
      them so that they're less awkward.  Most per-controller usages of the
      accessors are localized in accessor wrappers and given the amount of
      scheduled changes, this isn't gonna add any noticeable headache.
      
      Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
      to task_css().  This patch is pure rename.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      8af01f56
  23. 18 Dec, 2012 1 commit
  24. 19 Nov, 2012 1 commit
  25. 05 Nov, 2012 2 commits
  26. 01 Aug, 2012 3 commits