- 06 Feb, 2019 1 commit
-
-
Waiman Long authored
commit 1dbd449c upstream. The nr_dentry_unused per-cpu counter tracks dentries in both the LRU lists and the shrink lists where the DCACHE_LRU_LIST bit is set. The shrink_dcache_sb() function moves dentries from the LRU list to a shrink list and subtracts the dentry count from nr_dentry_unused. This is incorrect as the nr_dentry_unused count will also be decremented in shrink_dentry_list() via d_shrink_del(). To fix this double decrement, the decrement in the shrink_dcache_sb() function is taken out. Fixes: 4e717f5c ("list_lru: remove special case function list_lru_dispose_all." Cc: stable@kernel.org Signed-off-by:
Waiman Long <longman@redhat.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 Oct, 2018 1 commit
-
-
Roman Gushchin authored
commit f1782c9b upstream. I received a report about suspicious growth of unreclaimable slabs on some machines. I've found that it happens on machines with low memory pressure, and these unreclaimable slabs are external names attached to dentries. External names are allocated using generic kmalloc() function, so they are accounted as unreclaimable. But they are held by dentries, which are reclaimable, and they will be reclaimed under the memory pressure. In particular, this breaks MemAvailable calculation, as it doesn't take unreclaimable slabs into account. This leads to a silly situation, when a machine is almost idle, has no memory pressure and therefore has a big dentry cache. And the resulting MemAvailable is too low to start a new workload. To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is used to track the amount of memory, consumed by external names. The counter is increased in the dentry allocation path, if an external name structure is allocated; and it's decreased in the dentry freeing path. To reproduce the problem I've used the following Python script: import os for iter in range (0, 10000000): try: name = ("/some_long_name_%d" % iter) + "_" * 220 os.stat(name) except Exception: pass Without this patch: $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7811688 kB $ python indirect.py $ cat /proc/meminfo | grep MemAvailable MemAvailable: 2753052 kB With the patch: $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7809516 kB $ python indirect.py $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7749144 kB [guro@fb.com: fix indirectly reclaimable memory accounting for CONFIG_SLOB] Link: http://lkml.kernel.org/r/20180312194140.19517-1-guro@fb.com [guro@fb.com: fix indirectly reclaimable memory accounting] Link: http://lkml.kernel.org/r/20180313125701.7955-1-guro@fb.com Link: http://lkml.kernel.org/r/20180305133743.12746-5-guro@fb.com Signed-off-by:
Roman Gushchin <guro@fb.com> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Sep, 2018 1 commit
-
-
Tetsuo Handa authored
[ Upstream commit 6cd00a01 ] Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes are initialized at __d_alloc(), we can't copy the whole size unconditionally. WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50) 636f6e66696766732e746d70000000000010000000000000020000000188ffff i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u ^ RIP: 0010:take_dentry_name_snapshot+0x28/0x50 RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002 RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60 RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001 R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00 R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0 take_dentry_name_snapshot+0x28/0x50 vfs_rename+0x128/0x870 SyS_rename+0x3b2/0x3d0 entry_SYSCALL_64_fastpath+0x1a/0xa4 0xffffffffffffffff Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jp Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Aug, 2018 2 commits
-
-
Al Viro authored
commit 4c0d7cd5 upstream. RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by:
"Dae R. Jeong" <threeearcat@gmail.com> Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 90bad5e0 upstream. Since mountpoint crossing can happen without leaving lazy mode, root dentries do need the same protection against having their memory freed without RCU delay as everything else in the tree. It's partially hidden by RCU delay between detaching from the mount tree and dropping the vfsmount reference, but the starting point of pathwalk can be on an already detached mount, in which case umount-caused RCU delay has already passed by the time the lazy pathwalk grabs rcu_read_lock(). If the starting point happens to be at the root of that vfsmount *and* that vfsmount covers the entire filesystem, we get trouble. Fixes: 48a066e7 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 30 May, 2018 3 commits
-
-
Will Deacon authored
[ Upstream commit 8cc07c80 ] i_dir_seq is subject to concurrent modification by a cmpxchg or store-release operation, so ensure that the relaxed access in d_alloc_parallel uses READ_ONCE. Reported-by:
Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Will Deacon authored
[ Upstream commit 015555fd ] If d_alloc_parallel runs concurrently with __d_add, it is possible for d_alloc_parallel to continuously retry whilst i_dir_seq has been incremented to an odd value by __d_add: CPU0: __d_add n = start_dir_add(dir); cmpxchg(&dir->i_dir_seq, n, n + 1) == n CPU1: d_alloc_parallel retry: seq = smp_load_acquire(&parent->d_inode->i_dir_seq) & ~1; hlist_bl_lock(b); bit_spin_lock(0, (unsigned long *)b); // Always succeeds CPU0: __d_lookup_done(dentry) hlist_bl_lock bit_spin_lock(0, (unsigned long *)b); // Never succeeds CPU1: if (unlikely(parent->d_inode->i_dir_seq != seq)) { hlist_bl_unlock(b); goto retry; } Since the simple bit_spin_lock used to implement hlist_bl_lock does not provide any fairness guarantees, then CPU1 can starve CPU0 of the lock and prevent it from reaching end_dir_add(dir), therefore CPU1 cannot exit its retry loop because the sequence number always has the bottom bit set. This patch resolves the livelock by not taking hlist_bl_lock in d_alloc_parallel if the sequence counter is odd, since any subsequent masked comparison with i_dir_seq will fail anyway. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by:
Naresh Madhusudana <naresh.madhusudana@arm.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 1e2e547a upstream. For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by:
Mike Marshall <hubcap@omnibond.com> Reviewed-by:
Andreas Dilger <adilger@dilger.ca> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 12 Apr, 2018 1 commit
-
-
NeilBrown authored
[ Upstream commit 61647823 ] d_move() will call __d_drop() and then __d_rehash() on the dentry being moved. This creates a small window when the dentry appears to be unhashed. Many tests of d_unhashed() are made under ->d_lock and so are safe from racing with this window, but some aren't. In particular, getcwd() calls d_unlinked() (which calls d_unhashed()) without d_lock protection, so it can race. This races has been seen in practice with lustre, which uses d_move() as part of name lookup. See: https://jira.hpdd.intel.com/browse/LU-9735 It could race with a regular rename(), and result in ENOENT instead of either the 'before' or 'after' name. The race can be demonstrated with a simple program which has two threads, one renaming a directory back and forth while another calls getcwd() within that directory: it should never fail, but does. See: https://patchwork.kernel.org/patch/9455345/ We could fix this race by taking d_lock and rechecking when d_unhashed() reports true. Alternately when can remove the window, which is the approach this patch takes. ___d_drop() is introduce which does *not* clear d_hash.pprev so the dentry still appears to be hashed. __d_drop() calls ___d_drop(), then clears d_hash.pprev. __d_move() now uses ___d_drop() and only clears d_hash.pprev when not rehashing. Signed-off-by:
NeilBrown <neilb@suse.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 21 Mar, 2018 1 commit
-
-
Al Viro authored
commit 3b821409 upstream. In case when dentry passed to lock_parent() is protected from freeing only by the fact that it's on a shrink list and trylock of parent fails, we could get hit by __dentry_kill() (and subsequent dentry_kill(parent)) between unlocking dentry and locking presumed parent. We need to recheck that dentry is alive once we lock both it and parent *and* postpone rcu_read_unlock() until after that point. Otherwise we could return a pointer to struct dentry that already is rcu-scheduled for freeing, with ->d_lock held on it; caller's subsequent attempt to unlock it can end up with memory corruption. Cc: stable@vger.kernel.org # 3.12+, counting backports Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 Feb, 2018 1 commit
-
-
Levin, Alexander (Sasha Levin) authored
commit 49502766 upstream. Patch series "kmemcheck: kill kmemcheck", v2. As discussed at LSF/MM, kill kmemcheck. KASan is a replacement that is able to work without the limitation of kmemcheck (single CPU, slow). KASan is already upstream. We are also not aware of any users of kmemcheck (or users who don't consider KASan as a suitable replacement). The only objection was that since KASAN wasn't supported by all GCC versions provided by distros at that time we should hold off for 2 years, and try again. Now that 2 years have passed, and all distros provide gcc that supports KASAN, kill kmemcheck again for the very same reasons. This patch (of 4): Remove kmemcheck annotations, and calls to kmemcheck from the kernel. [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs] Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com Link: http://lkml.kernel.org/r/20171007030159.22241-2-al...
-
- 25 Dec, 2017 1 commit
-
-
Will Deacon authored
commit 3382290e upstream. [ Note, this is a Git cherry-pick of the following commit: 506458ef ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()") ... for easier x86 PTI code testing and back-porting. ] READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by:
Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 10 Jul, 2017 1 commit
-
-
Sahitya Tummala authored
__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer duration if there are more number of items in the lru list. As per the current code, it can hold the spin lock for upto maximum UINT_MAX entries at a time. So if there are more number of items in the lru list, then "BUG: spinlock lockup suspected" is observed in the below path: spin_bug+0x90 do_raw_spin_lock+0xfc _raw_spin_lock+0x28 list_lru_add+0x28 dput+0x1c8 path_put+0x20 terminate_walk+0x3c path_lookupat+0x100 filename_lookup+0x6c user_path_at_empty+0x54 SyS_faccessat+0xd0 el0_svc_naked+0x24 This nlru->lock is acquired by another CPU in this path - d_lru_shrink_move+0x34 dentry_lru_isolate_shrink+0x48 __list_lru_walk_one.isra.10+0x94 list_lru_walk_node+0x40 shrink_dcache_sb+0x60 do_remount_sb+0xbc do_emergency_remount+0xb0 process_one_work+0x228 worker_thread+0x2e0 kthread+0xf4 ret_from_fork+0x10 Fix this lockup by reducing the number of entries to be shrinked from the lru list to 1024 at once. Also, add cond_resched() before processing the lru list again. Link: http://marc.info/?t=149722864900001&r=1&w=2 Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org Signed-off-by:
Sahitya Tummala <stummala@codeaurora.org> Suggested-by:
Jan Kara <jack@suse.cz> Suggested-by:
Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by:
Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Alexander Polakov <apolyakov@beget.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 08 Jul, 2017 1 commit
-
-
Al Viro authored
take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 06 Jul, 2017 2 commits
-
-
Pavel Tatashin authored
Update dcache, inode, pid, mountpoint, and mount hash tables to use HASH_ZERO, and remove initialization after allocations. In case of places where HASH_EARLY was used such as in __pv_init_lock_hash the zeroed hash table was already assumed, because memblock zeroes the memory. CPU: SPARC M6, Memory: 7T Before fix: Dentry cache hash table entries: 1073741824 Inode-cache hash table entries: 536870912 Mount-cache hash table entries: 16777216 Mountpoint-cache hash table entries: 16777216 ftrace: allocating 20414 entries in 40 pages Total time: 11.798s After fix: Dentry cache hash table entries: 1073741824 Inode-cache hash table entries: 536870912 Mount-cache hash table entries: 16777216 Mountpoint-cache hash table entries: 16777216 ftrace: allocating 20414 entries in 40 pages Total time: 3.198s CPU: Intel Xeon E5-2630, Memory: 2.2T: Before fix: Dentry cache hash table entries: 536870912 Inode-cache hash table entries: 268435456 Mount-cache hash table entries: 8388608 Mountpoint-cache hash table entries: 8388608 CPU: Physical Processor ID: 0 Total time: 3.245s After fix: Dentry cache hash table entries: 536870912 Inode-cache hash table entries: 268435456 Mount-cache hash table entries: 8388608 Mountpoint-cache hash table entries: 8388608 CPU: Physical Processor ID: 0 Total time: 3.244s Link: http://lkml.kernel.org/r/1488432825-92126-4-git-send-email-pasha.tatashin@oracle.com Signed-off-by:
Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by:
Babu Moger <babu.moger@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Howells authored
Provide an empty name (ie. "") qstr for general use. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 30 Jun, 2017 1 commit
-
-
Sebastian Andrzej Siewior authored
in_lookup_hashtable was introduced in commit 94bdd655 ("parallel lookups machinery, part 3") and never initialized but since it is in the data it is all zeros. But we need this for -RT. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 15 Jun, 2017 1 commit
-
-
Al Viro authored
It's not hard to trigger a bunch of d_invalidate() on the same dentry in parallel. They end up fighting each other - any dentry picked for removal by one will be skipped by the rest and we'll go for the next iteration through the entire subtree, even if everything is being skipped. Morevoer, we immediately go back to scanning the subtree. The only thing we really need is to dissolve all mounts in the subtree and as soon as we've nothing left to do, we can just unhash the dentry and bugger off. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 03 May, 2017 1 commit
-
-
Josef Bacik authored
By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or inode we create. This is problematic as this means that it takes two trips through the LRU for any of these objects to be reclaimed, regardless of their actual lifetime. With enough pressure from these caches we can easily evict our working set from page cache with single use objects. So instead only set *REFERENCED if we've already been added to the LRU list. This means that we've been touched since the first time we were accessed, and so more likely to need to hang out in cache. To illustrate this issue I wrote the following scripts https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure on my test box. It is a single socket 4 core CPU with 16gib of RAM and I tested on an Intel 2tib NVME drive. The cache-pressure.sh script creates a new file system and creates 2 6.5gib files in order to take up 13gib of the 16gib of ram with pagecache. Then it runs a test program that reads these 2 files in a loop, and keeps track of how often it has to read bytes for each loop. On an ideal system with no pressure we should have to read 0 bytes indefinitely. The second thing this script does is start a fs_mark job that creates a ton of 0 length files, putting pressure on the system with slab only allocations. On exit the script prints out how many bytes were read by the read-file program. The results are as follows Without patch: /mnt/btrfs-test/reads/file1: total read during loops 27262988288 /mnt/btrfs-test/reads/file2: total read during loops 27262976000 With patch: /mnt/btrfs-test/reads/file2: total read during loops 18640457728 /mnt/btrfs-test/reads/file1: total read during loops 9565376512 This patch results in a 50% reduction of the amount of pages evicted from our working set. Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 10 Jan, 2017 1 commit
-
-
Eric W. Biederman authored
Protecting the mountpoint hashtable with namespace_sem was sufficient until a call to umount_mnt was added to mntput_no_expire. At which point it became possible for multiple calls of put_mountpoint on the same hash chain to happen on the same time. Kristen Johansen <kjlx@templeofstupid.com> reported: > This can cause a panic when simultaneous callers of put_mountpoint > attempt to free the same mountpoint. This occurs because some callers > hold the mount_hash_lock, while others hold the namespace lock. Some > even hold both. > > In this submitter's case, the panic manifested itself as a GP fault in > put_mountpoint() when it called hlist_del() and attempted to dereference > a m_hash.pprev that had been poisioned by another thread. Al Viro observed that the simple fix is to switch from using the namespace_sem to the mount_lock to protect the mountpoint hash table. I have taken Al's suggested patch moved put_mountpoint in pivot_root (instead of taking mount_lock an additional time), and have replaced new_mountpoint with get_mountpoint a function that does the hash table lookup and addition under the mount_lock. The introduction of get_mounptoint ensures that only the mount_lock is needed to manipulate the mountpoint hashtable. d_set_mounted is modified to only set DCACHE_MOUNTED if it is not already set. This allows get_mountpoint to use the setting of DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry happens exactly once. Cc: stable@vger.kernel.org Fixes: ce07d891 ("mnt: Honor MNT_LOCKED when detaching mounts") Reported-by:
Krister Johansen <kjlx@templeofstupid.com> Suggested-by:
Al Viro <viro@ZenIV.linux.org.uk> Acked-by:
Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- 24 Dec, 2016 1 commit
-
-
Linus Torvalds authored
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 04 Dec, 2016 2 commits
-
-
Ian Kent authored
Now that path_has_submounts() has been added have_submounts() is no longer used so remove it. Link: http://lkml.kernel.org/r/20161011053428.27645.12310.stgit@pluto.themaw.net Signed-off-by:
Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Ian Kent authored
d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using the given dentry, possibly in a different namespace. Add function, path_has_submounts(), that checks is a struct path contains mounts (or is a mountpoint itself) to handle this case. Link: http://lkml.kernel.org/r/20161011053403.27645.55242.stgit@pluto.themaw.net Signed-off-by:
Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 31 Jul, 2016 1 commit
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 29 Jul, 2016 2 commits
-
-
Al Viro authored
The only place where we feed to __d_rehash() something other than d_hash(dentry->d_name.hash) is __d_move(), where we give it d_hash of another dentry. Postpone rehashing until we'd switched the names and we are rid of that exception, along with the need to keep _d_rehash() and __d_rehash() separate. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 24 Jul, 2016 2 commits
-
-
Wei Fang authored
We triggered soft-lockup under stress test which open/access/write/close one file concurrently on more than five different CPUs: WARN: soft lockup - CPU#0 stuck for 11s! [who:30631] ... [<ffffffc0003986f8>] dput+0x100/0x298 [<ffffffc00038c2dc>] terminate_walk+0x4c/0x60 [<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8 [<ffffffc00038f780>] filename_lookup+0x38/0xf0 [<ffffffc000391180>] user_path_at_empty+0x78/0xd0 [<ffffffc0003911f4>] user_path_at+0x1c/0x28 [<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230 ->d_lock trylock may failed many times because of concurrently operations, and dput() may execute a long time. Fix this by replacing cpu_relax() with cond_resched(). dput() used to be sleepable, so make it sleepable again should be safe. Cc: <stable@vger.kernel.org> Signed-off-by:
Wei Fang <fangwei1@huawei.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Miklos Szeredi authored
Allow filesystem to initialize dentry at allocation time. Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 21 Jul, 2016 1 commit
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 01 Jul, 2016 2 commits
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
He Kuang authored
lockless_dereference() was added which can be used in place of hard-coding smp_read_barrier_depends(). Signed-off-by:
He Kuang <hekuang@huawei.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 30 Jun, 2016 1 commit
-
-
Miklos Szeredi authored
The two methods essentially do the same: find the real dentry/inode belonging to an overlay dentry. The difference is in the usage: vfs_open() uses ->d_select_inode() and expects the function to perform copy-up if necessary based on the open flags argument. file_dentry() uses ->d_real() passing in the overlay dentry as well as the underlying inode. vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real() with a zero inode would have worked just as well here. This patch merges the functionality of ->d_select_inode() into ->d_real() by adding an 'open_flags' argument to the latter. [Al Viro] Make the signature of d_real() match that of ->d_real() again. And constify the inode argument, while we are at it. Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
- 20 Jun, 2016 1 commit
-
-
Al Viro authored
Check for d_unhashed() while searching in in-lookup hash was absolutely wrong. Worse, it masked a deadlock on dget() done under bitlock that nests inside ->d_lock. Thanks to J. R. Okajima for spotting it. Spotted-by:
"J. R. Okajima" <hooanon05g@gmail.com> Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 11 Jun, 2016 2 commits
-
-
George Spelvin authored
Noe that we're mixing in the parent pointer earlier, we don't need to use hash_32() to mix its bits. Instead, we can just take the msbits of the hash value directly. For those applications which use the partial_name_hash(), move the multiply to end_name_hash. Signed-off-by:
George Spelvin <linux@sciencehorizons.net> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 10 Jun, 2016 1 commit
-
-
Al Viro authored
d_walk() relies upon the tree not getting rearranged under it without rename_lock being touched. And we do grab rename_lock around the places that change the tree topology. Unfortunately, branch reordering is just as bad from d_walk() POV and we have two places that do it without touching rename_lock - one in handling of cursors (for ramfs-style directories) and another in autofs. autofs one is a separate story; this commit deals with the cursors. * mark cursor dentries explicitly at allocation time * make __dentry_kill() leave ->d_child.next pointing to the next non-cursor sibling, making sure that it won't be moved around unnoticed before the parent is relocked on ascend-to-parent path in d_walk(). * make d_walk() skip cursors explicitly; strictly speaking it's not necessary (all callbacks we pass to d_walk() are no-ops on cursors), but it makes analysis easier. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 08 Jun, 2016 1 commit
-
-
Al Viro authored
Ascend-to-parent logics in d_walk() depends on all encountered child dentries not getting freed without an RCU delay. Unfortunately, in quite a few cases it is not true, with hard-to-hit oopsable race as the result. Fortunately, the fix is simiple; right now the rule is "if it ever been hashed, freeing must be delayed" and changing it to "if it ever had a parent, freeing must be delayed" closes that hole and covers all cases the old rule used to cover. Moreover, pipes and sockets remain _not_ covered, so we do not introduce RCU delay in the cases which are the reason for having that delay conditional in the first place. Cc: stable@vger.kernel.org # v3.2+ (and watch out for __d_materialise_dentry()) Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 30 May, 2016 1 commit
-
-
Al Viro authored
There is a lot of duplication between dentry_unlink_inode() and dentry_iput(). The only real difference is that dentry_unlink_inode() bumps ->d_seq and dentry_iput() doesn't. The argument of the latter is known to have been unhashed, so anybody who might've found it in RCU lookup would already be doomed to a ->d_seq mismatch. And we want to avoid pointless smp_rmb() there. This patch makes dentry_unlink_inode() bump ->d_seq only for hashed dentries. It's safe (d_delete() calls that sucker only if we are holding the only reference to dentry, so rehash is not going to happen) and it allows to use dentry_unlink_inode() in __dentry_kill() and get rid of dentry_iput(). The interesting question here is profiling; it *is* a hot path, and extra conditional jumps in there might or might not be painful. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 29 May, 2016 1 commit
-
-
Al Viro authored
fsnotify_d_move()/__fsnotify_d_instantiate()/__fsnotify_update_dcache_flags() are identical to each other, regardless of the config. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 28 May, 2016 1 commit
-
-
George Spelvin authored
We'd like to make more use of the highly-optimized dcache hash functions throughout the kernel, rather than have every subsystem create its own, and a function that hashes basic null-terminated strings is required for that. (The name is to emphasize that it returns both hash and length.) It's actually useful in the dcache itself, specifically d_alloc_name(). Other uses in the next patch. full_name_hash() is also tweaked to make it more generally useful: 1) Take a "char *" rather than "unsigned char *" argument, to be consistent with hash_name(). 2) Handle zero-length inputs. If we want more callers, we don't want to make them worry about corner cases. Signed-off-by:
George Spelvin <linux@sciencehorizons.net>
-