- 18 Sep, 2021 1 commit
-
-
Arnd Bergmann authored
[ Upstream commit dd98d289 ] The ethtool compat ioctl handling is hidden away in net/socket.c, which introduces a couple of minor oddities: - The implementation may end up diverging, as seen in the RXNFC extension in commit 84a1d9c4 ("net: ethtool: extend RXNFC API to support RSS spreading of filter matches") that does not work in compat mode. - Most architectures do not need the compat handling at all because u64 and compat_u64 have the same alignment. - On x86, the conversion is done for both x32 and i386 user space, but it's actually wrong to do it for x32 and cannot work there. - On 32-bit Arm, it never worked for compat oabi user space, since that needs to do the same conversion but does not. - It would be nice to get rid of both compat_alloc_user_space() and copy_in_user() throughout the kernel. None of these actually seems to be a serious problem that real users are likely to encounter, but fixing all of them actually leads to code that is both shorter and more readable. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- 03 Sep, 2021 1 commit
-
-
Peter Collingbourne authored
commit d0efb162 upstream. A common implementation of isatty(3) involves calling a ioctl passing a dummy struct argument and checking whether the syscall failed -- bionic and glibc use TCGETS (passing a struct termios), and musl uses TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will copy sizeof(struct ifreq) bytes of data from the argument and return -EFAULT if that fails. The result is that the isatty implementations may return a non-POSIX-compliant value in errno in the case where part of the dummy struct argument is inaccessible, as both struct termios and struct winsize are smaller than struct ifreq (at least on arm64). Although there is usually enough stack space following the argument on the stack that this did not present a practical problem up to now, with MTE stack instrumentation it's more likely for the copy to fail, as the memory following the struct may have a different tag. Fix the problem by adding an early check for whether the ioctl is a valid socket ioctl, and return -ENOTTY if it isn't. Fixes: 44c02a2c ("dev_ioctl(): move copyin/copyout to callers") Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d Signed-off-by:
Peter Collingbourne <pcc@google.com> Cc: <stable@vger.kernel.org> # 4.19 Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 23 Jun, 2021 1 commit
-
-
Changbin Du authored
[ Upstream commit ea6932d7 ] There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled. The reason is that nsfs tries to access ns->ops but the proc_ns_operations is not implemented in this case. [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [7.670268] pgd = 32b54000 [7.670544] [00000010] *pgd=00000000 [7.671861] Internal error: Oops: 5 [#1] SMP ARM [7.672315] Modules linked in: [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2 #16 [7.673309] Hardware name: Generic DT based system [7.673642] PC is at nsfs_evict+0x24/0x30 [7.674486] LR is at clear_inode+0x20/0x9c The same to tun SIOCGSKNS command. To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is disabled. Meanwhile move it to right place net/core/net_namespace.c. Signed-off-by:
Changbin Du <changbin.du@gmail.com> Fixes: c62cce2c ("net: add an ioctl to get a socket network namespace") Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by:
Jakub Kicinski <kuba@kernel.org> Acked-by:
Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- 02 Oct, 2020 1 commit
-
-
Coly Li authored
If a page sent into kernel_sendpage() is a slab page or it doesn't have ref_count, this page is improper to send by the zero copy sendpage() method. Otherwise such page might be unexpected released in network code path and causes impredictable panic due to kernel memory management data structure corruption. This path adds a WARN_ON() on the sending page before sends it into the concrete zero-copy sendpage() method, if the page is improper for the zero-copy sendpage() method, a warning message can be observed before the consequential unpredictable kernel panic. This patch does not change existing kernel_sendpage() behavior for the improper page zero-copy send, it just provides hint warning message for following potential panic due the kernel memory heap corruption. Signed-off-by:
Coly Li <colyli@suse.de> Cc: Cong Wang <amwang@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David S. Miller <davem@davemloft.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 27 Aug, 2020 1 commit
-
-
Miaohe Lin authored
Fix some comments, including wrong function name, duplicated word and so on. Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 24 Aug, 2020 1 commit
-
-
Luke Hsiao authored
For TCP tx zero-copy, the kernel notifies the process of completions by queuing completion notifications on the socket error queue. This patch allows reading these notifications via recvmsg to support TCP tx zero-copy. Ancillary data was originally disallowed due to privilege escalation via io_uring's offloading of sendmsg() onto a kernel thread with kernel credentials (https://crbug.com/project-zero/1975 ). So, we must ensure that the socket type is one where the ancillary data types that are delivered on recvmsg are plain data (no file descriptors or values that are translated based on the identity of the calling process). This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE with tx zero-copy enabled. Before this patch, we received -EINVALID from this specific code path. After this patch, we could read tcp tx zero-copy completion notifications from the MSG_ERRQUEUE. Signed-off-by:
Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by:
Arjun Roy <arjunroy@google.com> Acked-by:
Eric Dumazet <edumazet@google.com> Reviewed-by:
Jann Horn <jannh@google.com> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Luke Hsiao <lukehsiao@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 10 Aug, 2020 1 commit
-
-
Christoph Hellwig authored
This reverts commits 6d04fe15 and a31edb20. It turns out the idea to share a single pointer for both kernel and user space address causes various kinds of problems. So use the slightly less optimal version that uses an extra bit, but which is guaranteed to be safe everywhere. Fixes: 6d04fe15 ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by:
Eric Dumazet <edumazet@google.com> Reported-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 08 Aug, 2020 4 commits
-
-
Miaohe Lin authored
Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Miaohe Lin authored
The out_fs jump label has nothing to do but goto out. Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Miaohe Lin authored
We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed accordingly. Fixes: 00e188ef ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Miaohe Lin authored
Use helper function fdput() to fput() the file iff FDPUT_FPUT is set. Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 28 Jul, 2020 1 commit
-
-
Christoph Hellwig authored
Make sure not just the pointer itself but the whole range lies in the user address space. For that pass the length and then use the access_ok helper to do the check. Fixes: 6d04fe15 ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by:
David Laight <David.Laight@ACULAB.COM> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 24 Jul, 2020 3 commits
-
-
Christoph Hellwig authored
For architectures like x86 and arm64 we don't need the separate bit to indicate that a pointer is a kernel pointer as the address spaces are unified. That way the sockptr_t can be reduced to a union of two pointers, which leads to nicer calling conventions. The only caveat is that we need to check that users don't pass in kernel address and thus gain access to kernel memory. Thus the USER_SOCKPTR helper is replaced with a init_user_sockptr function that does this check and returns an error if it fails. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by:
Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 20 Jul, 2020 4 commits
-
-
Christoph Hellwig authored
Just check for a NULL method instead of wiring up sock_no_{get,set}sockopt. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Return early when sockfd_lookup_light fails to reduce a level of indentation for most of the function body. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Return early when sockfd_lookup_light fails to reduce a level of indentation for most of the function body. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 14 Jul, 2020 1 commit
-
-
Andrew Lunn authored
Fix the warning "Function parameter or member 'inode' not described in '__sock_release'' due to the kerneldoc being placed before __sock_release() not sock_release(), which does not take an inode parameter. Signed-off-by:
Andrew Lunn <andrew@lunn.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 05 Jul, 2020 1 commit
-
-
Florian Westphal authored
setsockopt(mptcp_fd, SOL_SOCKET, ...)... appears to work (returns 0), but it has no effect -- this is because the MPTCP layer never has a chance to copy the settings to the subflow socket. Skip the generic handling for the mptcp case and instead call the mptcp specific handler instead for SOL_SOCKET too. Next patch adds more specific handling for SOL_SOCKET to mptcp. Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 29 May, 2020 1 commit
-
-
Christoph Hellwig authored
No users left. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 27 May, 2020 1 commit
-
-
Christoph Hellwig authored
No users left. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 19 May, 2020 2 commits
-
-
Christoph Hellwig authored
To prepare removing the global routing_ioctl hack start lifting the code into the ipv4 and appletalk ->compat_ioctl handlers. Unlike the existing handler we don't bother copying in the name - there are no compat issues for char arrays. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
To prepare removing the global routing_ioctl hack start lifting the code into a newly added ipv6 ->compat_ioctl handler. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 11 May, 2020 1 commit
-
-
Christoph Hellwig authored
The msg_control field in struct msghdr can either contain a user pointer when used with the recvmsg system call, or a kernel pointer when used with sendmsg. To complicate things further kernel_recvmsg can stuff a kernel pointer in and then use set_fs to make the uaccess helpers accept it. Replace it with a union of a kernel pointer msg_control field, and a user pointer msg_control_user one, and allow kernel_recvmsg operate on a proper kernel pointer using a bitfield to override the normal choice of a user pointer for recvmsg. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 20 Mar, 2020 1 commit
-
-
Jens Axboe authored
Just like commit 4022e7af , this fixes the fact that IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks current->signal->rlim[] for limits. Add an extra argument to __sys_accept4_file() that allows us to pass in the proper nofile limit, and grab it at request prep time. Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 10 Mar, 2020 1 commit
-
-
Jens Axboe authored
This splits it into two parts, one that imports the message, and one that imports the iovec. This allows a caller to only do the first part, and import the iovec manually afterwards. No functional changes in this patch. Acked-by:
David Miller <davem@davemloft.net> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 08 Jan, 2020 1 commit
-
-
Arnd Bergmann authored
When procfs is disabled, the fdinfo code causes a harmless warning: net/socket.c:1000:13: error: 'sock_show_fdinfo' defined but not used [-Werror=unused-function] static void sock_show_fdinfo(struct seq_file *m, struct file *f) Move the function definition up so we can use a single #ifdef around it. Fixes: b4653342 ("net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]") Suggested-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 13 Dec, 2019 1 commit
-
-
Kirill Tkhai authored
This adds .show_fdinfo to socket_file_ops, so protocols will be able to print their specific data in fdinfo. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 10 Dec, 2019 1 commit
-
-
Jens Axboe authored
The socket read/write helpers only look at the file O_NONBLOCK. not the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2 and io_uring that rely on not having the file itself marked nonblocking, but rather the iocb itself. Cc: netdev@vger.kernel.org Acked-by:
David Miller <davem@davemloft.net> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 06 Dec, 2019 1 commit
-
-
Eric Dumazet authored
CONFIG_RETPOLINE=y made indirect calls expensive. gcc seems to add an indirect call in ____sys_recvmsg(). Rewriting the code slightly makes sure to avoid this indirection. Alternative would be to not call sock_recvmsg() and instead use security_socket_recvmsg() and sock_recvmsg_nosec(), but this is less readable IMO. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: David Laight <David.Laight@aculab.com> Acked-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 03 Dec, 2019 2 commits
-
-
Jens Axboe authored
Just like commit f67676d1 for read/write requests, this one ensures that the sockaddr data has been copied for IORING_OP_CONNECT if we need to punt the request to async context. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Just like commit f67676d1 for read/write requests, this one ensures that the msghdr data is fully copied if we need to punt a recvmsg or sendmsg system call to async context. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 26 Nov, 2019 3 commits
-
-
Jens Axboe authored
Only io_uring uses (and added) these, and we want to disallow the use of sendmsg/recvmsg for anything but regular data transfers. Use the newly added prep helper to split the msghdr copy out from the core function, to check for msg_control and msg_controllen settings. If either is set, we return -EINVAL. Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is in preparation for enabling the io_uring helpers for sendmsg and recvmsg to first copy the header for validation before continuing with the operation. There should be no functional changes in this patch. Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is identical to __sys_connect(), except it takes a struct file instead of an fd, and it also allows passing in extra file->f_flags flags. The latter is done to support masking in O_NONBLOCK without manipulating the original file flags. No functional changes in this patch. Cc: netdev@vger.kernel.org Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 25 Nov, 2019 1 commit
-
-
Linus Torvalds authored
In commit 3975b097 ("convert stream-like files -> stream_open, even if they use noop_llseek") Kirill used a coccinelle script to change "nonseekable_open()" to "stream_open()", which changed the trivial cases of stream-like file descriptors to the new model with FMODE_STREAM. However, the two big cases - sockets and pipes - don't actually have that trivial pattern at all, and were thus never converted to FMODE_STREAM even though it makes lots of sense to do so. That's particularly true when looking forward to the next change: getting rid of FMODE_ATOMIC_POS entirely, and just using FMODE_STREAM to decide whether f_pos updates are needed or not. And if they are, we'll always do them atomically. This came up because KCSAN (correctly) noted that the non-locked f_pos updates are data races: they are clearly benign for the case where we don't care, but it would be good to just not have that issue exist at all. Note that the reason we used FMODE_ATOMIC_POS originally is that only doing it for the minimal required case is "safer" in that it's possible that the f_pos locking can cause unnecessary serialization across the whole write() call. And in the worst case, that kind of serialization can cause deadlock issues: think writers that need readers to empty the state using the same file descriptor. [ Note that the locking is per-file descriptor - because it protects "f_pos", which is obviously per-file descriptor - so it only affects cases where you literally use the same file descriptor to both read and write. So a regular pipe that has separate reading and writing file descriptors doesn't really have this situation even though it's the obvious case of "reader empties what a bit writer concurrently fills" But we want to make pipes as being stream-line anyway, because we don't want the unnecessary overhead of locking, and because a named pipe can be (ab-)used by reading and writing to the same file descriptor. ] There are likely a lot of other cases that might want FMODE_STREAM, and looking for ".llseek = no_llseek" users and other cases that don't have an lseek file operation at all and making them use "stream_open()" might be a good idea. But pipes and sockets are likely to be the two main cases. Cc: Kirill Smelkov <kirr@nexedi.com> Cc: Eic Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Marco Elver <elver@google.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Paul McKenney <paulmck@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Nov, 2019 2 commits
-
-
Arnd Bergmann authored
The 'timespec' type definition and helpers like ktime_to_timespec() or timespec64_to_timespec() should no longer be used in the kernel so we can remove them and avoid introducing y2038 issues in new code. Change the socket code that needs to pass a timespec to user space for backward compatibility to use __kernel_old_timespec instead. This type has the same layout but with a clearer defined name. Slightly reformat tcp_recv_timestamp() for consistency after the removal of timespec64_to_timespec(). Acked-by:
Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
Arnd Bergmann authored
The CONFIG_64BIT_TIME option is defined on all architectures, and can be removed for simplicity now. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-