Commits · 4b729a5def9f0cb267e76535dd7704859d4efe81 · Upstream / linux-stable

02 Dec, 2022 1 commit

io_uring: clear TIF_NOTIFY_SIGNAL if set and task_work not available · 4b729a5d

Jens Axboe authored 2 years ago

commit 7cfe7a09 upstream.

With how task_work is added and signaled, we can have TIF_NOTIFY_SIGNAL
set and no task_work pending as it got run in a previous loop. Treat
TIF_NOTIFY_SIGNAL like get_signal(), always clear it if set regardless
of whether or not task_work is pending to run.

Cc: stable@vger.kernel.org
Fixes: 46a525e1

 ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4b729a5d

26 Nov, 2022 1 commit

io_uring: fix multishot accept request leaks · 0e4626de

Pavel Begunkov authored 2 years ago

commit 91482864 upstream.

Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.

Cc: stable@vger.kernel.org
Fixes: 390ed29b

 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0e4626de

29 Oct, 2022 1 commit

io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL · 5a7d9406

Jens Axboe authored 2 years ago

[ Upstream commit 46a525e1

 ]

This isn't a reliable mechanism to tell if we have task_work pending, we
really should be looking at whether we have any items queued. This is
problematic if forward progress is gated on running said task_work. One
such example is reading from a pipe, where the write side has been closed
right before the read is started. The fput() of the file queues TWA_RESUME
task_work, and we need that task_work to be run before ->release() is
called for the pipe. If ->release() isn't called, then the read will sit
forever waiting on data that will never arise.

Fix this by io_run_task_work() so it checks if we have task_work pending
rather than rely on TIF_NOTIFY_SIGNAL for that. The latter obviously
doesn't work for task_work that is queued without TWA_SIGNAL.
Reported-by: Christiano Haesbaert <haesbaert@haesbaert.org>
Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/665

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

5a7d9406

21 Oct, 2022 1 commit

io_uring: fix CQE reordering · 8d8ffa39

Pavel Begunkov authored 2 years ago

[ Upstream commit aa1df3a3

 ]

Overflowing CQEs may result in reordering, which is buggy in case of
links, F_MORE and so on. If we guarantee that we don't reorder for
the unlikely event of a CQ ring overflow, then we can further extend
this to not have to terminate multishot requests if it happens. For
other operations, like zerocopy sends, we have no choice but to honor
CQE ordering.
Reported-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec3bc55687b0768bbe20fb62d7d06cfced7d7e70.1663892031.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

8d8ffa39

27 Jul, 2022 1 commit

io_uring: export req alloc from core · bd1a3783

Pavel Begunkov authored 2 years ago

We want to do request allocation out of the core io_uring code, make the
allocation functions public for other io_uring parts.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0314fedd3a02a514210ba42d4720332538c65956.1658913593.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

bd1a3783

25 Jul, 2022 35 commits

io_uring: flush notifiers after sendzc · 63809137

Pavel Begunkov authored 2 years ago

Allow to flush notifiers as a part of sendzc request by setting
IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will
flush the used [active] notifier.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

63809137

io_uring: add zc notification infrastructure · eb42cebb

Pavel Begunkov authored 2 years ago

Add internal part of send zerocopy notifications. There are two main
structures, the first one is struct io_notif, which carries inside
struct ubuf_info and maps 1:1 to it. io_uring will be binding a number
of zerocopy send requests to it and ask to complete (aka flush) it. When
flushed and all attached requests and skbs complete, it'll generate one
and only one CQE. There are intended to be passed into the network layer
as struct msghdr::msg_ubuf.

The second concept is notification slots. The userspace will be able to
register an array of slots and subsequently addressing them by the index
in the array. Slots are independent of each other. Each slot can have
only one notifier at a time (called active notifier) but many notifiers
during the lifetime. When active, a notifier not going to post any
completion but the userspace can attach requests to it by specifying
the corresponding slot while issueing send zc requests. Eventually, the
userspace will want to "flush" the notifier losing any way to attach
new requests to it, however it can use the next atomatically added
notifier of this slot or of any other slot.

When the network layer is done with all enqueued skbs attached to a
notifier and doesn't need the specified in them user data, the flushed
notifier will post a CQE.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

eb42cebb

io_uring: export io_put_task() · e70cb608

Pavel Begunkov authored 2 years ago

Make io_put_task() available to non-core parts of io_uring, we'll need
it for notification infrastructure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3686807d4c03b72e389947b0e8692d4d44334ef0.1657643355.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

e70cb608

io_uring: ensure REQ_F_ISREG is set async offload · f6b543fd

Jens Axboe authored 2 years ago

If we're offloading requests directly to io-wq because IOSQE_ASYNC was
set in the sqe, we can miss hashing writes appropriately because we
haven't set REQ_F_ISREG yet. This can cause a performance regression
with buffered writes, as io-wq then no longer correctly serializes writes
to that file.

Ensure that we set the flags in io_prep_async_work(), which will cause
the io-wq work item to be hashed appropriately.

Fixes: 584b0180 ("io_uring: move read/write file prep state into actual opcode handler")
Link: https://lore.kernel.org/io-uring/20220608080054.GB22428@xsang-OptiPlex-9020/

Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f6b543fd

io_uring: only trace one of complete or overflow · e0486f3f

Dylan Yudaken authored 2 years ago


In overflow we see a duplcate line in the trace, and in some cases 3
lines (if initial io_post_aux_cqe fails).
Instead just trace once for each CQE
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-13-dylany@fb.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

e0486f3f

io_uring: add allow_overflow to io_post_aux_cqe · 52120f0f

Dylan Yudaken authored 2 years ago


Some use cases of io_post_aux_cqe would not want to overflow as is, but
might want to change the flags/result. For example multishot receive
requires in order CQE, and so if there is an overflow it would need to
stop receiving until the overflow is taken care of.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

52120f0f

io_uring: add IOU_STOP_MULTISHOT return code · 114eccdf

Dylan Yudaken authored 2 years ago


For multishot we want a way to signal the caller that multishot has ended
but also this might not be an error return.

For example sockets return 0 when closed, which should end a multishot
recv, but still have a CQE with result 0

Introduce IOU_STOP_MULTISHOT which does this and indicates that the return
code is stored inside req->cqe
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-7-dylany@fb.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

114eccdf

io_uring: remove priority tw list optimisation · ed5ccb3b

Dylan Yudaken authored 3 years ago


This optimisation has some built in assumptions that make it easy to
introduce bugs. It also does not have clear wins that make it worth keeping.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-2-dylany@fb.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

ed5ccb3b

io_uring: move list helpers to a separate file · a6b21fbb

Pavel Begunkov authored 3 years ago

It's annoying to have io-wq.h as a dependency every time we want some of
struct io_wq_work_list helpers, move them into a separate file.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c1d891ce12b30767d1d2a3b7db2ca3abc1ecc4a2.1655802465.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

a6b21fbb

io_uring: improve io_run_task_work() · 625d38b3

Pavel Begunkov authored 3 years ago

Since SQPOLL now uses TWA_SIGNAL_NO_IPI, there won't be task work items
without TIF_NOTIFY_SIGNAL. Simplify io_run_task_work() by removing
task->task_works check. Even though looks it doesn't cause extra cache
bouncing, it's still nice to not touch it an extra time when it might be
not cached.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/75d4f34b0c671075892821a409e28da6cb1d64fe.1655802465.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

625d38b3

io_uring: consistent naming for inline completion · 9da070b1

Pavel Begunkov authored 3 years ago

Improve naming of the inline/deferred completion helper so it's
consistent with it's *_post counterpart. Add some comments and extra
lockdeps to ensure the locking is done right.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/797c619943dac06529e9d3fcb16e4c3cde6ad1a3.1655684496.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

9da070b1

io_uring: add io_commit_cqring_flush() · 46929b08

Pavel Begunkov authored 3 years ago


Since __io_commit_cqring_flush users moved to different files, introduce
io_commit_cqring_flush() helper and encapsulate all flags testing details
inside.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0da03887435dd9869ffe46dcd3962bf104afcca3.1655684496.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

46929b08

io_uring: introduce locking helpers for CQE posting · 25399321

Pavel Begunkov authored 3 years ago


spin_lock(&ctx->completion_lock);
/* post CQEs */
io_commit_cqring(ctx);
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);

We have many places repeating this sequence, and the three function
unlock section is not perfect from the maintainance perspective and also
makes it harder to add new locking/sync trick.

Introduce two helpers. io_cq_lock(), which is simple and only grabs
->completion_lock, and io_cq_unlock_post() encapsulating the three call
section.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fe0c682bf7f7b55d9be55b0d034be9c1949277dc.1655684496.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

25399321

io_uring: remove ->flush_cqes optimisation · d9dee430

Pavel Begunkov authored 3 years ago


It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
->flush_cqes flag prevents from completion being flushed. Sometimes it's
high level of concurrency that enables it at least for one CQE, but
sometimes it doesn't save much because nobody waiting on the CQ.

Remove ->flush_cqes flag and the optimisation, it should benefit the
normal use case. Note, that there is no spurious eventfd problem with
that as checks for spuriousness were incorporated into
io_eventfd_signal().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com


[axboe: remove now dead state->flush_cqes variable]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d9dee430

io_uring: reshuffle io_uring/io_uring.h · 9046c641

Pavel Begunkov authored 3 years ago


It's a good idea to first do forward declarations and then inline
helpers, otherwise there will be keep stumbling on dependencies
between them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1d7fa6672ed43f20ccc0c54ae201369ebc3ebfab.1655637157.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

9046c641

io_uring: make io_uring_types.h public · ab1c84d8

Pavel Begunkov authored 3 years ago

Move io_uring types to linux/include, need them public so tracing can
see the definitions and we can clean trace/events/io_uring.h
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a15f12e8cb7289b2de0deaddcc7518d98a132d17.1655384063.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

ab1c84d8

io_uring: change ->cqe_cached invariant for CQE32 · b3659a65

Pavel Begunkov authored 3 years ago

With IORING_SETUP_CQE32 ->cqe_cached doesn't store a real address but
rather an implicit offset into cqes. Store the real cqe pointer and
increment it accordingly if CQE32.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ee1838cba16bed96381a006950b36ba640d998c.1655455613.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

b3659a65

io_uring: deduplicate io_get_cqe() calls · e8c328c3

Pavel Begunkov authored 3 years ago

Deduplicate calls to io_get_cqe() from __io_fill_cqe_req().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4fa077986cc3abab7c59ff4e7c390c783885465f.1655455613.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

e8c328c3

io_uring: deduplicate __io_fill_cqe_req tracing · ae5735c6

Pavel Begunkov authored 3 years ago

Deduplicate two trace_io_uring_complete() calls in __io_fill_cqe_req().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/277ed85dba5189ab7d932164b314013a0f0b0fdc.1655455613.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

ae5735c6

io_uring: introduce io_req_cqe_overflow() · 68494a65

Pavel Begunkov authored 3 years ago


__io_fill_cqe_req() is hot and inlined, we want it to be as small as
possible. Add io_req_cqe_overflow() accepting only a request and doing
all overflow accounting, and replace with it two calls to 6 argument
io_cqring_event_overflow().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/048b9fbcce56814d77a1a540409c98c3d383edcb.1655455613.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

68494a65

io_uring: don't inline __io_get_cqe() · faf88dde

Pavel Begunkov authored 3 years ago


__io_get_cqe() is not as hot as io_get_cqe(), no need to inline it, it
sheds ~500B from the binary.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c1ac829198a881b7af8710926f99a3559b9f24c0.1655455613.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

faf88dde

io_uring: don't expose io_fill_cqe_aux() · d245bca6

Pavel Begunkov authored 3 years ago


Deduplicate some code and add a helper for filling an aux CQE, locking
and notification.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

d245bca6

io_uring: kill REQ_F_COMPLETE_INLINE · 75d7b3ae

Pavel Begunkov authored 3 years ago


REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.

As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com

Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

75d7b3ae

io_uring: move small helpers to headers · aa1e90f6

Pavel Begunkov authored 3 years ago

There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/22df99c83723e44cba7e945e8519e64e3642c064.1655310733.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

aa1e90f6

io_uring: move read/write related opcodes to its own file · f3b44f92
Jens Axboe authored 3 years ago
```
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
f3b44f92
io_uring: move remaining file table manipulation to filetable.c · c98817e6
Jens Axboe authored 3 years ago
```
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
c98817e6
io_uring: move rsrc related data, core, and commands · 73572984
Jens Axboe authored 3 years ago
```
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
73572984

io_uring: split provided buffers handling into its own file · 3b77495a

Jens Axboe authored 3 years ago


Move both the opcodes related to it, and the internals code dealing with
it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3b77495a

io_uring: move cancelation into its own file · 7aaff708

Jens Axboe authored 3 years ago


This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7aaff708

io_uring: move poll handling into its own file · 329061d3

Jens Axboe authored 3 years ago


Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

329061d3

io_uring: move io_uring_task (tctx) helpers into its own file · c9f06aa7
Jens Axboe authored 3 years ago
```
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
c9f06aa7
io_uring: move SQPOLL related handling into its own file · 17437f31
Jens Axboe authored 3 years ago
```
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
17437f31
io_uring: move timeout opcodes and handling into its own file · 59915143
Jens Axboe authored 3 years ago
```
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
59915143

io_uring: split network related opcodes into its own file · f9ead18c

Jens Axboe authored 3 years ago


While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f9ead18c

io_uring: move uring_cmd handling to its own file · 99f15d8d
Jens Axboe authored 3 years ago
```
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
99f15d8d