1. 07 Nov, 2016 1 commit
    • Paolo Abeni's avatar
      udp: do fwd memory scheduling on dequeue · 7c13f97f
      Paolo Abeni authored
      
      A new argument is added to __skb_recv_datagram to provide
      an explicit skb destructor, invoked under the receive queue
      lock.
      The UDP protocol uses such argument to perform memory
      reclaiming on dequeue, so that the UDP protocol does not
      set anymore skb->desctructor.
      Instead explicit memory reclaiming is performed at close() time and
      when skbs are removed from the receive queue.
      The in kernel UDP protocol users now need to call a
      skb_recv_udp() variant instead of skb_recv_datagram() to
      properly perform memory accounting on dequeue.
      
      Overall, this allows acquiring only once the receive queue
      lock on dequeue.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr sinks	vanilla		patched
      1		440		560
      3		2150		2300
      6		3650		3800
      9		4450		4600
      12		6250		6450
      
      v1 -> v2:
       - do rmem and allocated memory scheduling under the receive lock
       - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
       - avoid the typdef for the dequeue callback
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c13f97f
  2. 22 Oct, 2016 1 commit
    • Paolo Abeni's avatar
      udp: implement memory accounting helpers · f970bd9e
      Paolo Abeni authored
      
      Avoid using the generic helpers.
      Use the receive queue spin lock to protect the memory
      accounting operation, both on enqueue and on dequeue.
      
      On dequeue perform partial memory reclaiming, trying to
      leave a quantum of forward allocated memory.
      
      On enqueue use a custom helper, to allow some optimizations:
      - use a plain spin_lock() variant instead of the slightly
        costly spin_lock_irqsave(),
      - avoid dst_force check, since the calling code has already
        dropped the skb dst
      - avoid orphaning the skb, since skb_steal_sock() already did
        the work for us
      
      The above needs custom memory reclaiming on shutdown, provided
      by the udp_destruct_sock().
      
      v5 -> v6:
        - don't orphan the skb on enqueue
      
      v4 -> v5:
        - replace the mem_lock with the receive queue spin lock
        - ensure that the bh is always allowed to enqueue at least
          a skb, even if sk_rcvbuf is exceeded
      
      v3 -> v4:
        - reworked memory accunting, simplifying the schema
        - provide an helper for both memory scheduling and enqueuing
      
      v1 -> v2:
        - use a udp specific destrctor to perform memory reclaiming
        - remove a couple of helpers, unneeded after the above cleanup
        - do not reclaim memory on dequeue if not under memory
          pressure
        - reworked the fwd accounting schema to avoid potential
          integer overflow
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f970bd9e
  3. 20 Oct, 2016 1 commit
    • Eric Dumazet's avatar
      udp: must lock the socket in udp_disconnect() · 286c72de
      Eric Dumazet authored
      
      Baozeng Ding reported KASAN traces showing uses after free in
      udp_lib_get_port() and other related UDP functions.
      
      A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.
      
      I could write a reproducer with two threads doing :
      
      static int sock_fd;
      static void *thr1(void *arg)
      {
      	for (;;) {
      		connect(sock_fd, (const struct sockaddr *)arg,
      			sizeof(struct sockaddr_in));
      	}
      }
      
      static void *thr2(void *arg)
      {
      	struct sockaddr_in unspec;
      
      	for (;;) {
      		memset(&unspec, 0, sizeof(unspec));
      	        connect(sock_fd, (const struct sockaddr *)&unspec,
      			sizeof(unspec));
              }
      }
      
      Problem is that udp_disconnect() could run without holding socket lock,
      and this was causing list corruptions.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      286c72de
  4. 24 Aug, 2016 1 commit
  5. 02 Jun, 2016 1 commit
    • Eric Dumazet's avatar
      udp: avoid csum_partial() for validated skb · 595d0b29
      Eric Dumazet authored
      In commit e6afc8ac
      
       ("udp: remove headers from UDP packets before
      queueing"), udp_csum_pull_header() helper was added but missed fact
      that CHECKSUM_UNNECESSARY packets were now converted to CHECKSUM_NONE
      and skb->csum_valid was set to 1 for them.
      
      Since csum_partial() is quite expensive, even for 8-byte area, it is
      worth adding a test.
      
      We also can use skb->data instead of udp_hdr() as we are pulling
      UDP headers, as it is sightly faster.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      595d0b29
  6. 28 Apr, 2016 3 commits
  7. 07 Apr, 2016 2 commits
    • Tom Herbert's avatar
      udp: Add GRO functions to UDP socket · a6024562
      Tom Herbert authored
      
      This patch adds GRO functions (gro_receive and gro_complete) to UDP
      sockets. udp_gro_receive is changed to perform socket lookup on a
      packet. If a socket is found the related GRO functions are called.
      
      This features obsoletes using UDP offload infrastructure for GRO
      (udp_offload). This has the advantage of not being limited to provide
      offload on a per port basis, GRO is now applied to whatever individual
      UDP sockets are bound to.  This also allows the possbility of
      "application defined GRO"-- that is we can attach something like
      a BPF program to a UDP socket to perfrom GRO on an application
      layer protocol.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6024562
    • Tom Herbert's avatar
      udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb · 63058308
      Tom Herbert authored
      
      Add externally visible functions to lookup a UDP socket by skb. This
      will be used for GRO in UDP sockets. These functions also check
      if skb->dst is set, and if it is not skb->dev is used to get dev_net.
      This allows calling lookup functions before dst has been set on the
      skbuff.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63058308
  8. 05 Apr, 2016 2 commits
    • samanthakumar's avatar
      udp: remove headers from UDP packets before queueing · e6afc8ac
      samanthakumar authored
      
      Remove UDP transport headers before queueing packets for reception.
      This change simplifies a follow-up patch to add MSG_PEEK support.
      Signed-off-by: default avatarSam Kumar <samanthakumar@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6afc8ac
    • Eric Dumazet's avatar
      udp: no longer use SLAB_DESTROY_BY_RCU · ca065d0c
      Eric Dumazet authored
      
      Tom Herbert would like not touching UDP socket refcnt for encapsulated
      traffic. For this to happen, we need to use normal RCU rules, with a grace
      period before freeing a socket. UDP sockets are not short lived in the
      high usage case, so the added cost of call_rcu() should not be a concern.
      
      This actually removes a lot of complexity in UDP stack.
      
      Multicast receives no longer need to hold a bucket spinlock.
      
      Note that ip early demux still needs to take a reference on the socket.
      
      Same remark for functions used by xt_socket and xt_PROXY netfilter modules,
      but this might be changed later.
      
      Performance for a single UDP socket receiving flood traffic from
      many RX queues/cpus.
      
      Simple udp_rx using simple recvfrom() loop :
      438 kpps instead of 374 kpps : 17 % increase of the peak rate.
      
      v2: Addressed Willem de Bruijn feedback in multicast handling
       - keep early demux break in __udp4_lib_demux_lookup()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Tested-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca065d0c
  9. 11 Feb, 2016 1 commit
  10. 05 Jan, 2016 2 commits
    • Craig Gallek's avatar
      soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1
      Craig Gallek authored
      
      Expose socket options for setting a classic or extended BPF program
      for use when selecting sockets in an SO_REUSEPORT group.  These options
      can be used on the first socket to belong to a group before bind or
      on any socket in the group after bind.
      
      This change includes refactoring of the existing sk_filter code to
      allow reuse of the existing BPF filter validation checks.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      538950a1
    • Craig Gallek's avatar
      soreuseport: fast reuseport UDP socket selection · e32ea7e7
      Craig Gallek authored
      
      Include a struct sock_reuseport instance when a UDP socket binds to
      a specific address for the first time with the reuseport flag set.
      When selecting a socket for an incoming UDP packet, use the information
      available in sock_reuseport if present.
      
      This required adding an additional field to the UDP source address
      equality function to differentiate between exact and wildcard matches.
      The original use case allowed wildcard matches when checking for
      existing port uses during bind.  The new use case of adding a socket
      to a reuseport group requires exact address matching.
      
      Performance test (using a machine with 2 CPU sockets and a total of
      48 cores):  Create reuseport groups of varying size.  Use one socket
      from this group per user thread (pinning each thread to a different
      core) calling recvmmsg in a tight loop.  Record number of messages
      received per second while saturating a 10G link.
        10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
        20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
        40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)
      
      This work is based off a similar implementation written by
      Ying Cai <ycai@google.com> for implementing policy-based reuseport
      selection.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e32ea7e7
  11. 02 Mar, 2015 1 commit
  12. 27 Feb, 2015 1 commit
  13. 02 Oct, 2014 1 commit
    • Tom Herbert's avatar
      udp: Generalize skb_udp_segment · 8bce6d7d
      Tom Herbert authored
      
      skb_udp_segment is the function called from udp4_ufo_fragment to
      segment a UDP tunnel packet. This function currently assumes
      segmentation is transparent Ethernet bridging (i.e. VXLAN
      encapsulation). This patch generalizes the function to
      operate on either Ethertype or IP protocol.
      
      The inner_protocol field must be set to the protocol of the inner
      header. This can now be either an Ethertype or an IP protocol
      (in a union). A new flag in the skbuff indicates which type is
      effective. skb_set_inner_protocol and skb_set_inner_ipproto
      helper functions were added to set the inner_protocol. These
      functions are called from the point where the tunnel encapsulation
      is occuring.
      
      When skb_udp_tunnel_segment is called, the function to segment the
      inner packet is selected based on the inner IP or Ethertype. In the
      case of an IP protocol encapsulation, the function is derived from
      inet[6]_offloads. In the case of Ethertype, skb->protocol is
      set to the inner_protocol and skb_mac_gso_segment is called. (GRE
      currently does this, but it might be possible to lookup the protocol
      in offload_base and call the appropriate segmenation function
      directly).
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bce6d7d
  14. 25 Aug, 2014 1 commit
  15. 08 Jul, 2014 1 commit
  16. 15 Jun, 2014 1 commit
  17. 05 Jun, 2014 1 commit
  18. 23 May, 2014 1 commit
  19. 07 Nov, 2013 1 commit
  20. 08 Oct, 2013 1 commit
    • Shawn Bohrer's avatar
      udp: ipv4: Add udp early demux · 421b3885
      Shawn Bohrer authored
      
      The removal of the routing cache introduced a performance regression for
      some UDP workloads since a dst lookup must be done for each packet.
      This change caches the dst per socket in a similar manner to what we do
      for TCP by implementing early_demux.
      
      For UDP multicast we can only cache the dst if there is only one
      receiving socket on the host.  Since caching only works when there is
      one receiving socket we do the multicast socket lookup using RCU.
      
      For UDP unicast we only demux sockets with an exact match in order to
      not break forwarding setups.  Additionally since the hash chains may be
      long we only check the first socket to see if it is a match and not
      waste extra time searching the whole chain when we might not find an
      exact match.
      
      Benchmark results from a netperf UDP_RR test:
      Before 87961.22 transactions/s
      After  89789.68 transactions/s
      
      Benchmark results from a fio 1 byte UDP multicast pingpong test
      (Multicast one way unicast response):
      Before 12.97us RTT
      After  12.63us RTT
      Signed-off-by: default avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      421b3885
  21. 23 Sep, 2013 1 commit
  22. 28 Jul, 2013 1 commit
  23. 02 Jul, 2013 1 commit
    • Hannes Frederic Sowa's avatar
      ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data · 8822b64a
      Hannes Frederic Sowa authored
      
      We accidentally call down to ip6_push_pending_frames when uncorking
      pending AF_INET data on a ipv6 socket. This results in the following
      splat (from Dave Jones):
      
      skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:126!
      invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
      +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
      CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
      task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
      RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
      RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
      RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
      RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
      RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
      R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
      FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Stack:
       ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
       ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
       ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
      Call Trace:
       [<ffffffff8159a9aa>] skb_push+0x3a/0x40
       [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
       [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
       [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
       [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
       [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
       [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
       [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
       [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
       [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
       [<ffffffff816f5d54>] tracesys+0xdd/0xe2
      Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
      RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
       RSP <ffff8801e6431de8>
      
      This patch adds a check if the pending data is of address family AF_INET
      and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
      if that is the case.
      
      This bug was found by Dave Jones with trinity.
      
      (Also move the initialization of fl6 below the AF_INET check, even if
      not strictly necessary.)
      
      Cc: Dave Jones <davej@redhat.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8822b64a
  24. 12 Jun, 2013 1 commit
  25. 29 Apr, 2012 1 commit
  26. 15 Apr, 2012 1 commit
  27. 13 Apr, 2012 1 commit
    • Eric Dumazet's avatar
      udp: intoduce udp_encap_needed static_key · 447167bf
      Eric Dumazet authored
      
      Most machines dont use UDP encapsulation (L2TP)
      
      Adds a static_key so that udp_queue_rcv_skb() doesnt have to perform a
      test if L2TP never setup the encap_rcv on a socket.
      
      Idea of this patch came after Simon Horman proposal to add a hook on TCP
      as well.
      
      If static_key is not yet enabled, the fast path does a single JMP .
      
      When static_key is enabled, JMP destination is patched to reach the real
      encap_type/encap_rcv logic, possibly adding cache misses.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: dev@openvswitch.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      447167bf
  28. 04 Mar, 2012 1 commit
    • Paul Gortmaker's avatar
      BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882
      Paul Gortmaker authored
      
      If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
      other BUG variant in a static inline (i.e. not in a #define) then
      that header really should be including <linux/bug.h> and not just
      expecting it to be implicitly present.
      
      We can make this change risk-free, since if the files using these
      headers didn't have exposure to linux/bug.h already, they would have
      been causing compile failures/warnings.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      187f1882
  29. 11 Dec, 2011 1 commit
  30. 09 Dec, 2011 1 commit
  31. 16 Nov, 2011 1 commit
  32. 01 Nov, 2011 1 commit
  33. 01 Mar, 2011 1 commit
  34. 24 Jan, 2011 1 commit
  35. 10 Nov, 2010 1 commit