[RFC] net: add experimental UDP encapsulation PMD

Message ID 20221011001016.173447-1-stephen@networkplumber.org (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] net: add experimental UDP encapsulation PMD |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail Compilation issues
ci/intel-Testing success Testing PASS

Commit Message

Stephen Hemminger Oct. 11, 2022, 12:10 a.m. UTC
  This is a new PMD which can be useful to test a DPDK application
from another test program. The PMD binds to a connected UDP socket
and expects to receive and send raw Ethernet packets over that
socket.

This is especially useful for testing envirionments where you
can't/don't want to give the test driver program route permission.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
This is 1st draft port of some test infrastructure to get
feedback and comments from community.

Later version will include an example and unit tests.

 doc/guides/nics/features/udp.ini |  10 +
 doc/guides/nics/udp.rst          |  30 ++
 drivers/net/meson.build          |   1 +
 drivers/net/udp/meson.build      |  11 +
 drivers/net/udp/rte_eth_udp.c    | 728 +++++++++++++++++++++++++++++++
 drivers/net/udp/version.map      |   3 +
 6 files changed, 783 insertions(+)
 create mode 100644 doc/guides/nics/features/udp.ini
 create mode 100644 doc/guides/nics/udp.rst
 create mode 100644 drivers/net/udp/meson.build
 create mode 100644 drivers/net/udp/rte_eth_udp.c
 create mode 100644 drivers/net/udp/version.map
  

Comments

Morten Brørup Oct. 11, 2022, 6:47 a.m. UTC | #1
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 11 October 2022 02.10
> 
> This is a new PMD which can be useful to test a DPDK application
> from another test program. The PMD binds to a connected UDP socket
> and expects to receive and send raw Ethernet packets over that
> socket.
> 
> This is especially useful for testing envirionments where you
> can't/don't want to give the test driver program route permission.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---

Good idea.

Multiple queues are supported, but how does the remote application steer traffic into specific queues (for PMD RX), or identify which queue the packet was supposed to egress on (for PMD TX)?

You could use a range of UDP port numbers for that, so the second queue uses the UDP port number following the configured port number, etc..

Or you could open for feature creep. Here are some thoughts.

Add a metadata header in front of each packet - this might also allow more advanced use in the future, e.g. the remote application could set mbuf hash fields.

Consider if this PMD somehow can be integrated with the TUN/TAP PMD or something similar, and through that existing PMD support more advanced NIC features towards the DPDK application, such as VLAN stripping, GRO, etc..
  
Stephen Hemminger Oct. 11, 2022, 2:06 p.m. UTC | #2
On Tue, 11 Oct 2022 08:47:30 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Tuesday, 11 October 2022 02.10
> > 
> > This is a new PMD which can be useful to test a DPDK application
> > from another test program. The PMD binds to a connected UDP socket
> > and expects to receive and send raw Ethernet packets over that
> > socket.
> > 
> > This is especially useful for testing envirionments where you
> > can't/don't want to give the test driver program route permission.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---  
> 
> Good idea.
> 
> Multiple queues are supported, but how does the remote application steer traffic into specific queues (for PMD RX), or identify which queue the packet was supposed to egress on (for PMD TX)?

For Tx it relies on the fact that a UDP socket is idempotent so multiple Tx queues
just share a single file descriptor.

On Rx, there is no steering, it just has multiple threads reading on the same socket.
For testing this simulates multiple receivers being active.

> 
> You could use a range of UDP port numbers for that, so the second queue uses the UDP port number following the configured port number, etc..
> 
> Or you could open for feature creep. Here are some thoughts.
> 
> Add a metadata header in front of each packet - this might also allow more advanced use in the future, e.g. the remote application could set mbuf hash fields.
> 
> Consider if this PMD somehow can be integrated with the TUN/TAP PMD or something similar, and through that existing PMD support more advanced NIC features towards the DPDK application, such as VLAN stripping, GRO, etc..

The other alternative is making a VXLAN driver, which is on my TODO list.
  
Ferruh Yigit Oct. 11, 2022, 4:18 p.m. UTC | #3
On 10/11/2022 1:10 AM, Stephen Hemminger wrote:
> This is a new PMD which can be useful to test a DPDK application
> from another test program. The PMD binds to a connected UDP socket
> and expects to receive and send raw Ethernet packets over that
> socket.
> 

Why is the 'connected UDP socket' requirement? I guess this maps with 
'SO_REUSEPORT' in the code.

> This is especially useful for testing envirionments where you
> can't/don't want to give the test driver program route permission.
> 
> Signed-off-by: Stephen Hemminger<stephen@networkplumber.org>
> ---
> This is 1st draft port of some test infrastructure to get
> feedback and comments from community.
> 
> Later version will include an example and unit tests.

I think pcap PMD can be used for similar testing purposed, but we can 
have UDP PMD too, it may open other possibilities as a method to 
communicate with other non-dpdk applications.

Although this is draft I will put some comments in other thread for next 
version.
  
Ferruh Yigit Oct. 11, 2022, 4:48 p.m. UTC | #4
On 10/11/2022 1:10 AM, Stephen Hemminger wrote:
> This is a new PMD which can be useful to test a DPDK application
> from another test program. The PMD binds to a connected UDP socket
> and expects to receive and send raw Ethernet packets over that
> socket.
> 
> This is especially useful for testing envirionments where you

s/envirionments/environments/

> can't/don't want to give the test driver program route permission.
> 

root permission?

> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> This is 1st draft port of some test infrastructure to get
> feedback and comments from community.
> 
> Later version will include an example and unit tests.
> 
>   doc/guides/nics/features/udp.ini |  10 +
>   doc/guides/nics/udp.rst          |  30 ++

need to update 'doc/guides/nics/index.rst' and add new file.

>   drivers/net/meson.build          |   1 +
>   drivers/net/udp/meson.build      |  11 +
>   drivers/net/udp/rte_eth_udp.c    | 728 +++++++++++++++++++++++++++++++
>   drivers/net/udp/version.map      |   3 +
>   6 files changed, 783 insertions(+)
>   create mode 100644 doc/guides/nics/features/udp.ini
>   create mode 100644 doc/guides/nics/udp.rst
>   create mode 100644 drivers/net/udp/meson.build
>   create mode 100644 drivers/net/udp/rte_eth_udp.c
>   create mode 100644 drivers/net/udp/version.map
> 
> diff --git a/doc/guides/nics/features/udp.ini b/doc/guides/nics/features/udp.ini
> new file mode 100644
> index 000000000000..dfc39204dacf
> --- /dev/null
> +++ b/doc/guides/nics/features/udp.ini
> @@ -0,0 +1,10 @@
> +;
> +; Supported features of the 'udp' network poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Basic stats          = Y
> +Stats per queue      = Y
> +Multiprocess aware   = Y
> +Scattered Rx         = P
> diff --git a/doc/guides/nics/udp.rst b/doc/guides/nics/udp.rst
> new file mode 100644
> index 000000000000..7b86f5e273e9
> --- /dev/null
> +++ b/doc/guides/nics/udp.rst
> @@ -0,0 +1,30 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2018 Intel Corporation.
> +

Is copyright owner Intel?

> +UDP Poll Mode Driver
> +====================
> +
> +UDP Poll Mode Driver (PMD) is a basic virtual driver useful for testing.
> +It provides a simple bare encapsulation of Ethernet frames in a UDP
> +socket.  This is useful since the test driver application can
> +easily open a local UDP socket and interact with a DPDK application.
> +This can even be done inside a VM or container in automated test setup.
> +
> +
> +Driver Configuration
> +--------------------
> +
> +The driver is a virtual device configured with the --vdev option.
> +The device name must start with the net_udp prefix follwed by numbers

followed

> +or letters The name is unique for each device. Each device can have

or letters. The ...

> +multiple stream options and multiple devices can be used.
> +Multiple device definitions can be arranged using multiple --vdev.
> +
> +Both local and remote address must be specified. Both IPv4 and IPv6
> +are supported examples:
> +
> +.. code-block:: console
> +
> +   ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 \
> +       --vdev 'net_udp0,local=127.0.0.1:9000,remote=192.0.2.1:9000',
> +       --vdev 'net_udp1,local=[:0],9000,remote=[2001:DB8::1]:9000'

maybe good to document ipv6 should start with '['. (this is according to 
the below code)

> diff --git a/drivers/net/meson.build b/drivers/net/meson.build
> index 35bfa78dee66..36f2d9ed9b96 100644
> --- a/drivers/net/meson.build
> +++ b/drivers/net/meson.build
> @@ -56,6 +56,7 @@ drivers = [
>           'tap',
>           'thunderx',
>           'txgbe',
> +        'udp',
>           'vdev_netvsc',
>           'vhost',
>           'virtio',
> diff --git a/drivers/net/udp/meson.build b/drivers/net/udp/meson.build
> new file mode 100644
> index 000000000000..e7bfd843f4b2
> --- /dev/null
> +++ b/drivers/net/udp/meson.build
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +
> +if is_windows
> +    build = false
> +    reason = 'not supported on Windows'
> +    subdir_done()
> +endif
> +
> +sources = files('rte_eth_udp.c')
> +
> +pmd_supports_disable_iova_as_pa = true
> diff --git a/drivers/net/udp/rte_eth_udp.c b/drivers/net/udp/rte_eth_udp.c
> new file mode 100644
> index 000000000000..8ce65721b3ec
> --- /dev/null
> +++ b/drivers/net/udp/rte_eth_udp.c
> @@ -0,0 +1,728 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2022 Microsoft Corp.
> + * All rights reserved.
> + */
> +
> +#include <assert.h>
> +#include <errno.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <sys/socket.h>
> +#include <netinet/in.h>
> +#include <arpa/inet.h>
> +#include <sys/uio.h>
> +#include <unistd.h>
> +#include <netdb.h>
> +#include <string.h>
> +
> +#include <rte_bus_vdev.h>
> +#include <rte_common.h>
> +#include <rte_ethdev.h>
> +#include <ethdev_vdev.h>
> +#include <ethdev_driver.h>
> +#include <bus_vdev_driver.h>
> +#include <rte_ether.h>
> +#include <rte_kvargs.h>
> +#include <rte_log.h>
> +#include <rte_mbuf.h>
> +
> +/* Which strings are valid kvargs for this driver */
> +#define ETH_UDP_LOCAL_ARG	"local"
> +#define ETH_UDP_REMOTE_ARG	"remote"
> +
> +static int eth_udp_logtype;
> +#define PMD_LOG(level, fmt, args...) \
> +	rte_log(RTE_LOG_ ## level, eth_udp_logtype, \
> +		"%s(): " fmt "\n", __func__, ##args)
> +
> +struct pmd_internals;
> +
> +struct udp_queue {
> +	struct rte_mempool *mb_pool;
> +	struct pmd_internals *internals;
> +	uint16_t queue_id;
> +	int sock_fd;
> +
> +	uint64_t pkts;
> +	uint64_t bytes;
> +	uint64_t nobufs;
> +};
> +
> +struct pmd_internals {
> +	uint16_t port_id;
> +	int sock_fd;
> +
> +	struct sockaddr_storage remote;
> +	struct sockaddr_storage local;
> +	struct rte_ether_addr eth_addr;
> +
> +	struct udp_queue rx_queues[RTE_MAX_QUEUES_PER_PORT];
> +	struct udp_queue tx_queues[RTE_MAX_QUEUES_PER_PORT];
> +};
> +
> +static struct rte_eth_link pmd_link = {
> +	.link_speed = RTE_ETH_SPEED_NUM_10G,
> +	.link_duplex = RTE_ETH_LINK_FULL_DUPLEX,
> +	.link_status = RTE_ETH_LINK_DOWN,
> +	.link_autoneg = RTE_ETH_LINK_FIXED,
> +};
> +
> +static int
> +parse_ipv6_address(const char *value, struct sockaddr_in6 *sin6)
> +{
> +	char *str = strdupa(value);
> +	char *endp;
> +
> +	++str;	 /* skip leading '[' */
> +	endp = strchr(str, ']');
> +	if (endp == NULL) {
> +		PMD_LOG(ERR, "missing closing ]");
> +		return -EINVAL;
> +	}
> +
> +	*endp++ = '\0';
> +	sin6->sin6_family = AF_INET6;
> +
> +	if (inet_pton(AF_INET6, ++str, &sin6->sin6_addr) != 1) {
> +		PMD_LOG(ERR, "invalid ipv6 address '%s'", str);
> +		return -EINVAL;
> +	}
> +
> +	/* Handle [ff80::1]:999 as address and port */
> +	if (*endp == ':') {
> +		sin6->sin6_port = htons(strtoul(endp + 1, NULL, 0));
> +	} else if (*endp != '\0') {
> +		PMD_LOG(ERR, "incorrect ipv6 port syntax");
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static int
> +parse_ipv4_address(const char *value, struct sockaddr_in *sin)
> +{
> +	char *str = strdupa(value);
> +	char *endp;
> +
> +	endp = strchr(str, ':');
> +	if (endp)
> +		*endp++ = '\0';
> +
> +	memset(sin, 0, sizeof(*sin));
> +	sin->sin_family = AF_INET;
> +
> +	if (inet_pton(AF_INET, str, &sin->sin_addr) != 1) {
> +		PMD_LOG(ERR, "invalid ipv4 address '%s'", str);
> +		return -EINVAL;
> +	}
> +
> +	if (endp != NULL)
> +		sin->sin_port = htons(strtoul(endp, NULL, 0));
> +
> +	return 0;
> +}
> +
> +/* Addresses are given on Kvargs as:

kvargs

> + *   127.0.0.1:9000
> + *   [::1]:9000
> + */
> +static int
> +get_address_arg(const char *key, const char *value, void *sarg)
> +{
> +	if (value == NULL)
> +		return -EINVAL;
> +
> +	PMD_LOG(DEBUG, "%s='%s'", key, value);
> +
> +	if (*value == '[')
> +		return parse_ipv6_address(value, sarg);
> +	else
> +		return parse_ipv4_address(value, sarg);
> +}
> +
> +/* Helper function to determine how many mbufs are needed per packet  */
> +static uint16_t
> +eth_mbuf_per_pkt(uint16_t port_id,
> +		 struct rte_mempool *mb_pool)
> +{
> +	const struct rte_eth_dev *dev = &rte_eth_devices[port_id];

Not good to access to global device array, 'rte_eth_devices[]'.

Instead, what do you think to have 'eth_dev->data' in "struct udp_queue" 
and pass it to this function?
It is an option to have 'eth_dev' reference in "struct udp_queue" but 
that reference differs for primary and secondary process, that is why 
'eth_dev->data' is safer.

> +	uint16_t buf_size = rte_pktmbuf_data_room_size(mb_pool);
> +
> +	return (dev->data->mtu + buf_size - 1) / buf_size;
> +}
> +
> +
> +/*
> + * Receive packets from socket into mbufs.
> + *
> + * In order to handle multiple packets at a time and scattered receive
> + * this allocates the worst case number of buffers.
> + *
> + * If out of memory, or socket gives error returns 0.
> + */
> +static uint16_t
> +eth_udp_rx(void *queue, struct rte_mbuf **pkts, uint16_t nb_pkts)
> +{
> +	struct udp_queue *udp_q = queue;
> +	uint16_t port_id = udp_q->internals->port_id;
> +	struct rte_mempool *mpool = udp_q->mb_pool;
> +	unsigned int segs_per_pkt = eth_mbuf_per_pkt(port_id, mpool);
> +	unsigned int num_segs = nb_pkts * segs_per_pkt;
> +	struct rte_mbuf *bufs[num_segs];
> +	struct iovec iovecs[num_segs];
> +	struct mmsghdr msgs[nb_pkts];
> +	unsigned int seg_idx = 0, nb_iovs = 0;
> +	uint64_t num_rx_bytes = 0;
> +	int ret;
> +
> +	/* Allocate worst case number of buffers to be used. */
> +	if (rte_pktmbuf_alloc_bulk(mpool, bufs, num_segs) != 0) {
> +		PMD_LOG(ERR, "alloc mbuf failed");
> +		++udp_q->nobufs;
> +		return 0;
> +	}
> +
> +	/* Initialize the multi-packet headers and link the mbufs per packet */
> +	memset(msgs, 0, sizeof(msgs));
> +	for (uint16_t i = 0; i < nb_pkts; i++) {
> +		msgs[i].msg_hdr.msg_iov    = &iovecs[nb_iovs];
> +		msgs[i].msg_hdr.msg_iovlen = segs_per_pkt;
> +
> +		for (unsigned int n = 0; n < segs_per_pkt; n++, nb_iovs++) {
> +			struct rte_mbuf *mb = bufs[nb_iovs];
> +
> +			iovecs[nb_iovs].iov_base = rte_pktmbuf_mtod(mb, void *);
> +			iovecs[nb_iovs].iov_len = rte_pktmbuf_tailroom(mb);
> +		}
> +	}
> +	assert(nb_iovs == num_segs);
> +
> +	ret = recvmmsg(udp_q->sock_fd, msgs, nb_pkts, 0, NULL);
> +	if (ret < 0) {
> +		if (!(errno == EWOULDBLOCK || errno == EINTR))
> +			PMD_LOG(ERR, "recv failed: %s", strerror(errno));
> +
> +		rte_pktmbuf_free_bulk(bufs, num_segs);
> +		return 0;
> +	}
> +	PMD_LOG(DEBUG, "recvmmsg returned %d", ret);
> +

It can be better to use 'RTE_LOG_DP' in datapath.

> +	/* Adjust mbuf length and segments based on result. */
> +	for (int i = 0; i < ret; i++) {
> +		struct rte_mbuf **top = &pkts[i];
> +		struct rte_mbuf *m0, *mb;
> +		unsigned int unfilled;
> +		size_t len;
> +
> +		/* Number of bytes in this packet */
> +		len = msgs[i].msg_len;
> +		num_rx_bytes += len;
> +
> +		m0 = mb = bufs[seg_idx];
> +		m0->pkt_len = len;
> +		m0->port = port_id;
> +		m0->nb_segs = 0;
> +
> +		while (len > 0) {
> +			mb->data_len  = RTE_MIN(len,
> +						rte_pktmbuf_tailroom(mb));
> +			len -= mb->data_len;
> +			*top = mb;
> +			top = &mb->next;
> +
> +			++m0->nb_segs;
> +			mb = bufs[++seg_idx];
> +
> +		}
> +		*top = NULL;
> +
> +		/* Drop rest of chain */
> +		unfilled = segs_per_pkt - m0->nb_segs;
> +		if (unfilled > 0) {
> +			rte_pktmbuf_free_bulk(bufs + seg_idx, unfilled);
> +			seg_idx += unfilled;
> +		}
> +	}
> +
> +	udp_q->pkts += ret;
> +	udp_q->bytes += num_rx_bytes;
> +
> +	/* Free any unused buffers */
> +	if (seg_idx < num_segs)
> +		rte_pktmbuf_free_bulk(bufs + seg_idx, num_segs - seg_idx);
> +
> +	return ret;
> +}
> +
> +/*
> + * Send mbufs over UDP socket.
> + */
> +static uint16_t
> +eth_udp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct udp_queue *udp_q = queue;
> +	struct iovec iovecs[nb_bufs * RTE_MBUF_MAX_NB_SEGS];
> +	struct mmsghdr msgs[nb_bufs];
> +	unsigned int iov_iter;
> +	int ret;
> +
> +	memset(msgs, 0, sizeof(msgs));
> +	iov_iter = 0;
> +	for (uint16_t i = 0; i < nb_bufs; i++) {
> +		struct rte_mbuf *mb = bufs[i];
> +		unsigned int nsegs = mb->nb_segs;
> +
> +		msgs[i].msg_hdr.msg_iov    = &iovecs[iov_iter];
> +		msgs[i].msg_hdr.msg_iovlen = nsegs;
> +
> +		for (unsigned int n = 0; n < nsegs; n++) {
> +			iovecs[iov_iter].iov_base = rte_pktmbuf_mtod(mb, void *);
> +			iovecs[iov_iter].iov_len = rte_pktmbuf_tailroom(mb);
> +			iov_iter++;
> +			mb = mb->next;
> +		}
> +		assert(mb == NULL);
> +	}
> +
> +	ret = sendmmsg(udp_q->sock_fd, msgs, nb_bufs, 0);
> +	if (ret < 0) {
> +		if (!(errno == EWOULDBLOCK || errno == EINTR))
> +			PMD_LOG(ERR, "sendmmsg failed: %s", strerror(errno));
> +		ret = 0;
> +	} else {
> +		uint64_t num_tx_bytes = 0;
> +
> +		for (int i = 0; i < ret; i++)
> +			num_tx_bytes += msgs[i].msg_len;
> +
> +		udp_q->pkts += ret;
> +		udp_q->bytes += num_tx_bytes;
> +	}
> +
> +	if (ret < nb_bufs)
> +		rte_pktmbuf_free_bulk(bufs + ret, nb_bufs - ret);
> +
> +	return ret;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +	dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
> +	return 0;
> +}
> +
> +static int
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internals *internal = dev->data->dev_private;
> +	unsigned int i;
> +
> +
> +	for (i = 0; i < dev->data->nb_tx_queues; i++)
> +		internal->tx_queues[i].sock_fd = -1;
> +
> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +		struct udp_queue *rxq = &internal->rx_queues[i];
> +
> +		close(rxq->sock_fd);

User can do stop/start. If socket closed here, will it work with next start?

Should the socket be closed in 'close()' dev_ops?

> +		rxq->sock_fd = -1;
> +	}
> +
> +	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
> +	return 0;
> +}
> +
> +static int create_socket(struct pmd_internals *internals)
> +{
> +	socklen_t addrlen;
> +	int family, sock_fd, on = 1;
> +
> +	family = internals->local.ss_family;
> +	sock_fd = socket(family, SOCK_DGRAM | SOCK_NONBLOCK | SOCK_CLOEXEC, 0);
> +	if (sock_fd < 0) {
> +		PMD_LOG(ERR, "socket(): failed %s", strerror(errno));
> +		return -1;
> +	}
> +
> +	if (setsockopt(sock_fd, SOL_SOCKET, SO_REUSEPORT, &on, sizeof(on)) < 0) {
> +		PMD_LOG(ERR, "setsockopt(SO_REUSEPORT): failed %s", strerror(errno));
> +		goto fail;
> +	}
> +
> +	if (family == AF_INET6)
> +		addrlen = sizeof(struct sockaddr_in6);
> +	else
> +		addrlen = sizeof(struct sockaddr_in);
> +
> +
> +	/* if address family is not set, then local address not specified */
> +	if (bind(sock_fd, (struct sockaddr *)&internals->local, addrlen) < 0) {
> +		PMD_LOG(ERR, "bind: failed %s", strerror(errno));
> +		goto fail;
> +	}
> +
> +	if (connect(sock_fd, (struct sockaddr *)&internals->remote, addrlen) < 0) {
> +		PMD_LOG(ERR, "connect: failed %s", strerror(errno));
> +		goto fail;
> +	}
> +
> +	/* Get actual local family to reuse same address */
> +	addrlen = sizeof(internals->local);
> +	if (getsockname(sock_fd, (struct sockaddr *)&internals->local, &addrlen) < 0) {
> +		PMD_LOG(ERR, "getsockname failed %s", strerror(errno));
> +		goto fail;
> +	}
> +
> +	addrlen = sizeof(internals->remote);
> +	if (getpeername(sock_fd, (struct sockaddr *)&internals->remote, &addrlen) < 0) {
> +		PMD_LOG(ERR, "getsockname failed %s", strerror(errno));
> +		goto fail;
> +	}
> +
> +	return sock_fd;
> +
> +fail:
> +	close(sock_fd);
> +	return -1;
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +		uint16_t nb_rx_desc __rte_unused,
> +		unsigned int socket_id __rte_unused,
> +		const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct udp_queue *rx_q = &internals->rx_queues[rx_queue_id];
> +
> +	dev->data->rx_queues[rx_queue_id] = rx_q;
> +	rx_q->internals = internals;
> +	rx_q->queue_id = rx_queue_id;
> +	rx_q->mb_pool = mb_pool;
> +
> +	if (rx_queue_id == 0)
> +		rx_q->sock_fd = internals->sock_fd;
> +	else
> +		rx_q->sock_fd = create_socket(internals);

ah, I guess 'SO_REUSEPORT' requirement is because connecting to same 
port per Rx queue.

> +
> +	return (rx_q->sock_fd < 0) ? -1 : 0;
> +}
> +
> +static void
> +eth_rx_queue_release(struct rte_eth_dev *dev, uint16_t rx_queue_id)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct udp_queue *rx_q = &internals->rx_queues[rx_queue_id];
> +
> +	if (rx_q->queue_id > 0)
> +		close(rx_q->sock_fd);
> +
> +	rx_q->sock_fd = -1;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +		uint16_t nb_tx_desc __rte_unused,
> +		unsigned int socket_id __rte_unused,
> +		const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct udp_queue *tx_q = &internals->tx_queues[tx_queue_id];
> +
> +	dev->data->tx_queues[tx_queue_id] = tx_q;
> +	tx_q->queue_id = tx_queue_id;
> +	tx_q->internals = internals;
> +	tx_q->sock_fd = internals->sock_fd;
> +
> +	return 0;
> +}
> +
> +static void
> +eth_tx_queue_release(struct rte_eth_dev *dev, uint16_t tx_queue_id)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct udp_queue *tx_q = &internals->tx_queues[tx_queue_id];
> +
> +	tx_q->sock_fd = -1;
> +}
> +
> +static int
> +eth_mtu_set(struct rte_eth_dev *dev __rte_unused, uint16_t mtu __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static int
> +eth_dev_info(struct rte_eth_dev *dev __rte_unused,
> +	     struct rte_eth_dev_info *dev_info)
> +{
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = UINT16_MAX;
> +	dev_info->max_rx_queues = RTE_MAX_QUEUES_PER_PORT;
> +	dev_info->max_tx_queues = RTE_MAX_QUEUES_PER_PORT;
> +	dev_info->min_rx_bufsize = 0;
> +	dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
> +	dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_SCATTER;
> +
> +	return 0;
> +}
> +
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +		int wait_to_complete __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static int
> +eth_mac_address_set(__rte_unused struct rte_eth_dev *dev,
> +		    __rte_unused struct rte_ether_addr *addr)
> +{
> +	return 0;
> +}
> +
> +static int
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +	const struct pmd_internals *internal = dev->data->dev_private;
> +	unsigned int i, num_stats;
> +
> +	num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS,
> +			    dev->data->nb_rx_queues);

queue stats shouldn't be part of basic stats anymore, in that case no 
need to take RTE_ETHDEV_QUEUE_STAT_CNTRS into account.

> +	for (i = 0; i < num_stats; i++) {
> +		const struct udp_queue *q = &internal->rx_queues[i];
> +
> +		stats->q_ipackets[i] = q->pkts;
> +		stats->ipackets += q->pkts;
> +		stats->q_ibytes[i] += q->bytes;
> +		stats->ibytes += q->bytes;
> +		stats->rx_nombuf += q->nobufs;
> +	}
> +
> +	num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS,
> +			    dev->data->nb_tx_queues);
> +	for (i = 0; i < num_stats; i++) {
> +		const struct udp_queue *q = &internal->tx_queues[i];
> +
> +		stats->q_opackets[i] = q->pkts;
> +		stats->opackets += q->pkts;
> +		stats->q_obytes[i] += q->bytes;
> +		stats->obytes += q->bytes;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internals *internal = dev->data->dev_private;
> +	unsigned int i;
> +
> +	for (i = 0; i < RTE_DIM(internal->rx_queues); i++) {
> +		struct udp_queue *q = &internal->rx_queues[i];
> +
> +		q->pkts = 0;
> +		q->bytes = 0;
> +		q->nobufs = 0;
> +	}
> +
> +
> +	for (i = 0; i < RTE_DIM(internal->tx_queues); i++) {
> +		struct udp_queue *q = &internal->tx_queues[i];
> +
> +		q->pkts = 0;
> +		q->bytes = 0;
> +		q->nobufs = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_rx_queue_release,
> +	.tx_queue_release = eth_tx_queue_release,
> +	.mtu_set = eth_mtu_set,
> +	.link_update = eth_link_update,
> +	.mac_addr_set = eth_mac_address_set,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +static int
> +parse_parameters(struct pmd_internals *internals, const char *params)
> +{
> +	static const char * const valid_args[] = {
> +		"local", "remote", NULL
> +	};
> +	struct rte_kvargs *kvlist;
> +	int ret;
> +
> +	if (params == NULL && params[0] == '\0')
> +		return 0;
> +
> +	PMD_LOG(INFO, "parameters \"%s\"", params);
> +	kvlist = rte_kvargs_parse(params, valid_args);
> +	if (kvlist == NULL)
> +		return -1;
> +
> +	ret = rte_kvargs_process(kvlist, ETH_UDP_LOCAL_ARG,
> +				 &get_address_arg, &internals->local);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = rte_kvargs_process(kvlist, ETH_UDP_REMOTE_ARG,
> +				 &get_address_arg, &internals->remote);
> +
> +out:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
> +static int
> +validate_parameters(struct pmd_internals *internals)
> +{
> +	int family = internals->remote.ss_family;
> +
> +	if (family == AF_UNSPEC) {
> +		PMD_LOG(ERR, "remote address required");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * if no local address is specified,
> +	 * then use same port and  and let kernel choose.
> +	 */
> +	if (internals->local.ss_family == AF_UNSPEC) {
> +		internals->local.ss_family = family;
> +	} else if (internals->local.ss_family != family) {
> +		PMD_LOG(ERR, "Local and remote address family differ");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +rte_pmd_udp_probe(struct rte_vdev_device *dev)
> +{
> +	struct pmd_internals *internals;
> +	struct rte_eth_dev *eth_dev;
> +	struct rte_eth_dev_data *data;
> +	const char *name;
> +	int ret;
> +
> +	name = rte_vdev_device_name(dev);
> +
> +	PMD_LOG(INFO, "Initializing pmd_udp for %s", name);
> +
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +		PMD_LOG(ERR, "Secondary not supported");
> +		return -ENOTSUP;
> +	}
> +
> +	eth_dev = rte_eth_vdev_allocate(dev, sizeof(*internals));
> +	if (!eth_dev)
> +		return -ENOMEM;
> +
> +	internals = eth_dev->data->dev_private;
> +	internals->port_id = eth_dev->data->port_id;
> +
> +	ret = parse_parameters(internals, rte_vdev_device_args(dev));
> +	if (ret < 0)
> +		goto fail;
> +
> +	ret = validate_parameters(internals);
> +	if (ret < 0)
> +		goto fail;
> +
> +	/*
> +	 * Note: first socket is used for transmit and for
> +	 * receive queue 0.
> +	 */
> +	internals->sock_fd = create_socket(internals);
> +	if (internals->sock_fd < 0) {
> +		ret = errno ? -errno : -EINVAL;
> +		goto fail;
> +	}
> +
> +	rte_eth_random_addr(internals->eth_addr.addr_bytes);
> +
> +	data = eth_dev->data;
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = &internals->eth_addr;
> +	data->promiscuous = 1;
> +	data->all_multicast = 1;
> +
> +	eth_dev->dev_ops = &ops;
> +	eth_dev->rx_pkt_burst = eth_udp_rx;
> +	eth_dev->tx_pkt_burst = eth_udp_tx;
> +
> +	rte_eth_dev_probing_finish(eth_dev);
> +
> +fail:
> +	if (ret != 0) {
> +		eth_dev->data->mac_addrs = NULL;
> +		rte_eth_dev_release_port(eth_dev);
> +	}
> +
> +	return ret;
> +}
> +
> +static int
> +rte_pmd_udp_remove(struct rte_vdev_device *dev)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	const char *name;
> +
> +	name = rte_vdev_device_name(dev);
> +
> +	PMD_LOG(INFO, "Closing udp ethdev %s", name);
> +
> +	/* find the ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return -1;
> +
> +	/* mac_addrs must not be freed alone because part of dev_private */
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		eth_dev->data->mac_addrs = NULL;
> +
> +	rte_eth_dev_release_port(eth_dev);
> +
> +	return 0;
> +}
> +
> +static struct rte_vdev_driver pmd_udp_drv = {
> +	.probe = rte_pmd_udp_probe,
> +	.remove = rte_pmd_udp_remove,
> +};
> +
> +RTE_PMD_REGISTER_VDEV(net_udp, pmd_udp_drv);
> +RTE_PMD_REGISTER_ALIAS(net_udp, eth_udp);

alias is for old drivers, new ones shouldn't have it

> +RTE_PMD_REGISTER_PARAM_STRING(net_udp,
> +	"local=<string>"
> +	"remote=<string>");
> +
> +RTE_INIT(eth_udp_init_log)
> +{
> +	eth_udp_logtype = rte_log_register("pmd.net.udp");
> +	if (eth_udp_logtype >= 0)
> +		rte_log_set_level(eth_udp_logtype, RTE_LOG_NOTICE);
> +}

There is 'RTE_LOG_REGISTER_DEFAULT' macro to help above

> diff --git a/drivers/net/udp/version.map b/drivers/net/udp/version.map
> new file mode 100644
> index 000000000000..78c3585d7c6b
> --- /dev/null
> +++ b/drivers/net/udp/version.map
> @@ -0,0 +1,3 @@
> +DPDK_23 {
> +	local: *;
> +};
  
Stephen Hemminger Oct. 11, 2022, 5:54 p.m. UTC | #5
On Tue, 11 Oct 2022 17:18:26 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 10/11/2022 1:10 AM, Stephen Hemminger wrote:
> > This is a new PMD which can be useful to test a DPDK application
> > from another test program. The PMD binds to a connected UDP socket
> > and expects to receive and send raw Ethernet packets over that
> > socket.
> >   
> 
> Why is the 'connected UDP socket' requirement? I guess this maps with 
> 'SO_REUSEPORT' in the code.
> 
> > This is especially useful for testing envirionments where you
> > can't/don't want to give the test driver program route permission.
> > 
> > Signed-off-by: Stephen Hemminger<stephen@networkplumber.org>
> > ---
> > This is 1st draft port of some test infrastructure to get
> > feedback and comments from community.
> > 
> > Later version will include an example and unit tests.  
> 
> I think pcap PMD can be used for similar testing purposed, but we can 
> have UDP PMD too, it may open other possibilities as a method to 
> communicate with other non-dpdk applications.
> 
> Although this is draft I will put some comments in other thread for next 
> version.

Our use case was unique. We are testing application running in QEMU on Windows
and the test driver is a Windows application. The Windows application interacts
with DPDK inside QEMU over UDP. For example, test driver builds a packet and
sends it to the DPDK appliance and expects a given result: rewrite, drop, etc.
  

Patch

diff --git a/doc/guides/nics/features/udp.ini b/doc/guides/nics/features/udp.ini
new file mode 100644
index 000000000000..dfc39204dacf
--- /dev/null
+++ b/doc/guides/nics/features/udp.ini
@@ -0,0 +1,10 @@ 
+;
+; Supported features of the 'udp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Basic stats          = Y
+Stats per queue      = Y
+Multiprocess aware   = Y
+Scattered Rx         = P
diff --git a/doc/guides/nics/udp.rst b/doc/guides/nics/udp.rst
new file mode 100644
index 000000000000..7b86f5e273e9
--- /dev/null
+++ b/doc/guides/nics/udp.rst
@@ -0,0 +1,30 @@ 
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+UDP Poll Mode Driver
+====================
+
+UDP Poll Mode Driver (PMD) is a basic virtual driver useful for testing.
+It provides a simple bare encapsulation of Ethernet frames in a UDP
+socket.  This is useful since the test driver application can
+easily open a local UDP socket and interact with a DPDK application.
+This can even be done inside a VM or container in automated test setup.
+
+
+Driver Configuration
+--------------------
+
+The driver is a virtual device configured with the --vdev option.
+The device name must start with the net_udp prefix follwed by numbers
+or letters The name is unique for each device. Each device can have
+multiple stream options and multiple devices can be used.
+Multiple device definitions can be arranged using multiple --vdev.
+
+Both local and remote address must be specified. Both IPv4 and IPv6
+are supported examples:
+
+.. code-block:: console
+
+   ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 \
+       --vdev 'net_udp0,local=127.0.0.1:9000,remote=192.0.2.1:9000',
+       --vdev 'net_udp1,local=[:0],9000,remote=[2001:DB8::1]:9000'
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 35bfa78dee66..36f2d9ed9b96 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -56,6 +56,7 @@  drivers = [
         'tap',
         'thunderx',
         'txgbe',
+        'udp',
         'vdev_netvsc',
         'vhost',
         'virtio',
diff --git a/drivers/net/udp/meson.build b/drivers/net/udp/meson.build
new file mode 100644
index 000000000000..e7bfd843f4b2
--- /dev/null
+++ b/drivers/net/udp/meson.build
@@ -0,0 +1,11 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files('rte_eth_udp.c')
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/net/udp/rte_eth_udp.c b/drivers/net/udp/rte_eth_udp.c
new file mode 100644
index 000000000000..8ce65721b3ec
--- /dev/null
+++ b/drivers/net/udp/rte_eth_udp.c
@@ -0,0 +1,728 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2022 Microsoft Corp.
+ * All rights reserved.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <sys/uio.h>
+#include <unistd.h>
+#include <netdb.h>
+#include <string.h>
+
+#include <rte_bus_vdev.h>
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <ethdev_vdev.h>
+#include <ethdev_driver.h>
+#include <bus_vdev_driver.h>
+#include <rte_ether.h>
+#include <rte_kvargs.h>
+#include <rte_log.h>
+#include <rte_mbuf.h>
+
+/* Which strings are valid kvargs for this driver */
+#define ETH_UDP_LOCAL_ARG	"local"
+#define ETH_UDP_REMOTE_ARG	"remote"
+
+static int eth_udp_logtype;
+#define PMD_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, eth_udp_logtype, \
+		"%s(): " fmt "\n", __func__, ##args)
+
+struct pmd_internals;
+
+struct udp_queue {
+	struct rte_mempool *mb_pool;
+	struct pmd_internals *internals;
+	uint16_t queue_id;
+	int sock_fd;
+
+	uint64_t pkts;
+	uint64_t bytes;
+	uint64_t nobufs;
+};
+
+struct pmd_internals {
+	uint16_t port_id;
+	int sock_fd;
+
+	struct sockaddr_storage remote;
+	struct sockaddr_storage local;
+	struct rte_ether_addr eth_addr;
+
+	struct udp_queue rx_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct udp_queue tx_queues[RTE_MAX_QUEUES_PER_PORT];
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = RTE_ETH_SPEED_NUM_10G,
+	.link_duplex = RTE_ETH_LINK_FULL_DUPLEX,
+	.link_status = RTE_ETH_LINK_DOWN,
+	.link_autoneg = RTE_ETH_LINK_FIXED,
+};
+
+static int
+parse_ipv6_address(const char *value, struct sockaddr_in6 *sin6)
+{
+	char *str = strdupa(value);
+	char *endp;
+
+	++str;	 /* skip leading '[' */
+	endp = strchr(str, ']');
+	if (endp == NULL) {
+		PMD_LOG(ERR, "missing closing ]");
+		return -EINVAL;
+	}
+
+	*endp++ = '\0';
+	sin6->sin6_family = AF_INET6;
+
+	if (inet_pton(AF_INET6, ++str, &sin6->sin6_addr) != 1) {
+		PMD_LOG(ERR, "invalid ipv6 address '%s'", str);
+		return -EINVAL;
+	}
+
+	/* Handle [ff80::1]:999 as address and port */
+	if (*endp == ':') {
+		sin6->sin6_port = htons(strtoul(endp + 1, NULL, 0));
+	} else if (*endp != '\0') {
+		PMD_LOG(ERR, "incorrect ipv6 port syntax");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+parse_ipv4_address(const char *value, struct sockaddr_in *sin)
+{
+	char *str = strdupa(value);
+	char *endp;
+
+	endp = strchr(str, ':');
+	if (endp)
+		*endp++ = '\0';
+
+	memset(sin, 0, sizeof(*sin));
+	sin->sin_family = AF_INET;
+
+	if (inet_pton(AF_INET, str, &sin->sin_addr) != 1) {
+		PMD_LOG(ERR, "invalid ipv4 address '%s'", str);
+		return -EINVAL;
+	}
+
+	if (endp != NULL)
+		sin->sin_port = htons(strtoul(endp, NULL, 0));
+
+	return 0;
+}
+
+/* Addresses are given on Kvargs as:
+ *   127.0.0.1:9000
+ *   [::1]:9000
+ */
+static int
+get_address_arg(const char *key, const char *value, void *sarg)
+{
+	if (value == NULL)
+		return -EINVAL;
+
+	PMD_LOG(DEBUG, "%s='%s'", key, value);
+
+	if (*value == '[')
+		return parse_ipv6_address(value, sarg);
+	else
+		return parse_ipv4_address(value, sarg);
+}
+
+/* Helper function to determine how many mbufs are needed per packet  */
+static uint16_t
+eth_mbuf_per_pkt(uint16_t port_id,
+		 struct rte_mempool *mb_pool)
+{
+	const struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	uint16_t buf_size = rte_pktmbuf_data_room_size(mb_pool);
+
+	return (dev->data->mtu + buf_size - 1) / buf_size;
+}
+
+
+/*
+ * Receive packets from socket into mbufs.
+ *
+ * In order to handle multiple packets at a time and scattered receive
+ * this allocates the worst case number of buffers.
+ *
+ * If out of memory, or socket gives error returns 0.
+ */
+static uint16_t
+eth_udp_rx(void *queue, struct rte_mbuf **pkts, uint16_t nb_pkts)
+{
+	struct udp_queue *udp_q = queue;
+	uint16_t port_id = udp_q->internals->port_id;
+	struct rte_mempool *mpool = udp_q->mb_pool;
+	unsigned int segs_per_pkt = eth_mbuf_per_pkt(port_id, mpool);
+	unsigned int num_segs = nb_pkts * segs_per_pkt;
+	struct rte_mbuf *bufs[num_segs];
+	struct iovec iovecs[num_segs];
+	struct mmsghdr msgs[nb_pkts];
+	unsigned int seg_idx = 0, nb_iovs = 0;
+	uint64_t num_rx_bytes = 0;
+	int ret;
+
+	/* Allocate worst case number of buffers to be used. */
+	if (rte_pktmbuf_alloc_bulk(mpool, bufs, num_segs) != 0) {
+		PMD_LOG(ERR, "alloc mbuf failed");
+		++udp_q->nobufs;
+		return 0;
+	}
+
+	/* Initialize the multi-packet headers and link the mbufs per packet */
+	memset(msgs, 0, sizeof(msgs));
+	for (uint16_t i = 0; i < nb_pkts; i++) {
+		msgs[i].msg_hdr.msg_iov    = &iovecs[nb_iovs];
+		msgs[i].msg_hdr.msg_iovlen = segs_per_pkt;
+
+		for (unsigned int n = 0; n < segs_per_pkt; n++, nb_iovs++) {
+			struct rte_mbuf *mb = bufs[nb_iovs];
+
+			iovecs[nb_iovs].iov_base = rte_pktmbuf_mtod(mb, void *);
+			iovecs[nb_iovs].iov_len = rte_pktmbuf_tailroom(mb);
+		}
+	}
+	assert(nb_iovs == num_segs);
+
+	ret = recvmmsg(udp_q->sock_fd, msgs, nb_pkts, 0, NULL);
+	if (ret < 0) {
+		if (!(errno == EWOULDBLOCK || errno == EINTR))
+			PMD_LOG(ERR, "recv failed: %s", strerror(errno));
+
+		rte_pktmbuf_free_bulk(bufs, num_segs);
+		return 0;
+	}
+	PMD_LOG(DEBUG, "recvmmsg returned %d", ret);
+
+	/* Adjust mbuf length and segments based on result. */
+	for (int i = 0; i < ret; i++) {
+		struct rte_mbuf **top = &pkts[i];
+		struct rte_mbuf *m0, *mb;
+		unsigned int unfilled;
+		size_t len;
+
+		/* Number of bytes in this packet */
+		len = msgs[i].msg_len;
+		num_rx_bytes += len;
+
+		m0 = mb = bufs[seg_idx];
+		m0->pkt_len = len;
+		m0->port = port_id;
+		m0->nb_segs = 0;
+
+		while (len > 0) {
+			mb->data_len  = RTE_MIN(len,
+						rte_pktmbuf_tailroom(mb));
+			len -= mb->data_len;
+			*top = mb;
+			top = &mb->next;
+
+			++m0->nb_segs;
+			mb = bufs[++seg_idx];
+
+		}
+		*top = NULL;
+
+		/* Drop rest of chain */
+		unfilled = segs_per_pkt - m0->nb_segs;
+		if (unfilled > 0) {
+			rte_pktmbuf_free_bulk(bufs + seg_idx, unfilled);
+			seg_idx += unfilled;
+		}
+	}
+
+	udp_q->pkts += ret;
+	udp_q->bytes += num_rx_bytes;
+
+	/* Free any unused buffers */
+	if (seg_idx < num_segs)
+		rte_pktmbuf_free_bulk(bufs + seg_idx, num_segs - seg_idx);
+
+	return ret;
+}
+
+/*
+ * Send mbufs over UDP socket.
+ */
+static uint16_t
+eth_udp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct udp_queue *udp_q = queue;
+	struct iovec iovecs[nb_bufs * RTE_MBUF_MAX_NB_SEGS];
+	struct mmsghdr msgs[nb_bufs];
+	unsigned int iov_iter;
+	int ret;
+
+	memset(msgs, 0, sizeof(msgs));
+	iov_iter = 0;
+	for (uint16_t i = 0; i < nb_bufs; i++) {
+		struct rte_mbuf *mb = bufs[i];
+		unsigned int nsegs = mb->nb_segs;
+
+		msgs[i].msg_hdr.msg_iov    = &iovecs[iov_iter];
+		msgs[i].msg_hdr.msg_iovlen = nsegs;
+
+		for (unsigned int n = 0; n < nsegs; n++) {
+			iovecs[iov_iter].iov_base = rte_pktmbuf_mtod(mb, void *);
+			iovecs[iov_iter].iov_len = rte_pktmbuf_tailroom(mb);
+			iov_iter++;
+			mb = mb->next;
+		}
+		assert(mb == NULL);
+	}
+
+	ret = sendmmsg(udp_q->sock_fd, msgs, nb_bufs, 0);
+	if (ret < 0) {
+		if (!(errno == EWOULDBLOCK || errno == EINTR))
+			PMD_LOG(ERR, "sendmmsg failed: %s", strerror(errno));
+		ret = 0;
+	} else {
+		uint64_t num_tx_bytes = 0;
+
+		for (int i = 0; i < ret; i++)
+			num_tx_bytes += msgs[i].msg_len;
+
+		udp_q->pkts += ret;
+		udp_q->bytes += num_tx_bytes;
+	}
+
+	if (ret < nb_bufs)
+		rte_pktmbuf_free_bulk(bufs + ret, nb_bufs - ret);
+
+	return ret;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
+	return 0;
+}
+
+static int
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internal = dev->data->dev_private;
+	unsigned int i;
+
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++)
+		internal->tx_queues[i].sock_fd = -1;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct udp_queue *rxq = &internal->rx_queues[i];
+
+		close(rxq->sock_fd);
+		rxq->sock_fd = -1;
+	}
+
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	return 0;
+}
+
+static int create_socket(struct pmd_internals *internals)
+{
+	socklen_t addrlen;
+	int family, sock_fd, on = 1;
+
+	family = internals->local.ss_family;
+	sock_fd = socket(family, SOCK_DGRAM | SOCK_NONBLOCK | SOCK_CLOEXEC, 0);
+	if (sock_fd < 0) {
+		PMD_LOG(ERR, "socket(): failed %s", strerror(errno));
+		return -1;
+	}
+
+	if (setsockopt(sock_fd, SOL_SOCKET, SO_REUSEPORT, &on, sizeof(on)) < 0) {
+		PMD_LOG(ERR, "setsockopt(SO_REUSEPORT): failed %s", strerror(errno));
+		goto fail;
+	}
+
+	if (family == AF_INET6)
+		addrlen = sizeof(struct sockaddr_in6);
+	else
+		addrlen = sizeof(struct sockaddr_in);
+
+
+	/* if address family is not set, then local address not specified */
+	if (bind(sock_fd, (struct sockaddr *)&internals->local, addrlen) < 0) {
+		PMD_LOG(ERR, "bind: failed %s", strerror(errno));
+		goto fail;
+	}
+
+	if (connect(sock_fd, (struct sockaddr *)&internals->remote, addrlen) < 0) {
+		PMD_LOG(ERR, "connect: failed %s", strerror(errno));
+		goto fail;
+	}
+
+	/* Get actual local family to reuse same address */
+	addrlen = sizeof(internals->local);
+	if (getsockname(sock_fd, (struct sockaddr *)&internals->local, &addrlen) < 0) {
+		PMD_LOG(ERR, "getsockname failed %s", strerror(errno));
+		goto fail;
+	}
+
+	addrlen = sizeof(internals->remote);
+	if (getpeername(sock_fd, (struct sockaddr *)&internals->remote, &addrlen) < 0) {
+		PMD_LOG(ERR, "getsockname failed %s", strerror(errno));
+		goto fail;
+	}
+
+	return sock_fd;
+
+fail:
+	close(sock_fd);
+	return -1;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc __rte_unused,
+		unsigned int socket_id __rte_unused,
+		const struct rte_eth_rxconf *rx_conf __rte_unused,
+		struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct udp_queue *rx_q = &internals->rx_queues[rx_queue_id];
+
+	dev->data->rx_queues[rx_queue_id] = rx_q;
+	rx_q->internals = internals;
+	rx_q->queue_id = rx_queue_id;
+	rx_q->mb_pool = mb_pool;
+
+	if (rx_queue_id == 0)
+		rx_q->sock_fd = internals->sock_fd;
+	else
+		rx_q->sock_fd = create_socket(internals);
+
+	return (rx_q->sock_fd < 0) ? -1 : 0;
+}
+
+static void
+eth_rx_queue_release(struct rte_eth_dev *dev, uint16_t rx_queue_id)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct udp_queue *rx_q = &internals->rx_queues[rx_queue_id];
+
+	if (rx_q->queue_id > 0)
+		close(rx_q->sock_fd);
+
+	rx_q->sock_fd = -1;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		uint16_t nb_tx_desc __rte_unused,
+		unsigned int socket_id __rte_unused,
+		const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct udp_queue *tx_q = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = tx_q;
+	tx_q->queue_id = tx_queue_id;
+	tx_q->internals = internals;
+	tx_q->sock_fd = internals->sock_fd;
+
+	return 0;
+}
+
+static void
+eth_tx_queue_release(struct rte_eth_dev *dev, uint16_t tx_queue_id)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct udp_queue *tx_q = &internals->tx_queues[tx_queue_id];
+
+	tx_q->sock_fd = -1;
+}
+
+static int
+eth_mtu_set(struct rte_eth_dev *dev __rte_unused, uint16_t mtu __rte_unused)
+{
+	return 0;
+}
+
+static int
+eth_dev_info(struct rte_eth_dev *dev __rte_unused,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = UINT16_MAX;
+	dev_info->max_rx_queues = RTE_MAX_QUEUES_PER_PORT;
+	dev_info->max_tx_queues = RTE_MAX_QUEUES_PER_PORT;
+	dev_info->min_rx_bufsize = 0;
+	dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
+	dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_SCATTER;
+
+	return 0;
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static int
+eth_mac_address_set(__rte_unused struct rte_eth_dev *dev,
+		    __rte_unused struct rte_ether_addr *addr)
+{
+	return 0;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	const struct pmd_internals *internal = dev->data->dev_private;
+	unsigned int i, num_stats;
+
+	num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS,
+			    dev->data->nb_rx_queues);
+	for (i = 0; i < num_stats; i++) {
+		const struct udp_queue *q = &internal->rx_queues[i];
+
+		stats->q_ipackets[i] = q->pkts;
+		stats->ipackets += q->pkts;
+		stats->q_ibytes[i] += q->bytes;
+		stats->ibytes += q->bytes;
+		stats->rx_nombuf += q->nobufs;
+	}
+
+	num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS,
+			    dev->data->nb_tx_queues);
+	for (i = 0; i < num_stats; i++) {
+		const struct udp_queue *q = &internal->tx_queues[i];
+
+		stats->q_opackets[i] = q->pkts;
+		stats->opackets += q->pkts;
+		stats->q_obytes[i] += q->bytes;
+		stats->obytes += q->bytes;
+	}
+
+	return 0;
+}
+
+static int
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internal = dev->data->dev_private;
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIM(internal->rx_queues); i++) {
+		struct udp_queue *q = &internal->rx_queues[i];
+
+		q->pkts = 0;
+		q->bytes = 0;
+		q->nobufs = 0;
+	}
+
+
+	for (i = 0; i < RTE_DIM(internal->tx_queues); i++) {
+		struct udp_queue *q = &internal->tx_queues[i];
+
+		q->pkts = 0;
+		q->bytes = 0;
+		q->nobufs = 0;
+	}
+
+	return 0;
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_rx_queue_release,
+	.tx_queue_release = eth_tx_queue_release,
+	.mtu_set = eth_mtu_set,
+	.link_update = eth_link_update,
+	.mac_addr_set = eth_mac_address_set,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+parse_parameters(struct pmd_internals *internals, const char *params)
+{
+	static const char * const valid_args[] = {
+		"local", "remote", NULL
+	};
+	struct rte_kvargs *kvlist;
+	int ret;
+
+	if (params == NULL && params[0] == '\0')
+		return 0;
+
+	PMD_LOG(INFO, "parameters \"%s\"", params);
+	kvlist = rte_kvargs_parse(params, valid_args);
+	if (kvlist == NULL)
+		return -1;
+
+	ret = rte_kvargs_process(kvlist, ETH_UDP_LOCAL_ARG,
+				 &get_address_arg, &internals->local);
+	if (ret < 0)
+		goto out;
+
+	ret = rte_kvargs_process(kvlist, ETH_UDP_REMOTE_ARG,
+				 &get_address_arg, &internals->remote);
+
+out:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+validate_parameters(struct pmd_internals *internals)
+{
+	int family = internals->remote.ss_family;
+
+	if (family == AF_UNSPEC) {
+		PMD_LOG(ERR, "remote address required");
+		return -EINVAL;
+	}
+
+	/*
+	 * if no local address is specified,
+	 * then use same port and  and let kernel choose.
+	 */
+	if (internals->local.ss_family == AF_UNSPEC) {
+		internals->local.ss_family = family;
+	} else if (internals->local.ss_family != family) {
+		PMD_LOG(ERR, "Local and remote address family differ");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+rte_pmd_udp_probe(struct rte_vdev_device *dev)
+{
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	struct rte_eth_dev_data *data;
+	const char *name;
+	int ret;
+
+	name = rte_vdev_device_name(dev);
+
+	PMD_LOG(INFO, "Initializing pmd_udp for %s", name);
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		PMD_LOG(ERR, "Secondary not supported");
+		return -ENOTSUP;
+	}
+
+	eth_dev = rte_eth_vdev_allocate(dev, sizeof(*internals));
+	if (!eth_dev)
+		return -ENOMEM;
+
+	internals = eth_dev->data->dev_private;
+	internals->port_id = eth_dev->data->port_id;
+
+	ret = parse_parameters(internals, rte_vdev_device_args(dev));
+	if (ret < 0)
+		goto fail;
+
+	ret = validate_parameters(internals);
+	if (ret < 0)
+		goto fail;
+
+	/*
+	 * Note: first socket is used for transmit and for
+	 * receive queue 0.
+	 */
+	internals->sock_fd = create_socket(internals);
+	if (internals->sock_fd < 0) {
+		ret = errno ? -errno : -EINVAL;
+		goto fail;
+	}
+
+	rte_eth_random_addr(internals->eth_addr.addr_bytes);
+
+	data = eth_dev->data;
+	data->dev_link = pmd_link;
+	data->mac_addrs = &internals->eth_addr;
+	data->promiscuous = 1;
+	data->all_multicast = 1;
+
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_udp_rx;
+	eth_dev->tx_pkt_burst = eth_udp_tx;
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+fail:
+	if (ret != 0) {
+		eth_dev->data->mac_addrs = NULL;
+		rte_eth_dev_release_port(eth_dev);
+	}
+
+	return ret;
+}
+
+static int
+rte_pmd_udp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	name = rte_vdev_device_name(dev);
+
+	PMD_LOG(INFO, "Closing udp ethdev %s", name);
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -1;
+
+	/* mac_addrs must not be freed alone because part of dev_private */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		eth_dev->data->mac_addrs = NULL;
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_udp_drv = {
+	.probe = rte_pmd_udp_probe,
+	.remove = rte_pmd_udp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_udp, pmd_udp_drv);
+RTE_PMD_REGISTER_ALIAS(net_udp, eth_udp);
+RTE_PMD_REGISTER_PARAM_STRING(net_udp,
+	"local=<string>"
+	"remote=<string>");
+
+RTE_INIT(eth_udp_init_log)
+{
+	eth_udp_logtype = rte_log_register("pmd.net.udp");
+	if (eth_udp_logtype >= 0)
+		rte_log_set_level(eth_udp_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/udp/version.map b/drivers/net/udp/version.map
new file mode 100644
index 000000000000..78c3585d7c6b
--- /dev/null
+++ b/drivers/net/udp/version.map
@@ -0,0 +1,3 @@ 
+DPDK_23 {
+	local: *;
+};