[v3,1/5] net/af_xdp: introduce AF XDP PMD driver

Message ID 20190321091845.78495-2-xiaolong.ye@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series Introduce AF_XDP PMD |

Checks

Context Check Description
ci/Intel-compilation fail Compilation issues
ci/Performance-Testing fail build patch failure
ci/checkpatch success coding style OK

Commit Message

Xiaolong Ye March 21, 2019, 9:18 a.m. UTC
  Add a new PMD driver for AF_XDP which is a proposed faster version of
AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1]
[2].

This is the vanilla version PMD which just uses a raw buffer registered as
the umem.

[1] https://fosdem.org/2018/schedule/event/af_xdp/
[2] https://lwn.net/Articles/745934/

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---
 MAINTAINERS                                   |   6 +
 config/common_base                            |   5 +
 config/common_linux                           |   1 +
 doc/guides/nics/af_xdp.rst                    |  45 +
 doc/guides/nics/features/af_xdp.ini           |  11 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_19_05.rst        |   7 +
 drivers/net/Makefile                          |   1 +
 drivers/net/af_xdp/Makefile                   |  32 +
 drivers/net/af_xdp/meson.build                |  21 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 932 ++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   3 +
 drivers/net/meson.build                       |   1 +
 mk/rte.app.mk                                 |   1 +
 14 files changed, 1067 insertions(+)
 create mode 100644 doc/guides/nics/af_xdp.rst
 create mode 100644 doc/guides/nics/features/af_xdp.ini
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
  

Comments

Stephen Hemminger March 21, 2019, 3:24 p.m. UTC | #1
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +static inline int
> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
> +{
> +	struct xsk_ring_prod *fq = &umem->fq;
> +	uint32_t idx;
> +	void *addr = NULL;
> +	int i, ret;
> +
> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
> +	if (!ret) {
> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
> +		return ret;
> +	}
> +
> +	for (i = 0; i < reserve_size; i++) {
> +		__u64 *fq_addr;
> +		rte_ring_dequeue(umem->buf_ring, &addr);

You should check return value of dequeue, otherwise static checkers will
(rightly) complain that "everyone else checks return value of of rte_ring_dequeue()
why not here?"
  
Stephen Hemminger March 21, 2019, 3:25 p.m. UTC | #2
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +	for (i = 0; i < rcvd; i++) {
> +		const struct xdp_desc *desc;
> +		uint64_t addr;
> +		uint32_t len;
> +		void *pkt;
> +
> +		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
> +		addr = desc->addr;
> +		len = desc->len;
> +		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
> +
> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);

You could use rte_pktmbuf_alloc_bulk to get the mbufs in one call
before doing this. It saves rcvd-1 atomic operations.
  
Stephen Hemminger March 21, 2019, 3:27 p.m. UTC | #3
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> static void kick_tx(struct pkt_tx_queue *txq)
> +{
> +	struct xsk_umem_info *umem = txq->pair->umem;
> +
> +	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
> +		      0, MSG_DONTWAIT) < 0) {
> +		/* some thing unexpected */
> +		if (errno != EBUSY && errno != EAGAIN)
> +			break;
> +
> +		/* pull from complete qeueu to leave more space */
> +		if (errno == EAGAIN)
> +			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
> +	}

What about EINTR??
You should retry the send then.
  
Stephen Hemminger March 21, 2019, 3:28 p.m. UTC | #4
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +		if (ret != 0) {
> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
> +			return -1;

You need to use the new dynamic log types and not have a global logtype.
  
Stephen Hemminger March 21, 2019, 3:30 p.m. UTC | #5
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	return 0;

You could propogate kernel errno into DPDK?
	return (ret < 0) ? -errno : 0;
  
Stephen Hemminger March 21, 2019, 3:31 p.m. UTC | #6
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
> +		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
> +			"%u bytes.\n", value, IFNAMSIZ)

Please don't break error message strings across multiple source lines.
It makes it harder to use tools like grep to find errors in source.
  
Stephen Hemminger March 21, 2019, 3:32 p.m. UTC | #7
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
> +		goto error;
> +
> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
> +		goto error;
> +
> +	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
> +
> +	close(sock);
> +	*if_index = if_nametoindex(if_name);

This seems confused:
	- first you get ifindex with SIOCGIFINDEX, then you ignore the result
	- then get MAC address.
	- then use if_nametoindex() which does SIOCGIFINDEX internally
  
Stephen Hemminger March 21, 2019, 3:36 p.m. UTC | #8
On Thu, 21 Mar 2019 17:18:41 +0800
Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> +
> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);

The convention in other network drivers is to use net_XXX in the vdev name.
In AF_XDP that would be:

RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);

About naming, I would just drop AF_ from the name everywhere, the driver
is about running over XDP, and the "AF_" is just a prefix for address family.

Why not:
	net/xdp
  
Xiaolong Ye March 22, 2019, 1:49 a.m. UTC | #9
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
>
>The convention in other network drivers is to use net_XXX in the vdev name.
>In AF_XDP that would be:
>
>RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);

Got it.

>
>About naming, I would just drop AF_ from the name everywhere, the driver
>is about running over XDP, and the "AF_" is just a prefix for address family.
>
>Why not:
>	net/xdp

Thanks for the advice, Actually this driver is more about AF_XDP rathan than
XDP, the foundational objects it uses such as umem, umem fill ring, umem
completion ring, tx ring, rx ring which are all AF_XDP specfic, so I would
rather keep the naming.

Thanks,
Xiaolong
>
  
Xiaolong Ye March 22, 2019, 1:54 a.m. UTC | #10
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
>> +	if (ioctl(sock, SIOCGIFINDEX, &ifr))
>> +		goto error;
>> +
>> +	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
>> +		goto error;
>> +
>> +	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
>> +
>> +	close(sock);
>> +	*if_index = if_nametoindex(if_name);
>
>This seems confused:
>	- first you get ifindex with SIOCGIFINDEX, then you ignore the result
>	- then get MAC address.
>	- then use if_nametoindex() which does SIOCGIFINDEX internally

You're right, the code is chaotic here, will improve it in next version.

Thanks,
Xiaolong
  
Xiaolong Ye March 22, 2019, 1:55 a.m. UTC | #11
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
>> +		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
>> +			"%u bytes.\n", value, IFNAMSIZ)
>
>Please don't break error message strings across multiple source lines.
>It makes it harder to use tools like grep to find errors in source.

Good point, will keep this in mind.

Thanks,
Xiaolong
  
Xiaolong Ye March 22, 2019, 2:01 a.m. UTC | #12
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	return 0;
>
>You could propogate kernel errno into DPDK?
>	return (ret < 0) ? -errno : 0;
>

Sorry, could you share the advantage of doing this?

Thanks,
Xiaolong
  
Xiaolong Ye March 22, 2019, 2:04 a.m. UTC | #13
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> static void kick_tx(struct pkt_tx_queue *txq)
>> +{
>> +	struct xsk_umem_info *umem = txq->pair->umem;
>> +
>> +	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
>> +		      0, MSG_DONTWAIT) < 0) {
>> +		/* some thing unexpected */
>> +		if (errno != EBUSY && errno != EAGAIN)
>> +			break;
>> +
>> +		/* pull from complete qeueu to leave more space */
>> +		if (errno == EAGAIN)
>> +			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
>> +	}
>
>What about EINTR??
>You should retry the send then.

Will do.
  
Xiaolong Ye March 22, 2019, 2:05 a.m. UTC | #14
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +	for (i = 0; i < rcvd; i++) {
>> +		const struct xdp_desc *desc;
>> +		uint64_t addr;
>> +		uint32_t len;
>> +		void *pkt;
>> +
>> +		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
>> +		addr = desc->addr;
>> +		len = desc->len;
>> +		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
>> +
>> +		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
>
>You could use rte_pktmbuf_alloc_bulk to get the mbufs in one call
>before doing this. It saves rcvd-1 atomic operations.

Got it, will do.
  
Xiaolong Ye March 22, 2019, 2:05 a.m. UTC | #15
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +
>> +static inline int
>> +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
>> +{
>> +	struct xsk_ring_prod *fq = &umem->fq;
>> +	uint32_t idx;
>> +	void *addr = NULL;
>> +	int i, ret;
>> +
>> +	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
>> +	if (!ret) {
>> +		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
>> +		return ret;
>> +	}
>> +
>> +	for (i = 0; i < reserve_size; i++) {
>> +		__u64 *fq_addr;
>> +		rte_ring_dequeue(umem->buf_ring, &addr);
>
>You should check return value of dequeue, otherwise static checkers will
>(rightly) complain that "everyone else checks return value of of rte_ring_dequeue()
>why not here?"

Got it, will do.
  
Xiaolong Ye March 22, 2019, 2:15 a.m. UTC | #16
On 03/21, Stephen Hemminger wrote:
>On Thu, 21 Mar 2019 17:18:41 +0800
>Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> +		if (ret != 0) {
>> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
>> +			return -1;
>
>You need to use the new dynamic log types and not have a global logtype.

You mean for all the logs in this driver, right? Is it due to the global logtype
will be deprecated?

Will investigate and implement the dynamic log type.

Thanks,
Xiaolong
  
Bruce Richardson March 22, 2019, 9:32 a.m. UTC | #17
On Fri, Mar 22, 2019 at 09:49:03AM +0800, Ye Xiaolong wrote:
> On 03/21, Stephen Hemminger wrote:
> >On Thu, 21 Mar 2019 17:18:41 +0800
> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> >
> >> +
> >> +RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
> >
> >The convention in other network drivers is to use net_XXX in the vdev name.
> >In AF_XDP that would be:
> >
> >RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
> 
> Got it.
> 
> >
> >About naming, I would just drop AF_ from the name everywhere, the driver
> >is about running over XDP, and the "AF_" is just a prefix for address family.
> >
> >Why not:
> >	net/xdp
> 
> Thanks for the advice, Actually this driver is more about AF_XDP rathan than
> XDP, the foundational objects it uses such as umem, umem fill ring, umem
> completion ring, tx ring, rx ring which are all AF_XDP specfic, so I would
> rather keep the naming.
> 
+1 for the naming. AF_XDP is something different from XDP itself, though
the former does use the latter.

/Bruce
  
Stephen Hemminger March 22, 2019, 3:37 p.m. UTC | #18
On Fri, 22 Mar 2019 10:01:57 +0800
Ye Xiaolong <xiaolong.ye@intel.com> wrote:

> On 03/21, Stephen Hemminger wrote:
> >On Thu, 21 Mar 2019 17:18:41 +0800
> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> >  
> >> +
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> >> +
> >> +	return 0;  
> >
> >You could propogate kernel errno into DPDK?
> >	return (ret < 0) ? -errno : 0;
> >  
> 
> Sorry, could you share the advantage of doing this?
> 
> Thanks,
> Xiaolong

Suppose kernel returned -ENOTSUPP or other error, it could go back to
the caller rather than juse invalid.
  
Stephen Hemminger March 22, 2019, 3:38 p.m. UTC | #19
On Fri, 22 Mar 2019 10:15:23 +0800
Ye Xiaolong <xiaolong.ye@intel.com> wrote:

> On 03/21, Stephen Hemminger wrote:
> >On Thu, 21 Mar 2019 17:18:41 +0800
> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
> >  
> >> +		if (ret != 0) {
> >> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
> >> +			return -1;  
> >
> >You need to use the new dynamic log types and not have a global logtype.  
> 
> You mean for all the logs in this driver, right? Is it due to the global logtype
> will be deprecated?

Global log types should not be used or added by any new code.
  
Xiaolong Ye March 22, 2019, 11:19 p.m. UTC | #20
On 03/22, Stephen Hemminger wrote:
>On Fri, 22 Mar 2019 10:01:57 +0800
>Ye Xiaolong <xiaolong.ye@intel.com> wrote:
>
>> On 03/21, Stephen Hemminger wrote:
>> >On Thu, 21 Mar 2019 17:18:41 +0800
>> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>> >  
>> >> +
>> >> +	if (ret < 0)
>> >> +		return -EINVAL;
>> >> +
>> >> +	return 0;  
>> >
>> >You could propogate kernel errno into DPDK?
>> >	return (ret < 0) ? -errno : 0;
>> >  
>> 
>> Sorry, could you share the advantage of doing this?
>> 
>> Thanks,
>> Xiaolong
>
>Suppose kernel returned -ENOTSUPP or other error, it could go back to
>the caller rather than juse invalid.

Got it.
  
Xiaolong Ye March 22, 2019, 11:20 p.m. UTC | #21
On 03/22, Stephen Hemminger wrote:
>On Fri, 22 Mar 2019 10:15:23 +0800
>Ye Xiaolong <xiaolong.ye@intel.com> wrote:
>
>> On 03/21, Stephen Hemminger wrote:
>> >On Thu, 21 Mar 2019 17:18:41 +0800
>> >Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>> >  
>> >> +		if (ret != 0) {
>> >> +			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
>> >> +			return -1;  
>> >
>> >You need to use the new dynamic log types and not have a global logtype.  
>> 
>> You mean for all the logs in this driver, right? Is it due to the global logtype
>> will be deprecated?
>
>Global log types should not be used or added by any new code.

Got it.
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb82..1cc54b439 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -468,6 +468,12 @@  M: John W. Linville <linville@tuxdriver.com>
 F: drivers/net/af_packet/
 F: doc/guides/nics/features/afpacket.ini
 
+Linux AF_XDP
+M: Xiaolong Ye <xiaolong.ye@intel.com>
+M: Qi Zhang <qi.z.zhang@intel.com>
+F: drivers/net/af_xdp/
+F: doc/guides/nics/features/af_xdp.rst
+
 Amazon ENA
 M: Marcin Wojtas <mw@semihalf.com>
 M: Michal Krawczyk <mk@semihalf.com>
diff --git a/config/common_base b/config/common_base
index 0b09a9348..4044de205 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@  CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 #
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
+#
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
 #
 # Compile link bonding PMD library
 #
diff --git a/config/common_linux b/config/common_linux
index 75334273d..0b1249da0 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@  CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_IFC_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
 CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
new file mode 100644
index 000000000..dd5654dd1
--- /dev/null
+++ b/doc/guides/nics/af_xdp.rst
@@ -0,0 +1,45 @@ 
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+AF_XDP Poll Mode Driver
+==========================
+
+AF_XDP is an address family that is optimized for high performance
+packet processing. AF_XDP sockets enable the possibility for XDP program to
+redirect packets to a memory buffer in userspace.
+
+For the full details behind AF_XDP socket, you can refer to
+`AF_XDP documentation in the Kernel
+<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
+
+This Linux-specific PMD driver creates the AF_XDP socket and binds it to a
+specific netdev queue, it allows a DPDK application to send and receive raw
+packets through the socket which would bypass the kernel network stack.
+Current implementation only supports single queue, multi-queues feature will
+be added later.
+
+Options
+-------
+
+The following options can be provided to set up an af_xdp port in DPDK.
+
+*   ``iface`` - name of the Kernel interface to attach to (required);
+*   ``queue`` - netdev queue id (optional, default 0);
+
+Prerequisites
+-------------
+
+This is a Linux-specific PMD, thus the following prerequisites apply:
+
+*  A Linux Kernel (version > 4.18) with XDP sockets configuration enabled;
+*  libbpf (within kernel version > 5.1) with latest af_xdp support installed
+*  A Kernel bound interface to attach to.
+
+Set up an af_xdp interface
+-----------------------------
+
+The following example will set up an af_xdp interface in DPDK:
+
+.. code-block:: console
+
+    --vdev eth_af_xdp,iface=ens786f1,queue=0
diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
new file mode 100644
index 000000000..36953c2de
--- /dev/null
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -0,0 +1,11 @@ 
+;
+; Supported features of the 'af_xdp' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+MTU update           = Y
+Promiscuous mode     = Y
+Stats per queue      = Y
+x86-64               = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..a4b80a3d0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -12,6 +12,7 @@  Network Interface Controller Drivers
     features
     build_and_test
     af_packet
+    af_xdp
     ark
     atlantic
     avp
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 61a2c7383..062facf89 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -65,6 +65,13 @@  New Features
     process.
   * Added support for Rx packet types list in a secondary process.
 
+* **Added the AF_XDP PMD.**
+
+  Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket
+  and bind it to a specific netdev queue, it allows a DPDK application to send
+  and receive raw packets through the socket which would bypass the kernel
+  network stack to achieve high performance packet processing.
+
 * **Updated Mellanox drivers.**
 
    New features and improvements were done in mlx4 and mlx5 PMDs:
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..5d401b8c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@  ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 000000000..db7d9aa57
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,32 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+
+# require kernel version >= v5.1-rc1
+CFLAGS += -I$(RTE_KERNELDIR)/tools/include
+CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf
+
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lbpf
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 000000000..635e67483
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,21 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+if host_machine.system() != 'linux'
+	build = false
+endif
+
+bpf_dep = dependency('libbpf', required: false)
+if bpf_dep.found()
+	build = true
+else
+	bpf_dep = cc.find_library('libbpf', required: false)
+	if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep)
+		build = true
+		pkgconfig_extra_libs += '-lbpf'
+	else
+		build = false
+	endif
+endif
+sources = files('rte_eth_af_xdp.c')
+ext_deps += bpf_dep
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
new file mode 100644
index 000000000..5e671670a
--- /dev/null
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -0,0 +1,932 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include <linux/if_ether.h>
+#include <linux/if_xdp.h>
+#include <linux/if_link.h>
+#include <asm/barrier.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <bpf/bpf.h>
+#include <xsk.h>
+
+#define RTE_LOGTYPE_AF_XDP RTE_LOGTYPE_USER1
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+#define ETH_AF_XDP_IFACE_ARG			"iface"
+#define ETH_AF_XDP_QUEUE_IDX_ARG		"queue"
+
+#define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
+#define ETH_AF_XDP_NUM_BUFFERS		4096
+#define ETH_AF_XDP_DATA_HEADROOM	0
+#define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
+#define ETH_AF_XDP_DFLT_QUEUE_IDX	0
+
+#define ETH_AF_XDP_RX_BATCH_SIZE	32
+#define ETH_AF_XDP_TX_BATCH_SIZE	32
+
+#define ETH_AF_XDP_MAX_QUEUE_PAIRS     16
+
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
+	struct rte_ring *buf_ring;
+	void *buffer;
+};
+
+struct rx_stats {
+	uint64_t rx_pkts;
+	uint64_t rx_bytes;
+	uint64_t rx_dropped;
+};
+
+struct pkt_rx_queue {
+	struct xsk_ring_cons rx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
+	struct rte_mempool *mb_pool;
+
+	struct rx_stats stats;
+
+	struct pkt_tx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct tx_stats {
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+	uint64_t tx_bytes;
+};
+
+struct pkt_tx_queue {
+	struct xsk_ring_prod tx;
+
+	struct tx_stats stats;
+
+	struct pkt_rx_queue *pair;
+	uint16_t queue_idx;
+};
+
+struct pmd_internals {
+	int if_index;
+	char if_name[IFNAMSIZ];
+	uint16_t queue_idx;
+	struct ether_addr eth_addr;
+	struct xsk_umem_info *umem;
+	struct rte_mempool *mb_pool_share;
+
+	struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+	struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+};
+
+static const char * const valid_arguments[] = {
+	ETH_AF_XDP_IFACE_ARG,
+	ETH_AF_XDP_QUEUE_IDX_ARG,
+	NULL
+};
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_AUTONEG
+};
+
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	void *addr = NULL;
+	int i, ret;
+
+	ret = xsk_ring_prod__reserve(fq, reserve_size, &idx);
+	if (!ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to reserve enough fq descs.\n");
+		return ret;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		rte_ring_dequeue(umem->buf_ring, &addr);
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		*fq_addr = (uint64_t)addr;
+	}
+
+	xsk_ring_prod__submit(fq, reserve_size);
+
+	return 0;
+}
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbuf;
+	unsigned long dropped = 0;
+	unsigned long rx_bytes = 0;
+	uint16_t count = 0;
+	int rcvd, i;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0)
+		return 0;
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		void *pkt;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+		pkt = xsk_umem__get_data(rxq->umem->buffer, addr);
+
+		mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+		if (mbuf != NULL) {
+			rte_memcpy(rte_pktmbuf_mtod(mbuf, void *), pkt, len);
+			rte_pktmbuf_pkt_len(mbuf) = len;
+			rte_pktmbuf_data_len(mbuf) = len;
+			rx_bytes += len;
+			bufs[count++] = mbuf;
+		} else {
+			dropped++;
+		}
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_bytes += rx_bytes;
+	rxq->stats.rx_dropped += dropped;
+
+	return count;
+}
+
+static void pull_umem_cq(struct xsk_umem_info *umem, int size)
+{
+	struct xsk_ring_cons *cq = &umem->cq;
+	size_t i, n;
+	uint32_t idx_cq;
+
+	n = xsk_ring_cons__peek(cq, size, &idx_cq);
+
+	for (i = 0; i < n; i++) {
+		uint64_t addr;
+		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+	}
+
+	xsk_ring_cons__release(cq, n);
+}
+
+static void kick_tx(struct pkt_tx_queue *txq)
+{
+	struct xsk_umem_info *umem = txq->pair->umem;
+
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		      0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN)
+			break;
+
+		/* pull from complete qeueu to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
+	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+}
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->pair->umem;
+	struct rte_mbuf *mbuf;
+	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
+	unsigned long tx_bytes = 0;
+	int i, valid = 0;
+	uint32_t idx_tx;
+
+	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
+	pull_umem_cq(umem, nb_pkts);
+
+	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
+					nb_pkts, NULL);
+	if (nb_pkts == 0)
+		return 0;
+
+	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
+		kick_tx(txq);
+		return 0;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct xdp_desc *desc;
+		void *pkt;
+		uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE
+					- ETH_AF_XDP_DATA_HEADROOM;
+		desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i);
+		mbuf = bufs[i];
+		if (mbuf->pkt_len <= buf_len) {
+			desc->addr = (uint64_t)addrs[valid];
+			desc->len = mbuf->pkt_len;
+			pkt = xsk_umem__get_data(umem->buffer,
+						 desc->addr);
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+			       desc->len);
+			valid++;
+			tx_bytes += mbuf->pkt_len;
+		}
+		rte_pktmbuf_free(mbuf);
+	}
+
+	xsk_ring_prod__submit(&txq->tx, nb_pkts);
+
+	kick_tx(txq);
+
+	if (valid < nb_pkts)
+		rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid],
+				 nb_pkts - valid, NULL);
+
+	txq->stats.err_pkts += nb_pkts - valid;
+	txq->stats.tx_pkts += valid;
+	txq->stats.tx_bytes += tx_bytes;
+
+	return nb_pkts;
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/* This function gets called when the current port gets stopped. */
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	/* rx/tx must be paired */
+	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = ETH_FRAME_LEN;
+	dev_info->max_rx_queues = 1;
+	dev_info->max_tx_queues = 1;
+
+	dev_info->default_rxportconf.nb_queues = 1;
+	dev_info->default_txportconf.nb_queues = 1;
+	dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+	dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS;
+}
+
+static int
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct xdp_statistics xdp_stats;
+	struct pkt_rx_queue *rxq;
+	socklen_t optlen;
+	int i, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		optlen = sizeof(struct xdp_statistics);
+		rxq = &internals->rx_queues[i];
+		stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
+		stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+
+		stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
+		stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+
+		stats->ipackets += stats->q_ipackets[i];
+		stats->ibytes += stats->q_ibytes[i];
+		stats->imissed += internals->rx_queues[i].stats.rx_dropped;
+		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
+				XDP_STATISTICS, &xdp_stats, &optlen);
+		if (ret != 0) {
+			RTE_LOG(ERR, AF_XDP, "getsockopt() failed for XDP_STATISTICS.\n");
+			return -1;
+		}
+		stats->imissed += xdp_stats.rx_dropped;
+
+		stats->opackets += stats->q_opackets[i];
+		stats->oerrors += internals->tx_queues[i].stats.err_pkts;
+		stats->obytes += stats->q_obytes[i];
+	}
+
+	return 0;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		memset(&internals->rx_queues[i].stats, 0,
+					sizeof(struct rx_stats));
+		memset(&internals->tx_queues[i].stats, 0,
+					sizeof(struct tx_stats));
+	}
+}
+
+static void remove_xdp_program(struct pmd_internals *internals)
+{
+	uint32_t curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id,
+				XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		RTE_LOG(ERR, AF_XDP, "bpf_get_link_xdp_id failed\n");
+		return;
+	}
+	bpf_set_link_xdp_fd(internals->if_index, -1,
+			XDP_FLAGS_UPDATE_IF_NOEXIST);
+}
+
+static void
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_rx_queue *rxq;
+	int i;
+
+	RTE_LOG(INFO, AF_XDP, "Closing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		rxq = &internals->rx_queues[i];
+		if (rxq->umem == NULL)
+			break;
+		xsk_socket__delete(rxq->xsk);
+	}
+
+	(void)xsk_umem__delete(internals->umem->umem);
+	remove_xdp_program(internals);
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static void xdp_umem_destroy(struct xsk_umem_info *umem)
+{
+	free(umem->buffer);
+	umem->buffer = NULL;
+
+	rte_ring_free(umem->buf_ring);
+	umem->buf_ring = NULL;
+
+	free(umem);
+	umem = NULL;
+}
+
+static struct xsk_umem_info *xdp_umem_configure(void)
+{
+	struct xsk_umem_info *umem;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.frame_size = ETH_AF_XDP_FRAME_SIZE,
+		.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
+	void *bufs = NULL;
+	int ret;
+	uint64_t i;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		RTE_LOG(ERR, AF_XDP, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->buf_ring = rte_ring_create("af_xdp_ring",
+					 ETH_AF_XDP_NUM_BUFFERS,
+					 SOCKET_ID_ANY,
+					 0x0);
+	if (umem->buf_ring == NULL) {
+		RTE_LOG(ERR, AF_XDP,
+			"Failed to create rte_ring\n");
+		goto err;
+	}
+
+	for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
+		rte_ring_enqueue(umem->buf_ring,
+				 (void *)(i * ETH_AF_XDP_FRAME_SIZE +
+					  ETH_AF_XDP_DATA_HEADROOM));
+
+	if (posix_memalign(&bufs, getpagesize(),
+			   ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) {
+		RTE_LOG(ERR, AF_XDP, "Failed to allocate memory pool.\n");
+		goto err;
+	}
+	ret = xsk_umem__create(&umem->umem, bufs,
+			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = bufs;
+
+	return umem;
+
+err:
+	xdp_umem_destroy(umem);
+	return NULL;
+}
+
+static int
+xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
+	      int ring_size)
+{
+	struct xsk_socket_config cfg;
+	struct pkt_tx_queue *txq = rxq->pair;
+	int ret = 0;
+	int reserve_size;
+
+	rxq->umem = xdp_umem_configure();
+	if (rxq->umem == NULL) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	cfg.rx_size = ring_size;
+	cfg.tx_size = ring_size;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	cfg.bind_flags = 0;
+	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+			internals->queue_idx, rxq->umem->umem, &rxq->rx,
+			&txq->tx, &cfg);
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to create xsk socket.\n");
+		goto err;
+	}
+
+	reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2;
+	ret = reserve_fill_queue(rxq->umem, reserve_size);
+	if (ret) {
+		RTE_LOG(ERR, AF_XDP, "Failed to reserve fill queue.\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	xdp_umem_destroy(rxq->umem);
+
+	return ret;
+}
+
+static void
+queue_reset(struct pmd_internals *internals, uint16_t queue_idx)
+{
+	struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx];
+	struct pkt_tx_queue *txq = rxq->pair;
+	int xsk_fd = xsk_socket__fd(rxq->xsk);
+
+	if (xsk_fd) {
+		close(xsk_fd);
+		if (internals->umem != NULL) {
+			xdp_umem_destroy(internals->umem);
+			internals->umem = NULL;
+		}
+	}
+	memset(rxq, 0, sizeof(*rxq));
+	memset(txq, 0, sizeof(*txq));
+	rxq->pair = txq;
+	txq->pair = rxq;
+	rxq->queue_idx = queue_idx;
+	txq->queue_idx = queue_idx;
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint32_t buf_size, data_size;
+	struct pkt_rx_queue *rxq;
+	int ret;
+
+	rxq = &internals->rx_queues[rx_queue_id];
+	queue_reset(internals, rx_queue_id);
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
+		RTE_PKTMBUF_HEADROOM;
+	data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
+
+	if (data_size > buf_size) {
+		RTE_LOG(ERR, AF_XDP,
+			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->device->name, data_size, buf_size);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	rxq->mb_pool = mb_pool;
+
+	if (xsk_configure(internals, rxq, nb_rx_desc)) {
+		RTE_LOG(ERR, AF_XDP,
+			"Failed to configure xdp socket\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	internals->umem = rxq->umem;
+
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	return 0;
+
+err:
+	queue_reset(internals, rx_queue_id);
+	return ret;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct pkt_tx_queue *txq;
+
+	txq = &internals->tx_queues[tx_queue_id];
+
+	dev->data->tx_queues[tx_queue_id] = txq;
+	return 0;
+}
+
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	struct ifreq ifr = { .ifr_mtu = mtu };
+	int ret;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return -EINVAL;
+
+	strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+	ret = ioctl(s, SIOCSIFMTU, &ifr);
+	close(s);
+
+	if (ret < 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+	struct ifreq ifr;
+	int s;
+
+	s = socket(PF_INET, SOCK_DGRAM, 0);
+	if (s < 0)
+		return;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+		goto out;
+	ifr.ifr_flags &= mask;
+	ifr.ifr_flags |= flags;
+	if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+		goto out;
+out:
+	close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_close = eth_dev_close,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.mtu_set = eth_dev_mtu_set,
+	.promiscuous_enable = eth_dev_promiscuous_enable,
+	.promiscuous_disable = eth_dev_promiscuous_disable,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+/** parse integer from integer argument */
+static int
+parse_integer_arg(const char *key __rte_unused,
+		  const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+	char *end;
+
+	*i = strtol(value, &end, 10);
+	if (*i < 0) {
+		RTE_LOG(ERR, AF_XDP, "Argument has to be positive.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/** parse name argument */
+static int
+parse_name_arg(const char *key __rte_unused,
+	       const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) {
+		RTE_LOG(ERR, AF_XDP, "Invalid name %s, should be less than "
+			"%u bytes.\n", value, IFNAMSIZ);
+		return -EINVAL;
+	}
+
+	strlcpy(name, value, IFNAMSIZ);
+
+	return 0;
+}
+
+static int
+parse_parameters(struct rte_kvargs *kvlist,
+		 char *if_name,
+		 int *queue_idx)
+{
+	int ret;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG,
+				 &parse_name_arg, if_name);
+	if (ret < 0)
+		goto free_kvlist;
+
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG,
+				 &parse_integer_arg, queue_idx);
+	if (ret < 0)
+		goto free_kvlist;
+
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+get_iface_info(const char *if_name,
+	       struct ether_addr *eth_addr,
+	       int *if_index)
+{
+	struct ifreq ifr;
+	int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
+
+	if (sock < 0)
+		return -1;
+
+	strlcpy(ifr.ifr_name, if_name, IFNAMSIZ);
+	if (ioctl(sock, SIOCGIFINDEX, &ifr))
+		goto error;
+
+	if (ioctl(sock, SIOCGIFHWADDR, &ifr))
+		goto error;
+
+	rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+
+	close(sock);
+	*if_index = if_nametoindex(if_name);
+	return 0;
+
+error:
+	close(sock);
+	return -1;
+}
+
+static struct rte_eth_dev *
+init_internals(struct rte_vdev_device *dev,
+	       const char *if_name,
+	       int queue_idx)
+{
+	const char *name = rte_vdev_device_name(dev);
+	const unsigned int numa_node = dev->device.numa_node;
+	struct pmd_internals *internals;
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	int i;
+
+	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
+	if (internals == NULL)
+		return NULL;
+
+	internals->queue_idx = queue_idx;
+	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+
+	for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) {
+		internals->tx_queues[i].pair = &internals->rx_queues[i];
+		internals->rx_queues[i].pair = &internals->tx_queues[i];
+	}
+
+	ret = get_iface_info(if_name, &internals->eth_addr,
+			     &internals->if_index);
+	if (ret)
+		goto err;
+
+	eth_dev = rte_eth_vdev_allocate(dev, 0);
+	if (eth_dev == NULL)
+		goto err;
+
+	eth_dev->data->dev_private = internals;
+	eth_dev->data->dev_link = pmd_link;
+	eth_dev->data->mac_addrs = &internals->eth_addr;
+	eth_dev->dev_ops = &ops;
+	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
+	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
+
+	return eth_dev;
+
+err:
+	rte_free(internals);
+	return NULL;
+}
+
+static int
+rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
+{
+	struct rte_kvargs *kvlist;
+	char if_name[IFNAMSIZ];
+	int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX;
+	struct rte_eth_dev *eth_dev = NULL;
+	const char *name;
+
+	RTE_LOG(INFO, AF_XDP, "Initializing pmd_af_xdp for %s\n",
+		rte_vdev_device_name(dev));
+
+	name = rte_vdev_device_name(dev);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+		strlen(rte_vdev_device_args(dev)) == 0) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			RTE_LOG(ERR, AF_XDP, "Failed to probe %s\n", name);
+			return -EINVAL;
+		}
+		eth_dev->dev_ops = &ops;
+		rte_eth_dev_probing_finish(eth_dev);
+		return 0;
+	}
+
+	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
+	if (kvlist == NULL) {
+		RTE_LOG(ERR, AF_XDP, "Invalid kvargs key\n");
+		return -EINVAL;
+	}
+
+	if (dev->device.numa_node == SOCKET_ID_ANY)
+		dev->device.numa_node = rte_socket_id();
+
+	if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) {
+		RTE_LOG(ERR, AF_XDP, "Invalid kvargs value\n");
+		return -EINVAL;
+	}
+
+	eth_dev = init_internals(dev, if_name, xsk_queue_idx);
+	if (eth_dev == NULL) {
+		RTE_LOG(ERR, AF_XDP, "Failed to init internals\n");
+		return -1;
+	}
+
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
+}
+
+static int
+rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+
+	RTE_LOG(INFO, AF_XDP, "Removing AF_XDP ethdev on numa socket %u\n",
+		rte_socket_id());
+
+	if (dev == NULL)
+		return -1;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return -1;
+
+	internals = eth_dev->data->dev_private;
+
+	rte_ring_free(internals->umem->buf_ring);
+	rte_free(internals->umem->buffer);
+	rte_free(internals->umem);
+
+	rte_eth_dev_release_port(eth_dev);
+
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_af_xdp_drv = {
+	.probe = rte_pmd_af_xdp_probe,
+	.remove = rte_pmd_af_xdp_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(eth_af_xdp, pmd_af_xdp_drv);
+RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp,
+			      "iface=<string> "
+			      "queue=<int> ");
diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
new file mode 100644
index 000000000..c6db030fe
--- /dev/null
+++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map
@@ -0,0 +1,3 @@ 
+DPDK_19.05 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..1105e72d8 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,6 +2,7 @@ 
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet',
+	'af_xdp',
 	'ark',
 	'atlantic',
 	'avp',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..be0af73cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,6 +143,7 @@  _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL)  += -lrte_mempool_dpaa2
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP)     += -lrte_pmd_af_xdp -lelf -lbpf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD)   += -lrte_pmd_atlantic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD)        += -lrte_pmd_avp