[v3,0/6] PMD driver for AF_XDP

Message ID 20180816144321.17719-1-qi.z.zhang@intel.com (mailing list archive)
State Not Applicable, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail apply issues

Commit Message

Qi Zhang Aug. 16, 2018, 2:43 p.m. UTC
  Overview
  and linked to eth0, the ebpf source code is /samples/bpf/xdpsock_kern.c
  you can modify it and re-compile for a different test.

4. dump xdp socket map information.
  #./tools/bpf/bpftool/bpftool map -p, you will see something like below.

  },{
       "id": 56,
       "type": "xskmap",
       "name": "xsks_map",
       "flags": 0,
       "bytes_key": 4,
       "bytes_value": 4,
       "max_entries": 4,
       "bytes_memlock": 4096
   }

   in this case 56 is the map id and it has 4 entries

5. start testpmd

   ./build/app/testpmd -c 0xc -n 4 --vdev eth_af_xdp,iface=enp59s0f0,xsk_map_id=56,xsk_map_key_start=2xsk_map_key_count=2 -- -i --rxq=2 --txq=2

    in this case, we reserved 2 entries (2,3) in the map, and they will be mapped to queue 0 and queue 1.

6. unbind after test
   ./sample/bpf/xdpsock -i eth0 -u.

Performance
===========
Since no zero copy driver is ready yet.
So far only tested with DRV and SKB mode on i40e 25G
the result show identical with kernel sample "xdpsock"

Qi Zhang (6):
  net/af_xdp: new PMD driver
  lib/mbuf: enable parse flags when create mempool
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable zero copy
  app/testpmd: add mempool flags parameter

 app/test-pmd/parameters.c                     |   12 +
 app/test-pmd/testpmd.c                        |   15 +-
 app/test-pmd/testpmd.h                        |    1 +
 config/common_base                            |    5 +
 config/common_linuxapp                        |    1 +
 drivers/net/Makefile                          |    1 +
 drivers/net/af_xdp/Makefile                   |   30 +
 drivers/net/af_xdp/meson.build                |    7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c           | 1345 +++++++++++++++++++++++++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    4 +
 lib/librte_mbuf/rte_mbuf.c                    |   15 +-
 lib/librte_mbuf/rte_mbuf.h                    |    8 +-
 lib/librte_mempool/rte_mempool.c              |    3 +
 lib/librte_mempool/rte_mempool.h              |    1 +
 mk/rte.app.mk                                 |    1 +
 15 files changed, 1439 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
  

Comments

William Tu Aug. 23, 2018, 4:25 p.m. UTC | #1
Hi Zhang Qi,

I'm not familiar with DPDK code, but I'm curious about the
benefits of using AF_XDP pmd, specifically I have a couple questions:

1) With zero-copy driver support, is AF_XDP pmd expects to have
similar performance than other pmd? Since AF_XDP is still using
native device driver, isn't the interrupt still there and not "poll-mode"
anymore?

2) does the patch expect user to customize the ebpf/xdp code
so that this becomes another way to extend dpdk datapath?

Thank you
William

On Thu, Aug 16, 2018 at 7:42 AM Qi Zhang <qi.z.zhang@intel.com> wrote:
>
> Overview
> ========
>
> The patch set add a new PMD driver for AF_XDP which is a proposed
> faster version of AF_PACKET interface in Linux, see below link for
> detail AF_XDP introduction:
> https://lwn.net/Articles/750845/
> https://fosdem.org/2018/schedule/event/af_xdp/
>
> AF_XDP roadmap
> ==============
> - The kernel 4.18 is out and af_xdp is included.
>   https://kernelnewbies.org/Linux_4.18
> - So far there is no zero copy supported driver be merged, but some are
>   on the way.
>
> Change logs
> ===========
>
> v3:
> - Re-work base on AF_XDP's interface changes.
> - Support multi-queues, each dpdk queue has its own xdp socket.
>   An xdp socket is always bound to a netdev queue.
>   We assume all xdp socket from the same ethdev are bound to the
>   same netdev queue, though a netdev queue still can be bound by
>   xdp sockets from different ethdev instances.
>   Below is an example of the mapping.
>   ------------------------------------------------------
>   | dpdk q0 | dpdk q1 | dpdk q0    | dpdk q0 | dpdk q1 |
>   ------------------------------------------------------
>   | xsk A   | xsk B   | xsk C      | xsk D   | xsk E   |<---|
>   ------------------------------------------------------    |
>   |  ETHDEV 0         | ETHDEV 1   |  ETHDEV 2         |    |  DPDK
>   ------------------------------------------------------------------
>   |  netdev queue 0                |   netdev queue 1  |    |  KERNEL
>   ------------------------------------------------------    |
>   |                  NETDEV eth0                       |    |
>   ------------------------------------------------------    |
>   |                    key   xsk                       |    |
>   |  ----------       --------------                   |    |
>   |  |         |      | 0  | xsk A |                   |    |
>   |  |         |      --------------                   |    |
>   |  |         |      | 2  | xsk B |                   |    |
>   |  | ebpf    |      ---------------------------------------
>   |  |         |      | 3  | xsk C |                   |
>   |  |     redirect ->|--------------                  |
>   |  |         |      | 4  | xsk D |                   |
>   |  |         |      --------------                   |
>   |  |---------|      | 5  | xsk E |                   |
>   |                   --------------                   |
>   |-----------------------------------------------------
>
> - It is an open question that how to load ebpf to kernel and link to
>   specific netdev in DPDK, should it be part of PMD, or it should be handled by
>   an independent tool? In this patchset, it takes the second option, there will
>   be a "bind" stage before we start AF_XDP PMD, this includes below steps:
>   a) load ebpf program to the kernel, (the ebpf program must contain the
>      logic to redirect packet to a xdp socket base on a redirect map).
>   b) link ebpf program to specific network interface.
>   c) expose the xdp socket redirect map id and entries number to user,
>      so this will be parsed to PMD, and PMD will create xdp socket
>      for each queue and update the redirect map correctly.
>      (example: --vdev,iface=eth0,xsk_map_id=53,xsk_map_key_base=0,xsk_map_key_count=4)
>
> v2:
> - fix lisence header
> - clean up bpf dependency, bpf program is embedded,  no "xdpsock_kern.o"
>   required
> - clean up make file, only linux_header is required
> - fix all the compile warning.
> - fix packet number return in Tx.
>
> How to try
> ==========
>
> 1. Take the kernel v4.18.
>    make sure you turn on XDP sockets when compiling
>    Networking support -->
>         Networking options -->
>                 [ * ] XDP sockets
> 2. in the kernel source code, apply below patch and compile the bpf sample code.
>    #make samples/bpf/
>    so the sample xdpsock can be used as a bind/unbind tool for af_xdp
>    PMD, sorry for this ugly, but in future, there could be a dedicated
>    tool in DPDK, if we agree with the idea that bpf configure in the kernel
>    should be separated from PMD.
>
> ~~~~~~~~~~~~~~~~~~~~~~~PATCH START~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index d69c8d78d3fd..44a6318043e7 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -76,6 +76,8 @@ static int opt_poll;
>  static int opt_shared_packet_buffer;
>  static int opt_interval = 1;
>  static u32 opt_xdp_bind_flags;
> +static int opt_bind;
> +static int opt_unbind;
>
>  struct xdp_umem_uqueue {
>         u32 cached_prod;
> @@ -662,6 +664,8 @@ static void usage(const char *prog)
>                 "  -S, --xdp-skb=n      Use XDP skb-mod\n"
>                 "  -N, --xdp-native=n   Enfore XDP native mode\n"
>                 "  -n, --interval=n     Specify statistics update interval (default 1 sec).\n"
> +               "  -b, --bind           Bind only.\n"
> +               "  -u, --unbind         Unbind only.\n"
>                 "\n";
>         fprintf(stderr, str, prog);
>         exit(EXIT_FAILURE);
> @@ -674,7 +678,7 @@ static void parse_command_line(int argc, char **argv)
>         opterr = 0;
>
>         for (;;) {
> -               c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
> +               c = getopt_long(argc, argv, "rtli:q:psSNn:bu", long_options,
>                                 &option_index);
>                 if (c == -1)
>                         break;
> @@ -711,6 +715,12 @@ static void parse_command_line(int argc, char **argv)
>                 case 'n':
>                         opt_interval = atoi(optarg);
>                         break;
> +               case 'b':
> +                       opt_bind = 1;
> +                       break;
> +               case 'u':
> +                       opt_unbind = 1;
> +                       break;
>                 default:
>                         usage(basename(argv[0]));
>                 }
> @@ -898,6 +908,12 @@ int main(int argc, char **argv)
>                 exit(EXIT_FAILURE);
>         }
>
> +       if (opt_unbind) {
> +               bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
>
> ~~~~~~~~~~~~~~~~~~~~~~~PATCH END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> 3. bind
>   #./samples/bpf/xdpsock -i eth0 -b
>
>   in this step, an ebpf binary xdpsock_kern.o is be loaded into the kernel
>   and linked to eth0, the ebpf source code is /samples/bpf/xdpsock_kern.c
>   you can modify it and re-compile for a different test.
>
> 4. dump xdp socket map information.
>   #./tools/bpf/bpftool/bpftool map -p, you will see something like below.
>
>   },{
>        "id": 56,
>        "type": "xskmap",
>        "name": "xsks_map",
>        "flags": 0,
>        "bytes_key": 4,
>        "bytes_value": 4,
>        "max_entries": 4,
>        "bytes_memlock": 4096
>    }
>
>    in this case 56 is the map id and it has 4 entries
>
> 5. start testpmd
>
>    ./build/app/testpmd -c 0xc -n 4 --vdev eth_af_xdp,iface=enp59s0f0,xsk_map_id=56,xsk_map_key_start=2xsk_map_key_count=2 -- -i --rxq=2 --txq=2
>
>     in this case, we reserved 2 entries (2,3) in the map, and they will be mapped to queue 0 and queue 1.
>
> 6. unbind after test
>    ./sample/bpf/xdpsock -i eth0 -u.
>
> Performance
> ===========
> Since no zero copy driver is ready yet.
> So far only tested with DRV and SKB mode on i40e 25G
> the result show identical with kernel sample "xdpsock"
>
> Qi Zhang (6):
>   net/af_xdp: new PMD driver
>   lib/mbuf: enable parse flags when create mempool
>   lib/mempool: allow page size aligned mempool
>   net/af_xdp: use mbuf mempool for buffer management
>   net/af_xdp: enable zero copy
>   app/testpmd: add mempool flags parameter
>
>  app/test-pmd/parameters.c                     |   12 +
>  app/test-pmd/testpmd.c                        |   15 +-
>  app/test-pmd/testpmd.h                        |    1 +
>  config/common_base                            |    5 +
>  config/common_linuxapp                        |    1 +
>  drivers/net/Makefile                          |    1 +
>  drivers/net/af_xdp/Makefile                   |   30 +
>  drivers/net/af_xdp/meson.build                |    7 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 1345 +++++++++++++++++++++++++
>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    4 +
>  lib/librte_mbuf/rte_mbuf.c                    |   15 +-
>  lib/librte_mbuf/rte_mbuf.h                    |    8 +-
>  lib/librte_mempool/rte_mempool.c              |    3 +
>  lib/librte_mempool/rte_mempool.h              |    1 +
>  mk/rte.app.mk                                 |    1 +
>  15 files changed, 1439 insertions(+), 10 deletions(-)
>  create mode 100644 drivers/net/af_xdp/Makefile
>  create mode 100644 drivers/net/af_xdp/meson.build
>  create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
>  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
>
> --
> 2.13.6
>
  
Qi Zhang Aug. 25, 2018, 6:11 a.m. UTC | #2
Sorry, the patch for kernel sample code is not complete. It should be as below

~~~~~~~~~~~~~~~~~~~~~~~PATCH START ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d69c8d78d3fd..44a6318043e7 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -76,6 +76,8 @@ static int opt_poll;
 static int opt_shared_packet_buffer;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags;
+static int opt_bind;
+static int opt_unbind;
 
 struct xdp_umem_uqueue {
 	u32 cached_prod;
@@ -662,6 +664,8 @@ static void usage(const char *prog)
 		"  -S, --xdp-skb=n	Use XDP skb-mod\n"
 		"  -N, --xdp-native=n	Enfore XDP native mode\n"
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
+		"  -b, --bind		Bind only.\n"
+		"  -u, --unbind		Unbind only.\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -674,7 +678,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
+		c = getopt_long(argc, argv, "rtli:q:psSNn:bu", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -711,6 +715,12 @@ static void parse_command_line(int argc, char **argv)
 		case 'n':
 			opt_interval = atoi(optarg);
 			break;
+		case 'b':
+			opt_bind = 1;
+			break;
+		case 'u':
+			opt_unbind = 1;
+			break;
 		default:
 			usage(basename(argv[0]));
 		}
@@ -898,6 +908,12 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
+	if (opt_unbind) {
+		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
+		printf("unbind.\n");
+		return 0;
+	}
+
 	snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
 
 	if (load_bpf_file(xdp_filename)) {
@@ -922,6 +938,11 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
+	if (opt_bind) {
+		printf("bind.\n");
+		return 0;
+	}
+
 	/* Create sockets... */
 	xsks[num_socks++] = xsk_configure(NULL);

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PATCH END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

BTW, there is bug in kernel bpftool on 4.18, it will cause segment fault when you try to dump bpf map with
#./tools/bpf/bpftool/bpftool map -p

So, please also apply below patch 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PATCH START ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 097b1a5e046b..0c661de58976 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -67,6 +67,7 @@ static const char * const map_type_name[] = {
 	[BPF_MAP_TYPE_SOCKMAP]		= "sockmap",
 	[BPF_MAP_TYPE_CPUMAP]		= "cpumap",
 	[BPF_MAP_TYPE_SOCKHASH]		= "sockhash",
+	[BPF_MAP_TYPE_XSKMAP]		= "xskmap"
 };

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PATCH END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Thursday, August 16, 2018 10:43 PM
> To: dev@dpdk.org
> Cc: Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn
> <bjorn.topel@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>
> Subject: [PATCH v3 0/6] PMD driver for AF_XDP
> 
> Overview
> ========
> 
> The patch set add a new PMD driver for AF_XDP which is a proposed faster
> version of AF_PACKET interface in Linux, see below link for detail AF_XDP
> introduction:
> https://lwn.net/Articles/750845/
> https://fosdem.org/2018/schedule/event/af_xdp/
> 
> AF_XDP roadmap
> ==============
> - The kernel 4.18 is out and af_xdp is included.
>   https://kernelnewbies.org/Linux_4.18
> - So far there is no zero copy supported driver be merged, but some are
>   on the way.
> 
> Change logs
> ===========
> 
> v3:
> - Re-work base on AF_XDP's interface changes.
> - Support multi-queues, each dpdk queue has its own xdp socket.
>   An xdp socket is always bound to a netdev queue.
>   We assume all xdp socket from the same ethdev are bound to the
>   same netdev queue, though a netdev queue still can be bound by
>   xdp sockets from different ethdev instances.
>   Below is an example of the mapping.
>   ------------------------------------------------------
>   | dpdk q0 | dpdk q1 | dpdk q0    | dpdk q0 | dpdk q1 |
>   ------------------------------------------------------
>   | xsk A   | xsk B   | xsk C      | xsk D   | xsk E   |<---|
>   ------------------------------------------------------    |
>   |  ETHDEV 0         | ETHDEV 1   |  ETHDEV 2         |    |
> DPDK
>   ------------------------------------------------------------------
>   |  netdev queue 0                |   netdev queue 1  |    |
> KERNEL
>   ------------------------------------------------------    |
>   |                  NETDEV eth0                       |    |
>   ------------------------------------------------------    |
>   |                    key   xsk                       |    |
>   |  ----------       --------------                   |    |
>   |  |         |      | 0  | xsk A |                   |    |
>   |  |         |      --------------                   |    |
>   |  |         |      | 2  | xsk B |                   |    |
>   |  | ebpf    |      ---------------------------------------
>   |  |         |      | 3  | xsk C |                   |
>   |  |     redirect ->|--------------                  |
>   |  |         |      | 4  | xsk D |                   |
>   |  |         |      --------------                   |
>   |  |---------|      | 5  | xsk E |                   |
>   |                   --------------                   |
>   |-----------------------------------------------------
> 
> - It is an open question that how to load ebpf to kernel and link to
>   specific netdev in DPDK, should it be part of PMD, or it should be handled
> by
>   an independent tool? In this patchset, it takes the second option, there will
>   be a "bind" stage before we start AF_XDP PMD, this includes below steps:
>   a) load ebpf program to the kernel, (the ebpf program must contain the
>      logic to redirect packet to a xdp socket base on a redirect map).
>   b) link ebpf program to specific network interface.
>   c) expose the xdp socket redirect map id and entries number to user,
>      so this will be parsed to PMD, and PMD will create xdp socket
>      for each queue and update the redirect map correctly.
>      (example:
> --vdev,iface=eth0,xsk_map_id=53,xsk_map_key_base=0,xsk_map_key_count
> =4)
> 
> v2:
> - fix lisence header
> - clean up bpf dependency, bpf program is embedded,  no
> "xdpsock_kern.o"
>   required
> - clean up make file, only linux_header is required
> - fix all the compile warning.
> - fix packet number return in Tx.
> 
> How to try
> ==========
> 
> 1. Take the kernel v4.18.
>    make sure you turn on XDP sockets when compiling
>    Networking support -->
>         Networking options -->
>                 [ * ] XDP sockets
> 2. in the kernel source code, apply below patch and compile the bpf sample
> code.
>    #make samples/bpf/
>    so the sample xdpsock can be used as a bind/unbind tool for af_xdp
>    PMD, sorry for this ugly, but in future, there could be a dedicated
>    tool in DPDK, if we agree with the idea that bpf configure in the kernel
>    should be separated from PMD.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~PATCH
> START~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index
> d69c8d78d3fd..44a6318043e7 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -76,6 +76,8 @@ static int opt_poll;
>  static int opt_shared_packet_buffer;
>  static int opt_interval = 1;
>  static u32 opt_xdp_bind_flags;
> +static int opt_bind;
> +static int opt_unbind;
> 
>  struct xdp_umem_uqueue {
>  	u32 cached_prod;
> @@ -662,6 +664,8 @@ static void usage(const char *prog)
>  		"  -S, --xdp-skb=n	Use XDP skb-mod\n"
>  		"  -N, --xdp-native=n	Enfore XDP native mode\n"
>  		"  -n, --interval=n	Specify statistics update interval (default 1
> sec).\n"
> +		"  -b, --bind		Bind only.\n"
> +		"  -u, --unbind		Unbind only.\n"
>  		"\n";
>  	fprintf(stderr, str, prog);
>  	exit(EXIT_FAILURE);
> @@ -674,7 +678,7 @@ static void parse_command_line(int argc, char
> **argv)
>  	opterr = 0;
> 
>  	for (;;) {
> -		c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
> +		c = getopt_long(argc, argv, "rtli:q:psSNn:bu", long_options,
>  				&option_index);
>  		if (c == -1)
>  			break;
> @@ -711,6 +715,12 @@ static void parse_command_line(int argc, char
> **argv)
>  		case 'n':
>  			opt_interval = atoi(optarg);
>  			break;
> +		case 'b':
> +			opt_bind = 1;
> +			break;
> +		case 'u':
> +			opt_unbind = 1;
> +			break;
>  		default:
>  			usage(basename(argv[0]));
>  		}
> @@ -898,6 +908,12 @@ int main(int argc, char **argv)
>  		exit(EXIT_FAILURE);
>  	}
> 
> +	if (opt_unbind) {
> +		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> 
> ~~~~~~~~~~~~~~~~~~~~~~~PATCH
> END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 3. bind
>   #./samples/bpf/xdpsock -i eth0 -b
> 
>   in this step, an ebpf binary xdpsock_kern.o is be loaded into the kernel
>   and linked to eth0, the ebpf source code is /samples/bpf/xdpsock_kern.c
>   you can modify it and re-compile for a different test.
> 
> 4. dump xdp socket map information.
>   #./tools/bpf/bpftool/bpftool map -p, you will see something like below.
> 
>   },{
>        "id": 56,
>        "type": "xskmap",
>        "name": "xsks_map",
>        "flags": 0,
>        "bytes_key": 4,
>        "bytes_value": 4,
>        "max_entries": 4,
>        "bytes_memlock": 4096
>    }
> 
>    in this case 56 is the map id and it has 4 entries
> 
> 5. start testpmd
> 
>    ./build/app/testpmd -c 0xc -n 4 --vdev
> eth_af_xdp,iface=enp59s0f0,xsk_map_id=56,xsk_map_key_start=2xsk_map_
> key_count=2 -- -i --rxq=2 --txq=2
> 
>     in this case, we reserved 2 entries (2,3) in the map, and they will be
> mapped to queue 0 and queue 1.
> 
> 6. unbind after test
>    ./sample/bpf/xdpsock -i eth0 -u.
> 
> Performance
> ===========
> Since no zero copy driver is ready yet.
> So far only tested with DRV and SKB mode on i40e 25G the result show
> identical with kernel sample "xdpsock"
> 
> Qi Zhang (6):
>   net/af_xdp: new PMD driver
>   lib/mbuf: enable parse flags when create mempool
>   lib/mempool: allow page size aligned mempool
>   net/af_xdp: use mbuf mempool for buffer management
>   net/af_xdp: enable zero copy
>   app/testpmd: add mempool flags parameter
> 
>  app/test-pmd/parameters.c                     |   12 +
>  app/test-pmd/testpmd.c                        |   15 +-
>  app/test-pmd/testpmd.h                        |    1 +
>  config/common_base                            |    5 +
>  config/common_linuxapp                        |    1 +
>  drivers/net/Makefile                          |    1 +
>  drivers/net/af_xdp/Makefile                   |   30 +
>  drivers/net/af_xdp/meson.build                |    7 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c           | 1345
> +++++++++++++++++++++++++
>  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    4 +
>  lib/librte_mbuf/rte_mbuf.c                    |   15 +-
>  lib/librte_mbuf/rte_mbuf.h                    |    8 +-
>  lib/librte_mempool/rte_mempool.c              |    3 +
>  lib/librte_mempool/rte_mempool.h              |    1 +
>  mk/rte.app.mk                                 |    1 +
>  15 files changed, 1439 insertions(+), 10 deletions(-)  create mode 100644
> drivers/net/af_xdp/Makefile  create mode 100644
> drivers/net/af_xdp/meson.build  create mode 100644
> drivers/net/af_xdp/rte_eth_af_xdp.c
>  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> 
> --
> 2.13.6
  
Qi Zhang Aug. 28, 2018, 2:11 p.m. UTC | #3
Hi William:

> -----Original Message-----
> From: William Tu [mailto:u9012063@gmail.com]
> Sent: Friday, August 24, 2018 12:25 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Karlsson, Magnus <magnus.karlsson@intel.com>; Topel,
> Bjorn <bjorn.topel@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Li,
> Xiaoyun <xiaoyun.li@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/6] PMD driver for AF_XDP
> 
> Hi Zhang Qi,
> 
> I'm not familiar with DPDK code, but I'm curious about the benefits of using
> AF_XDP pmd, specifically I have a couple questions:
> 
> 1) With zero-copy driver support, is AF_XDP pmd expects to have similar
> performance than other pmd? 

Zero-copy will improve performance a lot, but it still have gap with native DPDK PMD.
basically it's kind of less performance but more flexible solution.

BTW, Patches to enable zero copy for i40e just be published by Bjorn, there is some performance data for your reference.
http://lists.openwall.net/netdev/2018/08/28/62


> Since AF_XDP is still using native device driver,
> isn't the interrupt still there and not "poll-mode"
> anymore?

Yes, it's still napi->poll triggered by interrupt.

> 
> 2) does the patch expect user to customize the ebpf/xdp code so that this
> becomes another way to extend dpdk datapath?

Yes, this provides another option to use kernel's eBPF eco-system for packet filtering,.
And it will be easy for us to develop some tool to load/link/expose ebpf as part of DPDK I think.

According to AF_XDP PMD, my view is since DPDK is very popular, it is becoming some standard way to develop network applications.
So a DPDK PMD is going to be a bridge for developers to take advantage of the AF_XDP technology if compared to deal with the XDP Socket and libc directly.

Regards
Qi

> 
> Thank you
> William
> 
> On Thu, Aug 16, 2018 at 7:42 AM Qi Zhang <qi.z.zhang@intel.com> wrote:
> >
> > Overview
> > ========
> >
> > The patch set add a new PMD driver for AF_XDP which is a proposed
> > faster version of AF_PACKET interface in Linux, see below link for
> > detail AF_XDP introduction:
> > https://lwn.net/Articles/750845/
> > https://fosdem.org/2018/schedule/event/af_xdp/
> >
> > AF_XDP roadmap
> > ==============
> > - The kernel 4.18 is out and af_xdp is included.
> >   https://kernelnewbies.org/Linux_4.18
> > - So far there is no zero copy supported driver be merged, but some are
> >   on the way.
> >
> > Change logs
> > ===========
> >
> > v3:
> > - Re-work base on AF_XDP's interface changes.
> > - Support multi-queues, each dpdk queue has its own xdp socket.
> >   An xdp socket is always bound to a netdev queue.
> >   We assume all xdp socket from the same ethdev are bound to the
> >   same netdev queue, though a netdev queue still can be bound by
> >   xdp sockets from different ethdev instances.
> >   Below is an example of the mapping.
> >   ------------------------------------------------------
> >   | dpdk q0 | dpdk q1 | dpdk q0    | dpdk q0 | dpdk q1 |
> >   ------------------------------------------------------
> >   | xsk A   | xsk B   | xsk C      | xsk D   | xsk E   |<---|
> >   ------------------------------------------------------    |
> >   |  ETHDEV 0         | ETHDEV 1   |  ETHDEV 2         |    |
> DPDK
> >   ------------------------------------------------------------------
> >   |  netdev queue 0                |   netdev queue 1  |    |
> KERNEL
> >   ------------------------------------------------------    |
> >   |                  NETDEV eth0                       |    |
> >   ------------------------------------------------------    |
> >   |                    key   xsk                       |    |
> >   |  ----------       --------------                   |    |
> >   |  |         |      | 0  | xsk A |                   |    |
> >   |  |         |      --------------                   |    |
> >   |  |         |      | 2  | xsk B |                   |    |
> >   |  | ebpf    |      ---------------------------------------
> >   |  |         |      | 3  | xsk C |                   |
> >   |  |     redirect ->|--------------                  |
> >   |  |         |      | 4  | xsk D |                   |
> >   |  |         |      --------------                   |
> >   |  |---------|      | 5  | xsk E |                   |
> >   |                   --------------                   |
> >   |-----------------------------------------------------
> >
> > - It is an open question that how to load ebpf to kernel and link to
> >   specific netdev in DPDK, should it be part of PMD, or it should be
> handled by
> >   an independent tool? In this patchset, it takes the second option, there
> will
> >   be a "bind" stage before we start AF_XDP PMD, this includes below
> steps:
> >   a) load ebpf program to the kernel, (the ebpf program must contain the
> >      logic to redirect packet to a xdp socket base on a redirect map).
> >   b) link ebpf program to specific network interface.
> >   c) expose the xdp socket redirect map id and entries number to user,
> >      so this will be parsed to PMD, and PMD will create xdp socket
> >      for each queue and update the redirect map correctly.
> >      (example:
> >
> --vdev,iface=eth0,xsk_map_id=53,xsk_map_key_base=0,xsk_map_key_count
> =4
> > )
> >
> > v2:
> > - fix lisence header
> > - clean up bpf dependency, bpf program is embedded,  no
> "xdpsock_kern.o"
> >   required
> > - clean up make file, only linux_header is required
> > - fix all the compile warning.
> > - fix packet number return in Tx.
> >
> > How to try
> > ==========
> >
> > 1. Take the kernel v4.18.
> >    make sure you turn on XDP sockets when compiling
> >    Networking support -->
> >         Networking options -->
> >                 [ * ] XDP sockets
> > 2. in the kernel source code, apply below patch and compile the bpf sample
> code.
> >    #make samples/bpf/
> >    so the sample xdpsock can be used as a bind/unbind tool for af_xdp
> >    PMD, sorry for this ugly, but in future, there could be a dedicated
> >    tool in DPDK, if we agree with the idea that bpf configure in the kernel
> >    should be separated from PMD.
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~PATCH
> > START~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> > index d69c8d78d3fd..44a6318043e7 100644
> > --- a/samples/bpf/xdpsock_user.c
> > +++ b/samples/bpf/xdpsock_user.c
> > @@ -76,6 +76,8 @@ static int opt_poll;  static int
> > opt_shared_packet_buffer;  static int opt_interval = 1;  static u32
> > opt_xdp_bind_flags;
> > +static int opt_bind;
> > +static int opt_unbind;
> >
> >  struct xdp_umem_uqueue {
> >         u32 cached_prod;
> > @@ -662,6 +664,8 @@ static void usage(const char *prog)
> >                 "  -S, --xdp-skb=n      Use XDP skb-mod\n"
> >                 "  -N, --xdp-native=n   Enfore XDP native mode\n"
> >                 "  -n, --interval=n     Specify statistics update
> interval (default 1 sec).\n"
> > +               "  -b, --bind           Bind only.\n"
> > +               "  -u, --unbind         Unbind only.\n"
> >                 "\n";
> >         fprintf(stderr, str, prog);
> >         exit(EXIT_FAILURE);
> > @@ -674,7 +678,7 @@ static void parse_command_line(int argc, char
> **argv)
> >         opterr = 0;
> >
> >         for (;;) {
> > -               c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
> > +               c = getopt_long(argc, argv, "rtli:q:psSNn:bu",
> > + long_options,
> >                                 &option_index);
> >                 if (c == -1)
> >                         break;
> > @@ -711,6 +715,12 @@ static void parse_command_line(int argc, char
> **argv)
> >                 case 'n':
> >                         opt_interval = atoi(optarg);
> >                         break;
> > +               case 'b':
> > +                       opt_bind = 1;
> > +                       break;
> > +               case 'u':
> > +                       opt_unbind = 1;
> > +                       break;
> >                 default:
> >                         usage(basename(argv[0]));
> >                 }
> > @@ -898,6 +908,12 @@ int main(int argc, char **argv)
> >                 exit(EXIT_FAILURE);
> >         }
> >
> > +       if (opt_unbind) {
> > +               bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~PATCH
> > END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > 3. bind
> >   #./samples/bpf/xdpsock -i eth0 -b
> >
> >   in this step, an ebpf binary xdpsock_kern.o is be loaded into the kernel
> >   and linked to eth0, the ebpf source code is
> /samples/bpf/xdpsock_kern.c
> >   you can modify it and re-compile for a different test.
> >
> > 4. dump xdp socket map information.
> >   #./tools/bpf/bpftool/bpftool map -p, you will see something like below.
> >
> >   },{
> >        "id": 56,
> >        "type": "xskmap",
> >        "name": "xsks_map",
> >        "flags": 0,
> >        "bytes_key": 4,
> >        "bytes_value": 4,
> >        "max_entries": 4,
> >        "bytes_memlock": 4096
> >    }
> >
> >    in this case 56 is the map id and it has 4 entries
> >
> > 5. start testpmd
> >
> >    ./build/app/testpmd -c 0xc -n 4 --vdev
> >
> eth_af_xdp,iface=enp59s0f0,xsk_map_id=56,xsk_map_key_start=2xsk_map_
> ke
> > y_count=2 -- -i --rxq=2 --txq=2
> >
> >     in this case, we reserved 2 entries (2,3) in the map, and they will be
> mapped to queue 0 and queue 1.
> >
> > 6. unbind after test
> >    ./sample/bpf/xdpsock -i eth0 -u.
> >
> > Performance
> > ===========
> > Since no zero copy driver is ready yet.
> > So far only tested with DRV and SKB mode on i40e 25G the result show
> > identical with kernel sample "xdpsock"
> >
> > Qi Zhang (6):
> >   net/af_xdp: new PMD driver
> >   lib/mbuf: enable parse flags when create mempool
> >   lib/mempool: allow page size aligned mempool
> >   net/af_xdp: use mbuf mempool for buffer management
> >   net/af_xdp: enable zero copy
> >   app/testpmd: add mempool flags parameter
> >
> >  app/test-pmd/parameters.c                     |   12 +
> >  app/test-pmd/testpmd.c                        |   15 +-
> >  app/test-pmd/testpmd.h                        |    1 +
> >  config/common_base                            |    5 +
> >  config/common_linuxapp                        |    1 +
> >  drivers/net/Makefile                          |    1 +
> >  drivers/net/af_xdp/Makefile                   |   30 +
> >  drivers/net/af_xdp/meson.build                |    7 +
> >  drivers/net/af_xdp/rte_eth_af_xdp.c           | 1345
> +++++++++++++++++++++++++
> >  drivers/net/af_xdp/rte_pmd_af_xdp_version.map |    4 +
> >  lib/librte_mbuf/rte_mbuf.c                    |   15 +-
> >  lib/librte_mbuf/rte_mbuf.h                    |    8 +-
> >  lib/librte_mempool/rte_mempool.c              |    3 +
> >  lib/librte_mempool/rte_mempool.h              |    1 +
> >  mk/rte.app.mk                                 |    1 +
> >  15 files changed, 1439 insertions(+), 10 deletions(-)  create mode
> > 100644 drivers/net/af_xdp/Makefile  create mode 100644
> > drivers/net/af_xdp/meson.build  create mode 100644
> > drivers/net/af_xdp/rte_eth_af_xdp.c
> >  create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> >
> > --
> > 2.13.6
> >
  

Patch

========

The patch set add a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below link for
detail AF_XDP introduction:
https://lwn.net/Articles/750845/
https://fosdem.org/2018/schedule/event/af_xdp/

AF_XDP roadmap
==============
- The kernel 4.18 is out and af_xdp is included.
  https://kernelnewbies.org/Linux_4.18
- So far there is no zero copy supported driver be merged, but some are
  on the way.

Change logs
===========

v3:
- Re-work base on AF_XDP's interface changes.
- Support multi-queues, each dpdk queue has its own xdp socket.
  An xdp socket is always bound to a netdev queue.
  We assume all xdp socket from the same ethdev are bound to the
  same netdev queue, though a netdev queue still can be bound by
  xdp sockets from different ethdev instances.
  Below is an example of the mapping.
  ------------------------------------------------------
  | dpdk q0 | dpdk q1 | dpdk q0    | dpdk q0 | dpdk q1 |
  ------------------------------------------------------
  | xsk A   | xsk B   | xsk C      | xsk D   | xsk E   |<---|
  ------------------------------------------------------    |
  |  ETHDEV 0         | ETHDEV 1   |  ETHDEV 2         |    |  DPDK
  ------------------------------------------------------------------ 
  |  netdev queue 0                |   netdev queue 1  |    |  KERNEL
  ------------------------------------------------------    |
  |                  NETDEV eth0                       |    |
  ------------------------------------------------------    |
  |                    key   xsk                       |    |
  |  ----------       --------------                   |    |
  |  |         |      | 0  | xsk A |                   |    |
  |  |         |      --------------                   |    |
  |  |         |      | 2  | xsk B |                   |    |
  |  | ebpf    |      ---------------------------------------
  |  |         |      | 3  | xsk C |                   |
  |  |     redirect ->|--------------                  |
  |  |         |      | 4  | xsk D |                   |
  |  |         |      --------------                   |
  |  |---------|      | 5  | xsk E |                   |
  |                   --------------                   |
  |-----------------------------------------------------

- It is an open question that how to load ebpf to kernel and link to
  specific netdev in DPDK, should it be part of PMD, or it should be handled by
  an independent tool? In this patchset, it takes the second option, there will
  be a "bind" stage before we start AF_XDP PMD, this includes below steps:
  a) load ebpf program to the kernel, (the ebpf program must contain the
     logic to redirect packet to a xdp socket base on a redirect map).
  b) link ebpf program to specific network interface.
  c) expose the xdp socket redirect map id and entries number to user,
     so this will be parsed to PMD, and PMD will create xdp socket
     for each queue and update the redirect map correctly.
     (example: --vdev,iface=eth0,xsk_map_id=53,xsk_map_key_base=0,xsk_map_key_count=4)

v2:
- fix lisence header
- clean up bpf dependency, bpf program is embedded,  no "xdpsock_kern.o"
  required
- clean up make file, only linux_header is required
- fix all the compile warning.
- fix packet number return in Tx.

How to try
==========

1. Take the kernel v4.18.
   make sure you turn on XDP sockets when compiling
   Networking support -->
        Networking options -->
                [ * ] XDP sockets
2. in the kernel source code, apply below patch and compile the bpf sample code.
   #make samples/bpf/
   so the sample xdpsock can be used as a bind/unbind tool for af_xdp
   PMD, sorry for this ugly, but in future, there could be a dedicated
   tool in DPDK, if we agree with the idea that bpf configure in the kernel
   should be separated from PMD.

~~~~~~~~~~~~~~~~~~~~~~~PATCH START~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d69c8d78d3fd..44a6318043e7 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -76,6 +76,8 @@  static int opt_poll;
 static int opt_shared_packet_buffer;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags;
+static int opt_bind;
+static int opt_unbind;
 
 struct xdp_umem_uqueue {
 	u32 cached_prod;
@@ -662,6 +664,8 @@  static void usage(const char *prog)
 		"  -S, --xdp-skb=n	Use XDP skb-mod\n"
 		"  -N, --xdp-native=n	Enfore XDP native mode\n"
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
+		"  -b, --bind		Bind only.\n"
+		"  -u, --unbind		Unbind only.\n"
 		"\n";
 	fprintf(stderr, str, prog);
 	exit(EXIT_FAILURE);
@@ -674,7 +678,7 @@  static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
+		c = getopt_long(argc, argv, "rtli:q:psSNn:bu", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -711,6 +715,12 @@  static void parse_command_line(int argc, char **argv)
 		case 'n':
 			opt_interval = atoi(optarg);
 			break;
+		case 'b':
+			opt_bind = 1;
+			break;
+		case 'u':
+			opt_unbind = 1;
+			break;
 		default:
 			usage(basename(argv[0]));
 		}
@@ -898,6 +908,12 @@  int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
+	if (opt_unbind) {
+		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);

~~~~~~~~~~~~~~~~~~~~~~~PATCH END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3. bind
  #./samples/bpf/xdpsock -i eth0 -b

  in this step, an ebpf binary xdpsock_kern.o is be loaded into the kernel