From patchwork Fri Mar 22 13:01:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaolong Ye X-Patchwork-Id: 51523 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 64A091B5E2; Fri, 22 Mar 2019 14:05:51 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id B77151B57B for ; Fri, 22 Mar 2019 14:05:44 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 06:05:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="124954357" Received: from yexl-server.sh.intel.com ([10.67.110.206]) by orsmga007.jf.intel.com with ESMTP; 22 Mar 2019 06:05:43 -0700 From: Xiaolong Ye To: dev@dpdk.org Cc: Qi Zhang , Karlsson Magnus , Topel Bjorn , Xiaolong Ye Date: Fri, 22 Mar 2019 21:01:25 +0800 Message-Id: <20190322130129.109964-2-xiaolong.ye@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190322130129.109964-1-xiaolong.ye@intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190322130129.109964-1-xiaolong.ye@intel.com> Subject: [dpdk-dev] [PATCH v4 1/5] net/af_xdp: introduce AF XDP PMD driver X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a new PMD driver for AF_XDP which is a proposed faster version of AF_PACKET interface in Linux. More info about AF_XDP, please refer to [1] [2]. This is the vanilla version PMD which just uses a raw buffer registered as the umem. [1] https://fosdem.org/2018/schedule/event/af_xdp/ [2] https://lwn.net/Articles/745934/ Signed-off-by: Xiaolong Ye --- MAINTAINERS | 6 + config/common_base | 5 + config/common_linux | 1 + doc/guides/nics/af_xdp.rst | 45 + doc/guides/nics/features/af_xdp.ini | 11 + doc/guides/nics/index.rst | 1 + doc/guides/rel_notes/release_19_05.rst | 7 + drivers/net/Makefile | 1 + drivers/net/af_xdp/Makefile | 32 + drivers/net/af_xdp/meson.build | 21 + drivers/net/af_xdp/rte_eth_af_xdp.c | 940 ++++++++++++++++++ drivers/net/af_xdp/rte_pmd_af_xdp_version.map | 3 + drivers/net/meson.build | 1 + mk/rte.app.mk | 1 + 14 files changed, 1075 insertions(+) create mode 100644 doc/guides/nics/af_xdp.rst create mode 100644 doc/guides/nics/features/af_xdp.ini create mode 100644 drivers/net/af_xdp/Makefile create mode 100644 drivers/net/af_xdp/meson.build create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map diff --git a/MAINTAINERS b/MAINTAINERS index 452b8eb82..1cc54b439 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -468,6 +468,12 @@ M: John W. Linville F: drivers/net/af_packet/ F: doc/guides/nics/features/afpacket.ini +Linux AF_XDP +M: Xiaolong Ye +M: Qi Zhang +F: drivers/net/af_xdp/ +F: doc/guides/nics/features/af_xdp.rst + Amazon ENA M: Marcin Wojtas M: Michal Krawczyk diff --git a/config/common_base b/config/common_base index 0b09a9348..4044de205 100644 --- a/config/common_base +++ b/config/common_base @@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n # CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n +# +# Compile software PMD backed by AF_XDP sockets (Linux only) +# +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n + # # Compile link bonding PMD library # diff --git a/config/common_linux b/config/common_linux index 75334273d..0b1249da0 100644 --- a/config/common_linux +++ b/config/common_linux @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_IFC_PMD=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst new file mode 100644 index 000000000..dd5654dd1 --- /dev/null +++ b/doc/guides/nics/af_xdp.rst @@ -0,0 +1,45 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2018 Intel Corporation. + +AF_XDP Poll Mode Driver +========================== + +AF_XDP is an address family that is optimized for high performance +packet processing. AF_XDP sockets enable the possibility for XDP program to +redirect packets to a memory buffer in userspace. + +For the full details behind AF_XDP socket, you can refer to +`AF_XDP documentation in the Kernel +`_. + +This Linux-specific PMD driver creates the AF_XDP socket and binds it to a +specific netdev queue, it allows a DPDK application to send and receive raw +packets through the socket which would bypass the kernel network stack. +Current implementation only supports single queue, multi-queues feature will +be added later. + +Options +------- + +The following options can be provided to set up an af_xdp port in DPDK. + +* ``iface`` - name of the Kernel interface to attach to (required); +* ``queue`` - netdev queue id (optional, default 0); + +Prerequisites +------------- + +This is a Linux-specific PMD, thus the following prerequisites apply: + +* A Linux Kernel (version > 4.18) with XDP sockets configuration enabled; +* libbpf (within kernel version > 5.1) with latest af_xdp support installed +* A Kernel bound interface to attach to. + +Set up an af_xdp interface +----------------------------- + +The following example will set up an af_xdp interface in DPDK: + +.. code-block:: console + + --vdev eth_af_xdp,iface=ens786f1,queue=0 diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini new file mode 100644 index 000000000..36953c2de --- /dev/null +++ b/doc/guides/nics/features/af_xdp.ini @@ -0,0 +1,11 @@ +; +; Supported features of the 'af_xdp' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +MTU update = Y +Promiscuous mode = Y +Stats per queue = Y +x86-64 = Y diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 5c80e3baa..a4b80a3d0 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -12,6 +12,7 @@ Network Interface Controller Drivers features build_and_test af_packet + af_xdp ark atlantic avp diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst index 61a2c7383..062facf89 100644 --- a/doc/guides/rel_notes/release_19_05.rst +++ b/doc/guides/rel_notes/release_19_05.rst @@ -65,6 +65,13 @@ New Features process. * Added support for Rx packet types list in a secondary process. +* **Added the AF_XDP PMD.** + + Added a Linux-specific PMD driver for AF_XDP, it can create the AF_XDP socket + and bind it to a specific netdev queue, it allows a DPDK application to send + and receive raw packets through the socket which would bypass the kernel + network stack to achieve high performance packet processing. + * **Updated Mellanox drivers.** New features and improvements were done in mlx4 and mlx5 PMDs: diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 502869a87..5d401b8c5 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d) endif DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark DIRS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += atlantic DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile new file mode 100644 index 000000000..db7d9aa57 --- /dev/null +++ b/drivers/net/af_xdp/Makefile @@ -0,0 +1,32 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pmd_af_xdp.a + +EXPORT_MAP := rte_pmd_af_xdp_version.map + +LIBABIVER := 1 + +CFLAGS += -O3 + +# require kernel version >= v5.1-rc1 +CFLAGS += -I$(RTE_KERNELDIR)/tools/include +CFLAGS += -I$(RTE_KERNELDIR)/tools/lib/bpf + +CFLAGS += $(WERROR_FLAGS) +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs +LDLIBS += -lrte_bus_vdev +LDLIBS += -lbpf + +# +# all source are stored in SRCS-y +# +SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build new file mode 100644 index 000000000..635e67483 --- /dev/null +++ b/drivers/net/af_xdp/meson.build @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +if host_machine.system() != 'linux' + build = false +endif + +bpf_dep = dependency('libbpf', required: false) +if bpf_dep.found() + build = true +else + bpf_dep = cc.find_library('libbpf', required: false) + if bpf_dep.found() and cc.has_header('xsk.h', dependencies: bpf_dep) + build = true + pkgconfig_extra_libs += '-lbpf' + else + build = false + endif +endif +sources = files('rte_eth_af_xdp.c') +ext_deps += bpf_dep diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c new file mode 100644 index 000000000..9f0012347 --- /dev/null +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -0,0 +1,940 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2019 Intel Corporation. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef SOL_XDP +#define SOL_XDP 283 +#endif + +#ifndef AF_XDP +#define AF_XDP 44 +#endif + +#ifndef PF_XDP +#define PF_XDP AF_XDP +#endif + +static int af_xdp_logtype; + +#define AF_XDP_LOG(level, fmt, args...) \ + rte_log(RTE_LOG_ ## level, af_xdp_logtype, \ + "%s(): " fmt "\n", __func__, ##args) + +#define ETH_AF_XDP_IFACE_ARG "iface" +#define ETH_AF_XDP_QUEUE_IDX_ARG "queue" + +#define ETH_AF_XDP_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE +#define ETH_AF_XDP_NUM_BUFFERS 4096 +#define ETH_AF_XDP_DATA_HEADROOM 0 +#define ETH_AF_XDP_DFLT_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS +#define ETH_AF_XDP_DFLT_QUEUE_IDX 0 + +#define ETH_AF_XDP_RX_BATCH_SIZE 32 +#define ETH_AF_XDP_TX_BATCH_SIZE 32 + +#define ETH_AF_XDP_MAX_QUEUE_PAIRS 16 + +struct xsk_umem_info { + struct xsk_ring_prod fq; + struct xsk_ring_cons cq; + struct xsk_umem *umem; + struct rte_ring *buf_ring; + void *buffer; +}; + +struct rx_stats { + uint64_t rx_pkts; + uint64_t rx_bytes; + uint64_t rx_dropped; +}; + +struct pkt_rx_queue { + struct xsk_ring_cons rx; + struct xsk_umem_info *umem; + struct xsk_socket *xsk; + struct rte_mempool *mb_pool; + + struct rx_stats stats; + + struct pkt_tx_queue *pair; + uint16_t queue_idx; +}; + +struct tx_stats { + uint64_t tx_pkts; + uint64_t err_pkts; + uint64_t tx_bytes; +}; + +struct pkt_tx_queue { + struct xsk_ring_prod tx; + + struct tx_stats stats; + + struct pkt_rx_queue *pair; + uint16_t queue_idx; +}; + +struct pmd_internals { + int if_index; + char if_name[IFNAMSIZ]; + uint16_t queue_idx; + struct ether_addr eth_addr; + struct xsk_umem_info *umem; + struct rte_mempool *mb_pool_share; + + struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS]; + struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS]; +}; + +static const char * const valid_arguments[] = { + ETH_AF_XDP_IFACE_ARG, + ETH_AF_XDP_QUEUE_IDX_ARG, + NULL +}; + +static struct rte_eth_link pmd_link = { + .link_speed = ETH_SPEED_NUM_10G, + .link_duplex = ETH_LINK_FULL_DUPLEX, + .link_status = ETH_LINK_DOWN, + .link_autoneg = ETH_LINK_AUTONEG +}; + +static inline int +reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size) +{ + struct xsk_ring_prod *fq = &umem->fq; + uint32_t idx; + void *addr = NULL; + int i, ret; + + ret = xsk_ring_prod__reserve(fq, reserve_size, &idx); + if (!ret) { + AF_XDP_LOG(ERR, "Failed to reserve enough fq descs.\n"); + return ret; + } + + for (i = 0; i < reserve_size; i++) { + __u64 *fq_addr; + if (rte_ring_dequeue(umem->buf_ring, &addr)) { + i--; + break; + } + fq_addr = xsk_ring_prod__fill_addr(fq, idx++); + *fq_addr = (uint64_t)addr; + } + + xsk_ring_prod__submit(fq, i); + + return 0; +} + +static uint16_t +eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct pkt_rx_queue *rxq = queue; + struct xsk_ring_cons *rx = &rxq->rx; + struct xsk_umem_info *umem = rxq->umem; + struct xsk_ring_prod *fq = &umem->fq; + uint32_t idx_rx; + uint32_t free_thresh = fq->size >> 1; + struct rte_mbuf *mbufs[ETH_AF_XDP_TX_BATCH_SIZE]; + unsigned long dropped = 0; + unsigned long rx_bytes = 0; + uint16_t count = 0; + int rcvd, i; + + nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE); + + rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx); + if (rcvd == 0) + return 0; + + if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh) + (void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE); + + if (rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0) + return 0; + + for (i = 0; i < rcvd; i++) { + const struct xdp_desc *desc; + uint64_t addr; + uint32_t len; + void *pkt; + + desc = xsk_ring_cons__rx_desc(rx, idx_rx++); + addr = desc->addr; + len = desc->len; + pkt = xsk_umem__get_data(rxq->umem->buffer, addr); + + rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len); + rte_pktmbuf_pkt_len(mbufs[i]) = len; + rte_pktmbuf_data_len(mbufs[i]) = len; + rx_bytes += len; + bufs[count++] = mbufs[i]; + + rte_ring_enqueue(umem->buf_ring, (void *)addr); + } + + xsk_ring_cons__release(rx, rcvd); + + /* statistics */ + rxq->stats.rx_pkts += (rcvd - dropped); + rxq->stats.rx_bytes += rx_bytes; + + return count; +} + +static void pull_umem_cq(struct xsk_umem_info *umem, int size) +{ + struct xsk_ring_cons *cq = &umem->cq; + size_t i, n; + uint32_t idx_cq; + + n = xsk_ring_cons__peek(cq, size, &idx_cq); + + for (i = 0; i < n; i++) { + uint64_t addr; + addr = *xsk_ring_cons__comp_addr(cq, idx_cq++); + rte_ring_enqueue(umem->buf_ring, (void *)addr); + } + + xsk_ring_cons__release(cq, n); +} + +static void kick_tx(struct pkt_tx_queue *txq) +{ + struct xsk_umem_info *umem = txq->pair->umem; + + while (send(xsk_socket__fd(txq->pair->xsk), NULL, + 0, MSG_DONTWAIT) < 0) { + /* some thing unexpected */ + if (errno != EBUSY && errno != EAGAIN && errno != EINTR) + break; + + /* pull from complete qeueu to leave more space */ + if (errno == EAGAIN) + pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE); + } + pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE); +} + +static uint16_t +eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct pkt_tx_queue *txq = queue; + struct xsk_umem_info *umem = txq->pair->umem; + struct rte_mbuf *mbuf; + void *addrs[ETH_AF_XDP_TX_BATCH_SIZE]; + unsigned long tx_bytes = 0; + int i, valid = 0; + uint32_t idx_tx; + + nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE); + + pull_umem_cq(umem, nb_pkts); + + nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs, + nb_pkts, NULL); + if (nb_pkts == 0) + return 0; + + if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) { + kick_tx(txq); + return 0; + } + + for (i = 0; i < nb_pkts; i++) { + struct xdp_desc *desc; + void *pkt; + uint32_t buf_len = ETH_AF_XDP_FRAME_SIZE + - ETH_AF_XDP_DATA_HEADROOM; + desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i); + mbuf = bufs[i]; + if (mbuf->pkt_len <= buf_len) { + desc->addr = (uint64_t)addrs[valid]; + desc->len = mbuf->pkt_len; + pkt = xsk_umem__get_data(umem->buffer, + desc->addr); + rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), + desc->len); + valid++; + tx_bytes += mbuf->pkt_len; + } + rte_pktmbuf_free(mbuf); + } + + xsk_ring_prod__submit(&txq->tx, nb_pkts); + + kick_tx(txq); + + if (valid < nb_pkts) + rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid], + nb_pkts - valid, NULL); + + txq->stats.err_pkts += nb_pkts - valid; + txq->stats.tx_pkts += valid; + txq->stats.tx_bytes += tx_bytes; + + return nb_pkts; +} + +static int +eth_dev_start(struct rte_eth_dev *dev) +{ + dev->data->dev_link.link_status = ETH_LINK_UP; + + return 0; +} + +/* This function gets called when the current port gets stopped. */ +static void +eth_dev_stop(struct rte_eth_dev *dev) +{ + dev->data->dev_link.link_status = ETH_LINK_DOWN; +} + +static int +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) +{ + /* rx/tx must be paired */ + if (dev->data->nb_rx_queues != dev->data->nb_tx_queues) + return -EINVAL; + + return 0; +} + +static void +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) +{ + struct pmd_internals *internals = dev->data->dev_private; + + dev_info->if_index = internals->if_index; + dev_info->max_mac_addrs = 1; + dev_info->max_rx_pktlen = ETH_FRAME_LEN; + dev_info->max_rx_queues = 1; + dev_info->max_tx_queues = 1; + + dev_info->default_rxportconf.nb_queues = 1; + dev_info->default_txportconf.nb_queues = 1; + dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS; + dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS; +} + +static int +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct xdp_statistics xdp_stats; + struct pkt_rx_queue *rxq; + socklen_t optlen; + int i, ret; + + for (i = 0; i < dev->data->nb_rx_queues; i++) { + optlen = sizeof(struct xdp_statistics); + rxq = &internals->rx_queues[i]; + stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts; + stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes; + + stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts; + stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes; + + stats->ipackets += stats->q_ipackets[i]; + stats->ibytes += stats->q_ibytes[i]; + stats->imissed += internals->rx_queues[i].stats.rx_dropped; + ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP, + XDP_STATISTICS, &xdp_stats, &optlen); + if (ret != 0) { + AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n"); + return -1; + } + stats->imissed += xdp_stats.rx_dropped; + + stats->opackets += stats->q_opackets[i]; + stats->oerrors += internals->tx_queues[i].stats.err_pkts; + stats->obytes += stats->q_obytes[i]; + } + + return 0; +} + +static void +eth_stats_reset(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + int i; + + for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) { + memset(&internals->rx_queues[i].stats, 0, + sizeof(struct rx_stats)); + memset(&internals->tx_queues[i].stats, 0, + sizeof(struct tx_stats)); + } +} + +static void remove_xdp_program(struct pmd_internals *internals) +{ + uint32_t curr_prog_id = 0; + + if (bpf_get_link_xdp_id(internals->if_index, &curr_prog_id, + XDP_FLAGS_UPDATE_IF_NOEXIST)) { + AF_XDP_LOG(ERR, "bpf_get_link_xdp_id failed\n"); + return; + } + bpf_set_link_xdp_fd(internals->if_index, -1, + XDP_FLAGS_UPDATE_IF_NOEXIST); +} + +static void +eth_dev_close(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct pkt_rx_queue *rxq; + int i; + + AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n", + rte_socket_id()); + + for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) { + rxq = &internals->rx_queues[i]; + if (rxq->umem == NULL) + break; + xsk_socket__delete(rxq->xsk); + } + + (void)xsk_umem__delete(internals->umem->umem); + remove_xdp_program(internals); +} + +static void +eth_queue_release(void *q __rte_unused) +{ +} + +static int +eth_link_update(struct rte_eth_dev *dev __rte_unused, + int wait_to_complete __rte_unused) +{ + return 0; +} + +static void xdp_umem_destroy(struct xsk_umem_info *umem) +{ + free(umem->buffer); + umem->buffer = NULL; + + rte_ring_free(umem->buf_ring); + umem->buf_ring = NULL; + + rte_free(umem); + umem = NULL; +} + +static struct xsk_umem_info *xdp_umem_configure(void) +{ + struct xsk_umem_info *umem; + struct xsk_umem_config usr_config = { + .fill_size = ETH_AF_XDP_DFLT_NUM_DESCS, + .comp_size = ETH_AF_XDP_DFLT_NUM_DESCS, + .frame_size = ETH_AF_XDP_FRAME_SIZE, + .frame_headroom = ETH_AF_XDP_DATA_HEADROOM }; + void *bufs = NULL; + int ret; + uint64_t i; + + umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id()); + if (umem == NULL) { + AF_XDP_LOG(ERR, "Failed to allocate umem info"); + return NULL; + } + + umem->buf_ring = rte_ring_create("af_xdp_ring", + ETH_AF_XDP_NUM_BUFFERS, + SOCKET_ID_ANY, + 0x0); + if (umem->buf_ring == NULL) { + AF_XDP_LOG(ERR, "Failed to create rte_ring\n"); + goto err; + } + + for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++) + rte_ring_enqueue(umem->buf_ring, + (void *)(i * ETH_AF_XDP_FRAME_SIZE + + ETH_AF_XDP_DATA_HEADROOM)); + + if (posix_memalign(&bufs, getpagesize(), + ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) { + AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n"); + goto err; + } + ret = xsk_umem__create(&umem->umem, bufs, + ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE, + &umem->fq, &umem->cq, + &usr_config); + + if (ret) { + AF_XDP_LOG(ERR, "Failed to create umem"); + goto err; + } + umem->buffer = bufs; + + return umem; + +err: + xdp_umem_destroy(umem); + return NULL; +} + +static int +xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq, + int ring_size) +{ + struct xsk_socket_config cfg; + struct pkt_tx_queue *txq = rxq->pair; + int ret = 0; + int reserve_size; + + rxq->umem = xdp_umem_configure(); + if (rxq->umem == NULL) { + ret = -ENOMEM; + goto err; + } + + cfg.rx_size = ring_size; + cfg.tx_size = ring_size; + cfg.libbpf_flags = 0; + cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST; + cfg.bind_flags = 0; + ret = xsk_socket__create(&rxq->xsk, internals->if_name, + internals->queue_idx, rxq->umem->umem, &rxq->rx, + &txq->tx, &cfg); + if (ret) { + AF_XDP_LOG(ERR, "Failed to create xsk socket.\n"); + goto err; + } + + reserve_size = ETH_AF_XDP_DFLT_NUM_DESCS / 2; + ret = reserve_fill_queue(rxq->umem, reserve_size); + if (ret) { + AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n"); + goto err; + } + + return 0; + +err: + xdp_umem_destroy(rxq->umem); + + return ret; +} + +static void +queue_reset(struct pmd_internals *internals, uint16_t queue_idx) +{ + struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx]; + struct pkt_tx_queue *txq = rxq->pair; + int xsk_fd = xsk_socket__fd(rxq->xsk); + + if (xsk_fd) { + close(xsk_fd); + if (internals->umem != NULL) { + xdp_umem_destroy(internals->umem); + internals->umem = NULL; + } + } + memset(rxq, 0, sizeof(*rxq)); + memset(txq, 0, sizeof(*txq)); + rxq->pair = txq; + txq->pair = rxq; + rxq->queue_idx = queue_idx; + txq->queue_idx = queue_idx; +} + +static int +eth_rx_queue_setup(struct rte_eth_dev *dev, + uint16_t rx_queue_id, + uint16_t nb_rx_desc, + unsigned int socket_id __rte_unused, + const struct rte_eth_rxconf *rx_conf __rte_unused, + struct rte_mempool *mb_pool) +{ + struct pmd_internals *internals = dev->data->dev_private; + uint32_t buf_size, data_size; + struct pkt_rx_queue *rxq; + int ret; + + rxq = &internals->rx_queues[rx_queue_id]; + queue_reset(internals, rx_queue_id); + + /* Now get the space available for data in the mbuf */ + buf_size = rte_pktmbuf_data_room_size(mb_pool) - + RTE_PKTMBUF_HEADROOM; + data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM; + + if (data_size > buf_size) { + AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d bytes)\n", + dev->device->name, data_size, buf_size); + ret = -ENOMEM; + goto err; + } + + rxq->mb_pool = mb_pool; + + if (xsk_configure(internals, rxq, nb_rx_desc)) { + AF_XDP_LOG(ERR, "Failed to configure xdp socket\n"); + ret = -EINVAL; + goto err; + } + + internals->umem = rxq->umem; + + dev->data->rx_queues[rx_queue_id] = rxq; + return 0; + +err: + queue_reset(internals, rx_queue_id); + return ret; +} + +static int +eth_tx_queue_setup(struct rte_eth_dev *dev, + uint16_t tx_queue_id, + uint16_t nb_tx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_txconf *tx_conf __rte_unused) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct pkt_tx_queue *txq; + + txq = &internals->tx_queues[tx_queue_id]; + + dev->data->tx_queues[tx_queue_id] = txq; + return 0; +} + +static int +eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct ifreq ifr = { .ifr_mtu = mtu }; + int ret; + int s; + + s = socket(PF_INET, SOCK_DGRAM, 0); + if (s < 0) + return -EINVAL; + + strlcpy(ifr.ifr_name, internals->if_name, IFNAMSIZ); + ret = ioctl(s, SIOCSIFMTU, &ifr); + close(s); + + return (ret < 0) ? -errno : 0; +} + +static void +eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask) +{ + struct ifreq ifr; + int s; + + s = socket(PF_INET, SOCK_DGRAM, 0); + if (s < 0) + return; + + strlcpy(ifr.ifr_name, if_name, IFNAMSIZ); + if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0) + goto out; + ifr.ifr_flags &= mask; + ifr.ifr_flags |= flags; + if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0) + goto out; +out: + close(s); +} + +static void +eth_dev_promiscuous_enable(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + + eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0); +} + +static void +eth_dev_promiscuous_disable(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + + eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC); +} + +static const struct eth_dev_ops ops = { + .dev_start = eth_dev_start, + .dev_stop = eth_dev_stop, + .dev_close = eth_dev_close, + .dev_configure = eth_dev_configure, + .dev_infos_get = eth_dev_info, + .mtu_set = eth_dev_mtu_set, + .promiscuous_enable = eth_dev_promiscuous_enable, + .promiscuous_disable = eth_dev_promiscuous_disable, + .rx_queue_setup = eth_rx_queue_setup, + .tx_queue_setup = eth_tx_queue_setup, + .rx_queue_release = eth_queue_release, + .tx_queue_release = eth_queue_release, + .link_update = eth_link_update, + .stats_get = eth_stats_get, + .stats_reset = eth_stats_reset, +}; + +/** parse integer from integer argument */ +static int +parse_integer_arg(const char *key __rte_unused, + const char *value, void *extra_args) +{ + int *i = (int *)extra_args; + char *end; + + *i = strtol(value, &end, 10); + if (*i < 0) { + AF_XDP_LOG(ERR, "Argument has to be positive.\n"); + return -EINVAL; + } + + return 0; +} + +/** parse name argument */ +static int +parse_name_arg(const char *key __rte_unused, + const char *value, void *extra_args) +{ + char *name = extra_args; + + if (strnlen(value, IFNAMSIZ) > IFNAMSIZ - 1) { + AF_XDP_LOG(ERR, "Invalid name %s, should be less than %u bytes.\n", + value, IFNAMSIZ); + return -EINVAL; + } + + strlcpy(name, value, IFNAMSIZ); + + return 0; +} + +static int +parse_parameters(struct rte_kvargs *kvlist, + char *if_name, + int *queue_idx) +{ + int ret; + + ret = rte_kvargs_process(kvlist, ETH_AF_XDP_IFACE_ARG, + &parse_name_arg, if_name); + if (ret < 0) + goto free_kvlist; + + ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IDX_ARG, + &parse_integer_arg, queue_idx); + if (ret < 0) + goto free_kvlist; + +free_kvlist: + rte_kvargs_free(kvlist); + return ret; +} + +static int +get_iface_info(const char *if_name, + struct ether_addr *eth_addr, + int *if_index) +{ + struct ifreq ifr; + int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP); + + if (sock < 0) + return -1; + + strlcpy(ifr.ifr_name, if_name, IFNAMSIZ); + if (ioctl(sock, SIOCGIFINDEX, &ifr)) + goto error; + + *if_index = ifr.ifr_ifindex; + + if (ioctl(sock, SIOCGIFHWADDR, &ifr)) + goto error; + + rte_memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, ETHER_ADDR_LEN); + + close(sock); + return 0; + +error: + close(sock); + return -1; +} + +static struct rte_eth_dev * +init_internals(struct rte_vdev_device *dev, + const char *if_name, + int queue_idx) +{ + const char *name = rte_vdev_device_name(dev); + const unsigned int numa_node = dev->device.numa_node; + struct pmd_internals *internals; + struct rte_eth_dev *eth_dev; + int ret; + int i; + + internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node); + if (internals == NULL) + return NULL; + + internals->queue_idx = queue_idx; + strlcpy(internals->if_name, if_name, IFNAMSIZ); + + for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) { + internals->tx_queues[i].pair = &internals->rx_queues[i]; + internals->rx_queues[i].pair = &internals->tx_queues[i]; + } + + ret = get_iface_info(if_name, &internals->eth_addr, + &internals->if_index); + if (ret) + goto err; + + eth_dev = rte_eth_vdev_allocate(dev, 0); + if (eth_dev == NULL) + goto err; + + eth_dev->data->dev_private = internals; + eth_dev->data->dev_link = pmd_link; + eth_dev->data->mac_addrs = &internals->eth_addr; + eth_dev->dev_ops = &ops; + eth_dev->rx_pkt_burst = eth_af_xdp_rx; + eth_dev->tx_pkt_burst = eth_af_xdp_tx; + + return eth_dev; + +err: + rte_free(internals); + return NULL; +} + +static int +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev) +{ + struct rte_kvargs *kvlist; + char if_name[IFNAMSIZ]; + int xsk_queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX; + struct rte_eth_dev *eth_dev = NULL; + const char *name; + + AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n", + rte_vdev_device_name(dev)); + + name = rte_vdev_device_name(dev); + if (rte_eal_process_type() == RTE_PROC_SECONDARY && + strlen(rte_vdev_device_args(dev)) == 0) { + eth_dev = rte_eth_dev_attach_secondary(name); + if (eth_dev == NULL) { + AF_XDP_LOG(ERR, "Failed to probe %s\n", name); + return -EINVAL; + } + eth_dev->dev_ops = &ops; + rte_eth_dev_probing_finish(eth_dev); + return 0; + } + + kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments); + if (kvlist == NULL) { + AF_XDP_LOG(ERR, "Invalid kvargs key\n"); + return -EINVAL; + } + + if (dev->device.numa_node == SOCKET_ID_ANY) + dev->device.numa_node = rte_socket_id(); + + if (parse_parameters(kvlist, if_name, &xsk_queue_idx) < 0) { + AF_XDP_LOG(ERR, "Invalid kvargs value\n"); + return -EINVAL; + } + + eth_dev = init_internals(dev, if_name, xsk_queue_idx); + if (eth_dev == NULL) { + AF_XDP_LOG(ERR, "Failed to init internals\n"); + return -1; + } + + rte_eth_dev_probing_finish(eth_dev); + + return 0; +} + +static int +rte_pmd_af_xdp_remove(struct rte_vdev_device *dev) +{ + struct rte_eth_dev *eth_dev = NULL; + struct pmd_internals *internals; + + AF_XDP_LOG(INFO, "Removing AF_XDP ethdev on numa socket %u\n", + rte_socket_id()); + + if (dev == NULL) + return -1; + + /* find the ethdev entry */ + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev)); + if (eth_dev == NULL) + return -1; + + internals = eth_dev->data->dev_private; + + rte_ring_free(internals->umem->buf_ring); + rte_free(internals->umem->buffer); + rte_free(internals->umem); + + rte_eth_dev_release_port(eth_dev); + + + return 0; +} + +static struct rte_vdev_driver pmd_af_xdp_drv = { + .probe = rte_pmd_af_xdp_probe, + .remove = rte_pmd_af_xdp_remove, +}; + +RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv); +RTE_PMD_REGISTER_PARAM_STRING(eth_af_xdp, + "iface= " + "queue= "); + +RTE_INIT(af_xdp_init_log) +{ + af_xdp_logtype = rte_log_register("pmd.net.xdp"); + if (af_xdp_logtype >= 0) + rte_log_set_level(af_xdp_logtype, RTE_LOG_NOTICE); +} diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map new file mode 100644 index 000000000..c6db030fe --- /dev/null +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map @@ -0,0 +1,3 @@ +DPDK_19.05 { + local: *; +}; diff --git a/drivers/net/meson.build b/drivers/net/meson.build index 3ecc78cee..1105e72d8 100644 --- a/drivers/net/meson.build +++ b/drivers/net/meson.build @@ -2,6 +2,7 @@ # Copyright(c) 2017 Intel Corporation drivers = ['af_packet', + 'af_xdp', 'ark', 'atlantic', 'avp', diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 262132fc6..be0af73cc 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -143,6 +143,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL) += -lrte_mempool_dpaa2 endif _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += -lrte_pmd_af_xdp -lelf -lbpf _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += -lrte_pmd_ark _LDLIBS-$(CONFIG_RTE_LIBRTE_ATLANTIC_PMD) += -lrte_pmd_atlantic _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += -lrte_pmd_avp From patchwork Fri Mar 22 13:01:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaolong Ye X-Patchwork-Id: 51524 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 97E881B5EA; Fri, 22 Mar 2019 14:05:54 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id E11241B5D1 for ; Fri, 22 Mar 2019 14:05:46 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 06:05:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="124954375" Received: from yexl-server.sh.intel.com ([10.67.110.206]) by orsmga007.jf.intel.com with ESMTP; 22 Mar 2019 06:05:45 -0700 From: Xiaolong Ye To: dev@dpdk.org Cc: Qi Zhang , Karlsson Magnus , Topel Bjorn , Xiaolong Ye Date: Fri, 22 Mar 2019 21:01:26 +0800 Message-Id: <20190322130129.109964-3-xiaolong.ye@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190322130129.109964-1-xiaolong.ye@intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190322130129.109964-1-xiaolong.ye@intel.com> Subject: [dpdk-dev] [PATCH v4 2/5] lib/mbuf: introduce helper to create mempool with flags X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This allows applications to create mbuf mempool with specific flags such as MEMPOOL_F_NO_SPREAD if they want fixed size memory objects. Signed-off-by: Qi Zhang Signed-off-by: Xiaolong Ye --- lib/librte_mbuf/rte_mbuf.c | 29 +++++++++++++++++++----- lib/librte_mbuf/rte_mbuf.h | 45 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 5 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 21f6f7404..c1db9e298 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -106,11 +106,10 @@ rte_pktmbuf_init(struct rte_mempool *mp, m->next = NULL; } -/* Helper to create a mbuf pool with given mempool ops name*/ -struct rte_mempool * -rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n, +static struct rte_mempool * +rte_pktmbuf_pool_create_by_ops_with_flags(const char *name, unsigned int n, unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size, - int socket_id, const char *ops_name) + unsigned int flags, int socket_id, const char *ops_name) { struct rte_mempool *mp; struct rte_pktmbuf_pool_private mbp_priv; @@ -130,7 +129,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n, mbp_priv.mbuf_priv_size = priv_size; mp = rte_mempool_create_empty(name, n, elt_size, cache_size, - sizeof(struct rte_pktmbuf_pool_private), socket_id, 0); + sizeof(struct rte_pktmbuf_pool_private), socket_id, flags); if (mp == NULL) return NULL; @@ -157,6 +156,16 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n, return mp; } +/* Helper to create a mbuf pool with given mempool ops name*/ +struct rte_mempool * +rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n, + unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size, + int socket_id, const char *ops_name) +{ + return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size, + priv_size, data_room_size, 0, socket_id, ops_name); +} + /* helper to create a mbuf pool */ struct rte_mempool * rte_pktmbuf_pool_create(const char *name, unsigned int n, @@ -167,6 +176,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n, data_room_size, socket_id, NULL); } +/* helper to create a mbuf pool with flags (e.g. NO_SPREAD) */ +struct rte_mempool * __rte_experimental +rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n, + unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size, + unsigned int flags, int socket_id) +{ + return rte_pktmbuf_pool_create_by_ops_with_flags(name, n, cache_size, + priv_size, data_room_size, flags, socket_id, NULL); +} + /* do some sanity checks on a mbuf: panic if it fails */ void rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index d961ccaf6..105ead6de 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -1266,6 +1266,51 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, unsigned cache_size, uint16_t priv_size, uint16_t data_room_size, int socket_id); +/** + * Create a mbuf pool with flags. + * + * This function creates and initializes a packet mbuf pool. It is + * a wrapper to rte_mempool functions. + * + * @warning + * @b EXPERIMENTAL: This API may change without prior notice. + * + * @param name + * The name of the mbuf pool. + * @param n + * The number of elements in the mbuf pool. The optimum size (in terms + * of memory usage) for a mempool is when n is a power of two minus one: + * n = (2^q - 1). + * @param cache_size + * Size of the per-core object cache. See rte_mempool_create() for + * details. + * @param priv_size + * Size of application private are between the rte_mbuf structure + * and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN. + * @param data_room_size + * Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM. + * @param flags + * Flags controlling the behavior of the mempool. See + * rte_mempool_create() for details. + * @param socket_id + * The socket identifier where the memory should be allocated. The + * value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the + * reserved zone. + * @return + * The pointer to the new allocated mempool, on success. NULL on error + * with rte_errno set appropriately. Possible rte_errno values include: + * - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure + * - E_RTE_SECONDARY - function was called from a secondary process instance + * - EINVAL - cache size provided is too large, or priv_size is not aligned. + * - ENOSPC - the maximum number of memzones has already been allocated + * - EEXIST - a memzone with the same name already exists + * - ENOMEM - no appropriate memory area found in which to create memzone + */ +struct rte_mempool * __rte_experimental +rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n, + unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size, + unsigned int flags, int socket_id); + /** * Create a mbuf pool with a given mempool ops name * From patchwork Fri Mar 22 13:01:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaolong Ye X-Patchwork-Id: 51525 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 61C971B5F3; Fri, 22 Mar 2019 14:05:57 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id B9CC41B5D1 for ; Fri, 22 Mar 2019 14:05:48 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 06:05:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="124954387" Received: from yexl-server.sh.intel.com ([10.67.110.206]) by orsmga007.jf.intel.com with ESMTP; 22 Mar 2019 06:05:47 -0700 From: Xiaolong Ye To: dev@dpdk.org Cc: Qi Zhang , Karlsson Magnus , Topel Bjorn , Xiaolong Ye Date: Fri, 22 Mar 2019 21:01:27 +0800 Message-Id: <20190322130129.109964-4-xiaolong.ye@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190322130129.109964-1-xiaolong.ye@intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190322130129.109964-1-xiaolong.ye@intel.com> Subject: [dpdk-dev] [PATCH v4 3/5] lib/mempool: allow page size aligned mempool X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Allow create a mempool with page size aligned base address. Signed-off-by: Qi Zhang Signed-off-by: Xiaolong Ye --- lib/librte_mempool/rte_mempool.c | 3 +++ lib/librte_mempool/rte_mempool.h | 1 + 2 files changed, 4 insertions(+) diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 683b216f9..171ba1057 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -543,6 +543,9 @@ rte_mempool_populate_default(struct rte_mempool *mp) if (try_contig) flags |= RTE_MEMZONE_IOVA_CONTIG; + if (mp->flags & MEMPOOL_F_PAGE_ALIGN) + align = RTE_MAX(align, (size_t)getpagesize()); + mz = rte_memzone_reserve_aligned(mz_name, mem_size, mp->socket_id, flags, align); diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 7c9cd9a2f..75553b36f 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -264,6 +264,7 @@ struct rte_mempool { #define MEMPOOL_F_POOL_CREATED 0x0010 /**< Internal: pool is created. */ #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */ #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */ +#define MEMPOOL_F_PAGE_ALIGN 0x0040 /**< Chunk's base address is page aligned */ /** * @internal When debug is enabled, store some statistics. From patchwork Fri Mar 22 13:01:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaolong Ye X-Patchwork-Id: 51526 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 870891B5E0; Fri, 22 Mar 2019 14:06:08 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 0E12A1B5D1 for ; Fri, 22 Mar 2019 14:05:50 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 06:05:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="124954403" Received: from yexl-server.sh.intel.com ([10.67.110.206]) by orsmga007.jf.intel.com with ESMTP; 22 Mar 2019 06:05:49 -0700 From: Xiaolong Ye To: dev@dpdk.org Cc: Qi Zhang , Karlsson Magnus , Topel Bjorn , Xiaolong Ye Date: Fri, 22 Mar 2019 21:01:28 +0800 Message-Id: <20190322130129.109964-5-xiaolong.ye@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190322130129.109964-1-xiaolong.ye@intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190322130129.109964-1-xiaolong.ye@intel.com> Subject: [dpdk-dev] [PATCH v4 4/5] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf be allocated from rte_mempool can be convert to xdp_desc's address and vice versa. Signed-off-by: Xiaolong Ye --- drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++----------- 1 file changed, 72 insertions(+), 45 deletions(-) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index 9f0012347..6b1bc462a 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -48,7 +48,11 @@ static int af_xdp_logtype; #define ETH_AF_XDP_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE #define ETH_AF_XDP_NUM_BUFFERS 4096 -#define ETH_AF_XDP_DATA_HEADROOM 0 +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */ +#define ETH_AF_XDP_MBUF_OVERHEAD 192 +/* data start from offset 320 (192 + 128) bytes */ +#define ETH_AF_XDP_DATA_HEADROOM \ + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) #define ETH_AF_XDP_DFLT_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS #define ETH_AF_XDP_DFLT_QUEUE_IDX 0 @@ -61,7 +65,7 @@ struct xsk_umem_info { struct xsk_ring_prod fq; struct xsk_ring_cons cq; struct xsk_umem *umem; - struct rte_ring *buf_ring; + struct rte_mempool *mb_pool; void *buffer; }; @@ -123,12 +127,32 @@ static struct rte_eth_link pmd_link = { .link_autoneg = ETH_LINK_AUTONEG }; +static inline struct rte_mbuf * +addr_to_mbuf(struct xsk_umem_info *umem, uint64_t addr) +{ + uint64_t offset = (addr / ETH_AF_XDP_FRAME_SIZE * + ETH_AF_XDP_FRAME_SIZE); + struct rte_mbuf *mbuf = (struct rte_mbuf *)((uint64_t)umem->buffer + + offset + ETH_AF_XDP_MBUF_OVERHEAD - + sizeof(struct rte_mbuf)); + mbuf->data_off = addr - offset - ETH_AF_XDP_MBUF_OVERHEAD; + return mbuf; +} + +static inline uint64_t +mbuf_to_addr(struct xsk_umem_info *umem, struct rte_mbuf *mbuf) +{ + return (uint64_t)mbuf->buf_addr + mbuf->data_off - + (uint64_t)umem->buffer; +} + static inline int reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size) { struct xsk_ring_prod *fq = &umem->fq; + struct rte_mbuf *mbuf; uint32_t idx; - void *addr = NULL; + uint64_t addr; int i, ret; ret = xsk_ring_prod__reserve(fq, reserve_size, &idx); @@ -139,12 +163,14 @@ reserve_fill_queue(struct xsk_umem_info *umem, int reserve_size) for (i = 0; i < reserve_size; i++) { __u64 *fq_addr; - if (rte_ring_dequeue(umem->buf_ring, &addr)) { + mbuf = rte_pktmbuf_alloc(umem->mb_pool); + if (mbuf == NULL) { i--; break; } + addr = mbuf_to_addr(umem, mbuf); fq_addr = xsk_ring_prod__fill_addr(fq, idx++); - *fq_addr = (uint64_t)addr; + *fq_addr = addr; } xsk_ring_prod__submit(fq, i); @@ -196,7 +222,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) rx_bytes += len; bufs[count++] = mbufs[i]; - rte_ring_enqueue(umem->buf_ring, (void *)addr); + rte_pktmbuf_free(addr_to_mbuf(umem, addr)); } xsk_ring_cons__release(rx, rcvd); @@ -219,7 +245,7 @@ static void pull_umem_cq(struct xsk_umem_info *umem, int size) for (i = 0; i < n; i++) { uint64_t addr; addr = *xsk_ring_cons__comp_addr(cq, idx_cq++); - rte_ring_enqueue(umem->buf_ring, (void *)addr); + rte_pktmbuf_free(addr_to_mbuf(umem, addr)); } xsk_ring_cons__release(cq, n); @@ -248,7 +274,7 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) struct pkt_tx_queue *txq = queue; struct xsk_umem_info *umem = txq->pair->umem; struct rte_mbuf *mbuf; - void *addrs[ETH_AF_XDP_TX_BATCH_SIZE]; + struct rte_mbuf *mbuf_to_tx; unsigned long tx_bytes = 0; int i, valid = 0; uint32_t idx_tx; @@ -257,11 +283,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) pull_umem_cq(umem, nb_pkts); - nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs, - nb_pkts, NULL); - if (nb_pkts == 0) - return 0; - if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) { kick_tx(txq); return 0; @@ -275,7 +296,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i); mbuf = bufs[i]; if (mbuf->pkt_len <= buf_len) { - desc->addr = (uint64_t)addrs[valid]; + mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool); + if (mbuf_to_tx == NULL) { + rte_pktmbuf_free(mbuf); + continue; + } + desc->addr = mbuf_to_addr(umem, mbuf_to_tx); desc->len = mbuf->pkt_len; pkt = xsk_umem__get_data(umem->buffer, desc->addr); @@ -291,10 +317,6 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) kick_tx(txq); - if (valid < nb_pkts) - rte_ring_enqueue_bulk(umem->buf_ring, &addrs[valid], - nb_pkts - valid, NULL); - txq->stats.err_pkts += nb_pkts - valid; txq->stats.tx_pkts += valid; txq->stats.tx_bytes += tx_bytes; @@ -443,16 +465,29 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused, static void xdp_umem_destroy(struct xsk_umem_info *umem) { - free(umem->buffer); - umem->buffer = NULL; - - rte_ring_free(umem->buf_ring); - umem->buf_ring = NULL; + rte_mempool_free(umem->mb_pool); + umem->mb_pool = NULL; rte_free(umem); umem = NULL; } +static inline uint64_t get_base_addr(struct rte_mempool *mp) +{ + struct rte_mempool_memhdr *memhdr; + + memhdr = STAILQ_FIRST(&mp->mem_list); + return (uint64_t)(memhdr->addr); +} + +static inline uint64_t get_len(struct rte_mempool *mp) +{ + struct rte_mempool_memhdr *memhdr; + + memhdr = STAILQ_FIRST(&mp->mem_list); + return (uint64_t)(memhdr->len); +} + static struct xsk_umem_info *xdp_umem_configure(void) { struct xsk_umem_info *umem; @@ -461,9 +496,8 @@ static struct xsk_umem_info *xdp_umem_configure(void) .comp_size = ETH_AF_XDP_DFLT_NUM_DESCS, .frame_size = ETH_AF_XDP_FRAME_SIZE, .frame_headroom = ETH_AF_XDP_DATA_HEADROOM }; - void *bufs = NULL; + void *base_addr = NULL; int ret; - uint64_t i; umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id()); if (umem == NULL) { @@ -471,26 +505,20 @@ static struct xsk_umem_info *xdp_umem_configure(void) return NULL; } - umem->buf_ring = rte_ring_create("af_xdp_ring", - ETH_AF_XDP_NUM_BUFFERS, - SOCKET_ID_ANY, - 0x0); - if (umem->buf_ring == NULL) { - AF_XDP_LOG(ERR, "Failed to create rte_ring\n"); + umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", + ETH_AF_XDP_NUM_BUFFERS, + 250, 0, + ETH_AF_XDP_FRAME_SIZE - + ETH_AF_XDP_MBUF_OVERHEAD, + MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN, + SOCKET_ID_ANY); + if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) { + AF_XDP_LOG(ERR, "Failed to create mempool\n"); goto err; } + base_addr = (void *)get_base_addr(umem->mb_pool); - for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++) - rte_ring_enqueue(umem->buf_ring, - (void *)(i * ETH_AF_XDP_FRAME_SIZE + - ETH_AF_XDP_DATA_HEADROOM)); - - if (posix_memalign(&bufs, getpagesize(), - ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) { - AF_XDP_LOG(ERR, "Failed to allocate memory pool.\n"); - goto err; - } - ret = xsk_umem__create(&umem->umem, bufs, + ret = xsk_umem__create(&umem->umem, base_addr, ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE, &umem->fq, &umem->cq, &usr_config); @@ -499,7 +527,7 @@ static struct xsk_umem_info *xdp_umem_configure(void) AF_XDP_LOG(ERR, "Failed to create umem"); goto err; } - umem->buffer = bufs; + umem->buffer = base_addr; return umem; @@ -912,10 +940,9 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev) internals = eth_dev->data->dev_private; - rte_ring_free(internals->umem->buf_ring); - rte_free(internals->umem->buffer); rte_free(internals->umem); + rte_mempool_free(internals->umem->mb_pool); rte_eth_dev_release_port(eth_dev); From patchwork Fri Mar 22 13:01:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaolong Ye X-Patchwork-Id: 51527 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EAE061B610; Fri, 22 Mar 2019 14:06:11 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 1D5121B5D5 for ; Fri, 22 Mar 2019 14:05:53 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 06:05:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="124954429" Received: from yexl-server.sh.intel.com ([10.67.110.206]) by orsmga007.jf.intel.com with ESMTP; 22 Mar 2019 06:05:52 -0700 From: Xiaolong Ye To: dev@dpdk.org Cc: Qi Zhang , Karlsson Magnus , Topel Bjorn , Xiaolong Ye Date: Fri, 22 Mar 2019 21:01:29 +0800 Message-Id: <20190322130129.109964-6-xiaolong.ye@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190322130129.109964-1-xiaolong.ye@intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190322130129.109964-1-xiaolong.ye@intel.com> Subject: [dpdk-dev] [PATCH v4 5/5] net/af_xdp: enable zero copy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Try to check if external mempool (from rx_queue_setup) is fit for af_xdp, if it is, it will be registered to af_xdp socket directly and there will be no packet data copy on Rx and Tx. Signed-off-by: Xiaolong Ye --- drivers/net/af_xdp/rte_eth_af_xdp.c | 129 ++++++++++++++++++++-------- 1 file changed, 95 insertions(+), 34 deletions(-) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index 6b1bc462a..124d341d0 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -67,6 +67,7 @@ struct xsk_umem_info { struct xsk_umem *umem; struct rte_mempool *mb_pool; void *buffer; + uint8_t zc; }; struct rx_stats { @@ -85,6 +86,7 @@ struct pkt_rx_queue { struct pkt_tx_queue *pair; uint16_t queue_idx; + uint8_t zc; }; struct tx_stats { @@ -202,7 +204,8 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh) (void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE); - if (rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0) + if (!rxq->zc && + rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, rcvd) != 0) return 0; for (i = 0; i < rcvd; i++) { @@ -216,13 +219,23 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) len = desc->len; pkt = xsk_umem__get_data(rxq->umem->buffer, addr); - rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len); - rte_pktmbuf_pkt_len(mbufs[i]) = len; - rte_pktmbuf_data_len(mbufs[i]) = len; - rx_bytes += len; - bufs[count++] = mbufs[i]; - - rte_pktmbuf_free(addr_to_mbuf(umem, addr)); + if (rxq->zc) { + struct rte_mbuf *mbuf; + mbuf = addr_to_mbuf(rxq->umem, addr); + rte_pktmbuf_pkt_len(mbuf) = len; + rte_pktmbuf_data_len(mbuf) = len; + rx_bytes += len; + bufs[count++] = mbuf; + } else { + rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), + pkt, len); + rte_pktmbuf_pkt_len(mbufs[i]) = len; + rte_pktmbuf_data_len(mbufs[i]) = len; + rx_bytes += len; + bufs[count++] = mbufs[i]; + + rte_pktmbuf_free(addr_to_mbuf(umem, addr)); + } } xsk_ring_cons__release(rx, rcvd); @@ -295,22 +308,29 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) - ETH_AF_XDP_DATA_HEADROOM; desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx + i); mbuf = bufs[i]; - if (mbuf->pkt_len <= buf_len) { - mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool); - if (mbuf_to_tx == NULL) { - rte_pktmbuf_free(mbuf); - continue; - } - desc->addr = mbuf_to_addr(umem, mbuf_to_tx); + if (txq->pair->zc && mbuf->pool == umem->mb_pool) { + desc->addr = mbuf_to_addr(umem, mbuf); desc->len = mbuf->pkt_len; - pkt = xsk_umem__get_data(umem->buffer, - desc->addr); - rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), - desc->len); valid++; tx_bytes += mbuf->pkt_len; + } else { + if (mbuf->pkt_len <= buf_len) { + mbuf_to_tx = rte_pktmbuf_alloc(umem->mb_pool); + if (!mbuf_to_tx) { + rte_pktmbuf_free(mbuf); + continue; + } + desc->addr = mbuf_to_addr(umem, mbuf_to_tx); + desc->len = mbuf->pkt_len; + pkt = xsk_umem__get_data(umem->buffer, + desc->addr); + memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), + desc->len); + valid++; + tx_bytes += mbuf->pkt_len; + } + rte_pktmbuf_free(mbuf); } - rte_pktmbuf_free(mbuf); } xsk_ring_prod__submit(&txq->tx, nb_pkts); @@ -488,7 +508,7 @@ static inline uint64_t get_len(struct rte_mempool *mp) return (uint64_t)(memhdr->len); } -static struct xsk_umem_info *xdp_umem_configure(void) +static struct xsk_umem_info *xdp_umem_configure(struct rte_mempool *mb_pool) { struct xsk_umem_info *umem; struct xsk_umem_config usr_config = { @@ -505,16 +525,23 @@ static struct xsk_umem_info *xdp_umem_configure(void) return NULL; } - umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", - ETH_AF_XDP_NUM_BUFFERS, - 250, 0, - ETH_AF_XDP_FRAME_SIZE - - ETH_AF_XDP_MBUF_OVERHEAD, - MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN, - SOCKET_ID_ANY); - if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) { - AF_XDP_LOG(ERR, "Failed to create mempool\n"); - goto err; + if (!mb_pool) { + umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", + ETH_AF_XDP_NUM_BUFFERS, + 250, 0, + ETH_AF_XDP_FRAME_SIZE - + ETH_AF_XDP_MBUF_OVERHEAD, + MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN, + SOCKET_ID_ANY); + + if (umem->mb_pool == NULL || + umem->mb_pool->nb_mem_chunks != 1) { + AF_XDP_LOG(ERR, "Failed to create mempool\n"); + goto err; + } + } else { + umem->mb_pool = mb_pool; + umem->zc = 1; } base_addr = (void *)get_base_addr(umem->mb_pool); @@ -536,16 +563,43 @@ static struct xsk_umem_info *xdp_umem_configure(void) return NULL; } +static uint8_t +check_mempool_zc(struct rte_mempool *mp) +{ + RTE_ASSERT(mp); + + /* must continues */ + if (mp->nb_mem_chunks > 1) + return 0; + + /* check header size */ + if (mp->header_size != RTE_CACHE_LINE_SIZE) + return 0; + + /* check base address */ + if ((uint64_t)get_base_addr(mp) % getpagesize() != 0) + return 0; + + /* check chunk size */ + if ((mp->elt_size + mp->header_size + mp->trailer_size) % + ETH_AF_XDP_FRAME_SIZE != 0) + return 0; + + return 1; +} + static int xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq, - int ring_size) + int ring_size, struct rte_mempool *mb_pool) { struct xsk_socket_config cfg; struct pkt_tx_queue *txq = rxq->pair; + struct rte_mempool *mp; int ret = 0; int reserve_size; - rxq->umem = xdp_umem_configure(); + mp = check_mempool_zc(mb_pool) ? mb_pool : NULL; + rxq->umem = xdp_umem_configure(mp); if (rxq->umem == NULL) { ret = -ENOMEM; goto err; @@ -631,7 +685,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, rxq->mb_pool = mb_pool; - if (xsk_configure(internals, rxq, nb_rx_desc)) { + if (xsk_configure(internals, rxq, nb_rx_desc, mb_pool)) { AF_XDP_LOG(ERR, "Failed to configure xdp socket\n"); ret = -EINVAL; goto err; @@ -639,6 +693,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, internals->umem = rxq->umem; + if (mb_pool == internals->umem->mb_pool) + rxq->zc = internals->umem->zc; + + if (rxq->zc) + AF_XDP_LOG(INFO, + "zero copy enabled on rx queue %d\n", rx_queue_id); + dev->data->rx_queues[rx_queue_id] = rxq; return 0;