From patchwork Wed Aug 19 03:24:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 75696 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 524CEA04AF; Wed, 19 Aug 2020 05:25:03 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B6F2C1C00D; Wed, 19 Aug 2020 05:24:55 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 460282BAA for ; Wed, 19 Aug 2020 05:24:53 +0200 (CEST) IronPort-SDR: 8DJ4k8J6YpKuCBNtMn4EEh2A6dFogfVsLUlx51BKpLAjGNHXKnVWaWp5L1TJglfbnYbAAJOmIh YRrDLxqjZqSw== X-IronPort-AV: E=McAfee;i="6000,8403,9717"; a="156113170" X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="156113170" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2020 20:24:52 -0700 IronPort-SDR: p5oK5gMknBGVfp3F7vN7T0rvJF31DBHKES3s+ww+mTc7UOuREnS+nJMZQnMwTyH/cdajcBp+y3 SJJIKtzTb07Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="441452718" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.56]) by orsmga004.jf.intel.com with ESMTP; 18 Aug 2020 20:24:51 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, chenbo.xia@intel.com, zhihong.wang@intel.com Cc: dev@dpdk.org, Marvin Liu Date: Wed, 19 Aug 2020 11:24:10 +0800 Message-Id: <20200819032414.51430-2-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200819032414.51430-1-yong.liu@intel.com> References: <20200819032414.51430-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v1 1/5] vhost: add vectorized data path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Packed ring operations are split into batch and single functions for performance perspective. Ring operations in batch function can be accelerated by SIMD instructions like AVX512. So introduce vectorized parameter in vhost. Vectorized data path can be selected if platform and ring format matched requirements. Otherwise will fallback to original data path. Signed-off-by: Marvin Liu diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst index d36f3120b2..efdaf4de09 100644 --- a/doc/guides/nics/vhost.rst +++ b/doc/guides/nics/vhost.rst @@ -64,6 +64,11 @@ The user can specify below arguments in `--vdev` option. It is used to enable external buffer support in vhost library. (Default: 0 (disabled)) +#. ``vectorized``: + + It is used to enable vectorized data path support in vhost library. + (Default: 0 (disabled)) + Vhost PMD event handling ------------------------ diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index b892eec67a..d5d421441c 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -162,6 +162,18 @@ The following is an overview of some key Vhost API functions: It is disabled by default. + - ``RTE_VHOST_USER_VECTORIZED`` + Vectorized data path will used when this flag is set. When packed ring + enabled, available descriptors are stored from frontend driver in sequence. + SIMD instructions like AVX can be used to handle multiple descriptors + simultaneously. Thus can accelerate the throughput of ring operations. + + * Only packed ring has vectorized data path. + + * Will fallback to normal datapath if no vectorization support. + + It is disabled by default. + * ``rte_vhost_driver_set_features(path, features)`` This function sets the feature bits the vhost-user driver supports. The diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index e55278af69..2ba5a2a076 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -35,6 +35,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; #define ETH_VHOST_VIRTIO_NET_F_HOST_TSO "tso" #define ETH_VHOST_LINEAR_BUF "linear-buffer" #define ETH_VHOST_EXT_BUF "ext-buffer" +#define ETH_VHOST_VECTORIZED "vectorized" #define VHOST_MAX_PKT_BURST 32 static const char *valid_arguments[] = { @@ -47,6 +48,7 @@ static const char *valid_arguments[] = { ETH_VHOST_VIRTIO_NET_F_HOST_TSO, ETH_VHOST_LINEAR_BUF, ETH_VHOST_EXT_BUF, + ETH_VHOST_VECTORIZED, NULL }; @@ -1507,6 +1509,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) int tso = 0; int linear_buf = 0; int ext_buf = 0; + int vectorized = 0; struct rte_eth_dev *eth_dev; const char *name = rte_vdev_device_name(dev); @@ -1626,6 +1629,17 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) flags |= RTE_VHOST_USER_EXTBUF_SUPPORT; } + if (rte_kvargs_count(kvlist, ETH_VHOST_VECTORIZED) == 1) { + ret = rte_kvargs_process(kvlist, + ETH_VHOST_VECTORIZED, + &open_int, &vectorized); + if (ret < 0) + goto out_free; + + if (vectorized == 1) + flags |= RTE_VHOST_USER_VECTORIZED; + } + if (dev->device.numa_node == SOCKET_ID_ANY) dev->device.numa_node = rte_socket_id(); @@ -1679,4 +1693,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_vhost, "postcopy-support=<0|1> " "tso=<0|1> " "linear-buffer=<0|1> " - "ext-buffer=<0|1>"); + "ext-buffer=<0|1> " + "vectorized=<0|1>"); diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index a94c84134d..c7f946c6c1 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -36,6 +36,7 @@ extern "C" { /* support only linear buffers (no chained mbufs) */ #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) +#define RTE_VHOST_USER_VECTORIZED (1ULL << 8) /* Features. */ #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 73e1dca95e..cc11244693 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -43,6 +43,7 @@ struct vhost_user_socket { bool extbuf; bool linearbuf; bool async_copy; + bool vectorized; /* * The "supported_features" indicates the feature bits the @@ -245,6 +246,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) dev->async_copy = 1; } + if (vsocket->vectorized) + vhost_enable_vectorized(vid); + VHOST_LOG_CONFIG(INFO, "new device, handle is %d\n", vid); if (vsocket->notify_ops->new_connection) { @@ -881,6 +885,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; + vsocket->vectorized = flags & RTE_VHOST_USER_VECTORIZED; if (vsocket->dequeue_zero_copy && (flags & RTE_VHOST_USER_IOMMU_SUPPORT)) { diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 8f20a0818f..50bf033a9d 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -752,6 +752,17 @@ vhost_enable_linearbuf(int vid) dev->linearbuf = 1; } +void +vhost_enable_vectorized(int vid) +{ + struct virtio_net *dev = get_device(vid); + + if (dev == NULL) + return; + + dev->vectorized = 1; +} + int rte_vhost_get_mtu(int vid, uint16_t *mtu) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 632f66d532..b556eb3bf6 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -383,6 +383,7 @@ struct virtio_net { int async_copy; int extbuf; int linearbuf; + int vectorized; struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; struct inflight_mem_info *inflight_info; #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) @@ -721,6 +722,7 @@ void vhost_enable_dequeue_zero_copy(int vid); void vhost_set_builtin_virtio_net(int vid, bool enable); void vhost_enable_extbuf(int vid); void vhost_enable_linearbuf(int vid); +void vhost_enable_vectorized(int vid); int vhost_enable_guest_notification(struct virtio_net *dev, struct vhost_virtqueue *vq, int enable); From patchwork Wed Aug 19 03:24:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 75697 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 25006A04AF; Wed, 19 Aug 2020 05:25:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EB9221C0AF; Wed, 19 Aug 2020 05:24:57 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 3CC3A1BE3D for ; Wed, 19 Aug 2020 05:24:55 +0200 (CEST) IronPort-SDR: tUdHEVKbumqVhfCFa1oGX0TAPhULC28VQguQiTeFqaoXJktDVZx9GtAHXouaY7Jextg9k9bUd0 XwI2reGvpfxg== X-IronPort-AV: E=McAfee;i="6000,8403,9717"; a="156113176" X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="156113176" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2020 20:24:54 -0700 IronPort-SDR: +PK2/I6DlJUJHEmrgnoEWU5tc2a82UZ1qZQgKZTWpBeki8T55ZinCFv1fpaOBmJKWykFMerul2 sefZMIHkmx9w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="441452724" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.56]) by orsmga004.jf.intel.com with ESMTP; 18 Aug 2020 20:24:53 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, chenbo.xia@intel.com, zhihong.wang@intel.com Cc: dev@dpdk.org, Marvin Liu Date: Wed, 19 Aug 2020 11:24:11 +0800 Message-Id: <20200819032414.51430-3-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200819032414.51430-1-yong.liu@intel.com> References: <20200819032414.51430-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v1 2/5] vhost: reuse packed ring functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Move parse_ethernet, offload, extbuf functions to header file. These functions will be reused by vhost vectorized path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index b556eb3bf6..5a5c945551 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -20,6 +20,10 @@ #include #include +#include +#include +#include +#include #include "rte_vhost.h" #include "rte_vdpa.h" #include "rte_vdpa_dev.h" @@ -905,4 +909,215 @@ put_zmbuf(struct zcopy_mbuf *zmbuf) zmbuf->in_use = 0; } +static __rte_always_inline bool +virtio_net_is_inorder(struct virtio_net *dev) +{ + return dev->features & (1ULL << VIRTIO_F_IN_ORDER); +} + +static __rte_always_inline void +parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr) +{ + struct rte_ipv4_hdr *ipv4_hdr; + struct rte_ipv6_hdr *ipv6_hdr; + void *l3_hdr = NULL; + struct rte_ether_hdr *eth_hdr; + uint16_t ethertype; + + eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *); + + m->l2_len = sizeof(struct rte_ether_hdr); + ethertype = rte_be_to_cpu_16(eth_hdr->ether_type); + + if (ethertype == RTE_ETHER_TYPE_VLAN) { + struct rte_vlan_hdr *vlan_hdr = + (struct rte_vlan_hdr *)(eth_hdr + 1); + + m->l2_len += sizeof(struct rte_vlan_hdr); + ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto); + } + + l3_hdr = (char *)eth_hdr + m->l2_len; + + switch (ethertype) { + case RTE_ETHER_TYPE_IPV4: + ipv4_hdr = l3_hdr; + *l4_proto = ipv4_hdr->next_proto_id; + m->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4; + *l4_hdr = (char *)l3_hdr + m->l3_len; + m->ol_flags |= PKT_TX_IPV4; + break; + case RTE_ETHER_TYPE_IPV6: + ipv6_hdr = l3_hdr; + *l4_proto = ipv6_hdr->proto; + m->l3_len = sizeof(struct rte_ipv6_hdr); + *l4_hdr = (char *)l3_hdr + m->l3_len; + m->ol_flags |= PKT_TX_IPV6; + break; + default: + m->l3_len = 0; + *l4_proto = 0; + *l4_hdr = NULL; + break; + } +} + +static __rte_always_inline bool +virtio_net_with_host_offload(struct virtio_net *dev) +{ + if (dev->features & + ((1ULL << VIRTIO_NET_F_CSUM) | + (1ULL << VIRTIO_NET_F_HOST_ECN) | + (1ULL << VIRTIO_NET_F_HOST_TSO4) | + (1ULL << VIRTIO_NET_F_HOST_TSO6) | + (1ULL << VIRTIO_NET_F_HOST_UFO))) + return true; + + return false; +} + +static __rte_always_inline void +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) +{ + uint16_t l4_proto = 0; + void *l4_hdr = NULL; + struct rte_tcp_hdr *tcp_hdr = NULL; + + if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) + return; + + parse_ethernet(m, &l4_proto, &l4_hdr); + if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { + if (hdr->csum_start == (m->l2_len + m->l3_len)) { + switch (hdr->csum_offset) { + case (offsetof(struct rte_tcp_hdr, cksum)): + if (l4_proto == IPPROTO_TCP) + m->ol_flags |= PKT_TX_TCP_CKSUM; + break; + case (offsetof(struct rte_udp_hdr, dgram_cksum)): + if (l4_proto == IPPROTO_UDP) + m->ol_flags |= PKT_TX_UDP_CKSUM; + break; + case (offsetof(struct rte_sctp_hdr, cksum)): + if (l4_proto == IPPROTO_SCTP) + m->ol_flags |= PKT_TX_SCTP_CKSUM; + break; + default: + break; + } + } + } + + if (l4_hdr && hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { + case VIRTIO_NET_HDR_GSO_TCPV4: + case VIRTIO_NET_HDR_GSO_TCPV6: + tcp_hdr = l4_hdr; + m->ol_flags |= PKT_TX_TCP_SEG; + m->tso_segsz = hdr->gso_size; + m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; + break; + case VIRTIO_NET_HDR_GSO_UDP: + m->ol_flags |= PKT_TX_UDP_SEG; + m->tso_segsz = hdr->gso_size; + m->l4_len = sizeof(struct rte_udp_hdr); + break; + default: + VHOST_LOG_DATA(WARNING, + "unsupported gso type %u.\n", hdr->gso_type); + break; + } + } +} + +static void +virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque) +{ + rte_free(opaque); +} + +static int +virtio_dev_extbuf_alloc(struct rte_mbuf *pkt, uint32_t size) +{ + struct rte_mbuf_ext_shared_info *shinfo = NULL; + uint32_t total_len = RTE_PKTMBUF_HEADROOM + size; + uint16_t buf_len; + rte_iova_t iova; + void *buf; + + /* Try to use pkt buffer to store shinfo to reduce the amount of memory + * required, otherwise store shinfo in the new buffer. + */ + if (rte_pktmbuf_tailroom(pkt) >= sizeof(*shinfo)) + shinfo = rte_pktmbuf_mtod(pkt, + struct rte_mbuf_ext_shared_info *); + else { + total_len += sizeof(*shinfo) + sizeof(uintptr_t); + total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); + } + + if (unlikely(total_len > UINT16_MAX)) + return -ENOSPC; + + buf_len = total_len; + buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); + if (unlikely(buf == NULL)) + return -ENOMEM; + + /* Initialize shinfo */ + if (shinfo) { + shinfo->free_cb = virtio_dev_extbuf_free; + shinfo->fcb_opaque = buf; + rte_mbuf_ext_refcnt_set(shinfo, 1); + } else { + shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, + virtio_dev_extbuf_free, buf); + if (unlikely(shinfo == NULL)) { + rte_free(buf); + VHOST_LOG_DATA(ERR, "Failed to init shinfo\n"); + return -1; + } + } + + iova = rte_malloc_virt2iova(buf); + rte_pktmbuf_attach_extbuf(pkt, buf, iova, buf_len, shinfo); + rte_pktmbuf_reset_headroom(pkt); + + return 0; +} + +/* + * Allocate a host supported pktmbuf. + */ +static __rte_always_inline struct rte_mbuf * +virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, + uint32_t data_len) +{ + struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); + + if (unlikely(pkt == NULL)) { + VHOST_LOG_DATA(ERR, + "Failed to allocate memory for mbuf.\n"); + return NULL; + } + + if (rte_pktmbuf_tailroom(pkt) >= data_len) + return pkt; + + /* attach an external buffer if supported */ + if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len)) + return pkt; + + /* check if chained buffers are allowed */ + if (!dev->linearbuf) + return pkt; + + /* Data doesn't fit into the buffer and the host supports + * only linear buffers + */ + rte_pktmbuf_free(pkt); + + return NULL; +} + #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index bd9303c8a9..6107662685 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -32,12 +32,6 @@ rxvq_is_mergeable(struct virtio_net *dev) return dev->features & (1ULL << VIRTIO_NET_F_MRG_RXBUF); } -static __rte_always_inline bool -virtio_net_is_inorder(struct virtio_net *dev) -{ - return dev->features & (1ULL << VIRTIO_F_IN_ORDER); -} - static bool is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring) { @@ -1804,121 +1798,6 @@ rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id, return virtio_dev_rx_async_submit(dev, queue_id, pkts, count); } -static inline bool -virtio_net_with_host_offload(struct virtio_net *dev) -{ - if (dev->features & - ((1ULL << VIRTIO_NET_F_CSUM) | - (1ULL << VIRTIO_NET_F_HOST_ECN) | - (1ULL << VIRTIO_NET_F_HOST_TSO4) | - (1ULL << VIRTIO_NET_F_HOST_TSO6) | - (1ULL << VIRTIO_NET_F_HOST_UFO))) - return true; - - return false; -} - -static void -parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr) -{ - struct rte_ipv4_hdr *ipv4_hdr; - struct rte_ipv6_hdr *ipv6_hdr; - void *l3_hdr = NULL; - struct rte_ether_hdr *eth_hdr; - uint16_t ethertype; - - eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *); - - m->l2_len = sizeof(struct rte_ether_hdr); - ethertype = rte_be_to_cpu_16(eth_hdr->ether_type); - - if (ethertype == RTE_ETHER_TYPE_VLAN) { - struct rte_vlan_hdr *vlan_hdr = - (struct rte_vlan_hdr *)(eth_hdr + 1); - - m->l2_len += sizeof(struct rte_vlan_hdr); - ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto); - } - - l3_hdr = (char *)eth_hdr + m->l2_len; - - switch (ethertype) { - case RTE_ETHER_TYPE_IPV4: - ipv4_hdr = l3_hdr; - *l4_proto = ipv4_hdr->next_proto_id; - m->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4; - *l4_hdr = (char *)l3_hdr + m->l3_len; - m->ol_flags |= PKT_TX_IPV4; - break; - case RTE_ETHER_TYPE_IPV6: - ipv6_hdr = l3_hdr; - *l4_proto = ipv6_hdr->proto; - m->l3_len = sizeof(struct rte_ipv6_hdr); - *l4_hdr = (char *)l3_hdr + m->l3_len; - m->ol_flags |= PKT_TX_IPV6; - break; - default: - m->l3_len = 0; - *l4_proto = 0; - *l4_hdr = NULL; - break; - } -} - -static __rte_always_inline void -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) -{ - uint16_t l4_proto = 0; - void *l4_hdr = NULL; - struct rte_tcp_hdr *tcp_hdr = NULL; - - if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) - return; - - parse_ethernet(m, &l4_proto, &l4_hdr); - if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { - if (hdr->csum_start == (m->l2_len + m->l3_len)) { - switch (hdr->csum_offset) { - case (offsetof(struct rte_tcp_hdr, cksum)): - if (l4_proto == IPPROTO_TCP) - m->ol_flags |= PKT_TX_TCP_CKSUM; - break; - case (offsetof(struct rte_udp_hdr, dgram_cksum)): - if (l4_proto == IPPROTO_UDP) - m->ol_flags |= PKT_TX_UDP_CKSUM; - break; - case (offsetof(struct rte_sctp_hdr, cksum)): - if (l4_proto == IPPROTO_SCTP) - m->ol_flags |= PKT_TX_SCTP_CKSUM; - break; - default: - break; - } - } - } - - if (l4_hdr && hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { - switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { - case VIRTIO_NET_HDR_GSO_TCPV4: - case VIRTIO_NET_HDR_GSO_TCPV6: - tcp_hdr = l4_hdr; - m->ol_flags |= PKT_TX_TCP_SEG; - m->tso_segsz = hdr->gso_size; - m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; - break; - case VIRTIO_NET_HDR_GSO_UDP: - m->ol_flags |= PKT_TX_UDP_SEG; - m->tso_segsz = hdr->gso_size; - m->l4_len = sizeof(struct rte_udp_hdr); - break; - default: - VHOST_LOG_DATA(WARNING, - "unsupported gso type %u.\n", hdr->gso_type); - break; - } - } -} - static __rte_noinline void copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, struct buf_vector *buf_vec) @@ -2145,96 +2024,6 @@ get_zmbuf(struct vhost_virtqueue *vq) return NULL; } -static void -virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque) -{ - rte_free(opaque); -} - -static int -virtio_dev_extbuf_alloc(struct rte_mbuf *pkt, uint32_t size) -{ - struct rte_mbuf_ext_shared_info *shinfo = NULL; - uint32_t total_len = RTE_PKTMBUF_HEADROOM + size; - uint16_t buf_len; - rte_iova_t iova; - void *buf; - - /* Try to use pkt buffer to store shinfo to reduce the amount of memory - * required, otherwise store shinfo in the new buffer. - */ - if (rte_pktmbuf_tailroom(pkt) >= sizeof(*shinfo)) - shinfo = rte_pktmbuf_mtod(pkt, - struct rte_mbuf_ext_shared_info *); - else { - total_len += sizeof(*shinfo) + sizeof(uintptr_t); - total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); - } - - if (unlikely(total_len > UINT16_MAX)) - return -ENOSPC; - - buf_len = total_len; - buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); - if (unlikely(buf == NULL)) - return -ENOMEM; - - /* Initialize shinfo */ - if (shinfo) { - shinfo->free_cb = virtio_dev_extbuf_free; - shinfo->fcb_opaque = buf; - rte_mbuf_ext_refcnt_set(shinfo, 1); - } else { - shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, - virtio_dev_extbuf_free, buf); - if (unlikely(shinfo == NULL)) { - rte_free(buf); - VHOST_LOG_DATA(ERR, "Failed to init shinfo\n"); - return -1; - } - } - - iova = rte_malloc_virt2iova(buf); - rte_pktmbuf_attach_extbuf(pkt, buf, iova, buf_len, shinfo); - rte_pktmbuf_reset_headroom(pkt); - - return 0; -} - -/* - * Allocate a host supported pktmbuf. - */ -static __rte_always_inline struct rte_mbuf * -virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, - uint32_t data_len) -{ - struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); - - if (unlikely(pkt == NULL)) { - VHOST_LOG_DATA(ERR, - "Failed to allocate memory for mbuf.\n"); - return NULL; - } - - if (rte_pktmbuf_tailroom(pkt) >= data_len) - return pkt; - - /* attach an external buffer if supported */ - if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len)) - return pkt; - - /* check if chained buffers are allowed */ - if (!dev->linearbuf) - return pkt; - - /* Data doesn't fit into the buffer and the host supports - * only linear buffers - */ - rte_pktmbuf_free(pkt); - - return NULL; -} - static __rte_noinline uint16_t virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) From patchwork Wed Aug 19 03:24:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 75698 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A2896A04AF; Wed, 19 Aug 2020 05:25:19 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 17F781C0B6; Wed, 19 Aug 2020 05:24:59 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id AEA191C0AD for ; Wed, 19 Aug 2020 05:24:56 +0200 (CEST) IronPort-SDR: +KMt5mXY9qQDcYX5IaEBHryOm6DRNcXHtwpl3vtn65D3Ji6PBrLBJiDYJOG+nwr1oGHMDFvEXZ LSlspt8al/aQ== X-IronPort-AV: E=McAfee;i="6000,8403,9717"; a="156113178" X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="156113178" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2020 20:24:56 -0700 IronPort-SDR: 4GpTiJS9EqU3HPppayL6LzL0DgrquDxZxn2JbH8a9R0O+gIpWrfxBEeuxDEBesU8tncj62nNFm aGc8p4V2oJMQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="441452729" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.56]) by orsmga004.jf.intel.com with ESMTP; 18 Aug 2020 20:24:55 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, chenbo.xia@intel.com, zhihong.wang@intel.com Cc: dev@dpdk.org, Marvin Liu Date: Wed, 19 Aug 2020 11:24:12 +0800 Message-Id: <20200819032414.51430-4-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200819032414.51430-1-yong.liu@intel.com> References: <20200819032414.51430-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v1 3/5] vhost: prepare memory regions addresses X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Prepare memory regions guest physical addresses for vectorized data path. These information will be utilized by SIMD instructions to find matched region index. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5a5c945551..4a81f18f01 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -52,6 +52,8 @@ #define ASYNC_MAX_POLL_SEG 255 +#define MAX_NREGIONS 8 + #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2) #define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 2) @@ -375,6 +377,8 @@ struct inflight_mem_info { struct virtio_net { /* Frontend (QEMU) memory and memory region information */ struct rte_vhost_memory *mem; + uint64_t regions_low_addrs[MAX_NREGIONS]; + uint64_t regions_high_addrs[MAX_NREGIONS]; uint64_t features; uint64_t protocol_features; int vid; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index c3c924faec..89e75e9e71 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1291,6 +1291,17 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, } } + RTE_BUILD_BUG_ON(VHOST_MEMORY_MAX_NREGIONS != 8); + if (dev->vectorized) { + for (i = 0; i < memory->nregions; i++) { + dev->regions_low_addrs[i] = + memory->regions[i].guest_phys_addr; + dev->regions_high_addrs[i] = + memory->regions[i].guest_phys_addr + + memory->regions[i].memory_size; + } + } + for (i = 0; i < dev->nr_vring; i++) { struct vhost_virtqueue *vq = dev->virtqueue[i]; From patchwork Wed Aug 19 03:24:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 75699 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id ED24FA04AF; Wed, 19 Aug 2020 05:25:29 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 58F791C0C0; Wed, 19 Aug 2020 05:25:00 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id A1F0A1C0B3 for ; Wed, 19 Aug 2020 05:24:58 +0200 (CEST) IronPort-SDR: A9v5QaaTvqj5z0NRWkySxh2PVoR47ijsagLnjD0YUCUIi+UORVMBreF2L0NXZyRf9M/I0m8SLR hnsj6NWbrsGA== X-IronPort-AV: E=McAfee;i="6000,8403,9717"; a="156113180" X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="156113180" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2020 20:24:58 -0700 IronPort-SDR: 3GdbOvQ8wte2lWoqXWQr9CWgfZGQPkYeQt+t/gu+CrMkvfWnaM24+CQLbHwDU/h2CHz6MDpsCs kaOAVbjv7f2w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="441452734" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.56]) by orsmga004.jf.intel.com with ESMTP; 18 Aug 2020 20:24:56 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, chenbo.xia@intel.com, zhihong.wang@intel.com Cc: dev@dpdk.org, Marvin Liu Date: Wed, 19 Aug 2020 11:24:13 +0800 Message-Id: <20200819032414.51430-5-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200819032414.51430-1-yong.liu@intel.com> References: <20200819032414.51430-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v1 4/5] vhost: add packed ring vectorized dequeue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Optimize vhost packed ring dequeue path with SIMD instructions. Four descriptors status check and writeback are batched handled with AVX512 instructions. Address translation operations are also accelerated by AVX512 instructions. If platform or compiler not support vectorization, will fallback to default path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 4f2f3e47da..c0cd7d498f 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -31,6 +31,13 @@ CFLAGS += -DVHOST_ICC_UNROLL_PRAGMA endif endif +ifneq ($(FORCE_DISABLE_AVX512), y) + CC_AVX512_SUPPORT=\ + $(shell $(CC) -march=native -dM -E - &1 | \ + sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' | \ + grep -q AVX512 && echo 1) +endif + ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y) LDLIBS += -lnuma endif @@ -40,6 +47,12 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \ vhost_user.c virtio_net.c vdpa.c +ifeq ($(CC_AVX512_SUPPORT), 1) +CFLAGS += -DCC_AVX512_SUPPORT +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_vec_avx.c +CFLAGS_vhost_vec_avx.o += -mavx512f -mavx512bw -mavx512vl +endif + # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h \ rte_vdpa_dev.h rte_vhost_async.h diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build index cc9aa65c67..c1481802d7 100644 --- a/lib/librte_vhost/meson.build +++ b/lib/librte_vhost/meson.build @@ -8,6 +8,22 @@ endif if has_libnuma == 1 dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true) endif + +if arch_subdir == 'x86' + if not machine_args.contains('-mno-avx512f') + if cc.has_argument('-mavx512f') and cc.has_argument('-mavx512vl') and cc.has_argument('-mavx512bw') + cflags += ['-DCC_AVX512_SUPPORT'] + vhost_avx512_lib = static_library('vhost_avx512_lib', + 'vhost_vec_avx.c', + dependencies: [static_rte_eal, static_rte_mempool, + static_rte_mbuf, static_rte_ethdev, static_rte_net], + include_directories: includes, + c_args: [cflags, '-mavx512f', '-mavx512bw', '-mavx512vl']) + objs += vhost_avx512_lib.extract_objects('vhost_vec_avx.c') + endif + endif +endif + if (toolchain == 'gcc' and cc.version().version_compare('>=8.3.0')) cflags += '-DVHOST_GCC_UNROLL_PRAGMA' elif (toolchain == 'clang' and cc.version().version_compare('>=3.7.0')) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 4a81f18f01..fc7daf2145 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -1124,4 +1124,12 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, return NULL; } +int +vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev, + struct vhost_virtqueue *vq, + struct rte_mempool *mbuf_pool, + struct rte_mbuf **pkts, + uint16_t avail_idx, + uintptr_t *desc_addrs, + uint16_t *ids); #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/vhost_vec_avx.c b/lib/librte_vhost/vhost_vec_avx.c new file mode 100644 index 0000000000..e8361d18fa --- /dev/null +++ b/lib/librte_vhost/vhost_vec_avx.c @@ -0,0 +1,152 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2016 Intel Corporation + */ +#include + +#include "vhost.h" + +#define BYTE_SIZE 8 +/* reference count offset in mbuf rearm data */ +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \ + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE) +/* segment number offset in mbuf rearm data */ +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \ + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE) + +/* default rearm data */ +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \ + 1ULL << REFCNT_BITS_OFFSET) + +#define DESC_FLAGS_SHORT_OFFSET (offsetof(struct vring_packed_desc, flags) / \ + sizeof(uint16_t)) + +#define DESC_FLAGS_SHORT_SIZE (sizeof(struct vring_packed_desc) / \ + sizeof(uint16_t)) +#define BATCH_FLAGS_MASK (1 << DESC_FLAGS_SHORT_OFFSET | \ + 1 << (DESC_FLAGS_SHORT_OFFSET + DESC_FLAGS_SHORT_SIZE) | \ + 1 << (DESC_FLAGS_SHORT_OFFSET + DESC_FLAGS_SHORT_SIZE * 2) | \ + 1 << (DESC_FLAGS_SHORT_OFFSET + DESC_FLAGS_SHORT_SIZE * 3)) + +#define FLAGS_BITS_OFFSET ((offsetof(struct vring_packed_desc, flags) - \ + offsetof(struct vring_packed_desc, len)) * BYTE_SIZE) + +#define PACKED_FLAGS_MASK ((0ULL | VRING_DESC_F_AVAIL | VRING_DESC_F_USED) \ + << FLAGS_BITS_OFFSET) +#define PACKED_AVAIL_FLAG ((0ULL | VRING_DESC_F_AVAIL) << FLAGS_BITS_OFFSET) +#define PACKED_AVAIL_FLAG_WRAP ((0ULL | VRING_DESC_F_USED) << \ + FLAGS_BITS_OFFSET) + +#define DESC_FLAGS_POS 0xaa +#define MBUF_LENS_POS 0x6666 + +int +vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev, + struct vhost_virtqueue *vq, + struct rte_mempool *mbuf_pool, + struct rte_mbuf **pkts, + uint16_t avail_idx, + uintptr_t *desc_addrs, + uint16_t *ids) +{ + struct vring_packed_desc *descs = vq->desc_packed; + uint32_t descs_status; + void *desc_addr; + uint16_t i; + uint8_t cmp_low, cmp_high, cmp_result; + uint64_t lens[PACKED_BATCH_SIZE]; + + if (unlikely(avail_idx & PACKED_BATCH_MASK)) + return -1; + + /* load 4 descs */ + desc_addr = &vq->desc_packed[avail_idx]; + __m512i desc_vec = _mm512_loadu_si512(desc_addr); + + /* burst check four status */ + __m512i avail_flag_vec; + if (vq->avail_wrap_counter) +#if defined(RTE_ARCH_I686) + avail_flag_vec = _mm512_set4_epi64(PACKED_AVAIL_FLAG, 0x0, + PACKED_FLAGS_MASK, 0x0); +#else + avail_flag_vec = _mm512_maskz_set1_epi64(DESC_FLAGS_POS, + PACKED_AVAIL_FLAG); + +#endif + else +#if defined(RTE_ARCH_I686) + avail_flag_vec = _mm512_set4_epi64(PACKED_AVAIL_FLAG_WRAP, + 0x0, PACKED_AVAIL_FLAG_WRAP, 0x0); +#else + avail_flag_vec = _mm512_maskz_set1_epi64(DESC_FLAGS_POS, + PACKED_AVAIL_FLAG_WRAP); +#endif + + descs_status = _mm512_cmp_epu16_mask(desc_vec, avail_flag_vec, + _MM_CMPINT_NE); + if (descs_status & BATCH_FLAGS_MASK) + return -1; + + /* check buffer fit into one region & translate address */ + __m512i regions_low_addrs = + _mm512_loadu_si512((void *)&dev->regions_low_addrs); + __m512i regions_high_addrs = + _mm512_loadu_si512((void *)&dev->regions_high_addrs); + vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + uint64_t addr_low = descs[avail_idx + i].addr; + uint64_t addr_high = addr_low + descs[avail_idx + i].len; + __m512i low_addr_vec = _mm512_set1_epi64(addr_low); + __m512i high_addr_vec = _mm512_set1_epi64(addr_high); + + cmp_low = _mm512_cmp_epi64_mask(low_addr_vec, + regions_low_addrs, _MM_CMPINT_NLT); + cmp_high = _mm512_cmp_epi64_mask(high_addr_vec, + regions_high_addrs, _MM_CMPINT_LT); + cmp_result = cmp_low & cmp_high; + int index = __builtin_ctz(cmp_result); + if (unlikely((uint32_t)index >= dev->mem->nregions)) + goto free_buf; + + desc_addrs[i] = addr_low + + dev->mem->regions[index].host_user_addr - + dev->mem->regions[index].guest_phys_addr; + lens[i] = descs[avail_idx + i].len; + rte_prefetch0((void *)(uintptr_t)desc_addrs[i]); + + pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, lens[i]); + if (!pkts[i]) + goto free_buf; + } + + if (unlikely(virtio_net_is_inorder(dev))) { + ids[PACKED_BATCH_SIZE - 1] = + descs[avail_idx + PACKED_BATCH_SIZE - 1].id; + } else { + vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) + ids[i] = descs[avail_idx + i].id; + } + + uint64_t addrs[PACKED_BATCH_SIZE << 1]; + /* store mbuf data_len, pkt_len */ + vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + addrs[i << 1] = (uint64_t)pkts[i]->rx_descriptor_fields1; + addrs[(i << 1) + 1] = (uint64_t)pkts[i]->rx_descriptor_fields1 + + sizeof(uint64_t); + } + + /* save pkt_len and data_len into mbufs */ + __m512i value_vec = _mm512_maskz_shuffle_epi32(MBUF_LENS_POS, desc_vec, + 0xAA); + __m512i offsets_vec = _mm512_maskz_set1_epi32(MBUF_LENS_POS, + (uint32_t)-12); + value_vec = _mm512_add_epi32(value_vec, offsets_vec); + __m512i vindex = _mm512_loadu_si512((void *)addrs); + _mm512_i64scatter_epi64(0, vindex, value_vec, 1); + + return 0; +free_buf: + for (i = 0; i < PACKED_BATCH_SIZE; i++) + rte_pktmbuf_free(pkts[i]); + + return -1; +} diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 6107662685..e4d2e2e7d6 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -2249,6 +2249,28 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev, return -1; } +static __rte_always_inline int +vhost_handle_avail_batch_packed(struct virtio_net *dev, + struct vhost_virtqueue *vq, + struct rte_mempool *mbuf_pool, + struct rte_mbuf **pkts, + uint16_t avail_idx, + uintptr_t *desc_addrs, + uint16_t *ids) +{ + if (unlikely(dev->vectorized)) +#ifdef CC_AVX512_SUPPORT + return vhost_reserve_avail_batch_packed_avx(dev, vq, mbuf_pool, + pkts, avail_idx, desc_addrs, ids); +#else + return vhost_reserve_avail_batch_packed(dev, vq, mbuf_pool, + pkts, avail_idx, desc_addrs, ids); + +#endif + return vhost_reserve_avail_batch_packed(dev, vq, mbuf_pool, pkts, + avail_idx, desc_addrs, ids); +} + static __rte_always_inline int virtio_dev_tx_batch_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, @@ -2261,8 +2283,9 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev, uint16_t ids[PACKED_BATCH_SIZE]; uint16_t i; - if (vhost_reserve_avail_batch_packed(dev, vq, mbuf_pool, pkts, - avail_idx, desc_addrs, ids)) + + if (vhost_handle_avail_batch_packed(dev, vq, mbuf_pool, pkts, + avail_idx, desc_addrs, ids)) return -1; vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) From patchwork Wed Aug 19 03:24:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 75700 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5B783A04AF; Wed, 19 Aug 2020 05:25:41 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 358671C0D5; Wed, 19 Aug 2020 05:25:03 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id ADFBA1C0C4 for ; Wed, 19 Aug 2020 05:25:00 +0200 (CEST) IronPort-SDR: UROdfDjf/ySS6Hxk9xvn94CP975h8C9Whq+Llmi1MlgWfXZm1EEMP5+Koi+B4lgmVDe7GyK0Wf +qXmNezEQXLw== X-IronPort-AV: E=McAfee;i="6000,8403,9717"; a="156113182" X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="156113182" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2020 20:25:00 -0700 IronPort-SDR: SSIDPTFDp9FXJraNpvl8gQy7/pZUec8MEQ0QhL6PVlCq/WxynTlNUoSr1e8SZQ5A4VCXIYua7d +FqBCuMSh4dQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,329,1592895600"; d="scan'208";a="441452738" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.56]) by orsmga004.jf.intel.com with ESMTP; 18 Aug 2020 20:24:58 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, chenbo.xia@intel.com, zhihong.wang@intel.com Cc: dev@dpdk.org, Marvin Liu Date: Wed, 19 Aug 2020 11:24:14 +0800 Message-Id: <20200819032414.51430-6-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200819032414.51430-1-yong.liu@intel.com> References: <20200819032414.51430-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v1 5/5] vhost: add packed ring vectorized enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Optimize vhost packed ring enqueue path with SIMD instructions. Four descriptors status and length are batched handled with AVX512 instructions. Address translation operations are also accelerated by AVX512 instructions. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index fc7daf2145..b78b2c5c1b 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -1132,4 +1132,10 @@ vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev, uint16_t avail_idx, uintptr_t *desc_addrs, uint16_t *ids); + +int +virtio_dev_rx_batch_packed_avx(struct virtio_net *dev, + struct vhost_virtqueue *vq, + struct rte_mbuf **pkts); + #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/vhost_vec_avx.c b/lib/librte_vhost/vhost_vec_avx.c index e8361d18fa..12b902253a 100644 --- a/lib/librte_vhost/vhost_vec_avx.c +++ b/lib/librte_vhost/vhost_vec_avx.c @@ -35,9 +35,15 @@ #define PACKED_AVAIL_FLAG ((0ULL | VRING_DESC_F_AVAIL) << FLAGS_BITS_OFFSET) #define PACKED_AVAIL_FLAG_WRAP ((0ULL | VRING_DESC_F_USED) << \ FLAGS_BITS_OFFSET) +#define PACKED_WRITE_AVAIL_FLAG (PACKED_AVAIL_FLAG | \ + ((0ULL | VRING_DESC_F_WRITE) << FLAGS_BITS_OFFSET)) +#define PACKED_WRITE_AVAIL_FLAG_WRAP (PACKED_AVAIL_FLAG_WRAP | \ + ((0ULL | VRING_DESC_F_WRITE) << FLAGS_BITS_OFFSET)) #define DESC_FLAGS_POS 0xaa #define MBUF_LENS_POS 0x6666 +#define DESC_LENS_POS 0x4444 +#define DESC_LENS_FLAGS_POS 0xB0B0B0B0 int vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev, @@ -150,3 +156,137 @@ vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev, return -1; } + +int +virtio_dev_rx_batch_packed_avx(struct virtio_net *dev, + struct vhost_virtqueue *vq, + struct rte_mbuf **pkts) +{ + struct vring_packed_desc *descs = vq->desc_packed; + uint16_t avail_idx = vq->last_avail_idx; + uint64_t desc_addrs[PACKED_BATCH_SIZE]; + uint32_t buf_offset = dev->vhost_hlen; + uint32_t desc_status; + uint64_t lens[PACKED_BATCH_SIZE]; + uint16_t i; + void *desc_addr; + uint8_t cmp_low, cmp_high, cmp_result; + + if (unlikely(avail_idx & PACKED_BATCH_MASK)) + return -1; + + /* check refcnt and nb_segs */ + __m256i mbuf_ref = _mm256_set1_epi64x(DEFAULT_REARM_DATA); + + /* load four mbufs rearm data */ + __m256i mbufs = _mm256_set_epi64x( + *pkts[3]->rearm_data, + *pkts[2]->rearm_data, + *pkts[1]->rearm_data, + *pkts[0]->rearm_data); + + uint16_t cmp = _mm256_cmpneq_epu16_mask(mbufs, mbuf_ref); + if (cmp & MBUF_LENS_POS) + return -1; + + /* check desc status */ + desc_addr = &vq->desc_packed[avail_idx]; + __m512i desc_vec = _mm512_loadu_si512(desc_addr); + + __m512i avail_flag_vec; + __m512i used_flag_vec; + if (vq->avail_wrap_counter) { +#if defined(RTE_ARCH_I686) + avail_flag_vec = _mm512_set4_epi64(PACKED_WRITE_AVAIL_FLAG, + 0x0, PACKED_WRITE_AVAIL_FLAG, 0x0); + used_flag_vec = _mm512_set4_epi64(PACKED_FLAGS_MASK, 0x0, + PACKED_FLAGS_MASK, 0x0); +#else + avail_flag_vec = _mm512_maskz_set1_epi64(DESC_FLAGS_POS, + PACKED_WRITE_AVAIL_FLAG); + used_flag_vec = _mm512_maskz_set1_epi64(DESC_FLAGS_POS, + PACKED_FLAGS_MASK); +#endif + } else { +#if defined(RTE_ARCH_I686) + avail_flag_vec = _mm512_set4_epi64( + PACKED_WRITE_AVAIL_FLAG_WRAP, 0x0, + PACKED_WRITE_AVAIL_FLAG, 0x0); + used_flag_vec = _mm512_set4_epi64(0x0, 0x0, 0x0, 0x0); +#else + avail_flag_vec = _mm512_maskz_set1_epi64(DESC_FLAGS_POS, + PACKED_WRITE_AVAIL_FLAG_WRAP); + used_flag_vec = _mm512_setzero_epi32(); +#endif + } + + desc_status = _mm512_mask_cmp_epu16_mask(BATCH_FLAGS_MASK, desc_vec, + avail_flag_vec, _MM_CMPINT_NE); + if (desc_status) + return -1; + + /* check buffer fit into one region & translate address */ + __m512i regions_low_addrs = + _mm512_loadu_si512((void *)&dev->regions_low_addrs); + __m512i regions_high_addrs = + _mm512_loadu_si512((void *)&dev->regions_high_addrs); + vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + uint64_t addr_low = descs[avail_idx + i].addr; + uint64_t addr_high = addr_low + descs[avail_idx + i].len; + __m512i low_addr_vec = _mm512_set1_epi64(addr_low); + __m512i high_addr_vec = _mm512_set1_epi64(addr_high); + + cmp_low = _mm512_cmp_epi64_mask(low_addr_vec, + regions_low_addrs, _MM_CMPINT_NLT); + cmp_high = _mm512_cmp_epi64_mask(high_addr_vec, + regions_high_addrs, _MM_CMPINT_LT); + cmp_result = cmp_low & cmp_high; + int index = __builtin_ctz(cmp_result); + if (unlikely((uint32_t)index >= dev->mem->nregions)) + return -1; + + desc_addrs[i] = addr_low + + dev->mem->regions[index].host_user_addr - + dev->mem->regions[index].guest_phys_addr; + rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[i], void *, 0)); + } + + /* check length is enough */ + __m512i pkt_lens = _mm512_set_epi32( + 0, pkts[3]->pkt_len, 0, 0, + 0, pkts[2]->pkt_len, 0, 0, + 0, pkts[1]->pkt_len, 0, 0, + 0, pkts[0]->pkt_len, 0, 0); + + __m512i mbuf_len_offset = _mm512_maskz_set1_epi32(DESC_LENS_POS, + dev->vhost_hlen); + __m512i buf_len_vec = _mm512_add_epi32(pkt_lens, mbuf_len_offset); + uint16_t lens_cmp = _mm512_mask_cmp_epu32_mask(DESC_LENS_POS, + desc_vec, buf_len_vec, _MM_CMPINT_LT); + if (lens_cmp) + return -1; + + vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + rte_memcpy((void *)(uintptr_t)(desc_addrs[i] + buf_offset), + rte_pktmbuf_mtod_offset(pkts[i], void *, 0), + pkts[i]->pkt_len); + } + + if (unlikely((dev->features & (1ULL << VHOST_F_LOG_ALL)))) { + vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + lens[i] = descs[avail_idx + i].len; + vhost_log_cache_write_iova(dev, vq, + descs[avail_idx + i].addr, lens[i]); + } + } + + vq_inc_last_avail_packed(vq, PACKED_BATCH_SIZE); + vq_inc_last_used_packed(vq, PACKED_BATCH_SIZE); + /* save len and flags, skip addr and id */ + __m512i desc_updated = _mm512_mask_add_epi16(desc_vec, + DESC_LENS_FLAGS_POS, buf_len_vec, + used_flag_vec); + _mm512_storeu_si512(desc_addr, desc_updated); + + return 0; +} diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index e4d2e2e7d6..5c56a8d6ff 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1354,6 +1354,21 @@ virtio_dev_rx_single_packed(struct virtio_net *dev, return 0; } +static __rte_always_inline int +virtio_dev_rx_handle_batch_packed(struct virtio_net *dev, + struct vhost_virtqueue *vq, + struct rte_mbuf **pkts) + +{ + if (unlikely(dev->vectorized)) +#ifdef CC_AVX512_SUPPORT + return virtio_dev_rx_batch_packed_avx(dev, vq, pkts); +#else + return virtio_dev_rx_batch_packed(dev, vq, pkts); +#endif + return virtio_dev_rx_batch_packed(dev, vq, pkts); +} + static __rte_noinline uint32_t virtio_dev_rx_packed(struct virtio_net *dev, struct vhost_virtqueue *__rte_restrict vq, @@ -1367,8 +1382,8 @@ virtio_dev_rx_packed(struct virtio_net *dev, rte_prefetch0(&vq->desc_packed[vq->last_avail_idx]); if (remained >= PACKED_BATCH_SIZE) { - if (!virtio_dev_rx_batch_packed(dev, vq, - &pkts[pkt_idx])) { + if (!virtio_dev_rx_handle_batch_packed(dev, vq, + &pkts[pkt_idx])) { pkt_idx += PACKED_BATCH_SIZE; remained -= PACKED_BATCH_SIZE; continue;