From patchwork Wed Aug 14 17:49:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Christensen X-Patchwork-Id: 57683 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9A9BF375B; Wed, 14 Aug 2019 19:50:06 +0200 (CEST) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by dpdk.org (Postfix) with ESMTP id D83A41252 for ; Wed, 14 Aug 2019 19:50:05 +0200 (CEST) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7EHlH13148179; Wed, 14 Aug 2019 13:50:04 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 2ucp47t2ux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Aug 2019 13:50:04 -0400 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.27/8.16.0.27) with SMTP id x7EHlMqH148558; Wed, 14 Aug 2019 13:50:04 -0400 Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0b-001b2d01.pphosted.com with ESMTP id 2ucp47t2uk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Aug 2019 13:50:04 -0400 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x7EHnqtb031547; Wed, 14 Aug 2019 17:50:03 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma01wdc.us.ibm.com with ESMTP id 2u9nj6wq7d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Aug 2019 17:50:03 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x7EHo28v54657518 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Aug 2019 17:50:02 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B84176A069; Wed, 14 Aug 2019 17:50:01 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6BDC06A047; Wed, 14 Aug 2019 17:50:01 +0000 (GMT) Received: from oprom9.localdomain (unknown [9.40.204.34]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Wed, 14 Aug 2019 17:50:01 +0000 (GMT) From: David Christensen To: maxime.coquelin@redhat.com, tiwei.bie@intel.com Cc: dev@dpdk.org, David Christensen Date: Wed, 14 Aug 2019 12:49:53 -0500 Message-Id: <1565804993-56187-1-git-send-email-drc@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.3.1 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-08-14_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=800 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1908140159 Subject: [dpdk-dev] [PATCH] net/virtio: Add support for vectorized functions on Power systems X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Added the file virtio_rxtx_simple_altivec.c which implements Altivec code for the virtio vectorized RX functions. Updated the various build files. Cc: Maxime Coquelin Cc: Tiwei Bie Signed-off-by: David Christensen Reviewed-by: Maxime Coquelin --- drivers/net/virtio/Makefile | 2 + drivers/net/virtio/meson.build | 2 + drivers/net/virtio/virtio_rxtx_simple_altivec.c | 202 ++++++++++++++++++++++++ 3 files changed, 206 insertions(+) create mode 100644 drivers/net/virtio/virtio_rxtx_simple_altivec.c diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile index 6c2c996..5144e7c 100644 --- a/drivers/net/virtio/Makefile +++ b/drivers/net/virtio/Makefile @@ -33,6 +33,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c ifeq ($(CONFIG_RTE_ARCH_X86),y) SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_sse.c +else ifeq ($(CONFIG_RTE_ARCH_PPC_64),y) +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_altivec.c else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c endif diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build index 7949054..04c7fdf 100644 --- a/drivers/net/virtio/meson.build +++ b/drivers/net/virtio/meson.build @@ -11,6 +11,8 @@ deps += ['kvargs', 'bus_pci'] if arch_subdir == 'x86' sources += files('virtio_rxtx_simple_sse.c') +elif arch_subdir == 'ppc_64' + sources += files('virtio_rxtx_simple_altivec.c') elif arch_subdir == 'arm' and host_machine.cpu_family().startswith('aarch64') sources += files('virtio_rxtx_simple_neon.c') endif diff --git a/drivers/net/virtio/virtio_rxtx_simple_altivec.c b/drivers/net/virtio/virtio_rxtx_simple_altivec.c new file mode 100644 index 0000000..f4eb4eb --- /dev/null +++ b/drivers/net/virtio/virtio_rxtx_simple_altivec.c @@ -0,0 +1,202 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2015 Intel Corporation + */ + +#include +#include +#include +#include +#include + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "virtio_rxtx_simple.h" + +#define RTE_VIRTIO_DESC_PER_LOOP 8 + +/* virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP) + * + * This routine is for non-mergeable RX, one desc for each guest buffer. + * This routine is based on the RX ring layout optimization. Each entry in the + * avail ring points to the desc with the same index in the desc ring and this + * will never be changed in the driver. + * + * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet + */ +uint16_t +virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts) +{ + struct virtnet_rx *rxvq = rx_queue; + struct virtqueue *vq = rxvq->vq; + struct virtio_hw *hw = vq->hw; + uint16_t nb_used; + uint16_t desc_idx; + struct vring_used_elem *rused; + struct rte_mbuf **sw_ring; + struct rte_mbuf **sw_ring_end; + uint16_t nb_pkts_received = 0; + const vector unsigned char zero = {0}; + + const vector unsigned char shuf_msk1 = { + 0xFF, 0xFF, 0xFF, 0xFF, /* packet type */ + 4, 5, 0xFF, 0xFF, /* vlan tci */ + 4, 5, /* dat len */ + 0xFF, 0xFF, /* vlan tci */ + 0xFF, 0xFF, 0xFF, 0xFF + }; + + const vector unsigned char shuf_msk2 = { + 0xFF, 0xFF, 0xFF, 0xFF, /* packet type */ + 12, 13, 0xFF, 0xFF, /* pkt len */ + 12, 13, /* dat len */ + 0xFF, 0xFF, /* vlan tci */ + 0xFF, 0xFF, 0xFF, 0xFF + }; + + /* + * Subtract the header length. + * In which case do we need the header length in used->len ? + */ + const vector unsigned short len_adjust = { + 0, 0, + (uint16_t)-vq->hw->vtnet_hdr_size, 0, + (uint16_t)-vq->hw->vtnet_hdr_size, 0, + 0, 0 + }; + + if (unlikely(hw->started == 0)) + return nb_pkts_received; + + if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP)) + return 0; + + nb_used = VIRTQUEUE_NUSED(vq); + + rte_compiler_barrier(); + + if (unlikely(nb_used == 0)) + return 0; + + nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP); + nb_used = RTE_MIN(nb_used, nb_pkts); + + desc_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1)); + rused = &vq->vq_split.ring.used->ring[desc_idx]; + sw_ring = &vq->sw_ring[desc_idx]; + sw_ring_end = &vq->sw_ring[vq->vq_nentries]; + + rte_prefetch0(rused); + + if (vq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) { + virtio_rxq_rearm_vec(rxvq); + if (unlikely(virtqueue_kick_prepare(vq))) + virtqueue_notify(vq); + } + + for (nb_pkts_received = 0; + nb_pkts_received < nb_used;) { + vector unsigned char desc[RTE_VIRTIO_DESC_PER_LOOP / 2]; + vector unsigned char mbp[RTE_VIRTIO_DESC_PER_LOOP / 2]; + vector unsigned char pkt_mb[RTE_VIRTIO_DESC_PER_LOOP]; + + mbp[0] = vec_vsx_ld(0, (unsigned char const *)(sw_ring + 0)); + desc[0] = vec_vsx_ld(0, (unsigned char const *)(rused + 0)); + *(vector unsigned char *)&rx_pkts[0] = mbp[0]; + + mbp[1] = vec_vsx_ld(0, (unsigned char const *)(sw_ring + 2)); + desc[1] = vec_vsx_ld(0, (unsigned char const *)(rused + 2)); + *(vector unsigned char *)&rx_pkts[2] = mbp[1]; + + mbp[2] = vec_vsx_ld(0, (unsigned char const *)(sw_ring + 4)); + desc[2] = vec_vsx_ld(0, (unsigned char const *)(rused + 4)); + *(vector unsigned char *)&rx_pkts[4] = mbp[2]; + + mbp[3] = vec_vsx_ld(0, (unsigned char const *)(sw_ring + 6)); + desc[3] = vec_vsx_ld(0, (unsigned char const *)(rused + 6)); + *(vector unsigned char *)&rx_pkts[6] = mbp[3]; + + pkt_mb[0] = vec_perm(desc[0], zero, shuf_msk1); + pkt_mb[1] = vec_perm(desc[0], zero, shuf_msk2); + pkt_mb[0] = (vector unsigned char) + ((vector unsigned short)pkt_mb[0] + len_adjust); + pkt_mb[1] = (vector unsigned char) + ((vector unsigned short)pkt_mb[1] + len_adjust); + *(vector unsigned char *)&rx_pkts[0]->rx_descriptor_fields1 = + pkt_mb[0]; + *(vector unsigned char *)&rx_pkts[1]->rx_descriptor_fields1 = + pkt_mb[1]; + + pkt_mb[2] = vec_perm(desc[1], zero, shuf_msk1); + pkt_mb[3] = vec_perm(desc[1], zero, shuf_msk2); + pkt_mb[2] = (vector unsigned char) + ((vector unsigned short)pkt_mb[2] + len_adjust); + pkt_mb[3] = (vector unsigned char) + ((vector unsigned short)pkt_mb[3] + len_adjust); + *(vector unsigned char *)&rx_pkts[2]->rx_descriptor_fields1 = + pkt_mb[2]; + *(vector unsigned char *)&rx_pkts[3]->rx_descriptor_fields1 = + pkt_mb[3]; + + pkt_mb[4] = vec_perm(desc[2], zero, shuf_msk1); + pkt_mb[5] = vec_perm(desc[2], zero, shuf_msk2); + pkt_mb[4] = (vector unsigned char) + ((vector unsigned short)pkt_mb[4] + len_adjust); + pkt_mb[5] = (vector unsigned char) + ((vector unsigned short)pkt_mb[5] + len_adjust); + *(vector unsigned char *)&rx_pkts[4]->rx_descriptor_fields1 = + pkt_mb[4]; + *(vector unsigned char *)&rx_pkts[5]->rx_descriptor_fields1 = + pkt_mb[5]; + + pkt_mb[6] = vec_perm(desc[3], zero, shuf_msk1); + pkt_mb[7] = vec_perm(desc[3], zero, shuf_msk2); + pkt_mb[6] = (vector unsigned char) + ((vector unsigned short)pkt_mb[6] + len_adjust); + pkt_mb[7] = (vector unsigned char) + ((vector unsigned short)pkt_mb[7] + len_adjust); + *(vector unsigned char *)&rx_pkts[6]->rx_descriptor_fields1 = + pkt_mb[6]; + *(vector unsigned char *)&rx_pkts[7]->rx_descriptor_fields1 = + pkt_mb[7]; + + if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) { + if (sw_ring + nb_used <= sw_ring_end) + nb_pkts_received += nb_used; + else + nb_pkts_received += sw_ring_end - sw_ring; + break; + } else { + if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >= + sw_ring_end)) { + nb_pkts_received += sw_ring_end - sw_ring; + break; + } else { + nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP; + + rx_pkts += RTE_VIRTIO_DESC_PER_LOOP; + sw_ring += RTE_VIRTIO_DESC_PER_LOOP; + rused += RTE_VIRTIO_DESC_PER_LOOP; + nb_used -= RTE_VIRTIO_DESC_PER_LOOP; + } + } + } + + vq->vq_used_cons_idx += nb_pkts_received; + vq->vq_free_cnt += nb_pkts_received; + rxvq->stats.packets += nb_pkts_received; + return nb_pkts_received; +}