From patchwork Wed Sep 7 07:02:26 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?N=C3=A9lio_Laranjeiro?= X-Patchwork-Id: 15640 X-Patchwork-Delegate: bruce.richardson@intel.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 428E18DA0; Wed, 7 Sep 2016 09:03:01 +0200 (CEST) Received: from mail-wm0-f45.google.com (mail-wm0-f45.google.com [74.125.82.45]) by dpdk.org (Postfix) with ESMTP id A2C898D4D for ; Wed, 7 Sep 2016 09:02:54 +0200 (CEST) Received: by mail-wm0-f45.google.com with SMTP id i204so69668147wma.0 for ; Wed, 07 Sep 2016 00:02:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=0pwJDOa7hZirP+IImWbWzbnJZUzju/calqxPTR1R7ms=; b=Y9nlqFJHIywRnHepMp7kUFwb98tQJLPrccflOCIyeLo6naeqZEe7G0/+53xFbUcE/Y fiIW/wQIJ60/Non+tgYB8cKhpSvCU3BdJ0kz+DecXd3wfLhGCAXWmM+Z4Y4Mid3iHDHV 13YA3Y7VxdYbR//tomV1PZN3rV571TAGi17nYZVJ0y1aLS2Jg6POa2OrmYOXNFPsVv3d nTJNbQ1Dz6RUlV33GFZAHzKRUc4CJGfCgQeXMZzNCO/KhIJ0oL4XfVrzO8e9iqKvWgJI vSYpN54SxICVUl4oJcinJuBfXijMdB6gssz2/td+EYJl25uXcwyfAPiQrCgDsWJLe5MS cOvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=0pwJDOa7hZirP+IImWbWzbnJZUzju/calqxPTR1R7ms=; b=OHMOiTCRfoBZgt5BQ3vx5dPxrrhjGgi1v1hAKuUr5seFtbiegj9VM1tukjg+bfslG/ FbFMcJJGTcW7pHBpi/B1A8mFfKNPwFu7THDcoLVvVHdT+Tx4XIeaD16/Pn5eqjbCnJ9i KkvlgKzLuEUUq6JcK910PF/6/eXhP0LdZtM8Y1yedgx0IItS609wxS4a1QXKap9KmPPc aJI5izhhSXZ8LAJlSk+FRPDTu+LTaZ2Eq05eZJlPmvZ8KH2VyIa1ZdH+bqe3vkgmhTdg 7a05SlDYx59Yhlb0lBzaEdH1LjSENr3O2j0ueJuCDQKT1V2PEmrgmczGTX0LY3aXlb/Q izqQ== X-Gm-Message-State: AE9vXwOrmOm6JYSSl6nBY/zZMCOM4JmsOzG/xX+qEuRDjpmm8eVioafYfu8G7QvEtdE9NI7O X-Received: by 10.194.115.38 with SMTP id jl6mr39343106wjb.28.1473231774305; Wed, 07 Sep 2016 00:02:54 -0700 (PDT) Received: from ping.vm.6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id z17sm2556475wmz.23.2016.09.07.00.02.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 07 Sep 2016 00:02:53 -0700 (PDT) From: Nelio Laranjeiro To: dev@dpdk.org Cc: Vasily Philipov Date: Wed, 7 Sep 2016 09:02:26 +0200 Message-Id: <994cdb427701204461ea4f146f8dd240441648e7.1473230641.git.nelio.laranjeiro@6wind.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: References: Subject: [dpdk-dev] [PATCH 8/8] net/mlx5: fix inline logic X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" To improve performance the NIC expects for large packets to have a pointer to a cache aligned address, old inline code could break this assumption which hurts performance. Fixes: 2a66cf378954 ("net/mlx5: support inline send") Signed-off-by: Nelio Laranjeiro Signed-off-by: Vasily Philipov --- drivers/net/mlx5/mlx5_ethdev.c | 4 - drivers/net/mlx5/mlx5_rxtx.c | 424 ++++++++++------------------------------- drivers/net/mlx5/mlx5_rxtx.h | 3 +- drivers/net/mlx5/mlx5_txq.c | 9 +- 4 files changed, 104 insertions(+), 336 deletions(-) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 47f323e..1ae80e5 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -1398,10 +1398,6 @@ priv_select_tx_function(struct priv *priv) } else if ((priv->sriov == 0) && priv->mps) { priv->dev->tx_pkt_burst = mlx5_tx_burst_mpw; DEBUG("selected MPW TX function"); - } else if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) { - priv->dev->tx_pkt_burst = mlx5_tx_burst_inline; - DEBUG("selected inline TX function (%u >= %u queues)", - priv->txqs_n, priv->txqs_inline); } } diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index c7e538f..67e0f37 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -297,179 +297,99 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp) * Buffer. * @param length * Packet length. - * @param lkey - * Memory region lkey. + * + * @return ds + * Number of DS elements consumed. */ -static inline void +static inline unsigned int mlx5_wqe_write(struct txq *txq, volatile union mlx5_wqe *wqe, - struct rte_mbuf *buf, uint32_t length, uint32_t lkey) + struct rte_mbuf *buf, uint32_t length) { + uintptr_t raw = (uintptr_t)&wqe->wqe.eseg.inline_hdr_start; + uint16_t ds; + uint16_t pkt_inline_sz = 16; uintptr_t addr = rte_pktmbuf_mtod(buf, uintptr_t); + struct mlx5_wqe_data_seg *dseg = NULL; - rte_mov16((uint8_t *)&wqe->wqe.eseg.inline_hdr_start, - (uint8_t *)addr); - addr += 16; + assert(length >= 16); + /* Start the know and common part of the WQE structure. */ + wqe->wqe.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND); + wqe->wqe.ctrl.data[2] = 0; + wqe->wqe.ctrl.data[3] = 0; + wqe->wqe.eseg.rsvd0 = 0; + wqe->wqe.eseg.rsvd1 = 0; + wqe->wqe.eseg.mss = 0; + wqe->wqe.eseg.rsvd2 = 0; + /* Start by copying the Ethernet Header. */ + rte_mov16((uint8_t *)raw, (uint8_t *)addr); length -= 16; - /* Need to insert VLAN ? */ + addr += 16; + /* Replace the Ethernet type by the VLAN if necessary. */ if (buf->ol_flags & PKT_TX_VLAN_PKT) { uint32_t vlan = htonl(0x81000000 | buf->vlan_tci); - memcpy((uint8_t *)&wqe->wqe.eseg.inline_hdr_start + 12, + memcpy((uint8_t *)(raw + 16 - sizeof(vlan)), &vlan, sizeof(vlan)); addr -= sizeof(vlan); length += sizeof(vlan); } - /* Write the WQE. */ - wqe->wqe.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND); - wqe->wqe.ctrl.data[1] = htonl((txq->qp_num_8s) | 4); - wqe->wqe.ctrl.data[2] = 0; - wqe->wqe.ctrl.data[3] = 0; - wqe->inl.eseg.rsvd0 = 0; - wqe->inl.eseg.rsvd1 = 0; - wqe->inl.eseg.mss = 0; - wqe->inl.eseg.rsvd2 = 0; - wqe->wqe.eseg.inline_hdr_sz = htons(16); - /* Store remaining data in data segment. */ - wqe->wqe.dseg.byte_count = htonl(length); - wqe->wqe.dseg.lkey = lkey; - wqe->wqe.dseg.addr = htonll(addr); - /* Increment consumer index. */ - ++txq->wqe_ci; -} - -/** - * Write a inline WQE. - * - * @param txq - * Pointer to TX queue structure. - * @param wqe - * Pointer to the WQE to fill. - * @param addr - * Buffer data address. - * @param length - * Packet length. - * @param lkey - * Memory region lkey. - */ -static inline void -mlx5_wqe_write_inline(struct txq *txq, volatile union mlx5_wqe *wqe, - uintptr_t addr, uint32_t length) -{ - uint32_t size; - uint16_t wqe_cnt = txq->wqe_n - 1; - uint16_t wqe_ci = txq->wqe_ci + 1; - - /* Copy the first 16 bytes into inline header. */ - rte_memcpy((void *)(uintptr_t)wqe->inl.eseg.inline_hdr_start, - (void *)(uintptr_t)addr, - MLX5_ETH_INLINE_HEADER_SIZE); - addr += MLX5_ETH_INLINE_HEADER_SIZE; - length -= MLX5_ETH_INLINE_HEADER_SIZE; - size = 3 + ((4 + length + 15) / 16); - wqe->inl.byte_cnt = htonl(length | MLX5_INLINE_SEG); - rte_memcpy((void *)(uintptr_t)&wqe->inl.data[0], - (void *)addr, MLX5_WQE64_INL_DATA); - addr += MLX5_WQE64_INL_DATA; - length -= MLX5_WQE64_INL_DATA; - while (length) { - volatile union mlx5_wqe *wqe_next = - &(*txq->wqes)[wqe_ci & wqe_cnt]; - uint32_t copy_bytes = (length > sizeof(*wqe)) ? - sizeof(*wqe) : - length; - - rte_mov64((uint8_t *)(uintptr_t)&wqe_next->data[0], - (uint8_t *)addr); - addr += copy_bytes; - length -= copy_bytes; - ++wqe_ci; - } - assert(size < 64); - wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND); - wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size); - wqe->inl.ctrl.data[2] = 0; - wqe->inl.ctrl.data[3] = 0; - wqe->inl.eseg.rsvd0 = 0; - wqe->inl.eseg.rsvd1 = 0; - wqe->inl.eseg.mss = 0; - wqe->inl.eseg.rsvd2 = 0; - wqe->inl.eseg.inline_hdr_sz = htons(MLX5_ETH_INLINE_HEADER_SIZE); - /* Increment consumer index. */ - txq->wqe_ci = wqe_ci; -} - -/** - * Write a inline WQE with VLAN. - * - * @param txq - * Pointer to TX queue structure. - * @param wqe - * Pointer to the WQE to fill. - * @param addr - * Buffer data address. - * @param length - * Packet length. - * @param lkey - * Memory region lkey. - * @param vlan_tci - * VLAN field to insert in packet. - */ -static inline void -mlx5_wqe_write_inline_vlan(struct txq *txq, volatile union mlx5_wqe *wqe, - uintptr_t addr, uint32_t length, uint16_t vlan_tci) -{ - uint32_t size; - uint32_t wqe_cnt = txq->wqe_n - 1; - uint16_t wqe_ci = txq->wqe_ci + 1; - uint32_t vlan = htonl(0x81000000 | vlan_tci); - - /* - * Copy 12 bytes of source & destination MAC address. - * Copy 4 bytes of VLAN. - * Copy 2 bytes of Ether type. - */ - rte_memcpy((uint8_t *)(uintptr_t)wqe->inl.eseg.inline_hdr_start, - (uint8_t *)addr, 12); - rte_memcpy((uint8_t *)(uintptr_t)wqe->inl.eseg.inline_hdr_start + 12, - &vlan, sizeof(vlan)); - rte_memcpy((uint8_t *)((uintptr_t)wqe->inl.eseg.inline_hdr_start + 16), - (uint8_t *)(addr + 12), 2); - addr += MLX5_ETH_VLAN_INLINE_HEADER_SIZE - sizeof(vlan); - length -= MLX5_ETH_VLAN_INLINE_HEADER_SIZE - sizeof(vlan); - size = (sizeof(wqe->inl.ctrl.ctrl) + - sizeof(wqe->inl.eseg) + - sizeof(wqe->inl.byte_cnt) + - length + 15) / 16; - wqe->inl.byte_cnt = htonl(length | MLX5_INLINE_SEG); - rte_memcpy((void *)(uintptr_t)&wqe->inl.data[0], - (void *)addr, MLX5_WQE64_INL_DATA); - addr += MLX5_WQE64_INL_DATA; - length -= MLX5_WQE64_INL_DATA; - while (length) { - volatile union mlx5_wqe *wqe_next = - &(*txq->wqes)[wqe_ci & wqe_cnt]; - uint32_t copy_bytes = (length > sizeof(*wqe)) ? - sizeof(*wqe) : - length; - - rte_mov64((uint8_t *)(uintptr_t)&wqe_next->data[0], - (uint8_t *)addr); - addr += copy_bytes; - length -= copy_bytes; - ++wqe_ci; + /* Inline if enough room. */ + if (txq->max_inline != 0) { + uintptr_t end = (uintptr_t)&(*txq->wqes)[txq->wqe_n]; + uint16_t max_inline = txq->max_inline * RTE_CACHE_LINE_SIZE; + uint16_t room; + + raw += 16; + room = end - (uintptr_t)raw; + if (room > max_inline) { + uintptr_t addr_end = (addr + max_inline) & + ~(RTE_CACHE_LINE_SIZE - 1); + uint16_t copy_b = ((addr_end - addr) > length) ? + length : + (addr_end - addr); + + rte_memcpy((void *)raw, (void *)addr, copy_b); + addr += copy_b; + length -= copy_b; + pkt_inline_sz += copy_b; + /* Sanity check. */ + assert(addr <= addr_end); + } + /* Store the inlined packet size in the WQE. */ + wqe->wqe.eseg.inline_hdr_sz = htons(pkt_inline_sz); + /* + * 2 DWORDs consumed by the WQE header + 1 DSEG + + * the size of the inline part of the packet. + */ + ds = 2 + ((pkt_inline_sz - 2 + 15) / 16); + if (length > 0) { + dseg = (struct mlx5_wqe_data_seg *) + ((uintptr_t)wqe + (ds * 16)); + if ((uintptr_t)dseg >= end) + dseg = (struct mlx5_wqe_data_seg *) + ((uintptr_t)&(*txq->wqes)[0]); + goto use_dseg; + } + } else { + /* Add the remaining packet as a simple ds. */ + ds = 3; + /* + * No inline has been done in the packet, only the Ethernet + * Header as been stored. + */ + wqe->wqe.eseg.inline_hdr_sz = htons(16); + dseg = (struct mlx5_wqe_data_seg *) + ((uintptr_t)wqe + (ds * 16)); +use_dseg: + *dseg = (struct mlx5_wqe_data_seg) { + .addr = htonll(addr), + .byte_count = htonl(length), + .lkey = txq_mp2mr(txq, txq_mb2mp(buf)), + }; + ++ds; } - assert(size < 64); - wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND); - wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size); - wqe->inl.ctrl.data[2] = 0; - wqe->inl.ctrl.data[3] = 0; - wqe->inl.eseg.rsvd0 = 0; - wqe->inl.eseg.rsvd1 = 0; - wqe->inl.eseg.mss = 0; - wqe->inl.eseg.rsvd2 = 0; - wqe->inl.eseg.inline_hdr_sz = htons(MLX5_ETH_VLAN_INLINE_HEADER_SIZE); - /* Increment consumer index. */ - txq->wqe_ci = wqe_ci; + wqe->wqe.ctrl.data[1] = htonl(txq->qp_num_8s | ds); + return ds; } /** @@ -570,7 +490,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) struct rte_mbuf *buf = *(pkts++); unsigned int elts_head_next; uint32_t length; - uint32_t lkey; unsigned int segs_n = buf->nb_segs; volatile struct mlx5_wqe_data_seg *dseg; unsigned int ds = sizeof(*wqe) / 16; @@ -586,8 +505,8 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) --pkts_n; elts_head_next = (elts_head + 1) & (elts_n - 1); wqe = &(*txq->wqes)[txq->wqe_ci & (txq->wqe_n - 1)]; - dseg = &wqe->wqe.dseg; - rte_prefetch0(wqe); + tx_prefetch_wqe(txq, txq->wqe_ci); + tx_prefetch_wqe(txq, txq->wqe_ci + 1); if (pkts_n) rte_prefetch0(*pkts); length = DATA_LEN(buf); @@ -597,9 +516,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) if (pkts_n) rte_prefetch0(rte_pktmbuf_mtod(*pkts, volatile void *)); - /* Retrieve Memory Region key for this memory pool. */ - lkey = txq_mp2mr(txq, txq_mb2mp(buf)); - mlx5_wqe_write(txq, wqe, buf, length, lkey); /* Should we enable HW CKSUM offload */ if (buf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) { @@ -607,8 +523,13 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; } else { - wqe->wqe.eseg.cs_flags = 0; + wqe->eseg.cs_flags = 0; } + ds = mlx5_wqe_write(txq, wqe, buf, length); + if (segs_n == 1) + goto skip_segs; + dseg = (volatile struct mlx5_wqe_data_seg *) + (((uintptr_t)wqe) + ds * 16); while (--segs_n) { /* * Spill on next WQE when the current one does not have @@ -639,11 +560,13 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) /* Update DS field in WQE. */ wqe->wqe.ctrl.data[1] &= htonl(0xffffffc0); wqe->wqe.ctrl.data[1] |= htonl(ds & 0x3f); - elts_head = elts_head_next; +skip_segs: #ifdef MLX5_PMD_SOFT_COUNTERS /* Increment sent bytes counter. */ txq->stats.obytes += length; #endif + /* Increment consumer index. */ + txq->wqe_ci += (ds + 3) / 4; elts_head = elts_head_next; ++i; } while (pkts_n); @@ -672,162 +595,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) } /** - * DPDK callback for TX with inline support. - * - * @param dpdk_txq - * Generic pointer to TX queue structure. - * @param[in] pkts - * Packets to transmit. - * @param pkts_n - * Number of packets in array. - * - * @return - * Number of packets successfully transmitted (<= pkts_n). - */ -uint16_t -mlx5_tx_burst_inline(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) -{ - struct txq *txq = (struct txq *)dpdk_txq; - uint16_t elts_head = txq->elts_head; - const unsigned int elts_n = txq->elts_n; - unsigned int i = 0; - unsigned int j = 0; - unsigned int max; - unsigned int comp; - volatile union mlx5_wqe *wqe = NULL; - unsigned int max_inline = txq->max_inline; - - if (unlikely(!pkts_n)) - return 0; - /* Prefetch first packet cacheline. */ - tx_prefetch_cqe(txq, txq->cq_ci); - tx_prefetch_cqe(txq, txq->cq_ci + 1); - rte_prefetch0(*pkts); - /* Start processing. */ - txq_complete(txq); - max = (elts_n - (elts_head - txq->elts_tail)); - if (max > elts_n) - max -= elts_n; - do { - struct rte_mbuf *buf = *(pkts++); - unsigned int elts_head_next; - uintptr_t addr; - uint32_t length; - uint32_t lkey; - unsigned int segs_n = buf->nb_segs; - volatile struct mlx5_wqe_data_seg *dseg; - unsigned int ds = sizeof(*wqe) / 16; - - /* - * Make sure there is enough room to store this packet and - * that one ring entry remains unused. - */ - assert(segs_n); - if (max < segs_n + 1) - break; - max -= segs_n; - --pkts_n; - elts_head_next = (elts_head + 1) & (elts_n - 1); - wqe = &(*txq->wqes)[txq->wqe_ci & (txq->wqe_n - 1)]; - dseg = &wqe->wqe.dseg; - tx_prefetch_wqe(txq, txq->wqe_ci); - tx_prefetch_wqe(txq, txq->wqe_ci + 1); - if (pkts_n) - rte_prefetch0(*pkts); - /* Should we enable HW CKSUM offload */ - if (buf->ol_flags & - (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) { - wqe->inl.eseg.cs_flags = - MLX5_ETH_WQE_L3_CSUM | - MLX5_ETH_WQE_L4_CSUM; - } else { - wqe->inl.eseg.cs_flags = 0; - } - /* Retrieve buffer information. */ - addr = rte_pktmbuf_mtod(buf, uintptr_t); - length = DATA_LEN(buf); - /* Update element. */ - (*txq->elts)[elts_head] = buf; - /* Prefetch next buffer data. */ - if (pkts_n) - rte_prefetch0(rte_pktmbuf_mtod(*pkts, - volatile void *)); - if ((length <= max_inline) && (segs_n == 1)) { - if (buf->ol_flags & PKT_TX_VLAN_PKT) - mlx5_wqe_write_inline_vlan(txq, wqe, - addr, length, - buf->vlan_tci); - else - mlx5_wqe_write_inline(txq, wqe, addr, length); - goto skip_segs; - } else { - /* Retrieve Memory Region key for this memory pool. */ - lkey = txq_mp2mr(txq, txq_mb2mp(buf)); - mlx5_wqe_write(txq, wqe, buf, length, lkey); - } - while (--segs_n) { - /* - * Spill on next WQE when the current one does not have - * enough room left. Size of WQE must a be a multiple - * of data segment size. - */ - assert(!(sizeof(*wqe) % sizeof(*dseg))); - if (!(ds % (sizeof(*wqe) / 16))) - dseg = (volatile void *) - &(*txq->wqes)[txq->wqe_ci++ & - (txq->wqe_n - 1)]; - else - ++dseg; - ++ds; - buf = buf->next; - assert(buf); - /* Store segment information. */ - dseg->byte_count = htonl(DATA_LEN(buf)); - dseg->lkey = txq_mp2mr(txq, txq_mb2mp(buf)); - dseg->addr = htonll(rte_pktmbuf_mtod(buf, uintptr_t)); - (*txq->elts)[elts_head_next] = buf; - elts_head_next = (elts_head_next + 1) & (elts_n - 1); -#ifdef MLX5_PMD_SOFT_COUNTERS - length += DATA_LEN(buf); -#endif - ++j; - } - /* Update DS field in WQE. */ - wqe->inl.ctrl.data[1] &= htonl(0xffffffc0); - wqe->inl.ctrl.data[1] |= htonl(ds & 0x3f); -skip_segs: - elts_head = elts_head_next; -#ifdef MLX5_PMD_SOFT_COUNTERS - /* Increment sent bytes counter. */ - txq->stats.obytes += length; -#endif - ++i; - } while (pkts_n); - /* Take a shortcut if nothing must be sent. */ - if (unlikely(i == 0)) - return 0; - /* Check whether completion threshold has been reached. */ - comp = txq->elts_comp + i + j; - if (comp >= MLX5_TX_COMP_THRESH) { - /* Request completion on last WQE. */ - wqe->inl.ctrl.data[2] = htonl(8); - /* Save elts_head in unused "immediate" field of WQE. */ - wqe->inl.ctrl.data[3] = elts_head; - txq->elts_comp = 0; - } else { - txq->elts_comp = comp; - } -#ifdef MLX5_PMD_SOFT_COUNTERS - /* Increment sent packets counter. */ - txq->stats.opackets += i; -#endif - /* Ring QP doorbell. */ - mlx5_tx_dbrec(txq); - txq->elts_head = elts_head; - return i; -} - -/** * Open a MPW session. * * @param txq @@ -1117,7 +884,7 @@ mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, unsigned int j = 0; unsigned int max; unsigned int comp; - unsigned int inline_room = txq->max_inline; + unsigned int inline_room = txq->max_inline * RTE_CACHE_LINE_SIZE; struct mlx5_mpw mpw = { .state = MLX5_MPW_STATE_CLOSED, }; @@ -1171,7 +938,8 @@ mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, (length > inline_room) || (mpw.wqe->mpw_inl.eseg.cs_flags != cs_flags)) { mlx5_mpw_inline_close(txq, &mpw); - inline_room = txq->max_inline; + inline_room = + txq->max_inline * RTE_CACHE_LINE_SIZE; } } if (mpw.state == MLX5_MPW_STATE_CLOSED) { @@ -1187,7 +955,8 @@ mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, /* Multi-segment packets must be alone in their MPW. */ assert((segs_n == 1) || (mpw.pkts_n == 0)); if (mpw.state == MLX5_MPW_STATE_OPENED) { - assert(inline_room == txq->max_inline); + assert(inline_room == + txq->max_inline * RTE_CACHE_LINE_SIZE); #if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) length = 0; #endif @@ -1252,7 +1021,8 @@ mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, ++j; if (mpw.pkts_n == MLX5_MPW_DSEG_MAX) { mlx5_mpw_inline_close(txq, &mpw); - inline_room = txq->max_inline; + inline_room = + txq->max_inline * RTE_CACHE_LINE_SIZE; } else { inline_room -= length; } diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index c8a93c0..8c568ad 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -249,7 +249,7 @@ struct txq { uint16_t wqe_n; /* Number of WQ elements. */ uint16_t bf_offset; /* Blueflame offset. */ uint16_t bf_buf_size; /* Blueflame size. */ - uint16_t max_inline; /* Maximum size to inline in a WQE. */ + uint16_t max_inline; /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ uint32_t qp_num_8s; /* QP number shifted by 8. */ volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ volatile union mlx5_wqe (*wqes)[]; /* Work queue. */ @@ -314,7 +314,6 @@ uint16_t mlx5_tx_burst_secondary_setup(void *, struct rte_mbuf **, uint16_t); /* mlx5_rxtx.c */ uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t); -uint16_t mlx5_tx_burst_inline(void *, struct rte_mbuf **, uint16_t); uint16_t mlx5_tx_burst_mpw(void *, struct rte_mbuf **, uint16_t); uint16_t mlx5_tx_burst_mpw_inline(void *, struct rte_mbuf **, uint16_t); uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t); diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index 6fe61c4..5ddd2fb 100644 --- a/drivers/net/mlx5/mlx5_txq.c +++ b/drivers/net/mlx5/mlx5_txq.c @@ -338,9 +338,12 @@ txq_ctrl_setup(struct rte_eth_dev *dev, struct txq_ctrl *txq_ctrl, .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD | IBV_EXP_QP_INIT_ATTR_RES_DOMAIN), }; - if (priv->txq_inline && priv->txqs_n >= priv->txqs_inline) { - tmpl.txq.max_inline = priv->txq_inline; - attr.init.cap.max_inline_data = tmpl.txq.max_inline; + if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) { + tmpl.txq.max_inline = + ((priv->txq_inline + (RTE_CACHE_LINE_SIZE - 1)) / + RTE_CACHE_LINE_SIZE); + attr.init.cap.max_inline_data = + tmpl.txq.max_inline * RTE_CACHE_LINE_SIZE; } tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); if (tmpl.qp == NULL) {