From patchwork Thu Nov 24 16:03:31 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?N=C3=A9lio_Laranjeiro?= X-Patchwork-Id: 17249 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 327A9567A; Thu, 24 Nov 2016 17:04:15 +0100 (CET) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by dpdk.org (Postfix) with ESMTP id 8C43E2BEF for ; Thu, 24 Nov 2016 17:04:02 +0100 (CET) Received: by mail-wm0-f53.google.com with SMTP id t79so66078002wmt.0 for ; Thu, 24 Nov 2016 08:04:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references; bh=ia4umRQTxh4DCTLBU89AdGqvbzCpTk6vAglhyIY2T+o=; b=rcS8gA/kscwgFRkIoy36VNfrcos+2eXdegZ2fadYDGa5yFvfokCb+n1Sf/iYFATmYP DhhfVavFZhLjnWBha6hZECS0F0cCB+IMUCLx/XZQs26gSutPdk03YCsg8lALwHUHoPQw yN/BIu6WoSL5anywkpK6d5jJTNMSYlwENItvxIRprplLC6MrqCYrci1c6X0UO0zrj+RJ arxY3zdNUxGyTIvYSusgZ3Jjyd61BcTJzjgte+FzaDORHZh5mT6vEURXQP9x3FdlXexm 5svX3iH95/MPJoznrawdpzKASNjz0avLw9BaJwk0UU6/GIFMyzgGZKvuMW6wNtaGNFGy EAFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=ia4umRQTxh4DCTLBU89AdGqvbzCpTk6vAglhyIY2T+o=; b=VbRGVTZwEW4wLeVEtoLdj5trjjw6px8QJFRO6gC3uTjtH5wUY5QjE600foA8ETzomV ZmJsxTGH1USrCK2KPxHmBYJAy18KsH2H5U0rs2Ybt8z4gl58mhB6PrKY3JC7vUu4hJ8J HmA5IrRTYZqKrvmTC1XujH/ezL2sfC2OO3i1U28dxUJIb0qthO8lRpuCXvS8BMLRhpK/ TYV9hCvLIeOOaKQuwBXrVk//JYrUs/rb/wLDWxkRKkLP9XyMiS0J+vjzEgs94WVUANHj I8+YIcGfpBFD5IFQ7hzs9IQNDTALhu9q+WQYkepVWz2AFTU9S84O7H8pKof7OlpmeU5O evCw== X-Gm-Message-State: AKaTC02A3ML02fZYPiK4L5DrQ272+p5I8QPmqgVcDhQAiuqVMVwUZggSU0Gj/yPXU9dWXdfP X-Received: by 10.28.199.71 with SMTP id x68mr2923872wmf.34.1480003441834; Thu, 24 Nov 2016 08:04:01 -0800 (PST) Received: from ping.vm.6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id vr9sm42495142wjc.35.2016.11.24.08.04.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 24 Nov 2016 08:04:01 -0800 (PST) From: Nelio Laranjeiro To: dev@dpdk.org Cc: Thomas Monjalon , Adrien Mazarguil Date: Thu, 24 Nov 2016 17:03:31 +0100 Message-Id: X-Mailer: git-send-email 2.1.4 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH 2/7] net/mlx5: use work queue buffer as a raw buffer X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Define a single work queue element type that encompasses them all. It includes control, Ethernet segment and raw data all grouped in a single place. Signed-off-by: Nelio Laranjeiro Acked-by: Adrien Mazarguil --- drivers/net/mlx5/mlx5_prm.h | 13 ++++-- drivers/net/mlx5/mlx5_rxtx.c | 103 ++++++++++++++++++++++--------------------- drivers/net/mlx5/mlx5_rxtx.h | 2 +- drivers/net/mlx5/mlx5_txq.c | 8 ++-- 4 files changed, 68 insertions(+), 58 deletions(-) diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h index 7f31a2f..3dd4cbe 100644 --- a/drivers/net/mlx5/mlx5_prm.h +++ b/drivers/net/mlx5/mlx5_prm.h @@ -114,12 +114,19 @@ struct mlx5_wqe_eth_seg_small { uint32_t rsvd2; uint16_t inline_hdr_sz; uint8_t inline_hdr[2]; -}; +} __rte_aligned(MLX5_WQE_DWORD_SIZE); struct mlx5_wqe_inl_small { uint32_t byte_cnt; uint8_t raw; -}; +} __rte_aligned(MLX5_WQE_DWORD_SIZE); + +struct mlx5_wqe_ctrl { + uint32_t ctrl0; + uint32_t ctrl1; + uint32_t ctrl2; + uint32_t ctrl3; +} __rte_aligned(MLX5_WQE_DWORD_SIZE); /* Small common part of the WQE. */ struct mlx5_wqe { @@ -131,7 +138,7 @@ struct mlx5_wqe { struct mlx5_wqe64 { struct mlx5_wqe hdr; uint8_t raw[32]; -} __rte_aligned(64); +} __rte_aligned(MLX5_WQE_SIZE); /* MPW session status. */ enum mlx5_mpw_state { diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 5dacd93..ada8e74 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -154,6 +154,24 @@ check_cqe(volatile struct mlx5_cqe *cqe, return 0; } +/** + * Return the address of the WQE. + * + * @param txq + * Pointer to TX queue structure. + * @param wqe_ci + * WQE consumer index. + * + * @return + * WQE address. + */ +static inline uintptr_t * +tx_mlx5_wqe(struct txq *txq, uint16_t ci) +{ + ci &= ((1 << txq->wqe_n) - 1); + return (uintptr_t *)((uintptr_t)txq->wqes + ci * MLX5_WQE_SIZE); +} + static inline void txq_complete(struct txq *txq) __attribute__((always_inline)); @@ -175,7 +193,7 @@ txq_complete(struct txq *txq) uint16_t elts_tail; uint16_t cq_ci = txq->cq_ci; volatile struct mlx5_cqe *cqe = NULL; - volatile struct mlx5_wqe *wqe; + volatile struct mlx5_wqe_ctrl *ctrl; do { volatile struct mlx5_cqe *tmp; @@ -201,9 +219,9 @@ txq_complete(struct txq *txq) } while (1); if (unlikely(cqe == NULL)) return; - wqe = &(*txq->wqes)[ntohs(cqe->wqe_counter) & - ((1 << txq->wqe_n) - 1)].hdr; - elts_tail = wqe->ctrl[3]; + ctrl = (volatile struct mlx5_wqe_ctrl *) + tx_mlx5_wqe(txq, ntohs(cqe->wqe_counter)); + elts_tail = ctrl->ctrl3; assert(elts_tail < (1 << txq->wqe_n)); /* Free buffers. */ while (elts_free != elts_tail) { @@ -331,23 +349,6 @@ tx_prefetch_cqe(struct txq *txq, uint16_t ci) } /** - * Prefetch a WQE. - * - * @param txq - * Pointer to TX queue structure. - * @param wqe_ci - * WQE consumer index. - */ -static inline void -tx_prefetch_wqe(struct txq *txq, uint16_t ci) -{ - volatile struct mlx5_wqe64 *wqe; - - wqe = &(*txq->wqes)[ci & ((1 << txq->wqe_n) - 1)]; - rte_prefetch0(wqe); -} - -/** * DPDK callback for TX. * * @param dpdk_txq @@ -411,9 +412,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) --segs_n; if (!segs_n) --pkts_n; - wqe = &(*txq->wqes)[txq->wqe_ci & - ((1 << txq->wqe_n) - 1)].hdr; - tx_prefetch_wqe(txq, txq->wqe_ci + 1); + wqe = (volatile struct mlx5_wqe *) + tx_mlx5_wqe(txq, txq->wqe_ci); + rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); if (pkts_n > 1) rte_prefetch0(*pkts); addr = rte_pktmbuf_mtod(buf, uintptr_t); @@ -464,8 +465,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) } /* Inline if enough room. */ if (txq->max_inline != 0) { - uintptr_t end = - (uintptr_t)&(*txq->wqes)[1 << txq->wqe_n]; + uintptr_t end = (uintptr_t) + (((uintptr_t)txq->wqes) + + (1 << txq->wqe_n) * MLX5_WQE_SIZE); uint16_t max_inline = txq->max_inline * RTE_CACHE_LINE_SIZE; uint16_t room; @@ -496,12 +498,13 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) */ ds = 2 + MLX5_WQE_DS(pkt_inline_sz - 2); if (length > 0) { - dseg = (struct mlx5_wqe_data_seg *) + dseg = (volatile struct mlx5_wqe_data_seg *) ((uintptr_t)wqe + (ds * MLX5_WQE_DWORD_SIZE)); if ((uintptr_t)dseg >= end) - dseg = (struct mlx5_wqe_data_seg *) - ((uintptr_t)&(*txq->wqes)[0]); + dseg = (volatile struct + mlx5_wqe_data_seg *) + txq->wqes; goto use_dseg; } else if (!segs_n) { goto next_pkt; @@ -514,12 +517,12 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) * Ethernet Header as been stored. */ wqe->eseg.inline_hdr_sz = htons(MLX5_WQE_DWORD_SIZE); - dseg = (struct mlx5_wqe_data_seg *) + dseg = (volatile struct mlx5_wqe_data_seg *) ((uintptr_t)wqe + (3 * MLX5_WQE_DWORD_SIZE)); ds = 3; use_dseg: /* Add the remaining packet as a simple ds. */ - *dseg = (struct mlx5_wqe_data_seg) { + *dseg = (volatile struct mlx5_wqe_data_seg) { .addr = htonll(addr), .byte_count = htonl(length), .lkey = txq_mp2mr(txq, txq_mb2mp(buf)), @@ -542,9 +545,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) unsigned int n = (txq->wqe_ci + ((ds + 3) / 4)) & ((1 << txq->wqe_n) - 1); - dseg = (struct mlx5_wqe_data_seg *) - ((uintptr_t)&(*txq->wqes)[n]); - tx_prefetch_wqe(txq, n + 1); + dseg = (volatile struct mlx5_wqe_data_seg *) + tx_mlx5_wqe(txq, n); + rte_prefetch0(tx_mlx5_wqe(txq, n + 1)); } else { ++dseg; } @@ -556,7 +559,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) total_length += length; #endif /* Store segment information. */ - *dseg = (struct mlx5_wqe_data_seg) { + *dseg = (volatile struct mlx5_wqe_data_seg) { .addr = htonll(rte_pktmbuf_mtod(buf, uintptr_t)), .byte_count = htonl(length), .lkey = txq_mp2mr(txq, txq_mb2mp(buf)), @@ -629,13 +632,13 @@ mlx5_mpw_new(struct txq *txq, struct mlx5_mpw *mpw, uint32_t length) uint16_t idx = txq->wqe_ci & ((1 << txq->wqe_n) - 1); volatile struct mlx5_wqe_data_seg (*dseg)[MLX5_MPW_DSEG_MAX] = (volatile struct mlx5_wqe_data_seg (*)[]) - (uintptr_t)&(*txq->wqes)[(idx + 1) & ((1 << txq->wqe_n) - 1)]; + tx_mlx5_wqe(txq, idx + 1); mpw->state = MLX5_MPW_STATE_OPENED; mpw->pkts_n = 0; mpw->len = length; mpw->total_len = 0; - mpw->wqe = (volatile struct mlx5_wqe *)&(*txq->wqes)[idx].hdr; + mpw->wqe = (volatile struct mlx5_wqe *)tx_mlx5_wqe(txq, idx); mpw->wqe->eseg.mss = htons(length); mpw->wqe->eseg.inline_hdr_sz = 0; mpw->wqe->eseg.rsvd0 = 0; @@ -677,8 +680,8 @@ mlx5_mpw_close(struct txq *txq, struct mlx5_mpw *mpw) ++txq->wqe_ci; else txq->wqe_ci += 2; - tx_prefetch_wqe(txq, txq->wqe_ci); - tx_prefetch_wqe(txq, txq->wqe_ci + 1); + rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci)); + rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); } /** @@ -712,8 +715,8 @@ mlx5_tx_burst_mpw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) return 0; /* Prefetch first packet cacheline. */ tx_prefetch_cqe(txq, txq->cq_ci); - tx_prefetch_wqe(txq, txq->wqe_ci); - tx_prefetch_wqe(txq, txq->wqe_ci + 1); + rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci)); + rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); /* Start processing. */ txq_complete(txq); max = (elts_n - (elts_head - txq->elts_tail)); @@ -841,7 +844,7 @@ mlx5_mpw_inline_new(struct txq *txq, struct mlx5_mpw *mpw, uint32_t length) mpw->pkts_n = 0; mpw->len = length; mpw->total_len = 0; - mpw->wqe = (volatile struct mlx5_wqe *)&(*txq->wqes)[idx].hdr; + mpw->wqe = (volatile struct mlx5_wqe *)tx_mlx5_wqe(txq, idx); mpw->wqe->ctrl[0] = htonl((MLX5_OPC_MOD_MPW << 24) | (txq->wqe_ci << 8) | MLX5_OPCODE_TSO); @@ -917,8 +920,8 @@ mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, return 0; /* Prefetch first packet cacheline. */ tx_prefetch_cqe(txq, txq->cq_ci); - tx_prefetch_wqe(txq, txq->wqe_ci); - tx_prefetch_wqe(txq, txq->wqe_ci + 1); + rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci)); + rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); /* Start processing. */ txq_complete(txq); max = (elts_n - (elts_head - txq->elts_tail)); @@ -1019,14 +1022,15 @@ mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, addr = rte_pktmbuf_mtod(buf, uintptr_t); (*txq->elts)[elts_head] = buf; /* Maximum number of bytes before wrapping. */ - max = ((uintptr_t)&(*txq->wqes)[1 << txq->wqe_n] - + max = ((((uintptr_t)(txq->wqes)) + + (1 << txq->wqe_n) * + MLX5_WQE_SIZE) - (uintptr_t)mpw.data.raw); if (length > max) { rte_memcpy((void *)(uintptr_t)mpw.data.raw, (void *)addr, max); - mpw.data.raw = - (volatile void *)&(*txq->wqes)[0]; + mpw.data.raw = (volatile void *)txq->wqes; rte_memcpy((void *)(uintptr_t)mpw.data.raw, (void *)(addr + max), length - max); @@ -1038,9 +1042,8 @@ mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, mpw.data.raw += length; } if ((uintptr_t)mpw.data.raw == - (uintptr_t)&(*txq->wqes)[1 << txq->wqe_n]) - mpw.data.raw = - (volatile void *)&(*txq->wqes)[0]; + (uintptr_t)tx_mlx5_wqe(txq, 1 << txq->wqe_n)) + mpw.data.raw = (volatile void *)txq->wqes; ++mpw.pkts_n; ++j; if (mpw.pkts_n == MLX5_MPW_DSEG_MAX) { diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index 8f2cddb..b9b90a7 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -259,7 +259,7 @@ struct txq { uint16_t max_inline; /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ uint32_t qp_num_8s; /* QP number shifted by 8. */ volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ - volatile struct mlx5_wqe64 (*wqes)[]; /* Work queue. */ + volatile void *wqes; /* Work queue (use volatile to write into). */ volatile uint32_t *qp_db; /* Work queue doorbell. */ volatile uint32_t *cq_db; /* Completion queue doorbell. */ volatile void *bf_reg; /* Blueflame register. */ diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index 053665d..f4c6682 100644 --- a/drivers/net/mlx5/mlx5_txq.c +++ b/drivers/net/mlx5/mlx5_txq.c @@ -82,7 +82,9 @@ txq_alloc_elts(struct txq_ctrl *txq_ctrl, unsigned int elts_n) for (i = 0; (i != elts_n); ++i) (*txq_ctrl->txq.elts)[i] = NULL; for (i = 0; (i != (1u << txq_ctrl->txq.wqe_n)); ++i) { - volatile struct mlx5_wqe64 *wqe = &(*txq_ctrl->txq.wqes)[i]; + volatile struct mlx5_wqe64 *wqe = + (volatile struct mlx5_wqe64 *) + txq_ctrl->txq.wqes + i; memset((void *)(uintptr_t)wqe, 0x0, sizeof(*wqe)); } @@ -214,9 +216,7 @@ txq_setup(struct txq_ctrl *tmpl, struct txq_ctrl *txq_ctrl) } tmpl->txq.cqe_n = log2above(ibcq->cqe); tmpl->txq.qp_num_8s = qp->ctrl_seg.qp_num << 8; - tmpl->txq.wqes = - (volatile struct mlx5_wqe64 (*)[]) - (uintptr_t)qp->gen_data.sqstart; + tmpl->txq.wqes = qp->gen_data.sqstart; tmpl->txq.wqe_n = log2above(qp->sq.wqe_cnt); tmpl->txq.qp_db = &qp->gen_data.db[MLX5_SND_DBR]; tmpl->txq.bf_reg = qp->gen_data.bf->reg;