From patchwork Wed Mar 31 07:37:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suanming Mou X-Patchwork-Id: 90185 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2DD69A034F; Wed, 31 Mar 2021 09:38:36 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 23EDE140E09; Wed, 31 Mar 2021 09:38:00 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by mails.dpdk.org (Postfix) with ESMTP id 8EDBD140DFB for ; Wed, 31 Mar 2021 09:37:55 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE1 (envelope-from suanmingm@nvidia.com) with SMTP; 31 Mar 2021 10:37:54 +0300 Received: from nvidia.com (mtbc-r640-03.mtbc.labs.mlnx [10.75.70.8]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 12V7boeF002108; Wed, 31 Mar 2021 10:37:53 +0300 From: Suanming Mou To: orika@nvidia.com Cc: dev@dpdk.org, viacheslavo@nvidia.com, matan@nvidia.com, rasland@nvidia.com Date: Wed, 31 Mar 2021 10:37:46 +0300 Message-Id: <20210331073749.1382377-2-suanmingm@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210331073749.1382377-1-suanmingm@nvidia.com> References: <20210309235732.3952418-1-suanmingm@nvidia.com> <20210331073749.1382377-1-suanmingm@nvidia.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v4 1/4] common/mlx5: add user memory registration bits X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This commit adds the UMR capability bits. Signed-off-by: Suanming Mou Acked-by: Ori Kam --- drivers/common/mlx5/linux/meson.build | 2 ++ drivers/common/mlx5/mlx5_devx_cmds.c | 5 +++++ drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++ 3 files changed, 10 insertions(+) diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build index 220de35420..5d6a861689 100644 --- a/drivers/common/mlx5/linux/meson.build +++ b/drivers/common/mlx5/linux/meson.build @@ -186,6 +186,8 @@ has_sym_args = [ 'mlx5dv_dr_action_create_aso' ], [ 'HAVE_INFINIBAND_VERBS_H', 'infiniband/verbs.h', 'INFINIBAND_VERBS_H' ], + [ 'HAVE_MLX5_UMR_IMKEY', 'infiniband/mlx5dv.h', + 'MLX5_WQE_UMR_CTRL_FLAG_INLINE' ], ] config = configuration_data() foreach arg:has_sym_args diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c index c90e020643..268bcd0d99 100644 --- a/drivers/common/mlx5/mlx5_devx_cmds.c +++ b/drivers/common/mlx5/mlx5_devx_cmds.c @@ -266,6 +266,7 @@ mlx5_devx_cmd_mkey_create(void *ctx, MLX5_SET(mkc, mkc, qpn, 0xffffff); MLX5_SET(mkc, mkc, pd, attr->pd); MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF); + MLX5_SET(mkc, mkc, umr_en, attr->umr_en); MLX5_SET(mkc, mkc, translations_octword_size, translation_size); MLX5_SET(mkc, mkc, relaxed_ordering_write, attr->relaxed_ordering_write); @@ -752,6 +753,10 @@ mlx5_devx_cmd_query_hca_attr(void *ctx, mini_cqe_resp_flow_tag); attr->mini_cqe_resp_l3_l4_tag = MLX5_GET(cmd_hca_cap, hcattr, mini_cqe_resp_l3_l4_tag); + attr->umr_indirect_mkey_disabled = + MLX5_GET(cmd_hca_cap, hcattr, umr_indirect_mkey_disabled); + attr->umr_modify_entity_size_disabled = + MLX5_GET(cmd_hca_cap, hcattr, umr_modify_entity_size_disabled); if (attr->qos.sup) { MLX5_SET(query_hca_cap_in, in, op_mod, MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP | diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h index 2826c0b2c6..67b5f771c6 100644 --- a/drivers/common/mlx5/mlx5_devx_cmds.h +++ b/drivers/common/mlx5/mlx5_devx_cmds.h @@ -31,6 +31,7 @@ struct mlx5_devx_mkey_attr { uint32_t pg_access:1; uint32_t relaxed_ordering_write:1; uint32_t relaxed_ordering_read:1; + uint32_t umr_en:1; struct mlx5_klm *klm_array; int klm_num; }; @@ -151,6 +152,8 @@ struct mlx5_hca_attr { uint32_t log_max_mmo_dma:5; uint32_t log_max_mmo_compress:5; uint32_t log_max_mmo_decompress:5; + uint32_t umr_modify_entity_size_disabled:1; + uint32_t umr_indirect_mkey_disabled:1; }; struct mlx5_devx_wq_attr { From patchwork Wed Mar 31 07:37:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suanming Mou X-Patchwork-Id: 90187 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4E9BFA034F; Wed, 31 Mar 2021 09:38:49 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 651DF140E25; Wed, 31 Mar 2021 09:38:05 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by mails.dpdk.org (Postfix) with ESMTP id 71096140E0D for ; Wed, 31 Mar 2021 09:38:00 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE1 (envelope-from suanmingm@nvidia.com) with SMTP; 31 Mar 2021 10:37:55 +0300 Received: from nvidia.com (mtbc-r640-03.mtbc.labs.mlnx [10.75.70.8]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 12V7boeG002108; Wed, 31 Mar 2021 10:37:54 +0300 From: Suanming Mou To: orika@nvidia.com Cc: dev@dpdk.org, viacheslavo@nvidia.com, matan@nvidia.com, rasland@nvidia.com Date: Wed, 31 Mar 2021 10:37:47 +0300 Message-Id: <20210331073749.1382377-3-suanmingm@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210331073749.1382377-1-suanmingm@nvidia.com> References: <20210309235732.3952418-1-suanmingm@nvidia.com> <20210331073749.1382377-1-suanmingm@nvidia.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v4 2/4] regex/mlx5: add data path scattered mbuf process X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" UMR(User-Mode Memory Registration) WQE can present data buffers scattered within multiple mbufs with single indirect mkey. Take advantage of the UMR WQE, scattered mbuf in one operation can be presented to an indirect mkey. The RegEx which only accepts one mkey can now process the whole scattered mbuf in one operation. The maximum scattered mbuf can be supported in one UMR WQE is now defined as 64. The mbufs from multiple operations can be combined into one UMR WQE as well if there is enough space in the KLM array, since the operations can address their own mbuf's content by the mkey's address and length. However, one operation's scattered mbuf's can't be placed in two different UMR WQE's KLM array, if the UMR WQE's KLM does not has enough free space for one operation, the extra UMR WQE will be engaged. In case the UMR WQE's indirect mkey will be over wrapped by the SQ's WQE move, the mkey's index used by the UMR WQE should be the index of last the RegEX WQE in the operations. As one operation consumes one WQE set, build the RegEx WQE by reverse helps address the mkey more efficiently. Once the operations in one burst consumes multiple mkeys, when the mkey KLM array is full, the reverse WQE set index will always be the last of the new mkey's for the new UMR WQE. In GGA mode, the SQ WQE's memory layout becomes UMR/NOP and RegEx WQE by interleave. The UMR and RegEx WQE can be called as WQE set. The SQ's pi and ci will also be increased as WQE set not as WQE. For operations don't have scattered mbuf, uses the mbuf's mkey directly, the WQE set combination is NOP + RegEx. For operations have scattered mubf but share the UMR WQE with others, the WQE set combination is NOP + RegEx. For operations complete the UMR WQE, the WQE set combination is UMR + RegEx. Signed-off-by: Suanming Mou Acked-by: Ori Kam --- doc/guides/regexdevs/mlx5.rst | 5 + doc/guides/rel_notes/release_21_05.rst | 4 + drivers/regex/mlx5/mlx5_regex.c | 9 + drivers/regex/mlx5/mlx5_regex.h | 26 +- drivers/regex/mlx5/mlx5_regex_control.c | 43 ++- drivers/regex/mlx5/mlx5_regex_fastpath.c | 378 +++++++++++++++++++++-- 6 files changed, 407 insertions(+), 58 deletions(-) diff --git a/doc/guides/regexdevs/mlx5.rst b/doc/guides/regexdevs/mlx5.rst index faaa6ac11d..45a0b96980 100644 --- a/doc/guides/regexdevs/mlx5.rst +++ b/doc/guides/regexdevs/mlx5.rst @@ -35,6 +35,11 @@ be specified as device parameter. The RegEx device can be probed and used with other Mellanox devices, by adding more options in the class. For example: ``class=net:regex`` will probe both the net PMD and the RegEx PMD. +Features +-------- + +- Multi segments mbuf support. + Supported NICs -------------- diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst index 3c76148b11..c3d6b8e8ae 100644 --- a/doc/guides/rel_notes/release_21_05.rst +++ b/doc/guides/rel_notes/release_21_05.rst @@ -119,6 +119,10 @@ New Features * Added command to display Rx queue used descriptor count. ``show port (port_id) rxq (queue_id) desc used count`` +* **Updated Mellanox RegEx PMD.** + + * Added support for multi segments mbuf. + Removed Items ------------- diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c index ac5b205fa9..82c485e50c 100644 --- a/drivers/regex/mlx5/mlx5_regex.c +++ b/drivers/regex/mlx5/mlx5_regex.c @@ -199,6 +199,13 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, } priv->regexdev->dev_ops = &mlx5_regexdev_ops; priv->regexdev->enqueue = mlx5_regexdev_enqueue; +#ifdef HAVE_MLX5_UMR_IMKEY + if (!attr.umr_indirect_mkey_disabled && + !attr.umr_modify_entity_size_disabled) + priv->has_umr = 1; + if (priv->has_umr) + priv->regexdev->enqueue = mlx5_regexdev_enqueue_gga; +#endif priv->regexdev->dequeue = mlx5_regexdev_dequeue; priv->regexdev->device = (struct rte_device *)pci_dev; priv->regexdev->data->dev_private = priv; @@ -213,6 +220,8 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, rte_errno = ENOMEM; goto error; } + DRV_LOG(INFO, "RegEx GGA is %s.", + priv->has_umr ? "supported" : "unsupported"); return 0; error: diff --git a/drivers/regex/mlx5/mlx5_regex.h b/drivers/regex/mlx5/mlx5_regex.h index a2b3f0d9f3..51a2101e53 100644 --- a/drivers/regex/mlx5/mlx5_regex.h +++ b/drivers/regex/mlx5/mlx5_regex.h @@ -15,6 +15,7 @@ #include #include "mlx5_rxp.h" +#include "mlx5_regex_utils.h" struct mlx5_regex_sq { uint16_t log_nb_desc; /* Log 2 number of desc for this object. */ @@ -40,6 +41,7 @@ struct mlx5_regex_qp { struct mlx5_regex_job *jobs; struct ibv_mr *metadata; struct ibv_mr *outputs; + struct ibv_mr *imkey_addr; /* Indirect mkey array region. */ size_t ci, pi; struct mlx5_mr_ctrl mr_ctrl; }; @@ -71,8 +73,29 @@ struct mlx5_regex_priv { struct mlx5_mr_share_cache mr_scache; /* Global shared MR cache. */ uint8_t is_bf2; /* The device is BF2 device. */ uint8_t sq_ts_format; /* Whether SQ supports timestamp formats. */ + uint8_t has_umr; /* The device supports UMR. */ }; +#ifdef HAVE_IBV_FLOW_DV_SUPPORT +static inline int +regex_get_pdn(void *pd, uint32_t *pdn) +{ + struct mlx5dv_obj obj; + struct mlx5dv_pd pd_info; + int ret = 0; + + obj.pd.in = pd; + obj.pd.out = &pd_info; + ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD); + if (ret) { + DRV_LOG(DEBUG, "Fail to get PD object info"); + return ret; + } + *pdn = pd_info.pdn; + return 0; +} +#endif + /* mlx5_regex.c */ int mlx5_regex_start(struct rte_regexdev *dev); int mlx5_regex_stop(struct rte_regexdev *dev); @@ -108,5 +131,6 @@ uint16_t mlx5_regexdev_enqueue(struct rte_regexdev *dev, uint16_t qp_id, struct rte_regex_ops **ops, uint16_t nb_ops); uint16_t mlx5_regexdev_dequeue(struct rte_regexdev *dev, uint16_t qp_id, struct rte_regex_ops **ops, uint16_t nb_ops); - +uint16_t mlx5_regexdev_enqueue_gga(struct rte_regexdev *dev, uint16_t qp_id, + struct rte_regex_ops **ops, uint16_t nb_ops); #endif /* MLX5_REGEX_H */ diff --git a/drivers/regex/mlx5/mlx5_regex_control.c b/drivers/regex/mlx5/mlx5_regex_control.c index 55fbb419ed..eef0fe579d 100644 --- a/drivers/regex/mlx5/mlx5_regex_control.c +++ b/drivers/regex/mlx5/mlx5_regex_control.c @@ -27,6 +27,9 @@ #define MLX5_REGEX_NUM_WQE_PER_PAGE (4096/64) +#define MLX5_REGEX_WQE_LOG_NUM(has_umr, log_desc) \ + ((has_umr) ? ((log_desc) + 2) : (log_desc)) + /** * Returns the number of qp obj to be created. * @@ -91,26 +94,6 @@ regex_ctrl_create_cq(struct mlx5_regex_priv *priv, struct mlx5_regex_cq *cq) return 0; } -#ifdef HAVE_IBV_FLOW_DV_SUPPORT -static int -regex_get_pdn(void *pd, uint32_t *pdn) -{ - struct mlx5dv_obj obj; - struct mlx5dv_pd pd_info; - int ret = 0; - - obj.pd.in = pd; - obj.pd.out = &pd_info; - ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD); - if (ret) { - DRV_LOG(DEBUG, "Fail to get PD object info"); - return ret; - } - *pdn = pd_info.pdn; - return 0; -} -#endif - /** * Destroy the SQ object. * @@ -168,14 +151,16 @@ regex_ctrl_create_sq(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp, int ret; sq->log_nb_desc = log_nb_desc; + sq->sqn = q_ind; sq->ci = 0; sq->pi = 0; ret = regex_get_pdn(priv->pd, &pd_num); if (ret) return ret; attr.wq_attr.pd = pd_num; - ret = mlx5_devx_sq_create(priv->ctx, &sq->sq_obj, log_nb_desc, &attr, - SOCKET_ID_ANY); + ret = mlx5_devx_sq_create(priv->ctx, &sq->sq_obj, + MLX5_REGEX_WQE_LOG_NUM(priv->has_umr, log_nb_desc), + &attr, SOCKET_ID_ANY); if (ret) { DRV_LOG(ERR, "Can't create SQ object."); rte_errno = ENOMEM; @@ -225,10 +210,18 @@ mlx5_regex_qp_setup(struct rte_regexdev *dev, uint16_t qp_ind, qp = &priv->qps[qp_ind]; qp->flags = cfg->qp_conf_flags; - qp->cq.log_nb_desc = rte_log2_u32(cfg->nb_desc); - qp->nb_desc = 1 << qp->cq.log_nb_desc; + log_desc = rte_log2_u32(cfg->nb_desc); + /* + * UMR mode requires two WQEs(UMR and RegEx WQE) for one descriptor. + * For CQ, expand the CQE number multiple with 2. + * For SQ, the UMR and RegEx WQE for one descriptor consumes 4 WQEBBS, + * expand the WQE number multiple with 4. + */ + qp->cq.log_nb_desc = log_desc + (!!priv->has_umr); + qp->nb_desc = 1 << log_desc; if (qp->flags & RTE_REGEX_QUEUE_PAIR_CFG_OOS_F) - qp->nb_obj = regex_ctrl_get_nb_obj(qp->nb_desc); + qp->nb_obj = regex_ctrl_get_nb_obj + (1 << MLX5_REGEX_WQE_LOG_NUM(priv->has_umr, log_desc)); else qp->nb_obj = 1; qp->sqs = rte_malloc(NULL, diff --git a/drivers/regex/mlx5/mlx5_regex_fastpath.c b/drivers/regex/mlx5/mlx5_regex_fastpath.c index beaea7b63f..4f9402c583 100644 --- a/drivers/regex/mlx5/mlx5_regex_fastpath.c +++ b/drivers/regex/mlx5/mlx5_regex_fastpath.c @@ -32,6 +32,15 @@ #define MLX5_REGEX_WQE_GATHER_OFFSET 32 #define MLX5_REGEX_WQE_SCATTER_OFFSET 48 #define MLX5_REGEX_METADATA_OFF 32 +#define MLX5_REGEX_UMR_WQE_SIZE 192 +/* The maximum KLMs can be added to one UMR indirect mkey. */ +#define MLX5_REGEX_MAX_KLM_NUM 128 +/* The KLM array size for one job. */ +#define MLX5_REGEX_KLMS_SIZE \ + ((MLX5_REGEX_MAX_KLM_NUM) * sizeof(struct mlx5_klm)) +/* In WQE set mode, the pi should be quarter of the MLX5_REGEX_MAX_WQE_INDEX. */ +#define MLX5_REGEX_UMR_SQ_PI_IDX(pi, ops) \ + (((pi) + (ops)) & (MLX5_REGEX_MAX_WQE_INDEX >> 2)) static inline uint32_t sq_size_get(struct mlx5_regex_sq *sq) @@ -49,6 +58,8 @@ struct mlx5_regex_job { uint64_t user_id; volatile uint8_t *output; volatile uint8_t *metadata; + struct mlx5_klm *imkey_array; /* Indirect mkey's KLM array. */ + struct mlx5_devx_obj *imkey; /* UMR WQE's indirect meky. */ } __rte_cached_aligned; static inline void @@ -99,12 +110,13 @@ set_wqe_ctrl_seg(struct mlx5_wqe_ctrl_seg *seg, uint16_t pi, uint8_t opcode, } static inline void -prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp, - struct mlx5_regex_sq *sq, struct rte_regex_ops *op, - struct mlx5_regex_job *job) +__prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_sq *sq, + struct rte_regex_ops *op, struct mlx5_regex_job *job, + size_t pi, struct mlx5_klm *klm) { - size_t wqe_offset = (sq->pi & (sq_size_get(sq) - 1)) * MLX5_SEND_WQE_BB; - uint32_t lkey; + size_t wqe_offset = (pi & (sq_size_get(sq) - 1)) * + (MLX5_SEND_WQE_BB << (priv->has_umr ? 2 : 0)) + + (priv->has_umr ? MLX5_REGEX_UMR_WQE_SIZE : 0); uint16_t group0 = op->req_flags & RTE_REGEX_OPS_REQ_GROUP_ID0_VALID_F ? op->group_id0 : 0; uint16_t group1 = op->req_flags & RTE_REGEX_OPS_REQ_GROUP_ID1_VALID_F ? @@ -122,14 +134,11 @@ prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp, RTE_REGEX_OPS_REQ_GROUP_ID2_VALID_F | RTE_REGEX_OPS_REQ_GROUP_ID3_VALID_F))) group0 = op->group_id0; - lkey = mlx5_mr_addr2mr_bh(priv->pd, 0, - &priv->mr_scache, &qp->mr_ctrl, - rte_pktmbuf_mtod(op->mbuf, uintptr_t), - !!(op->mbuf->ol_flags & EXT_ATTACHED_MBUF)); uint8_t *wqe = (uint8_t *)(uintptr_t)sq->sq_obj.wqes + wqe_offset; int ds = 4; /* ctrl + meta + input + output */ - set_wqe_ctrl_seg((struct mlx5_wqe_ctrl_seg *)wqe, sq->pi, + set_wqe_ctrl_seg((struct mlx5_wqe_ctrl_seg *)wqe, + (priv->has_umr ? (pi * 4 + 3) : pi), MLX5_OPCODE_MMO, MLX5_OPC_MOD_MMO_REGEX, sq->sq_obj.sq->id, 0, ds, 0, 0); set_regex_ctrl_seg(wqe + 12, 0, group0, group1, group2, group3, @@ -137,36 +146,54 @@ prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp, struct mlx5_wqe_data_seg *input_seg = (struct mlx5_wqe_data_seg *)(wqe + MLX5_REGEX_WQE_GATHER_OFFSET); - input_seg->byte_count = - rte_cpu_to_be_32(rte_pktmbuf_data_len(op->mbuf)); - input_seg->addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(op->mbuf, - uintptr_t)); - input_seg->lkey = lkey; + input_seg->byte_count = rte_cpu_to_be_32(klm->byte_count); + input_seg->addr = rte_cpu_to_be_64(klm->address); + input_seg->lkey = klm->mkey; job->user_id = op->user_id; +} + +static inline void +prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp, + struct mlx5_regex_sq *sq, struct rte_regex_ops *op, + struct mlx5_regex_job *job) +{ + struct mlx5_klm klm; + + klm.byte_count = rte_pktmbuf_data_len(op->mbuf); + klm.mkey = mlx5_mr_addr2mr_bh(priv->pd, 0, + &priv->mr_scache, &qp->mr_ctrl, + rte_pktmbuf_mtod(op->mbuf, uintptr_t), + !!(op->mbuf->ol_flags & EXT_ATTACHED_MBUF)); + klm.address = rte_pktmbuf_mtod(op->mbuf, uintptr_t); + __prep_one(priv, sq, op, job, sq->pi, &klm); sq->db_pi = sq->pi; sq->pi = (sq->pi + 1) & MLX5_REGEX_MAX_WQE_INDEX; } static inline void -send_doorbell(struct mlx5dv_devx_uar *uar, struct mlx5_regex_sq *sq) +send_doorbell(struct mlx5_regex_priv *priv, struct mlx5_regex_sq *sq) { + struct mlx5dv_devx_uar *uar = priv->uar; size_t wqe_offset = (sq->db_pi & (sq_size_get(sq) - 1)) * - MLX5_SEND_WQE_BB; + (MLX5_SEND_WQE_BB << (priv->has_umr ? 2 : 0)) + + (priv->has_umr ? MLX5_REGEX_UMR_WQE_SIZE : 0); uint8_t *wqe = (uint8_t *)(uintptr_t)sq->sq_obj.wqes + wqe_offset; - ((struct mlx5_wqe_ctrl_seg *)wqe)->fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE; + /* Or the fm_ce_se instead of set, avoid the fence be cleared. */ + ((struct mlx5_wqe_ctrl_seg *)wqe)->fm_ce_se |= MLX5_WQE_CTRL_CQ_UPDATE; uint64_t *doorbell_addr = (uint64_t *)((uint8_t *)uar->base_addr + 0x800); rte_io_wmb(); - sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32((sq->db_pi + 1) & - MLX5_REGEX_MAX_WQE_INDEX); + sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32((priv->has_umr ? + (sq->db_pi * 4 + 3) : sq->db_pi) & + MLX5_REGEX_MAX_WQE_INDEX); rte_wmb(); *doorbell_addr = *(volatile uint64_t *)wqe; rte_wmb(); } static inline int -can_send(struct mlx5_regex_sq *sq) { - return ((uint16_t)(sq->pi - sq->ci) < sq_size_get(sq)); +get_free(struct mlx5_regex_sq *sq) { + return (sq_size_get(sq) - (uint16_t)(sq->pi - sq->ci)); } static inline uint32_t @@ -174,6 +201,211 @@ job_id_get(uint32_t qid, size_t sq_size, size_t index) { return qid * sq_size + (index & (sq_size - 1)); } +#ifdef HAVE_MLX5_UMR_IMKEY +static inline int +mkey_klm_available(struct mlx5_klm *klm, uint32_t pos, uint32_t new) +{ + return (klm && ((pos + new) <= MLX5_REGEX_MAX_KLM_NUM)); +} + +static inline void +complete_umr_wqe(struct mlx5_regex_qp *qp, struct mlx5_regex_sq *sq, + struct mlx5_regex_job *mkey_job, + size_t umr_index, uint32_t klm_size, uint32_t total_len) +{ + size_t wqe_offset = (umr_index & (sq_size_get(sq) - 1)) * + (MLX5_SEND_WQE_BB * 4); + struct mlx5_wqe_ctrl_seg *wqe = (struct mlx5_wqe_ctrl_seg *)((uint8_t *) + (uintptr_t)sq->sq_obj.wqes + wqe_offset); + struct mlx5_wqe_umr_ctrl_seg *ucseg = + (struct mlx5_wqe_umr_ctrl_seg *)(wqe + 1); + struct mlx5_wqe_mkey_context_seg *mkc = + (struct mlx5_wqe_mkey_context_seg *)(ucseg + 1); + struct mlx5_klm *iklm = (struct mlx5_klm *)(mkc + 1); + uint16_t klm_align = RTE_ALIGN(klm_size, 4); + + memset(wqe, 0, MLX5_REGEX_UMR_WQE_SIZE); + /* Set WQE control seg. Non-inline KLM UMR WQE size must be 9 WQE_DS. */ + set_wqe_ctrl_seg(wqe, (umr_index * 4), MLX5_OPCODE_UMR, + 0, sq->sq_obj.sq->id, 0, 9, 0, + rte_cpu_to_be_32(mkey_job->imkey->id)); + /* Set UMR WQE control seg. */ + ucseg->mkey_mask |= rte_cpu_to_be_64(MLX5_WQE_UMR_CTRL_MKEY_MASK_LEN | + MLX5_WQE_UMR_CTRL_FLAG_TRNSLATION_OFFSET | + MLX5_WQE_UMR_CTRL_MKEY_MASK_ACCESS_LOCAL_WRITE); + ucseg->klm_octowords = rte_cpu_to_be_16(klm_align); + /* Set mkey context seg. */ + mkc->len = rte_cpu_to_be_64(total_len); + mkc->qpn_mkey = rte_cpu_to_be_32(0xffffff00 | + (mkey_job->imkey->id & 0xff)); + /* Set UMR pointer to data seg. */ + iklm->address = rte_cpu_to_be_64 + ((uintptr_t)((char *)mkey_job->imkey_array)); + iklm->mkey = rte_cpu_to_be_32(qp->imkey_addr->lkey); + iklm->byte_count = rte_cpu_to_be_32(klm_align); + /* Clear the padding memory. */ + memset((uint8_t *)&mkey_job->imkey_array[klm_size], 0, + sizeof(struct mlx5_klm) * (klm_align - klm_size)); + + /* Add the following RegEx WQE with fence. */ + wqe = (struct mlx5_wqe_ctrl_seg *) + (((uint8_t *)wqe) + MLX5_REGEX_UMR_WQE_SIZE); + wqe->fm_ce_se |= MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE; +} + +static inline void +prep_nop_regex_wqe_set(struct mlx5_regex_priv *priv, struct mlx5_regex_sq *sq, + struct rte_regex_ops *op, struct mlx5_regex_job *job, + size_t pi, struct mlx5_klm *klm) +{ + size_t wqe_offset = (pi & (sq_size_get(sq) - 1)) * + (MLX5_SEND_WQE_BB << 2); + struct mlx5_wqe_ctrl_seg *wqe = (struct mlx5_wqe_ctrl_seg *)((uint8_t *) + (uintptr_t)sq->sq_obj.wqes + wqe_offset); + + /* Clear the WQE memory used as UMR WQE previously. */ + if ((rte_be_to_cpu_32(wqe->opmod_idx_opcode) & 0xff) != MLX5_OPCODE_NOP) + memset(wqe, 0, MLX5_REGEX_UMR_WQE_SIZE); + /* UMR WQE size is 9 DS, align nop WQE to 3 WQEBBS(12 DS). */ + set_wqe_ctrl_seg(wqe, pi * 4, MLX5_OPCODE_NOP, 0, sq->sq_obj.sq->id, + 0, 12, 0, 0); + __prep_one(priv, sq, op, job, pi, klm); +} + +static inline void +prep_regex_umr_wqe_set(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp, + struct mlx5_regex_sq *sq, struct rte_regex_ops **op, size_t nb_ops) +{ + struct mlx5_regex_job *job = NULL; + size_t sqid = sq->sqn, mkey_job_id = 0; + size_t left_ops = nb_ops; + uint32_t klm_num = 0, len; + struct mlx5_klm *mkey_klm = NULL; + struct mlx5_klm klm; + + sqid = sq->sqn; + while (left_ops--) + rte_prefetch0(op[left_ops]); + left_ops = nb_ops; + /* + * Build the WQE set by reverse. In case the burst may consume + * multiple mkeys, build the WQE set as normal will hard to + * address the last mkey index, since we will only know the last + * RegEx WQE's index when finishes building. + */ + while (left_ops--) { + struct rte_mbuf *mbuf = op[left_ops]->mbuf; + size_t pi = MLX5_REGEX_UMR_SQ_PI_IDX(sq->pi, left_ops); + + if (mbuf->nb_segs > 1) { + size_t scatter_size = 0; + + if (!mkey_klm_available(mkey_klm, klm_num, + mbuf->nb_segs)) { + /* + * The mkey's KLM is full, create the UMR + * WQE in the next WQE set. + */ + if (mkey_klm) + complete_umr_wqe(qp, sq, + &qp->jobs[mkey_job_id], + MLX5_REGEX_UMR_SQ_PI_IDX(pi, 1), + klm_num, len); + /* + * Get the indircet mkey and KLM array index + * from the last WQE set. + */ + mkey_job_id = job_id_get(sqid, + sq_size_get(sq), pi); + mkey_klm = qp->jobs[mkey_job_id].imkey_array; + klm_num = 0; + len = 0; + } + /* Build RegEx WQE's data segment KLM. */ + klm.address = len; + klm.mkey = rte_cpu_to_be_32 + (qp->jobs[mkey_job_id].imkey->id); + while (mbuf) { + /* Build indirect mkey seg's KLM. */ + mkey_klm->mkey = mlx5_mr_addr2mr_bh(priv->pd, + NULL, &priv->mr_scache, &qp->mr_ctrl, + rte_pktmbuf_mtod(mbuf, uintptr_t), + !!(mbuf->ol_flags & EXT_ATTACHED_MBUF)); + mkey_klm->address = rte_cpu_to_be_64 + (rte_pktmbuf_mtod(mbuf, uintptr_t)); + mkey_klm->byte_count = rte_cpu_to_be_32 + (rte_pktmbuf_data_len(mbuf)); + /* + * Save the mbuf's total size for RegEx data + * segment. + */ + scatter_size += rte_pktmbuf_data_len(mbuf); + mkey_klm++; + klm_num++; + mbuf = mbuf->next; + } + len += scatter_size; + klm.byte_count = scatter_size; + } else { + /* The single mubf case. Build the KLM directly. */ + klm.mkey = mlx5_mr_addr2mr_bh(priv->pd, NULL, + &priv->mr_scache, &qp->mr_ctrl, + rte_pktmbuf_mtod(mbuf, uintptr_t), + !!(mbuf->ol_flags & EXT_ATTACHED_MBUF)); + klm.address = rte_pktmbuf_mtod(mbuf, uintptr_t); + klm.byte_count = rte_pktmbuf_data_len(mbuf); + } + job = &qp->jobs[job_id_get(sqid, sq_size_get(sq), pi)]; + /* + * Build the nop + RegEx WQE set by default. The fist nop WQE + * will be updated later as UMR WQE if scattered mubf exist. + */ + prep_nop_regex_wqe_set(priv, sq, op[left_ops], job, pi, &klm); + } + /* + * Scattered mbuf have been added to the KLM array. Complete the build + * of UMR WQE, update the first nop WQE as UMR WQE. + */ + if (mkey_klm) + complete_umr_wqe(qp, sq, &qp->jobs[mkey_job_id], sq->pi, + klm_num, len); + sq->db_pi = MLX5_REGEX_UMR_SQ_PI_IDX(sq->pi, nb_ops - 1); + sq->pi = MLX5_REGEX_UMR_SQ_PI_IDX(sq->pi, nb_ops); +} + +uint16_t +mlx5_regexdev_enqueue_gga(struct rte_regexdev *dev, uint16_t qp_id, + struct rte_regex_ops **ops, uint16_t nb_ops) +{ + struct mlx5_regex_priv *priv = dev->data->dev_private; + struct mlx5_regex_qp *queue = &priv->qps[qp_id]; + struct mlx5_regex_sq *sq; + size_t sqid, nb_left = nb_ops, nb_desc; + + while ((sqid = ffs(queue->free_sqs))) { + sqid--; /* ffs returns 1 for bit 0 */ + sq = &queue->sqs[sqid]; + nb_desc = get_free(sq); + if (nb_desc) { + /* The ops be handled can't exceed nb_ops. */ + if (nb_desc > nb_left) + nb_desc = nb_left; + else + queue->free_sqs &= ~(1 << sqid); + prep_regex_umr_wqe_set(priv, queue, sq, ops, nb_desc); + send_doorbell(priv, sq); + nb_left -= nb_desc; + } + if (!nb_left) + break; + ops += nb_desc; + } + nb_ops -= nb_left; + queue->pi += nb_ops; + return nb_ops; +} +#endif + uint16_t mlx5_regexdev_enqueue(struct rte_regexdev *dev, uint16_t qp_id, struct rte_regex_ops **ops, uint16_t nb_ops) @@ -186,17 +418,17 @@ mlx5_regexdev_enqueue(struct rte_regexdev *dev, uint16_t qp_id, while ((sqid = ffs(queue->free_sqs))) { sqid--; /* ffs returns 1 for bit 0 */ sq = &queue->sqs[sqid]; - while (can_send(sq)) { + while (get_free(sq)) { job_id = job_id_get(sqid, sq_size_get(sq), sq->pi); prep_one(priv, queue, sq, ops[i], &queue->jobs[job_id]); i++; if (unlikely(i == nb_ops)) { - send_doorbell(priv->uar, sq); + send_doorbell(priv, sq); goto out; } } queue->free_sqs &= ~(1 << sqid); - send_doorbell(priv->uar, sq); + send_doorbell(priv, sq); } out: @@ -308,6 +540,10 @@ mlx5_regexdev_dequeue(struct rte_regexdev *dev, uint16_t qp_id, MLX5_REGEX_MAX_WQE_INDEX; size_t sqid = cqe->rsvd3[2]; struct mlx5_regex_sq *sq = &queue->sqs[sqid]; + + /* UMR mode WQE counter move as WQE set(4 WQEBBS).*/ + if (priv->has_umr) + wq_counter >>= 2; while (sq->ci != wq_counter) { if (unlikely(i == nb_ops)) { /* Return without updating cq->ci */ @@ -316,7 +552,9 @@ mlx5_regexdev_dequeue(struct rte_regexdev *dev, uint16_t qp_id, uint32_t job_id = job_id_get(sqid, sq_size_get(sq), sq->ci); extract_result(ops[i], &queue->jobs[job_id]); - sq->ci = (sq->ci + 1) & MLX5_REGEX_MAX_WQE_INDEX; + sq->ci = (sq->ci + 1) & (priv->has_umr ? + (MLX5_REGEX_MAX_WQE_INDEX >> 2) : + MLX5_REGEX_MAX_WQE_INDEX); i++; } cq->ci = (cq->ci + 1) & 0xffffff; @@ -331,7 +569,7 @@ mlx5_regexdev_dequeue(struct rte_regexdev *dev, uint16_t qp_id, } static void -setup_sqs(struct mlx5_regex_qp *queue) +setup_sqs(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *queue) { size_t sqid, entry; uint32_t job_id; @@ -342,6 +580,14 @@ setup_sqs(struct mlx5_regex_qp *queue) job_id = sqid * sq_size_get(sq) + entry; struct mlx5_regex_job *job = &queue->jobs[job_id]; + /* Fill UMR WQE with NOP in advanced. */ + if (priv->has_umr) { + set_wqe_ctrl_seg + ((struct mlx5_wqe_ctrl_seg *)wqe, + entry * 2, MLX5_OPCODE_NOP, 0, + sq->sq_obj.sq->id, 0, 12, 0, 0); + wqe += MLX5_REGEX_UMR_WQE_SIZE; + } set_metadata_seg((struct mlx5_wqe_metadata_seg *) (wqe + MLX5_REGEX_WQE_METADATA_OFFSET), 0, queue->metadata->lkey, @@ -358,8 +604,9 @@ setup_sqs(struct mlx5_regex_qp *queue) } static int -setup_buffers(struct mlx5_regex_qp *qp, struct ibv_pd *pd) +setup_buffers(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp) { + struct ibv_pd *pd = priv->pd; uint32_t i; int err; @@ -395,6 +642,24 @@ setup_buffers(struct mlx5_regex_qp *qp, struct ibv_pd *pd) goto err_output; } + if (priv->has_umr) { + ptr = rte_calloc(__func__, qp->nb_desc, MLX5_REGEX_KLMS_SIZE, + MLX5_REGEX_KLMS_SIZE); + if (!ptr) { + err = -ENOMEM; + goto err_imkey; + } + qp->imkey_addr = mlx5_glue->reg_mr(pd, ptr, + MLX5_REGEX_KLMS_SIZE * qp->nb_desc, + IBV_ACCESS_LOCAL_WRITE); + if (!qp->imkey_addr) { + rte_free(ptr); + DRV_LOG(ERR, "Failed to register output"); + err = -EINVAL; + goto err_imkey; + } + } + /* distribute buffers to jobs */ for (i = 0; i < qp->nb_desc; i++) { qp->jobs[i].output = @@ -403,9 +668,18 @@ setup_buffers(struct mlx5_regex_qp *qp, struct ibv_pd *pd) qp->jobs[i].metadata = (uint8_t *)qp->metadata->addr + (i % qp->nb_desc) * MLX5_REGEX_METADATA_SIZE; + if (qp->imkey_addr) + qp->jobs[i].imkey_array = (struct mlx5_klm *) + qp->imkey_addr->addr + + (i % qp->nb_desc) * MLX5_REGEX_MAX_KLM_NUM; } + return 0; +err_imkey: + ptr = qp->outputs->addr; + rte_free(ptr); + mlx5_glue->dereg_mr(qp->outputs); err_output: ptr = qp->metadata->addr; rte_free(ptr); @@ -417,23 +691,57 @@ int mlx5_regexdev_setup_fastpath(struct mlx5_regex_priv *priv, uint32_t qp_id) { struct mlx5_regex_qp *qp = &priv->qps[qp_id]; - int err; + struct mlx5_klm klm = { 0 }; + struct mlx5_devx_mkey_attr attr = { + .klm_array = &klm, + .klm_num = 1, + .umr_en = 1, + }; + uint32_t i; + int err = 0; qp->jobs = rte_calloc(__func__, qp->nb_desc, sizeof(*qp->jobs), 64); if (!qp->jobs) return -ENOMEM; - err = setup_buffers(qp, priv->pd); + err = setup_buffers(priv, qp); if (err) { rte_free(qp->jobs); return err; } - setup_sqs(qp); - return 0; + + setup_sqs(priv, qp); + + if (priv->has_umr) { +#ifdef HAVE_IBV_FLOW_DV_SUPPORT + if (regex_get_pdn(priv->pd, &attr.pd)) { + err = -rte_errno; + DRV_LOG(ERR, "Failed to get pdn."); + mlx5_regexdev_teardown_fastpath(priv, qp_id); + return err; + } +#endif + for (i = 0; i < qp->nb_desc; i++) { + attr.klm_num = MLX5_REGEX_MAX_KLM_NUM; + attr.klm_array = qp->jobs[i].imkey_array; + qp->jobs[i].imkey = mlx5_devx_cmd_mkey_create(priv->ctx, + &attr); + if (!qp->jobs[i].imkey) { + err = -rte_errno; + DRV_LOG(ERR, "Failed to allocate imkey."); + mlx5_regexdev_teardown_fastpath(priv, qp_id); + } + } + } + return err; } static void free_buffers(struct mlx5_regex_qp *qp) { + if (qp->imkey_addr) { + mlx5_glue->dereg_mr(qp->imkey_addr); + rte_free(qp->imkey_addr->addr); + } if (qp->metadata) { mlx5_glue->dereg_mr(qp->metadata); rte_free(qp->metadata->addr); @@ -448,8 +756,14 @@ void mlx5_regexdev_teardown_fastpath(struct mlx5_regex_priv *priv, uint32_t qp_id) { struct mlx5_regex_qp *qp = &priv->qps[qp_id]; + uint32_t i; if (qp) { + for (i = 0; i < qp->nb_desc; i++) { + if (qp->jobs[i].imkey) + claim_zero(mlx5_devx_cmd_destroy + (qp->jobs[i].imkey)); + } free_buffers(qp); if (qp->jobs) rte_free(qp->jobs); From patchwork Wed Mar 31 07:37:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suanming Mou X-Patchwork-Id: 90186 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 74420A034F; Wed, 31 Mar 2021 09:38:42 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 36496140E10; Wed, 31 Mar 2021 09:38:04 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by mails.dpdk.org (Postfix) with ESMTP id 71AE2140E10 for ; Wed, 31 Mar 2021 09:38:00 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE1 (envelope-from suanmingm@nvidia.com) with SMTP; 31 Mar 2021 10:37:57 +0300 Received: from nvidia.com (mtbc-r640-03.mtbc.labs.mlnx [10.75.70.8]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 12V7boeH002108; Wed, 31 Mar 2021 10:37:56 +0300 From: Suanming Mou To: orika@nvidia.com Cc: dev@dpdk.org, viacheslavo@nvidia.com, matan@nvidia.com, rasland@nvidia.com Date: Wed, 31 Mar 2021 10:37:48 +0300 Message-Id: <20210331073749.1382377-4-suanmingm@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210331073749.1382377-1-suanmingm@nvidia.com> References: <20210309235732.3952418-1-suanmingm@nvidia.com> <20210331073749.1382377-1-suanmingm@nvidia.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v4 3/4] app/test-regex: support scattered mbuf input X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This commits adds the scattered mbuf input support. Signed-off-by: Suanming Mou Acked-by: Ori Kam --- app/test-regex/main.c | 134 +++++++++++++++++++++++++++------ doc/guides/tools/testregex.rst | 3 + 2 files changed, 112 insertions(+), 25 deletions(-) diff --git a/app/test-regex/main.c b/app/test-regex/main.c index aea4fa6b88..82cffaacfa 100644 --- a/app/test-regex/main.c +++ b/app/test-regex/main.c @@ -35,6 +35,7 @@ enum app_args { ARG_NUM_OF_ITERATIONS, ARG_NUM_OF_QPS, ARG_NUM_OF_LCORES, + ARG_NUM_OF_MBUF_SEGS, }; struct job_ctx { @@ -70,6 +71,7 @@ struct regex_conf { char *data_buf; long data_len; long job_len; + uint32_t nb_segs; }; static void @@ -82,14 +84,15 @@ usage(const char *prog_name) " --perf N: only outputs the performance data\n" " --nb_iter N: number of iteration to run\n" " --nb_qps N: number of queues to use\n" - " --nb_lcores N: number of lcores to use\n", + " --nb_lcores N: number of lcores to use\n" + " --nb_segs N: number of mbuf segments\n", prog_name); } static void args_parse(int argc, char **argv, char *rules_file, char *data_file, uint32_t *nb_jobs, bool *perf_mode, uint32_t *nb_iterations, - uint32_t *nb_qps, uint32_t *nb_lcores) + uint32_t *nb_qps, uint32_t *nb_lcores, uint32_t *nb_segs) { char **argvopt; int opt; @@ -111,6 +114,8 @@ args_parse(int argc, char **argv, char *rules_file, char *data_file, { "nb_qps", 1, 0, ARG_NUM_OF_QPS}, /* Number of lcores. */ { "nb_lcores", 1, 0, ARG_NUM_OF_LCORES}, + /* Number of mbuf segments. */ + { "nb_segs", 1, 0, ARG_NUM_OF_MBUF_SEGS}, /* End of options */ { 0, 0, 0, 0 } }; @@ -150,6 +155,9 @@ args_parse(int argc, char **argv, char *rules_file, char *data_file, case ARG_NUM_OF_LCORES: *nb_lcores = atoi(optarg); break; + case ARG_NUM_OF_MBUF_SEGS: + *nb_segs = atoi(optarg); + break; case ARG_HELP: usage("RegEx test app"); break; @@ -302,11 +310,75 @@ extbuf_free_cb(void *addr __rte_unused, void *fcb_opaque __rte_unused) { } +static inline struct rte_mbuf * +regex_create_segmented_mbuf(struct rte_mempool *mbuf_pool, int pkt_len, + int nb_segs, void *buf) { + + struct rte_mbuf *m = NULL, *mbuf = NULL; + uint8_t *dst; + char *src = buf; + int data_len = 0; + int i, size; + int t_len; + + if (pkt_len < 1) { + printf("Packet size must be 1 or more (is %d)\n", pkt_len); + return NULL; + } + + if (nb_segs < 1) { + printf("Number of segments must be 1 or more (is %d)\n", + nb_segs); + return NULL; + } + + t_len = pkt_len >= nb_segs ? (pkt_len / nb_segs + + !!(pkt_len % nb_segs)) : 1; + size = pkt_len; + + /* Create chained mbuf_src and fill it with buf data */ + for (i = 0; size > 0; i++) { + + m = rte_pktmbuf_alloc(mbuf_pool); + if (i == 0) + mbuf = m; + + if (m == NULL) { + printf("Cannot create segment for source mbuf"); + goto fail; + } + + data_len = size > t_len ? t_len : size; + memset(rte_pktmbuf_mtod(m, uint8_t *), 0, + rte_pktmbuf_tailroom(m)); + memcpy(rte_pktmbuf_mtod(m, uint8_t *), src, data_len); + dst = (uint8_t *)rte_pktmbuf_append(m, data_len); + if (dst == NULL) { + printf("Cannot append %d bytes to the mbuf\n", + data_len); + goto fail; + } + + if (mbuf != m) + rte_pktmbuf_chain(mbuf, m); + src += data_len; + size -= data_len; + + } + return mbuf; + +fail: + if (mbuf) + rte_pktmbuf_free(mbuf); + return NULL; +} + static int run_regex(void *args) { struct regex_conf *rgxc = args; uint32_t nb_jobs = rgxc->nb_jobs; + uint32_t nb_segs = rgxc->nb_segs; uint32_t nb_iterations = rgxc->nb_iterations; uint8_t nb_max_matches = rgxc->nb_max_matches; uint32_t nb_qps = rgxc->nb_qps; @@ -338,8 +410,12 @@ run_regex(void *args) snprintf(mbuf_pool, sizeof(mbuf_pool), "mbuf_pool_%2u", qp_id_base); - mbuf_mp = rte_pktmbuf_pool_create(mbuf_pool, nb_jobs * nb_qps, 0, - 0, MBUF_SIZE, rte_socket_id()); + mbuf_mp = rte_pktmbuf_pool_create(mbuf_pool, + rte_align32pow2(nb_jobs * nb_qps * nb_segs), + 0, 0, (nb_segs == 1) ? MBUF_SIZE : + (rte_align32pow2(job_len) / nb_segs + + RTE_PKTMBUF_HEADROOM), + rte_socket_id()); if (mbuf_mp == NULL) { printf("Error, can't create memory pool\n"); return -ENOMEM; @@ -375,8 +451,19 @@ run_regex(void *args) goto end; } + if (clone_buf(data_buf, &buf, data_len)) { + printf("Error, can't clone buf.\n"); + res = -EXIT_FAILURE; + goto end; + } + + /* Assign each mbuf with the data to handle. */ + actual_jobs = 0; + pos = 0; /* Allocate the jobs and assign each job with an mbuf. */ - for (i = 0; i < nb_jobs; i++) { + for (i = 0; (pos < data_len) && (i < nb_jobs) ; i++) { + long act_job_len = RTE_MIN(job_len, data_len - pos); + ops[i] = rte_malloc(NULL, sizeof(*ops[0]) + nb_max_matches * sizeof(struct rte_regexdev_match), 0); @@ -386,30 +473,26 @@ run_regex(void *args) res = -ENOMEM; goto end; } - ops[i]->mbuf = rte_pktmbuf_alloc(mbuf_mp); + if (nb_segs > 1) { + ops[i]->mbuf = regex_create_segmented_mbuf + (mbuf_mp, act_job_len, + nb_segs, &buf[pos]); + } else { + ops[i]->mbuf = rte_pktmbuf_alloc(mbuf_mp); + if (ops[i]->mbuf) { + rte_pktmbuf_attach_extbuf(ops[i]->mbuf, + &buf[pos], 0, act_job_len, &shinfo); + ops[i]->mbuf->data_len = job_len; + ops[i]->mbuf->pkt_len = act_job_len; + } + } if (!ops[i]->mbuf) { - printf("Error, can't attach mbuf.\n"); + printf("Error, can't add mbuf.\n"); res = -ENOMEM; goto end; } - } - if (clone_buf(data_buf, &buf, data_len)) { - printf("Error, can't clone buf.\n"); - res = -EXIT_FAILURE; - goto end; - } - - /* Assign each mbuf with the data to handle. */ - actual_jobs = 0; - pos = 0; - for (i = 0; (pos < data_len) && (i < nb_jobs) ; i++) { - long act_job_len = RTE_MIN(job_len, data_len - pos); - rte_pktmbuf_attach_extbuf(ops[i]->mbuf, &buf[pos], 0, - act_job_len, &shinfo); jobs_ctx[i].mbuf = ops[i]->mbuf; - ops[i]->mbuf->data_len = job_len; - ops[i]->mbuf->pkt_len = act_job_len; ops[i]->user_id = i; ops[i]->group_id0 = 1; pos += act_job_len; @@ -612,7 +695,7 @@ main(int argc, char **argv) char *data_buf; long data_len; long job_len; - uint32_t nb_lcores = 1; + uint32_t nb_lcores = 1, nb_segs = 1; struct regex_conf *rgxc; uint32_t i; struct qps_per_lcore *qps_per_lcore; @@ -626,7 +709,7 @@ main(int argc, char **argv) if (argc > 1) args_parse(argc, argv, rules_file, data_file, &nb_jobs, &perf_mode, &nb_iterations, &nb_qps, - &nb_lcores); + &nb_lcores, &nb_segs); if (nb_qps == 0) rte_exit(EXIT_FAILURE, "Number of QPs must be greater than 0\n"); @@ -656,6 +739,7 @@ main(int argc, char **argv) for (i = 0; i < nb_lcores; i++) { rgxc[i] = (struct regex_conf){ .nb_jobs = nb_jobs, + .nb_segs = nb_segs, .perf_mode = perf_mode, .nb_iterations = nb_iterations, .nb_max_matches = nb_max_matches, diff --git a/doc/guides/tools/testregex.rst b/doc/guides/tools/testregex.rst index a59acd919f..cdb1ffd6ee 100644 --- a/doc/guides/tools/testregex.rst +++ b/doc/guides/tools/testregex.rst @@ -68,6 +68,9 @@ Application Options ``--nb_iter N`` number of iteration to run +``--nb_segs N`` + number of mbuf segment + ``--help`` print application options From patchwork Wed Mar 31 07:37:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suanming Mou X-Patchwork-Id: 90188 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9D12AA034F; Wed, 31 Mar 2021 09:38:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C2383140E32; Wed, 31 Mar 2021 09:38:06 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by mails.dpdk.org (Postfix) with ESMTP id 749D5140E12 for ; Wed, 31 Mar 2021 09:38:00 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE1 (envelope-from suanmingm@nvidia.com) with SMTP; 31 Mar 2021 10:37:59 +0300 Received: from nvidia.com (mtbc-r640-03.mtbc.labs.mlnx [10.75.70.8]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 12V7boeI002108; Wed, 31 Mar 2021 10:37:57 +0300 From: Suanming Mou To: orika@nvidia.com Cc: dev@dpdk.org, viacheslavo@nvidia.com, matan@nvidia.com, rasland@nvidia.com, John Hurley Date: Wed, 31 Mar 2021 10:37:49 +0300 Message-Id: <20210331073749.1382377-5-suanmingm@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210331073749.1382377-1-suanmingm@nvidia.com> References: <20210309235732.3952418-1-suanmingm@nvidia.com> <20210331073749.1382377-1-suanmingm@nvidia.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v4 4/4] regex/mlx5: prevent wrong calculation of free sqs in umr mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: John Hurley A recent change adds support for scattered mbuf and UMR support for regex. Part of this commit makes the pi and ci counters of the regex_sq a quarter of the length in non umr mode, effectively moving them from 16 bits to 14. The new get_free method casts the difference in pi and ci to a 16 bit value when calculating the free send queues, accounting for any wrapping when pi has looped back to 0 but ci has not yet. However, the move to 14 bits while still casting to 16 can now lead to corrupted, large values returned. Modify the get_free function to take in the has_umr flag and, accordingly, account for wrapping on either 14 or 16 bit pi/ci difference. Fixes: 017f097021a6 ("regex/mlx5: add data path scattered mbuf process") Signed-off-by: John Hurley Acked-by: Ori Kam --- drivers/regex/mlx5/mlx5_regex_fastpath.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/regex/mlx5/mlx5_regex_fastpath.c b/drivers/regex/mlx5/mlx5_regex_fastpath.c index 4f9402c583..b57e7d7794 100644 --- a/drivers/regex/mlx5/mlx5_regex_fastpath.c +++ b/drivers/regex/mlx5/mlx5_regex_fastpath.c @@ -192,8 +192,10 @@ send_doorbell(struct mlx5_regex_priv *priv, struct mlx5_regex_sq *sq) } static inline int -get_free(struct mlx5_regex_sq *sq) { - return (sq_size_get(sq) - (uint16_t)(sq->pi - sq->ci)); +get_free(struct mlx5_regex_sq *sq, uint8_t has_umr) { + return (sq_size_get(sq) - ((sq->pi - sq->ci) & + (has_umr ? (MLX5_REGEX_MAX_WQE_INDEX >> 2) : + MLX5_REGEX_MAX_WQE_INDEX))); } static inline uint32_t @@ -385,7 +387,7 @@ mlx5_regexdev_enqueue_gga(struct rte_regexdev *dev, uint16_t qp_id, while ((sqid = ffs(queue->free_sqs))) { sqid--; /* ffs returns 1 for bit 0 */ sq = &queue->sqs[sqid]; - nb_desc = get_free(sq); + nb_desc = get_free(sq, priv->has_umr); if (nb_desc) { /* The ops be handled can't exceed nb_ops. */ if (nb_desc > nb_left) @@ -418,7 +420,7 @@ mlx5_regexdev_enqueue(struct rte_regexdev *dev, uint16_t qp_id, while ((sqid = ffs(queue->free_sqs))) { sqid--; /* ffs returns 1 for bit 0 */ sq = &queue->sqs[sqid]; - while (get_free(sq)) { + while (get_free(sq, priv->has_umr)) { job_id = job_id_get(sqid, sq_size_get(sq), sq->pi); prep_one(priv, queue, sq, ops[i], &queue->jobs[job_id]); i++;