From patchwork Mon Mar 25 19:22:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongseok Koh X-Patchwork-Id: 51662 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 429E04C93; Mon, 25 Mar 2019 20:22:49 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 199C437AF for ; Mon, 25 Mar 2019 20:22:46 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:45 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrE027389; Mon, 25 Mar 2019 21:22:43 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org, stable@dpdk.org Date: Mon, 25 Mar 2019 12:22:33 -0700 Message-Id: <20190325192238.20940-2-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Cc: stable@dpdk.org Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- drivers/net/mlx4/mlx4_mr.c | 4 ---- drivers/net/mlx5/mlx5_mr.c | 6 ------ 2 files changed, 10 deletions(-) diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c index 01894faecf..0ba55fda04 100644 --- a/drivers/net/mlx4/mlx4_mr.c +++ b/drivers/net/mlx4/mlx4_mr.c @@ -1039,8 +1039,6 @@ mlx4_rx_addr2mr_bh(struct rxq *rxq, uintptr_t addr) struct mlx4_mr_ctrl *mr_ctrl = &rxq->mr_ctrl; struct mlx4_priv *priv = rxq->priv; - DEBUG("Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p", - rxq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr); return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr); } @@ -1061,8 +1059,6 @@ mlx4_tx_addr2mr_bh(struct txq *txq, uintptr_t addr) struct mlx4_mr_ctrl *mr_ctrl = &txq->mr_ctrl; struct mlx4_priv *priv = txq->priv; - DEBUG("Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p", - txq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr); return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr); } diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index d336a77e40..8aaa87dd60 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -1030,9 +1030,6 @@ mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr) struct mlx5_mr_ctrl *mr_ctrl = &rxq->mr_ctrl; struct mlx5_priv *priv = rxq_ctrl->priv; - DRV_LOG(DEBUG, - "Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p", - rxq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr); return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr); } @@ -1055,9 +1052,6 @@ mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr) struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl; struct mlx5_priv *priv = txq_ctrl->priv; - DRV_LOG(DEBUG, - "Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p", - txq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr); return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr); } From patchwork Mon Mar 25 19:22:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongseok Koh X-Patchwork-Id: 51664 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DF8203195; Mon, 25 Mar 2019 20:22:58 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 108B24C9C for ; Mon, 25 Mar 2019 20:22:51 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:46 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrF027389; Mon, 25 Mar 2019 21:22:45 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org, stable@dpdk.org Date: Mon, 25 Mar 2019 12:22:34 -0700 Message-Id: <20190325192238.20940-3-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Secondary process is not allowed to register MR due to a restriction of library and kernel driver. Fixes: 7e43a32ee060 ("net/mlx5: support externally allocated static memory") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- doc/guides/nics/mlx5.rst | 5 +++++ drivers/net/mlx5/mlx5_mr.c | 10 ++++++++++ 2 files changed, 15 insertions(+) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 0200373008..cbe3fb4c33 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -86,6 +86,11 @@ Limitations - Forked secondary process not supported. - All mempools must be initialized before rte_eth_dev_start(). + - External memory unregistered in EAL memseg list cannot be used for DMA + unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in + primary process and remapped to the same virtual address in secondary + process. If the external memory is registered by primary process but has + different virtual address in secondary process, unexpected error may happen. - Flow pattern without any specific vlan will match for vlan packets as well: diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index 8aaa87dd60..e255650add 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -1132,6 +1132,7 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque, struct mlx5_mr_cache entry; uint32_t lkey; + assert(rte_eal_process_type() == RTE_PROC_PRIMARY); /* If already registered, it should return. */ rte_rwlock_read_lock(&priv->mr.rwlock); lkey = mr_lookup_dev(dev, &entry, addr); @@ -1233,6 +1234,15 @@ mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr, struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl; struct mlx5_priv *priv = txq_ctrl->priv; + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + DRV_LOG(WARNING, + "port %u using address (%p) from unregistered mempool" + " having externally allocated memory" + " in secondary process, please create mempool" + " prior to rte_eth_dev_start()", + PORT_ID(priv), (void *)addr); + return UINT32_MAX; + } mlx5_mr_update_ext_mp(ETH_DEV(priv), mr_ctrl, mp); return mlx5_tx_addr2mr_bh(txq, addr); } From patchwork Mon Mar 25 19:22:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongseok Koh X-Patchwork-Id: 51665 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 89FC94F90; Mon, 25 Mar 2019 20:23:01 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 31A434CA6 for ; Mon, 25 Mar 2019 20:22:52 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:48 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrG027389; Mon, 25 Mar 2019 21:22:47 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org Date: Mon, 25 Mar 2019 12:22:35 -0700 Message-Id: <20190325192238.20940-4-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" A new PMD parameter (mr_ext_memseg_en) is added to control extension of memseg when creating a MR. It is enabled by default. If enabled, mlx5_mr_create() tries to maximize the range of MR registration so that the LKey lookup tables on datapath become smaller and get the best performance. However, it may worsen memory utilization because registered memory is pinned by kernel driver. Even if a page in the extended chunk is freed, that doesn't become reusable until the entire memory is freed and the MR is destroyed. To make freed pages available immediately, this parameter has to be turned off but it could drop performance. Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- doc/guides/nics/mlx5.rst | 11 +++++++++++ drivers/net/mlx5/mlx5.c | 7 +++++++ drivers/net/mlx5/mlx5.h | 2 ++ drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++----- 4 files changed, 36 insertions(+), 5 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index cbe3fb4c33..d9ae91dfc1 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -485,6 +485,17 @@ Run-time configuration Disabled by default. +- ``mr_ext_memseg_en`` parameter [int] + + A nonzero value enables extending memseg when registering DMA memory. If + enabled, the number of entries in MR (Memory Region) lookup table on datapath + is minimized and it benefits performance. On the other hand, it worsens memory + utilization because registered memory is pinned by kernel driver. Even if a + page in the extended chunk is freed, that doesn't become reusable until the + entire memory is freed. + + Enabled by default. + - ``representor`` parameter [list] This parameter can be used to instantiate DPDK Ethernet devices from diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 840cd3d307..93c0fc8c20 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -108,6 +108,9 @@ /* Activate Netlink support in VF mode. */ #define MLX5_VF_NL_EN "vf_nl_en" +/* Enable extending memsegs when creating a MR. */ +#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en" + /* Select port representors to instantiate. */ #define MLX5_REPRESENTOR "representor" @@ -569,6 +572,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque) config->vf_nl_en = !!tmp; } else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) { config->dv_flow_en = !!tmp; + } else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) { + config->mr_ext_memseg_en = !!tmp; } else { DRV_LOG(WARNING, "%s: unknown parameter", key); rte_errno = EINVAL; @@ -610,6 +615,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs) MLX5_L3_VXLAN_EN, MLX5_VF_NL_EN, MLX5_DV_FLOW_EN, + MLX5_MR_EXT_MEMSEG_EN, MLX5_REPRESENTOR, NULL, }; @@ -1588,6 +1594,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, .txqs_vec = MLX5_ARG_UNSET, .inline_max_packet_sz = MLX5_ARG_UNSET, .vf_nl_en = 1, + .mr_ext_memseg_en = 1, .mprq = { .enabled = 0, /* Disabled by default. */ .stride_num_n = MLX5_MPRQ_STRIDE_NUM_N, diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index d8a5162bdb..37c8cd1d34 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -167,6 +167,8 @@ struct mlx5_dev_config { unsigned int tx_vec_en:1; /* Tx vector is enabled. */ unsigned int rx_vec_en:1; /* Rx vector is enabled. */ unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */ + unsigned int mr_ext_memseg_en:1; + /* Whether memseg should be extended for MR creation. */ unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */ unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */ unsigned int dv_flow_en:1; /* Enable DV flow. */ diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index e255650add..e9eda975ff 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -534,6 +534,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, uintptr_t addr) { struct mlx5_priv *priv = dev->data->dev_private; + struct mlx5_dev_config *config = &priv->config; struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; const struct rte_memseg_list *msl; const struct rte_memseg *ms; @@ -569,14 +570,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, */ mlx5_mr_garbage_collect(dev); /* - * Find out a contiguous virtual address chunk in use, to which the - * given address belongs, in order to register maximum range. In the - * best case where mempools are not dynamically recreated and + * If enabled, find out a contiguous virtual address chunk in use, to + * which the given address belongs, in order to register maximum range. + * In the best case where mempools are not dynamically recreated and * '--socket-mem' is specified as an EAL option, it is very likely to * have only one MR(LKey) per a socket and per a hugepage-size even - * though the system memory is highly fragmented. + * though the system memory is highly fragmented. As the whole memory + * chunk will be pinned by kernel, it can't be reused unless entire + * chunk is freed from EAL. + * + * If disabled, just register one memseg (page). Then, memory + * consumption will be minimized but it may drop performance if there + * are many MRs to lookup on the datapath. */ - if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { + if (!config->mr_ext_memseg_en) { + data.msl = rte_mem_virt2memseg_list((void *)addr); + data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz); + data.end = data.start + data.msl->page_sz; + } else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { DRV_LOG(WARNING, "port %u unable to find virtually contiguous" " chunk for address (%p)." From patchwork Mon Mar 25 19:22:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongseok Koh X-Patchwork-Id: 51666 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 76F5C5688; Mon, 25 Mar 2019 20:23:04 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 449184CB5 for ; Mon, 25 Mar 2019 20:22:52 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:49 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrH027389; Mon, 25 Mar 2019 21:22:48 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org Date: Mon, 25 Mar 2019 12:22:36 -0700 Message-Id: <20190325192238.20940-5-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The Memory Region (MR) for DMA memory can't be created from secondary process due to lib/driver limitation. Whenever it is needed, secondary process can make a request to primary process through the EAL IPC channel (rte_mp_msg) which is established on initialization. Once a MR is created by primary process, it is immediately visible to secondary process because the MR list is global per a device. Thus, secondary process can look up the list after the request is successfully returned. Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- doc/guides/nics/mlx5.rst | 1 - drivers/net/mlx5/mlx5.h | 6 +++ drivers/net/mlx5/mlx5_mp.c | 50 ++++++++++++++++++++++++ drivers/net/mlx5/mlx5_mr.c | 96 ++++++++++++++++++++++++++++++++++++++++------ drivers/net/mlx5/mlx5_mr.h | 2 + 5 files changed, 142 insertions(+), 13 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index d9ae91dfc1..d793068b51 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -85,7 +85,6 @@ Limitations - For secondary process: - Forked secondary process not supported. - - All mempools must be initialized before rte_eth_dev_start(). - External memory unregistered in EAL memseg list cannot be used for DMA unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in primary process and remapped to the same virtual address in secondary diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 37c8cd1d34..410f17ab53 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -59,6 +59,7 @@ enum { /* Request types for IPC. */ enum mlx5_mp_req_type { MLX5_MP_REQ_VERBS_CMD_FD = 1, + MLX5_MP_REQ_CREATE_MR, MLX5_MP_REQ_START_RXTX, MLX5_MP_REQ_STOP_RXTX, }; @@ -68,6 +69,10 @@ struct mlx5_mp_param { enum mlx5_mp_req_type type; int port_id; int result; + RTE_STD_C11 + union { + uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */ + } args; }; /** Request timeout for IPC. */ @@ -437,6 +442,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev); /* mlx5_mp.c */ void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev); void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev); +int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr); int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev); void mlx5_mp_init_primary(void); void mlx5_mp_uninit_primary(void); diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c index 657ab6872e..7274a33b50 100644 --- a/drivers/net/mlx5/mlx5_mp.c +++ b/drivers/net/mlx5/mlx5_mp.c @@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer) (const struct mlx5_mp_param *)mp_msg->param; struct rte_eth_dev *dev; struct mlx5_priv *priv; + struct mlx5_mr_cache entry; + uint32_t lkey; int ret; assert(rte_eal_process_type() == RTE_PROC_PRIMARY); @@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer) dev = &rte_eth_devices[param->port_id]; priv = dev->data->dev_private; switch (param->type) { + case MLX5_MP_REQ_CREATE_MR: + mp_init_msg(dev, &mp_res, param->type); + lkey = mlx5_mr_create_primary(dev, &entry, param->args.addr); + if (lkey == UINT32_MAX) + res->result = -rte_errno; + ret = rte_mp_reply(&mp_res, peer); + break; case MLX5_MP_REQ_VERBS_CMD_FD: mp_init_msg(dev, &mp_res, param->type); mp_res.num_fds = 1; @@ -221,6 +230,47 @@ mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev) } /** + * Request Memory Region creation to the primary process. + * + * @param[in] dev + * Pointer to Ethernet structure. + * @param addr + * Target virtual address to register. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int +mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr) +{ + struct rte_mp_msg mp_req; + struct rte_mp_msg *mp_res; + struct rte_mp_reply mp_rep; + struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param; + struct mlx5_mp_param *res; + struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0}; + int ret; + + assert(rte_eal_process_type() == RTE_PROC_SECONDARY); + mp_init_msg(dev, &mp_req, MLX5_MP_REQ_CREATE_MR); + req->args.addr = addr; + ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts); + if (ret) { + DRV_LOG(ERR, "port %u request to primary process failed", + dev->data->port_id); + return -rte_errno; + } + assert(mp_rep.nb_received == 1); + mp_res = &mp_rep.msgs[0]; + res = (struct mlx5_mp_param *)mp_res->param; + ret = res->result; + if (ret) + rte_errno = -ret; + free(mp_rep.msgs); + return ret; +} + +/** * Request Verbs command file descriptor for mmap to the primary process. * * @param[in] dev diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index e9eda975ff..576a3c298b 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -516,7 +516,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl, /** * Create a new global Memroy Region (MR) for a missing virtual address. - * Register entire virtually contiguous memory chunk around the address. + * This API should be called on a secondary process, then a request is sent to + * the primary process in order to create a MR for the address. As the global MR + * list is on the shared memory, following LKey lookup should succeed unless the + * request fails. * * @param dev * Pointer to Ethernet device. @@ -530,8 +533,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl, * Searched LKey on success, UINT32_MAX on failure and rte_errno is set. */ static uint32_t -mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, - uintptr_t addr) +mlx5_mr_create_secondary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, + uintptr_t addr) +{ + struct mlx5_priv *priv = dev->data->dev_private; + int ret; + + DEBUG("port %u requesting MR creation for address (%p)", + dev->data->port_id, (void *)addr); + ret = mlx5_mp_req_mr_create(dev, addr); + if (ret) { + DEBUG("port %u fail to request MR creation for address (%p)", + dev->data->port_id, (void *)addr); + return UINT32_MAX; + } + rte_rwlock_read_lock(&priv->mr.rwlock); + /* Fill in output data. */ + mr_lookup_dev(dev, entry, addr); + /* Lookup can't fail. */ + assert(entry->lkey != UINT32_MAX); + rte_rwlock_read_unlock(&priv->mr.rwlock); + DEBUG("port %u MR CREATED by primary process for %p:\n" + " [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x", + dev->data->port_id, (void *)addr, + entry->start, entry->end, entry->lkey); + return entry->lkey; +} + +/** + * Create a new global Memroy Region (MR) for a missing virtual address. + * Register entire virtually contiguous memory chunk around the address. + * This must be called from the primary process. + * + * @param dev + * Pointer to Ethernet device. + * @param[out] entry + * Pointer to returning MR cache entry, found in the global cache or newly + * created. If failed to create one, this will not be updated. + * @param addr + * Target virtual address to register. + * + * @return + * Searched LKey on success, UINT32_MAX on failure and rte_errno is set. + */ +uint32_t +mlx5_mr_create_primary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, + uintptr_t addr) { struct mlx5_priv *priv = dev->data->dev_private; struct mlx5_dev_config *config = &priv->config; @@ -552,15 +599,6 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, DRV_LOG(DEBUG, "port %u creating a MR using address (%p)", dev->data->port_id, (void *)addr); - if (rte_eal_process_type() != RTE_PROC_PRIMARY) { - DRV_LOG(WARNING, - "port %u using address (%p) of unregistered mempool" - " in secondary process, please create mempool" - " before rte_eth_dev_start()", - dev->data->port_id, (void *)addr); - rte_errno = EPERM; - goto err_nolock; - } /* * Release detached MRs if any. This can't be called with holding either * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have @@ -772,6 +810,40 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, } /** + * Create a new global Memroy Region (MR) for a missing virtual address. + * This can be called from primary and secondary process. + * + * @param dev + * Pointer to Ethernet device. + * @param[out] entry + * Pointer to returning MR cache entry, found in the global cache or newly + * created. If failed to create one, this will not be updated. + * @param addr + * Target virtual address to register. + * + * @return + * Searched LKey on success, UINT32_MAX on failure and rte_errno is set. + */ +static uint32_t +mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, + uintptr_t addr) +{ + uint32_t ret = 0; + + switch (rte_eal_process_type()) { + case RTE_PROC_PRIMARY: + ret = mlx5_mr_create_primary(dev, entry, addr); + break; + case RTE_PROC_SECONDARY: + ret = mlx5_mr_create_secondary(dev, entry, addr); + break; + default: + break; + } + return ret; +} + +/** * Rebuild the global B-tree cache of device from the original MR list. * * @param dev diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h index a57003fe92..786f6a3148 100644 --- a/drivers/net/mlx5/mlx5_mr.h +++ b/drivers/net/mlx5/mlx5_mr.h @@ -70,6 +70,8 @@ extern rte_rwlock_t mlx5_mem_event_rwlock; int mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket); void mlx5_mr_btree_free(struct mlx5_mr_btree *bt); +uint32_t mlx5_mr_create_primary(struct rte_eth_dev *dev, + struct mlx5_mr_cache *entry, uintptr_t addr); void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr, size_t len, void *arg); int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_ctrl, From patchwork Mon Mar 25 19:22:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongseok Koh X-Patchwork-Id: 51667 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AE1C758FE; Mon, 25 Mar 2019 20:23:06 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 567F54CBD for ; Mon, 25 Mar 2019 20:22:52 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:51 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrI027389; Mon, 25 Mar 2019 21:22:50 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org Date: Mon, 25 Mar 2019 12:22:37 -0700 Message-Id: <20190325192238.20940-6-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" A new PMD parameter (mr_ext_memseg_en) is added to control extension of memseg when creating a MR. It is enabled by default. If enabled, mlx4_mr_create() tries to maximize the range of MR registration so that the LKey lookup tables on datapath become smaller and get the best performance. However, it may worsen memory utilization because registered memory is pinned by kernel driver. Even if a page in the extended chunk is freed, that doesn't become reusable until the entire memory is freed and the MR is destroyed. To make freed pages available immediately, this parameter has to be turned off but it could drop performance. Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- doc/guides/nics/mlx4.rst | 11 +++++++++++ drivers/net/mlx4/mlx4.c | 11 +++++++++-- drivers/net/mlx4/mlx4.h | 5 +++++ drivers/net/mlx4/mlx4_mr.c | 20 +++++++++++++++----- 4 files changed, 40 insertions(+), 7 deletions(-) diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst index cd34838f41..c8a02be4dd 100644 --- a/doc/guides/nics/mlx4.rst +++ b/doc/guides/nics/mlx4.rst @@ -119,6 +119,17 @@ Run-time configuration times for additional ports. All ports are probed by default if left unspecified. +- ``mr_ext_memseg_en`` parameter [int] + + A nonzero value enables extending memseg when registering DMA memory. If + enabled, the number of entries in MR (Memory Region) lookup table on datapath + is minimized and it benefits performance. On the other hand, it worsens memory + utilization because registered memory is pinned by kernel driver. Even if a + page in the extended chunk is freed, that doesn't become reusable until the + entire memory is freed. + + Enabled by default. + Kernel module parameters ~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index a5cfcdbee3..d913c2a47e 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -71,11 +71,14 @@ struct mlx4_conf { uint32_t present; /**< Bit-field for existing ports. */ uint32_t enabled; /**< Bit-field for user-enabled ports. */ } ports; + int mr_ext_memseg_en; + /** Whether memseg should be extended for MR creation. */ }; /* Available parameters list. */ const char *pmd_mlx4_init_params[] = { MLX4_PMD_PORT_KVARG, + MLX4_MR_EXT_MEMSEG_EN_KVARG, NULL, }; @@ -516,6 +519,8 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf) return -rte_errno; } conf->ports.enabled |= 1 << tmp; + } else if (strcmp(MLX4_MR_EXT_MEMSEG_EN_KVARG, key) == 0) { + conf->mr_ext_memseg_en = !!tmp; } else { rte_errno = EINVAL; WARN("%s: unknown parameter", key); @@ -551,10 +556,10 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf) } /* Process parameters. */ for (i = 0; pmd_mlx4_init_params[i]; ++i) { - arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG); + arg_count = rte_kvargs_count(kvlist, pmd_mlx4_init_params[i]); while (arg_count-- > 0) { ret = rte_kvargs_process(kvlist, - MLX4_PMD_PORT_KVARG, + pmd_mlx4_init_params[i], (int (*)(const char *, const char *, void *)) @@ -883,6 +888,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) struct ibv_device_attr_ex device_attr_ex; struct mlx4_conf conf = { .ports.present = 0, + .mr_ext_memseg_en = 1, }; unsigned int vf; int i; @@ -1100,6 +1106,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) device_attr_ex.tso_caps.max_tso; DEBUG("TSO is %ssupported", priv->tso ? "" : "not "); + priv->mr_ext_memseg_en = conf.mr_ext_memseg_en; /* Configure the first MAC address by default. */ err = mlx4_get_mac(priv, &mac.addr_bytes); if (err) { diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 832edc962d..5316c51247 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -53,6 +53,9 @@ /** Port parameter. */ #define MLX4_PMD_PORT_KVARG "port" +/** Enable extending memsegs when creating a MR. */ +#define MLX4_MR_EXT_MEMSEG_EN_KVARG "mr_ext_memseg_en" + /* Reserved address space for UAR mapping. */ #define MLX4_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4)) @@ -164,6 +167,8 @@ struct mlx4_priv { uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */ uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */ uint32_t tso:1; /**< Transmit segmentation offload is supported. */ + uint32_t mr_ext_memseg_en:1; + /** Whether memseg should be extended for MR creation. */ uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */ uint32_t hw_rss_max_qps; /**< Max Rx Queues supported by RSS. */ uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */ diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c index 0ba55fda04..6db917a092 100644 --- a/drivers/net/mlx4/mlx4_mr.c +++ b/drivers/net/mlx4/mlx4_mr.c @@ -580,14 +580,24 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry, */ mlx4_mr_garbage_collect(dev); /* - * Find out a contiguous virtual address chunk in use, to which the - * given address belongs, in order to register maximum range. In the - * best case where mempools are not dynamically recreated and + * If enabled, find out a contiguous virtual address chunk in use, to + * which the given address belongs, in order to register maximum range. + * In the best case where mempools are not dynamically recreated and * '--socket-mem' is specified as an EAL option, it is very likely to * have only one MR(LKey) per a socket and per a hugepage-size even - * though the system memory is highly fragmented. + * though the system memory is highly fragmented. As the whole memory + * chunk will be pinned by kernel, it can't be reused unless entire + * chunk is freed from EAL. + * + * If disabled, just register one memseg (page). Then, memory + * consumption will be minimized but it may drop performance if there + * are many MRs to lookup on the datapath. */ - if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { + if (!priv->mr_ext_memseg_en) { + data.msl = rte_mem_virt2memseg_list((void *)addr); + data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz); + data.end = data.start + data.msl->page_sz; + } else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { WARN("port %u unable to find virtually contiguous" " chunk for address (%p)." " rte_memseg_contig_walk() failed.", From patchwork Mon Mar 25 19:22:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongseok Koh X-Patchwork-Id: 51668 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8A94D5A44; Mon, 25 Mar 2019 20:23:09 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 1F6753772 for ; Mon, 25 Mar 2019 20:22:57 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:52 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrJ027389; Mon, 25 Mar 2019 21:22:51 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org Date: Mon, 25 Mar 2019 12:22:38 -0700 Message-Id: <20190325192238.20940-7-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The Memory Region (MR) for DMA memory can't be created from secondary process due to lib/driver limitation. Whenever it is needed, secondary process can make a request to primary process through the EAL IPC channel (rte_mp_msg) which is established on initialization. Once a MR is created by primary process, it is immediately visible to secondary process because the MR list is global per a device. Thus, secondary process can look up the list after the request is successfully returned. Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- doc/guides/nics/mlx4.rst | 1 - drivers/net/mlx4/mlx4.h | 6 +++ drivers/net/mlx4/mlx4_mp.c | 50 ++++++++++++++++++++++++ drivers/net/mlx4/mlx4_mr.c | 95 ++++++++++++++++++++++++++++++++++++++++------ drivers/net/mlx4/mlx4_mr.h | 2 + 5 files changed, 142 insertions(+), 12 deletions(-) diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst index c8a02be4dd..aaf1907532 100644 --- a/doc/guides/nics/mlx4.rst +++ b/doc/guides/nics/mlx4.rst @@ -159,7 +159,6 @@ Limitations - For secondary process: - Forked secondary process not supported. - - All mempools must be initialized before rte_eth_dev_start(). - External memory unregistered in EAL memseg list cannot be used for DMA unless such memory has been registered by ``mlx4_mr_update_ext_mp()`` in primary process and remapped to the same virtual address in secondary diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 5316c51247..3881943ef0 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -79,6 +79,7 @@ enum { /* Request types for IPC. */ enum mlx4_mp_req_type { MLX4_MP_REQ_VERBS_CMD_FD = 1, + MLX4_MP_REQ_CREATE_MR, MLX4_MP_REQ_START_RXTX, MLX4_MP_REQ_STOP_RXTX, }; @@ -88,6 +89,10 @@ struct mlx4_mp_param { enum mlx4_mp_req_type type; int port_id; int result; + RTE_STD_C11 + union { + uintptr_t addr; /* MLX4_MP_REQ_CREATE_MR */ + } args; }; /** Request timeout for IPC. */ @@ -234,6 +239,7 @@ int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx); /* mlx4_mp.c */ void mlx4_mp_req_start_rxtx(struct rte_eth_dev *dev); void mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev); +int mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr); int mlx4_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev); void mlx4_mp_init_primary(void); void mlx4_mp_uninit_primary(void); diff --git a/drivers/net/mlx4/mlx4_mp.c b/drivers/net/mlx4/mlx4_mp.c index eaeb257348..183622453c 100644 --- a/drivers/net/mlx4/mlx4_mp.c +++ b/drivers/net/mlx4/mlx4_mp.c @@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer) (const struct mlx4_mp_param *)mp_msg->param; struct rte_eth_dev *dev; struct mlx4_priv *priv; + struct mlx4_mr_cache entry; + uint32_t lkey; int ret; assert(rte_eal_process_type() == RTE_PROC_PRIMARY); @@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer) dev = &rte_eth_devices[param->port_id]; priv = dev->data->dev_private; switch (param->type) { + case MLX4_MP_REQ_CREATE_MR: + mp_init_msg(dev, &mp_res, param->type); + lkey = mlx4_mr_create_primary(dev, &entry, param->args.addr); + if (lkey == UINT32_MAX) + res->result = -rte_errno; + ret = rte_mp_reply(&mp_res, peer); + break; case MLX4_MP_REQ_VERBS_CMD_FD: mp_init_msg(dev, &mp_res, param->type); mp_res.num_fds = 1; @@ -218,6 +227,47 @@ mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev) } /** + * Request Memory Region creation to the primary process. + * + * @param[in] dev + * Pointer to Ethernet structure. + * @param addr + * Target virtual address to register. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int +mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr) +{ + struct rte_mp_msg mp_req; + struct rte_mp_msg *mp_res; + struct rte_mp_reply mp_rep; + struct mlx4_mp_param *req = (struct mlx4_mp_param *)mp_req.param; + struct mlx4_mp_param *res; + struct timespec ts = {.tv_sec = MLX4_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0}; + int ret; + + assert(rte_eal_process_type() == RTE_PROC_SECONDARY); + mp_init_msg(dev, &mp_req, MLX4_MP_REQ_CREATE_MR); + req->args.addr = addr; + ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts); + if (ret) { + ERROR("port %u request to primary process failed", + dev->data->port_id); + return -rte_errno; + } + assert(mp_rep.nb_received == 1); + mp_res = &mp_rep.msgs[0]; + res = (struct mlx4_mp_param *)mp_res->param; + ret = res->result; + if (ret) + rte_errno = -ret; + free(mp_rep.msgs); + return ret; +} + +/** * IPC message handler of primary process. * * @param[in] dev diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c index 6db917a092..ad7d4832f2 100644 --- a/drivers/net/mlx4/mlx4_mr.c +++ b/drivers/net/mlx4/mlx4_mr.c @@ -528,7 +528,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl, /** * Create a new global Memroy Region (MR) for a missing virtual address. - * Register entire virtually contiguous memory chunk around the address. + * This API should be called on a secondary process, then a request is sent to + * the primary process in order to create a MR for the address. As the global MR + * list is on the shared memory, following LKey lookup should succeed unless the + * request fails. * * @param dev * Pointer to Ethernet device. @@ -542,8 +545,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl, * Searched LKey on success, UINT32_MAX on failure and rte_errno is set. */ static uint32_t -mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry, - uintptr_t addr) +mlx4_mr_create_secondary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry, + uintptr_t addr) +{ + struct mlx4_priv *priv = dev->data->dev_private; + int ret; + + DEBUG("port %u requesting MR creation for address (%p)", + dev->data->port_id, (void *)addr); + ret = mlx4_mp_req_mr_create(dev, addr); + if (ret) { + DEBUG("port %u fail to request MR creation for address (%p)", + dev->data->port_id, (void *)addr); + return UINT32_MAX; + } + rte_rwlock_read_lock(&priv->mr.rwlock); + /* Fill in output data. */ + mr_lookup_dev(dev, entry, addr); + /* Lookup can't fail. */ + assert(entry->lkey != UINT32_MAX); + rte_rwlock_read_unlock(&priv->mr.rwlock); + DEBUG("port %u MR CREATED by primary process for %p:\n" + " [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x", + dev->data->port_id, (void *)addr, + entry->start, entry->end, entry->lkey); + return entry->lkey; +} + +/** + * Create a new global Memroy Region (MR) for a missing virtual address. + * Register entire virtually contiguous memory chunk around the address. + * This must be called from the primary process. + * + * @param dev + * Pointer to Ethernet device. + * @param[out] entry + * Pointer to returning MR cache entry, found in the global cache or newly + * created. If failed to create one, this will not be updated. + * @param addr + * Target virtual address to register. + * + * @return + * Searched LKey on success, UINT32_MAX on failure and rte_errno is set. + */ +uint32_t +mlx4_mr_create_primary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry, + uintptr_t addr) { struct mlx4_priv *priv = dev->data->dev_private; struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; @@ -563,14 +610,6 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry, DEBUG("port %u creating a MR using address (%p)", dev->data->port_id, (void *)addr); - if (rte_eal_process_type() != RTE_PROC_PRIMARY) { - WARN("port %u using address (%p) of unregistered mempool" - " in secondary process, please create mempool" - " before rte_eth_dev_start()", - dev->data->port_id, (void *)addr); - rte_errno = EPERM; - goto err_nolock; - } /* * Release detached MRs if any. This can't be called with holding either * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have @@ -781,6 +820,40 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry, } /** + * Create a new global Memroy Region (MR) for a missing virtual address. + * This can be called from primary and secondary process. + * + * @param dev + * Pointer to Ethernet device. + * @param[out] entry + * Pointer to returning MR cache entry, found in the global cache or newly + * created. If failed to create one, this will not be updated. + * @param addr + * Target virtual address to register. + * + * @return + * Searched LKey on success, UINT32_MAX on failure and rte_errno is set. + */ +static uint32_t +mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry, + uintptr_t addr) +{ + uint32_t ret = 0; + + switch (rte_eal_process_type()) { + case RTE_PROC_PRIMARY: + ret = mlx4_mr_create_primary(dev, entry, addr); + break; + case RTE_PROC_SECONDARY: + ret = mlx4_mr_create_secondary(dev, entry, addr); + break; + default: + break; + } + return ret; +} + +/** * Rebuild the global B-tree cache of device from the original MR list. * * @param dev diff --git a/drivers/net/mlx4/mlx4_mr.h b/drivers/net/mlx4/mlx4_mr.h index 37a365a8b5..9d125e239d 100644 --- a/drivers/net/mlx4/mlx4_mr.h +++ b/drivers/net/mlx4/mlx4_mr.h @@ -75,6 +75,8 @@ extern rte_rwlock_t mlx4_mem_event_rwlock; int mlx4_mr_btree_init(struct mlx4_mr_btree *bt, int n, int socket); void mlx4_mr_btree_free(struct mlx4_mr_btree *bt); void mlx4_mr_btree_dump(struct mlx4_mr_btree *bt); +uint32_t mlx4_mr_create_primary(struct rte_eth_dev *dev, + struct mlx4_mr_cache *entry, uintptr_t addr); void mlx4_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr, size_t len, void *arg); int mlx4_mr_update_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,