From patchwork Sat May 25 09:26:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Slava Ovsiienko X-Patchwork-Id: 53697 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B7AC0343C; Sat, 25 May 2019 11:26:14 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id AD8CAF11 for ; Sat, 25 May 2019 11:26:12 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE2 (envelope-from viacheslavo@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 May 2019 12:26:11 +0300 Received: from pegasus12.mtr.labs.mlnx. (pegasus12.mtr.labs.mlnx [10.210.17.40]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x4P9QBBa005071; Sat, 25 May 2019 12:26:11 +0300 From: Viacheslav Ovsiienko To: dev@dpdk.org Cc: shahafs@mellanox.com, yskoh@mellanox.com Date: Sat, 25 May 2019 09:26:05 +0000 Message-Id: <1558776365-28511-1-git-send-email-viacheslavo@mellanox.com> X-Mailer: git-send-email 1.8.3.1 Subject: [dpdk-dev] [PATCH] net/mlx5: fix event handler uninstall X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When device is being closed and tries to unregister interrupt callback, there is a chance the handler is still active (called in context of eal_intr_thread_main thread). If so the rte_intr_callback_unregister returns -EAGAIN and keeps the handler registered, causing crash when underlaying resourse is gone away. This race condition may happen if event handling in application takes a long time. We should check the return code of unregistering routine and try again to unregister the handler. The diagnostic messages are shown once a second, while trying to unregister. Fixes: 028b2a28c3cb ("net/mlx5: update event handler for multiport IB devices") Signed-off-by: Viacheslav Ovsiienko Acked-by: Yongseok Koh --- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/mlx5/mlx5.h | 2 ++ drivers/net/mlx5/mlx5_ethdev.c | 79 +++++++++++++++++++++++++++++++++++++++--- 3 files changed, 78 insertions(+), 5 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 9f5ec97..2344cb4 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -313,7 +313,7 @@ struct mlx5_dev_spawn_data { **/ assert(!sh->intr_cnt); if (sh->intr_cnt) - rte_intr_callback_unregister + mlx5_intr_callback_unregister (&sh->intr_handle, mlx5_dev_interrupt_handler, sh); pthread_mutex_destroy(&sh->intr_mutex); if (sh->pd) diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 3eaaafd..5b5b93d 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -437,6 +437,8 @@ void mlx5_nl_check_switch_info(bool nun_vf_set, struct mlx5_switch_info *switch_info); void mlx5_translate_port_name(const char *port_name_in, struct mlx5_switch_info *port_info_out); +void mlx5_intr_callback_unregister(const struct rte_intr_handle *handle, + rte_intr_callback_fn cb_fn, void *cb_arg); /* mlx5_mac.c */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index a8a7ece..f47297c 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -1226,9 +1226,80 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size) } } +/* + * Unregister callback handler safely. The handler may be active + * while we are trying to unregister it, in this case code -EAGAIN + * is returned by rte_intr_callback_unregister(). This routine checks + * the return code and tries to unregister handler again. + * + * @param handle + * interrupt handle + * @param cb_fn + * pointer to callback routine + * @cb_arg + * opaque callback parameter + */ +void +mlx5_intr_callback_unregister(const struct rte_intr_handle *handle, + rte_intr_callback_fn cb_fn, void *cb_arg) +{ + /* + * Try to reduce timeout management overhead by not calling + * the timer related routines on the first iteration. If the + * unregistering succeeds on first call there will be no + * timer calls at all. + */ + uint64_t twait = 0; + uint64_t start = 0; + + do { + int ret; + + ret = rte_intr_callback_unregister(handle, cb_fn, cb_arg); + if (ret >= 0) + return; + if (ret != -EAGAIN) { + DRV_LOG(INFO, "failed to unregister interrupt" + " handler (error: %d)", ret); + assert(false); + return; + } + if (twait) { + struct timespec onems; + + /* Wait one millisecond and try again. */ + onems.tv_sec = 0; + onems.tv_nsec = NS_PER_S / MS_PER_S; + nanosleep(&onems, 0); + /* Check whether one second elapsed. */ + if ((rte_get_timer_cycles() - start) <= twait) + continue; + } else { + /* + * We get the amount of timer ticks for one second. + * If this amount elapsed it means we spent one + * second in waiting. This branch is executed once + * on first iteration. + */ + twait = rte_get_timer_hz(); + assert(twait); + } + /* + * Timeout elapsed, show message (once a second) and retry. + * We have no other acceptable option here, if we ignore + * the unregistering return code the handler will not + * be unregistered, fd will be closed and we may get the + * crush. Hanging and messaging in the loop seems not to be + * the worst choice. + */ + DRV_LOG(INFO, "Retrying to unregister interrupt handler"); + start = rte_get_timer_cycles(); + } while (true); +} + /** * Uninstall shared asynchronous device events handler. - * This function is implemeted to support event sharing + * This function is implemented to support event sharing * between multiple ports of single IB device. * * @param dev @@ -1254,7 +1325,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size) sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS; if (!sh->intr_cnt || --sh->intr_cnt) goto exit; - rte_intr_callback_unregister(&sh->intr_handle, + mlx5_intr_callback_unregister(&sh->intr_handle, mlx5_dev_interrupt_handler, sh); sh->intr_handle.fd = 0; sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; @@ -1263,8 +1334,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size) } /** - * Install shared asyncronous device events handler. - * This function is implemeted to support event sharing + * Install shared asynchronous device events handler. + * This function is implemented to support event sharing * between multiple ports of single IB device. * * @param dev