From patchwork Thu Feb 25 14:32:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elad Nachman X-Patchwork-Id: 88226 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 299E6A034F; Thu, 25 Feb 2021 15:32:46 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 15AD61608B3; Thu, 25 Feb 2021 15:32:46 +0100 (CET) Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by mails.dpdk.org (Postfix) with ESMTP id 570F840692 for ; Thu, 25 Feb 2021 15:32:44 +0100 (CET) Received: by mail-ed1-f43.google.com with SMTP id h19so7126905edb.9 for ; Thu, 25 Feb 2021 06:32:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jqKbaCCGwOp+FhdC0FHSuaq8zTtdce0kjjjGUfFQWrA=; b=EbnmSppz4k9B51N54NESg67wVfCVMojtmPLgrlmuWDTm37DTSuuoJaIzDxZYfK8/wH 4RtyO7qzIm13/UgKytJJPPpL1p8ZcGJ28AbkGx6rZV9eQ82C14mbhRd3ZdGeHzamKANt QaqMcZFYByFNWCVVnYQ1qznzHPF5xqnSkBwlL/sdCZzqUalGKegXUkLUj31lWmAFkJql zALzUkB/DmtLIsXCy0vaEAY2ClMihsRpRVDBxmvrES0FJ/FRFuZzIsDtixs1/hYvesHK SxInIHibC9U7+udcnPVtEUOSqjJ/V0YQ4uNf7GO/EeAchjAaj9QanAGm+bOqbisDz1Y3 2wTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jqKbaCCGwOp+FhdC0FHSuaq8zTtdce0kjjjGUfFQWrA=; b=QAuVBiX6z5UQSCHogOY80GCFCC5BkUaB7m7pbQdK4X5Enjkn0xzq/Rm55080tU9WwR nf4x0ixykIZAVoWw7DMqyqRYYsBYOwykTsHdw1Tyx374tyy6S2Z0/cBV9EyCSxaov4gH g5+hMNHfkWzAi2fIpiHSVnfVgdfhsfrDRd1C/hIZyXA5bA+g+IOUsm2249jZfMfwXoPf Z8ZwYZdOVi2fHxKeAgQCpVM78GXyQ6tVSMO2NLXj5MmiqiVF4+xnVzEdk4geo1e4dGZc 3W4KQ2PT8ErOLDuWDf07+iYMdu/jGOCvj39tRhL6cp42A9IKMiyMmz7maExjRPtpHn7G WlPQ== X-Gm-Message-State: AOAM533D4CHPxXv1kEHyOhsEFTi1GQ6VKG3ExZRj4xzlJ4TISHHKrvdc 8PaITXitgB9/tzu50VrzVH0= X-Google-Smtp-Source: ABdhPJzQBV4Bp7HDEysgk052M7mEAMyXP336ZPSehImuYjCNtL2Ni1vBHVkPXSE1wZ1N6oXpz4gwfw== X-Received: by 2002:a05:6402:34c4:: with SMTP id w4mr3114667edc.153.1614263564109; Thu, 25 Feb 2021 06:32:44 -0800 (PST) Received: from localhost (46-117-188-146.bb.netvision.net.il. [46.117.188.146]) by smtp.gmail.com with ESMTPSA id fi11sm1108815ejb.73.2021.02.25.06.32.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Feb 2021 06:32:43 -0800 (PST) From: Elad Nachman To: ferruh.yigit@intel.com Cc: iryzhov@nfware.com, stephen@networkplumber.org, dev@dpdk.org, eladv6@gmail.com Date: Thu, 25 Feb 2021 16:32:38 +0200 Message-Id: <20210225143239.14220-1-eladv6@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201126144613.4986-1-eladv6@gmail.com> References: <20201126144613.4986-1-eladv6@gmail.com> Subject: [dpdk-dev] [PATCH 1/2] kni: fix kernel deadlock when using mlx devices X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This first part of v4 of the patch re-introduces Stephen Hemminger's patch 64106 . This part changes the parameter kni_net_process_request() gets and introduces the initial rtnl unlocking mechanism. Signed-off-by: Elad Nachman --- v4: * for if down case, send asynchronously with rtnl locked and without wait, returning immediately to avoid both kernel race conditions and deadlock in user-space v3: * Include original patch and new patch as a series of patch, added a comment to the new patch v2: * rebuild the patch as increment from patch 64106 * fix comment and blank lines --- kernel/linux/kni/kni_net.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 4b752083d..f0b6e9a8d 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -102,17 +103,15 @@ get_data_kva(struct kni_dev *kni, void *pkt_kva) * It can be called to process the request. */ static int -kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req) +kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) { + struct kni_dev *kni = netdev_priv(dev); int ret = -1; void *resp_va; uint32_t num; int ret_val; - if (!kni || !req) { - pr_err("No kni instance or request\n"); - return -EINVAL; - } + ASSERT_RTNL(); mutex_lock(&kni->sync_lock); @@ -125,8 +124,17 @@ kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req) goto fail; } + /* Since we need to wait and RTNL mutex is held + * drop the mutex and hold reference to keep device + */ + dev_hold(dev); + rtnl_unlock(); + ret_val = wait_event_interruptible_timeout(kni->wq, kni_fifo_count(kni->resp_q), 3 * HZ); + rtnl_lock(); + dev_put(dev); + if (signal_pending(current) || ret_val <= 0) { ret = -ETIME; goto fail; @@ -155,7 +163,6 @@ kni_net_open(struct net_device *dev) { int ret; struct rte_kni_request req; - struct kni_dev *kni = netdev_priv(dev); netif_start_queue(dev); if (kni_dflt_carrier == 1) @@ -168,7 +175,7 @@ kni_net_open(struct net_device *dev) /* Setting if_up to non-zero means up */ req.if_up = 1; - ret = kni_net_process_request(kni, &req); + ret = kni_net_process_request(dev, &req); return (ret == 0) ? req.result : ret; } @@ -178,7 +185,6 @@ kni_net_release(struct net_device *dev) { int ret; struct rte_kni_request req; - struct kni_dev *kni = netdev_priv(dev); netif_stop_queue(dev); /* can't transmit any more */ netif_carrier_off(dev); @@ -188,7 +194,7 @@ kni_net_release(struct net_device *dev) /* Setting if_up to 0 means down */ req.if_up = 0; - ret = kni_net_process_request(kni, &req); + ret = kni_net_process_request(dev, &req); return (ret == 0) ? req.result : ret; } @@ -643,14 +649,13 @@ kni_net_change_mtu(struct net_device *dev, int new_mtu) { int ret; struct rte_kni_request req; - struct kni_dev *kni = netdev_priv(dev); pr_debug("kni_net_change_mtu new mtu %d to be set\n", new_mtu); memset(&req, 0, sizeof(req)); req.req_id = RTE_KNI_REQ_CHANGE_MTU; req.new_mtu = new_mtu; - ret = kni_net_process_request(kni, &req); + ret = kni_net_process_request(dev, &req); if (ret == 0 && req.result == 0) dev->mtu = new_mtu; @@ -661,7 +666,6 @@ static void kni_net_change_rx_flags(struct net_device *netdev, int flags) { struct rte_kni_request req; - struct kni_dev *kni = netdev_priv(netdev); memset(&req, 0, sizeof(req)); @@ -683,7 +687,7 @@ kni_net_change_rx_flags(struct net_device *netdev, int flags) req.promiscusity = 0; } - kni_net_process_request(kni, &req); + kni_net_process_request(netdev, &req); } /* @@ -742,7 +746,6 @@ kni_net_set_mac(struct net_device *netdev, void *p) { int ret; struct rte_kni_request req; - struct kni_dev *kni; struct sockaddr *addr = p; memset(&req, 0, sizeof(req)); @@ -754,8 +757,7 @@ kni_net_set_mac(struct net_device *netdev, void *p) memcpy(req.mac_addr, addr->sa_data, netdev->addr_len); memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len); - kni = netdev_priv(netdev); - ret = kni_net_process_request(kni, &req); + ret = kni_net_process_request(netdev, &req); return (ret == 0 ? req.result : ret); } From patchwork Thu Feb 25 14:32:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elad Nachman X-Patchwork-Id: 88227 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B11EAA034F; Thu, 25 Feb 2021 15:32:51 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 49C671608C8; Thu, 25 Feb 2021 15:32:49 +0100 (CET) Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by mails.dpdk.org (Postfix) with ESMTP id 2D25B1608C5 for ; Thu, 25 Feb 2021 15:32:48 +0100 (CET) Received: by mail-ed1-f53.google.com with SMTP id cf12so6315240edb.8 for ; Thu, 25 Feb 2021 06:32:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=HzpYrtfuwq/Lu+W+xjNTvkO1BDOFgFseiBG2hpPK11E=; b=hXOLXp0CzHf4wXscC7MUv8zGmPFMGr8bpIpXxnmr2CVxGw0RyHoVpk/J3O4pNZN/GO UjhDFhNpOZJA7O7Fqak5Dz7qFDVE7x0LrDt5oL70E7UCmhS9dk68Kjgw60R5cAs5SmFp 61NpxQFOrFYQvFjZGh96szVUhwwz/fbSan7VHKUfAPImXz38YuGpM28HUAuDhn8gaD0u gHiTvOPkz4K5kc+pUVUCq2D3tgZrkwhzMsSGnbjh+6GfwMl3Niw8ruEYewo782muMD6N O3nIQsbpuRip1krfL05/+fEL5gNdl37nw95SfU85D0uwMXf7DtmDpSoCVRGtPds2Q+r0 rfXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=HzpYrtfuwq/Lu+W+xjNTvkO1BDOFgFseiBG2hpPK11E=; b=QronCEWqxcNMmEkctf4wDwQH5DW43FRCM4Bzalzak4lo1yxFxdkZK88JWiq0yH1SkL JOYEmvpzJCZ4aLXsYy4E6uFZwXsEr8OBaKwqGtUOeH1vIeatfjLv+aDBYsu/CoET4Pus f94DAJ6oWfSHT/Beesr/sQPjRoy7MBl88TFLlE3CmEWYmI+fqy9BjCAVG7Wmg58qrMGI IIQGFZJIYZiFXrEdVdMCb0AqPl7ma7O+uSgsqvZMWzbC54Q/EEovDB99VAsNVNwmeoPa +gaJAkZWoZUYKBhOQZMyuh0vVP8OwdPOCaux4NFZw1lbb2UrV2pNOq2L18yiqIsdqyHP TEmg== X-Gm-Message-State: AOAM533mITG/QMUG0pvLQiiTeCmZcHNiy+/TbQ2a1uu2lnHpCBd56z3D dXuKjMaHvrnvuzjViaWlw7vrTDVsPyXYGz7X X-Google-Smtp-Source: ABdhPJy7lWT0ZJ1FjQoWM69g3JS3vVczz1PmU+dZG5HH57Gfi5EkJ6p7ZkiDkuxlOZehUgVp4M8ABA== X-Received: by 2002:a50:d9cc:: with SMTP id x12mr3248369edj.68.1614263567949; Thu, 25 Feb 2021 06:32:47 -0800 (PST) Received: from localhost (46-117-188-146.bb.netvision.net.il. [46.117.188.146]) by smtp.gmail.com with ESMTPSA id oy8sm2962419ejb.58.2021.02.25.06.32.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Feb 2021 06:32:47 -0800 (PST) From: Elad Nachman To: ferruh.yigit@intel.com Cc: iryzhov@nfware.com, stephen@networkplumber.org, dev@dpdk.org, eladv6@gmail.com Date: Thu, 25 Feb 2021 16:32:39 +0200 Message-Id: <20210225143239.14220-2-eladv6@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210225143239.14220-1-eladv6@gmail.com> References: <20201126144613.4986-1-eladv6@gmail.com> <20210225143239.14220-1-eladv6@gmail.com> Subject: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v4 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This part of the series includes my fixes for the issues reported by Ferruh and Igor (and Igor comments for v3 of the patch) on top of part 1 of the patch series: A. KNI sync lock is being locked while rtnl is held. If two threads are calling kni_net_process_request() , then the first one will take the sync lock, release rtnl lock then sleep. The second thread will try to lock sync lock while holding rtnl. The first thread will wake, and try to lock rtnl, resulting in a deadlock. The remedy is to release rtnl before locking the KNI sync lock. Since in between nothing is accessing Linux network-wise, no rtnl locking is needed. B. There is a race condition in __dev_close_many() processing the close_list while the application terminates. It looks like if two vEth devices are terminating, and one releases the rtnl lock, the other takes it, updating the close_list in an unstable state, causing the close_list to become a circular linked list, hence list_for_each_entry() will endlessly loop inside __dev_close_many() . Since the description for the original patch indicate the original motivation was bringing the device up, I have changed kni_net_process_request() to hold the rtnl mutex in case of bringing the device down since this is the path called from __dev_close_many() , causing the corruption of the close_list. In order to prevent deadlock in Mellanox device in this case, the code has been modified not to wait for user-space while holding the rtnl lock. Instead, after the request has been sent, all locks are relinquished and the function exits immediately with return code of zero (success). To summarize: request != interface down : unlock rtnl, send request to user-space, wait for response, send the response error code to caller in user-space. request == interface down: send request to user-space, return immediately with error code of 0 (success) to user-space. Signed-off-by: Elad Nachman --- v4: * for if down case, send asynchronously with rtnl locked and without wait, returning immediately to avoid both kernel race conditions and deadlock in user-space v3: * Include original patch and new patch as a series of patch, added a comment to the new patch v2: * rebuild the patch as increment from patch 64106 * fix comment and blank lines --- kernel/linux/kni/kni_net.c | 41 +++++++++++++++++++++++++++------ lib/librte_kni/rte_kni.c | 7 ++++-- lib/librte_kni/rte_kni_common.h | 1 + 3 files changed, 40 insertions(+), 9 deletions(-) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index f0b6e9a8d..ba991802b 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -110,12 +110,34 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) void *resp_va; uint32_t num; int ret_val; + int req_is_dev_stop = 0; + + /* For configuring the interface to down, + * rtnl must be held all the way to prevent race condition + * inside __dev_close_many() between two netdev instances of KNI + */ + if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF && + req->if_up == 0) + req_is_dev_stop = 1; ASSERT_RTNL(); + /* Since we need to wait and RTNL mutex is held + * drop the mutex and hold reference to keep device + */ + if (!req_is_dev_stop) { + dev_hold(dev); + rtnl_unlock(); + } + mutex_lock(&kni->sync_lock); - /* Construct data */ + /* Construct data, for dev stop send asynchronously + * so instruct user-space not to sent response as no + * one will be waiting for it. + */ + if (req_is_dev_stop) + req->skip_post_resp_to_q = 1; memcpy(kni->sync_kva, req, sizeof(struct rte_kni_request)); num = kni_fifo_put(kni->req_q, &kni->sync_va, 1); if (num < 1) { @@ -124,16 +146,16 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) goto fail; } - /* Since we need to wait and RTNL mutex is held - * drop the mutex and hold refernce to keep device + /* No result available since request is handled + * asynchronously. set response to success. */ - dev_hold(dev); - rtnl_unlock(); + if (req_is_dev_stop) { + req->result = 0; + goto async; + } ret_val = wait_event_interruptible_timeout(kni->wq, kni_fifo_count(kni->resp_q), 3 * HZ); - rtnl_lock(); - dev_put(dev); if (signal_pending(current) || ret_val <= 0) { ret = -ETIME; @@ -148,10 +170,15 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) } memcpy(req, kni->sync_kva, sizeof(struct rte_kni_request)); +async: ret = 0; fail: mutex_unlock(&kni->sync_lock); + if (!req_is_dev_stop) { + rtnl_lock(); + dev_put(dev); + } return ret; } diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 837d0217d..6d777266d 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -591,8 +591,11 @@ rte_kni_handle_request(struct rte_kni *kni) break; } - /* Construct response mbuf and put it back to resp_q */ - ret = kni_fifo_put(kni->resp_q, (void **)&req, 1); + /* if needed, construct response mbuf and put it back to resp_q */ + if (!req->skip_post_resp_to_q) + ret = kni_fifo_put(kni->resp_q, (void **)&req, 1); + else + ret = 1; if (ret != 1) { RTE_LOG(ERR, KNI, "Fail to put the muf back to resp_q\n"); return -1; /* It is an error of can't putting the mbuf back */ diff --git a/lib/librte_kni/rte_kni_common.h b/lib/librte_kni/rte_kni_common.h index ffb318273..3b5d06850 100644 --- a/lib/librte_kni/rte_kni_common.h +++ b/lib/librte_kni/rte_kni_common.h @@ -48,6 +48,7 @@ struct rte_kni_request { uint8_t promiscusity;/**< 1: promisc mode enable, 0: disable */ uint8_t allmulti; /**< 1: all-multicast mode enable, 0: disable */ }; + int32_t skip_post_resp_to_q; /**< 1: skip queue response 0: disable */ int32_t result; /**< Result for processing request */ } __attribute__((__packed__));