Message ID | 20220802004938.23670-1-cfontana@suse.de (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C3529A034C; Tue, 2 Aug 2022 02:49:41 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 704F840141; Tue, 2 Aug 2022 02:49:41 +0200 (CEST) Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by mails.dpdk.org (Postfix) with ESMTP id 64CE5400D7 for <dev@dpdk.org>; Tue, 2 Aug 2022 02:49:40 +0200 (CEST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 35B961FAB5; Tue, 2 Aug 2022 00:49:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1659401380; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=MomCghzZT5WkFvka8URSe2av4zBq22EBeO3Y9BFdGd0=; b=GPktXIVKY+sFa7xZyBRvFqRRnlDkH2PkEKCrUyKYey2Qj+hLamE2dtDyExOn3Lgm7vur7l fOWx4H2Wv+73MUm5VtB6L127X2djC52WjwnnBxA6kZe+jWOu2BS9edlp8fLyAUl/Ir0xf1 4mS8LPucreONK2RTXcY9jG89tBSpmR4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1659401380; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=MomCghzZT5WkFvka8URSe2av4zBq22EBeO3Y9BFdGd0=; b=7bLYpzokIJOo+fS0ArC8Wvoiyp+wlvFaP8Rs0fwpgn9izjtxCYHec53937GKpvK28m4pmq bJGkdjaIQ9gzcHDA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 06A5D13A99; Tue, 2 Aug 2022 00:49:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uMWNO6N06GLLMwAAMHmgww (envelope-from <cfontana@suse.de>); Tue, 02 Aug 2022 00:49:39 +0000 From: Claudio Fontana <cfontana@suse.de> To: Maxime Coquelin <maxime.coquelin@redhat.com>, Chenbo Xia <chenbo.xia@intel.com> Cc: dev@dpdk.org, Claudio Fontana <cfontana@suse.de> Subject: [PATCH v3 0/2] vhost fixes for OVS SIGSEGV in PMD Date: Tue, 2 Aug 2022 02:49:36 +0200 Message-Id: <20220802004938.23670-1-cfontana@suse.de> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org |
Series |
vhost fixes for OVS SIGSEGV in PMD
|
|
Message
Claudio Fontana
Aug. 2, 2022, 12:49 a.m. UTC
This is an alternative, more general fix compared with PATCH v1, and fixes style issues in v2. The series fixes a segmentation fault in the OVS PMD thread when resynchronizing with QEMU after the guest application has been killed with SIGKILL (patch 1/2), The segmentation fault can be caused by the guest DPDK application, which is able this way to crash the OVS process on the host, see the backtrace in patch 1/2. Patch 2/2 is an additional improvement in the current error handling. --- Changes from v2: fix warnings from checkpatch. --- Changes from v1: * patch 1/2: instead of only fixing virtio_dev_tx_split, put the check for nr_vec == 0 inside desc_to_mbuf and mbuf_to_desc, so that in no case they attempt to read and dereference addresses from the buf_vec[] array when it does not contain any valid elements. --- For your review and comments, Claudio Claudio Fontana (2): vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc vhost: improve error handling in desc_to_mbuf lib/vhost/virtio_net.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
Comments
On Tue, 2 Aug 2022 02:49:36 +0200 Claudio Fontana <cfontana@suse.de> wrote: > This is an alternative, more general fix compared with PATCH v1, > and fixes style issues in v2. > > The series fixes a segmentation fault in the OVS PMD thread when > resynchronizing with QEMU after the guest application has been killed > with SIGKILL (patch 1/2), > > The segmentation fault can be caused by the guest DPDK application, > which is able this way to crash the OVS process on the host, > see the backtrace in patch 1/2. > > Patch 2/2 is an additional improvement in the current error handling. Checking for NULL and 0 is good on host side. But guest should probably not be sending such a useless request?
On 8/2/22 03:40, Stephen Hemminger wrote: > On Tue, 2 Aug 2022 02:49:36 +0200 > Claudio Fontana <cfontana@suse.de> wrote: > >> This is an alternative, more general fix compared with PATCH v1, >> and fixes style issues in v2. >> >> The series fixes a segmentation fault in the OVS PMD thread when >> resynchronizing with QEMU after the guest application has been killed >> with SIGKILL (patch 1/2), >> >> The segmentation fault can be caused by the guest DPDK application, >> which is able this way to crash the OVS process on the host, >> see the backtrace in patch 1/2. >> >> Patch 2/2 is an additional improvement in the current error handling. > > Checking for NULL and 0 is good on host side. > But guest should probably not be sending such a useless request? Right, I focused on hardening the host side, as that is what the customer required. This happens specifically when the guest application goes away abruptly and has no chance to signal anything (SIGKILL), and at restart issues a virtio reset on the device, which in qemu causes also a (actually two) virtio_net set_status, which attempt to stop the queues (twice). DPDK seems to think at that point that it needs to drain the queue, and tries to process MAX_PKT_BURST buffers ("about to dequeue 32 buffers"), then calls fill_vec_buf_split and gets absolutely nothing. I think this should also address the reports in this thread: https://inbox.dpdk.org/dev/SA1PR08MB713373B0D19329C38C7527BB839A9@SA1PR08MB7133.namprd08.prod.outlook.com/ in addition to my specific customer request, Thanks, Claudio
On 8/2/22 19:20, Claudio Fontana wrote: > On 8/2/22 03:40, Stephen Hemminger wrote: >> On Tue, 2 Aug 2022 02:49:36 +0200 >> Claudio Fontana <cfontana@suse.de> wrote: >> >>> This is an alternative, more general fix compared with PATCH v1, >>> and fixes style issues in v2. >>> >>> The series fixes a segmentation fault in the OVS PMD thread when >>> resynchronizing with QEMU after the guest application has been killed >>> with SIGKILL (patch 1/2), >>> >>> The segmentation fault can be caused by the guest DPDK application, >>> which is able this way to crash the OVS process on the host, >>> see the backtrace in patch 1/2. >>> >>> Patch 2/2 is an additional improvement in the current error handling. >> >> Checking for NULL and 0 is good on host side. >> But guest should probably not be sending such a useless request? > > > Right, I focused on hardening the host side, as that is what the customer required. > > This happens specifically when the guest application goes away abruptly and has no chance to signal anything (SIGKILL), > and at restart issues a virtio reset on the device, which in qemu causes also a (actually two) virtio_net set_status, which attempt to stop the queues (twice). > > DPDK seems to think at that point that it needs to drain the queue, and tries to process MAX_PKT_BURST buffers > ("about to dequeue 32 buffers"), > > then calls fill_vec_buf_split and gets absolutely nothing. > > I think this should also address the reports in this thread: > > https://inbox.dpdk.org/dev/SA1PR08MB713373B0D19329C38C7527BB839A9@SA1PR08MB7133.namprd08.prod.outlook.com/ > > in addition to my specific customer request, > > Thanks, > > Claudio anything more required from my side? Do you need a respin without the "Tested-by" tag? Thanks, Claudio
A weekly ping on this one, any chance to get this fix for a guest-triggered host crash included? Thanks, Claudio On 8/2/22 02:49, Claudio Fontana wrote: > This is an alternative, more general fix compared with PATCH v1, > and fixes style issues in v2. > > The series fixes a segmentation fault in the OVS PMD thread when > resynchronizing with QEMU after the guest application has been killed > with SIGKILL (patch 1/2), > > The segmentation fault can be caused by the guest DPDK application, > which is able this way to crash the OVS process on the host, > see the backtrace in patch 1/2. > > Patch 2/2 is an additional improvement in the current error handling. > > --- > Changes from v2: fix warnings from checkpatch. > --- > > Changes from v1: > > * patch 1/2: instead of only fixing virtio_dev_tx_split, put the check > for nr_vec == 0 inside desc_to_mbuf and mbuf_to_desc, so that in no > case they attempt to read and dereference addresses from the buf_vec[] > array when it does not contain any valid elements. > > --- > > For your review and comments, > > Claudio > > Claudio Fontana (2): > vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc > vhost: improve error handling in desc_to_mbuf > > lib/vhost/virtio_net.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) >