Message ID | 1435938006-22254-3-git-send-email-bruce.richardson@intel.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id A6F865A95; Fri, 3 Jul 2015 17:40:10 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id E1C5D5A43 for <dev@dpdk.org>; Fri, 3 Jul 2015 17:40:08 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 03 Jul 2015 08:40:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,400,1432623600"; d="scan'208";a="755548404" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 03 Jul 2015 08:40:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id t63Fe6ld013856; Fri, 3 Jul 2015 16:40:06 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id t63Fe6cb022307; Fri, 3 Jul 2015 16:40:06 +0100 Received: (from bricha3@localhost) by sivswdev01.ir.intel.com with id t63Fe6ug022303; Fri, 3 Jul 2015 16:40:06 +0100 From: Bruce Richardson <bruce.richardson@intel.com> To: dev@dpdk.org Date: Fri, 3 Jul 2015 16:40:06 +0100 Message-Id: <1435938006-22254-3-git-send-email-bruce.richardson@intel.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1435938006-22254-1-git-send-email-bruce.richardson@intel.com> References: <1435938006-22254-1-git-send-email-bruce.richardson@intel.com> Subject: [dpdk-dev] [PATCH 2/2] ixgbe: check mbuf refcnt when clearing RX/TX ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Bruce Richardson
July 3, 2015, 3:40 p.m. UTC
The function to clear the TX ring when a port was being closed, e.g. on
exit in testpmd, was not checking the mbuf refcnt before freeing it.
Since the function in the vector driver to clear the ring after TX does
not set the pointer to NULL post-free, this caused crashes if mbuf
debugging was turned on.
To reproduce the issue, ensure the follow config variables are set:
RTE_IXGBE_INC_VECTOR
RTE_LIBRTE_MBUF_DEBUG
Then compile up and run testpmd using 10G ports with the vector driver.
Start traffic and let some flow through, then type "stop" and "quit" at
the testpmd prompt, and crash will occur. Output below:
testpmd> quit
Stopping port 0...done
Stopping port 1...PANIC in rte_mbuf_sanity_check():
bad ref cnt
[New Thread 0x7fffabfff700 (LWP 145312)]
[New Thread 0x7fffb47fe700 (LWP 145311)]
[New Thread 0x7fffb4fff700 (LWP 145310)]
[New Thread 0x7ffff6cd5700 (LWP 145309)]
18: [/home/bruce/dpdk.org/x86_64-native-linuxapp-gcc/app/testpmd(_start+0x29)
<....snip for brevity...>
Program received signal SIGABRT, Aborted.
0x00007ffff7120a98 in raise () from /lib64/libc.so.6
A similar error occurs when clearing the RX ring, which is also fixed by
this patch.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
drivers/net/ixgbe/ixgbe_rxtx.c | 3 ++-
drivers/net/ixgbe/ixgbe_rxtx_vec.c | 8 +++++++-
2 files changed, 9 insertions(+), 2 deletions(-)
Comments
2015-07-03 16:40, Bruce Richardson: > The function to clear the TX ring when a port was being closed, e.g. on > exit in testpmd, was not checking the mbuf refcnt before freeing it. > Since the function in the vector driver to clear the ring after TX does > not set the pointer to NULL post-free, this caused crashes if mbuf > debugging was turned on. Please, could you add a Fixes: line to reference the origin of the bug? Thanks
On Fri, Jul 03, 2015 at 05:46:55PM +0200, Thomas Monjalon wrote: > 2015-07-03 16:40, Bruce Richardson: > > The function to clear the TX ring when a port was being closed, e.g. on > > exit in testpmd, was not checking the mbuf refcnt before freeing it. > > Since the function in the vector driver to clear the ring after TX does > > not set the pointer to NULL post-free, this caused crashes if mbuf > > debugging was turned on. > > Please, could you add a Fixes: line to reference the origin of the bug? > Thanks What benefit does that information provide? Given that this bug occurs in two places in the code, and has been there for some time, I'm not sure that a straight lookup of the commit that last changed the line(s) would identify the true source of the bug. Because of that, I'd like to know the information is really going to be useful before making an effort to track it down. :-) /Bruce
2015-07-03 17:04, Bruce Richardson: > On Fri, Jul 03, 2015 at 05:46:55PM +0200, Thomas Monjalon wrote: > > 2015-07-03 16:40, Bruce Richardson: > > > The function to clear the TX ring when a port was being closed, e.g. on > > > exit in testpmd, was not checking the mbuf refcnt before freeing it. > > > Since the function in the vector driver to clear the ring after TX does > > > not set the pointer to NULL post-free, this caused crashes if mbuf > > > debugging was turned on. > > > > Please, could you add a Fixes: line to reference the origin of the bug? > > Thanks > > What benefit does that information provide? Given that this bug occurs in > two places in the code, and has been there for some time, I'm not sure that a > straight lookup of the commit that last changed the line(s) would identify the > true source of the bug. Because of that, I'd like to know the information is > really going to be useful before making an effort to track it down. :-) If it fixes a bug which was in previous release, it may deserve to be in the release notes. It also helps people wanting to fix the old version they are stuck with. Given it fixes only a crash in debug mode, it's not so important here.
Hi Bruce, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > Sent: Friday, July 03, 2015 4:40 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH 2/2] ixgbe: check mbuf refcnt when clearing RX/TX ring > > The function to clear the TX ring when a port was being closed, e.g. on > exit in testpmd, was not checking the mbuf refcnt before freeing it. > Since the function in the vector driver to clear the ring after TX does > not set the pointer to NULL post-free, this caused crashes if mbuf > debugging was turned on. > > To reproduce the issue, ensure the follow config variables are set: > RTE_IXGBE_INC_VECTOR > RTE_LIBRTE_MBUF_DEBUG > Then compile up and run testpmd using 10G ports with the vector driver. > Start traffic and let some flow through, then type "stop" and "quit" at > the testpmd prompt, and crash will occur. Output below: > > testpmd> quit > Stopping port 0...done > Stopping port 1...PANIC in rte_mbuf_sanity_check(): > bad ref cnt > [New Thread 0x7fffabfff700 (LWP 145312)] > [New Thread 0x7fffb47fe700 (LWP 145311)] > [New Thread 0x7fffb4fff700 (LWP 145310)] > [New Thread 0x7ffff6cd5700 (LWP 145309)] > 18: [/home/bruce/dpdk.org/x86_64-native-linuxapp-gcc/app/testpmd(_start+0x29) > <....snip for brevity...> > Program received signal SIGABRT, Aborted. > 0x00007ffff7120a98 in raise () from /lib64/libc.so.6 > > A similar error occurs when clearing the RX ring, which is also fixed by > this patch. > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> > --- > drivers/net/ixgbe/ixgbe_rxtx.c | 3 ++- > drivers/net/ixgbe/ixgbe_rxtx_vec.c | 8 +++++++- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > index 41a062e..12e25b7 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -2108,7 +2108,8 @@ ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq) > > if (rxq->sw_ring != NULL) { > for (i = 0; i < rxq->nb_rx_desc; i++) { > - if (rxq->sw_ring[i].mbuf != NULL) { > + if (rxq->sw_ring[i].mbuf != NULL && > + rte_mbuf_refcnt_read(rxq->sw_ring[i].mbuf)) { > rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf); > rxq->sw_ring[i].mbuf = NULL; > } Sorry for late review, but I am afraid your changes don't fix the problem. After sw_ring[].mbuf was freed by RX path, entity that manages that RX queue shouldn't touch that mbuf. (unless it was allocated by the same RX queue again). As same mbuf could be already allocated by something else. As an example by another RX/TX queue and is in active use. Same story for TX below. As I can see the proper fix could be one of 2: 1. Make RX/TX vector functions to reset sw_ring[].mbuf to 0. 2. At queue_release_mbufs(), don't go through all sw_ring[] entries, but only though ones which might contain valid mbufs. For RX: entries between rx_tail and rxrearm_start only (which implies a special queue_release_mbufs() for vector rx). For TX: from tx_next_dd - (tx_rs_thresh - 1) and no more then nb_tx_desc - nb_tx_free Konstantin > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > index 0edac82..7e633d3 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > @@ -665,7 +665,13 @@ ixgbe_tx_queue_release_mbufs(struct ixgbe_tx_queue *txq) > nb_free < max_desc && i != txq->tx_tail; > i = (i + 1) & max_desc) { > txe = (struct ixgbe_tx_entry_v *)&txq->sw_ring[i]; > - if (txe->mbuf != NULL) > + /* > + *check for already freed packets. > + * Note: ixgbe_tx_free_bufs does not NULL after free, > + * so we actually have to check the reference count. > + */ > + if (txe->mbuf != NULL && > + rte_mbuf_refcnt_read(txe->mbuf) != 0) > rte_pktmbuf_free_seg(txe->mbuf); > } > /* reset tx_entry */ > -- > 2.4.3
> -----Original Message----- > From: Ananyev, Konstantin > Sent: Monday, July 20, 2015 10:37 AM > To: Richardson, Bruce; dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH 2/2] ixgbe: check mbuf refcnt when clearing > RX/TX ring > > Hi Bruce, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > Sent: Friday, July 03, 2015 4:40 PM > > To: dev@dpdk.org > > Subject: [dpdk-dev] [PATCH 2/2] ixgbe: check mbuf refcnt when clearing > > RX/TX ring > > > > The function to clear the TX ring when a port was being closed, e.g. > > on exit in testpmd, was not checking the mbuf refcnt before freeing it. > > Since the function in the vector driver to clear the ring after TX > > does not set the pointer to NULL post-free, this caused crashes if > > mbuf debugging was turned on. > > > > To reproduce the issue, ensure the follow config variables are set: > > RTE_IXGBE_INC_VECTOR > > RTE_LIBRTE_MBUF_DEBUG > > Then compile up and run testpmd using 10G ports with the vector driver. > > Start traffic and let some flow through, then type "stop" and "quit" > > at the testpmd prompt, and crash will occur. Output below: > > > > testpmd> quit > > Stopping port 0...done > > Stopping port 1...PANIC in rte_mbuf_sanity_check(): > > bad ref cnt > > [New Thread 0x7fffabfff700 (LWP 145312)] > > [New Thread 0x7fffb47fe700 (LWP 145311)] > > [New Thread 0x7fffb4fff700 (LWP 145310)] > > [New Thread 0x7ffff6cd5700 (LWP 145309)] > > 18: [/home/bruce/dpdk.org/x86_64-native-linuxapp- > gcc/app/testpmd(_start+0x29) > > <....snip for brevity...> > > Program received signal SIGABRT, Aborted. > > 0x00007ffff7120a98 in raise () from /lib64/libc.so.6 > > > > A similar error occurs when clearing the RX ring, which is also fixed > > by this patch. > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> > > --- > > drivers/net/ixgbe/ixgbe_rxtx.c | 3 ++- > > drivers/net/ixgbe/ixgbe_rxtx_vec.c | 8 +++++++- > > 2 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > > b/drivers/net/ixgbe/ixgbe_rxtx.c index 41a062e..12e25b7 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > @@ -2108,7 +2108,8 @@ ixgbe_rx_queue_release_mbufs(struct > > ixgbe_rx_queue *rxq) > > > > if (rxq->sw_ring != NULL) { > > for (i = 0; i < rxq->nb_rx_desc; i++) { > > - if (rxq->sw_ring[i].mbuf != NULL) { > > + if (rxq->sw_ring[i].mbuf != NULL && > > + rte_mbuf_refcnt_read(rxq->sw_ring[i].mbuf)) > { > > rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf); > > rxq->sw_ring[i].mbuf = NULL; > > } > > > Sorry for late review, but I am afraid your changes don't fix the problem. > After sw_ring[].mbuf was freed by RX path, entity that manages that RX > queue shouldn't touch that mbuf. > (unless it was allocated by the same RX queue again). > As same mbuf could be already allocated by something else. > As an example by another RX/TX queue and is in active use. > Same story for TX below. Good point, I'd forgotten about that scenario. > > As I can see the proper fix could be one of 2: > 1. Make RX/TX vector functions to reset sw_ring[].mbuf to 0. > 2. At queue_release_mbufs(), don't go through all sw_ring[] entries, but > only though ones which might contain valid mbufs. > For RX: entries between rx_tail and rxrearm_start only (which implies a > special queue_release_mbufs() for vector rx). > For TX: from tx_next_dd - (tx_rs_thresh - 1) and no more then nb_tx_desc - > nb_tx_free Option 2 seems a better choice for a fix. I'll look at it when I get a chance. /Bruce > > Konstantin > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > > b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > > index 0edac82..7e633d3 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > > @@ -665,7 +665,13 @@ ixgbe_tx_queue_release_mbufs(struct ixgbe_tx_queue > *txq) > > nb_free < max_desc && i != txq->tx_tail; > > i = (i + 1) & max_desc) { > > txe = (struct ixgbe_tx_entry_v *)&txq->sw_ring[i]; > > - if (txe->mbuf != NULL) > > + /* > > + *check for already freed packets. > > + * Note: ixgbe_tx_free_bufs does not NULL after free, > > + * so we actually have to check the reference count. > > + */ > > + if (txe->mbuf != NULL && > > + rte_mbuf_refcnt_read(txe->mbuf) != 0) > > rte_pktmbuf_free_seg(txe->mbuf); > > } > > /* reset tx_entry */ > > -- > > 2.4.3
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 41a062e..12e25b7 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -2108,7 +2108,8 @@ ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq) if (rxq->sw_ring != NULL) { for (i = 0; i < rxq->nb_rx_desc; i++) { - if (rxq->sw_ring[i].mbuf != NULL) { + if (rxq->sw_ring[i].mbuf != NULL && + rte_mbuf_refcnt_read(rxq->sw_ring[i].mbuf)) { rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf); rxq->sw_ring[i].mbuf = NULL; } diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c index 0edac82..7e633d3 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c @@ -665,7 +665,13 @@ ixgbe_tx_queue_release_mbufs(struct ixgbe_tx_queue *txq) nb_free < max_desc && i != txq->tx_tail; i = (i + 1) & max_desc) { txe = (struct ixgbe_tx_entry_v *)&txq->sw_ring[i]; - if (txe->mbuf != NULL) + /* + *check for already freed packets. + * Note: ixgbe_tx_free_bufs does not NULL after free, + * so we actually have to check the reference count. + */ + if (txe->mbuf != NULL && + rte_mbuf_refcnt_read(txe->mbuf) != 0) rte_pktmbuf_free_seg(txe->mbuf); } /* reset tx_entry */