[v3] net/netvsc: fix number Tx queues > Rx queues

Message ID PA4PR83MB0526053A870E8358B7CB3643972C2@PA4PR83MB0526.EURPRD83.prod.outlook.com (mailing list archive)
State Superseded
Delegated to: Ferruh Yigit
Headers
Series [v3] net/netvsc: fix number Tx queues > Rx queues |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

Alan Elder March 19, 2024, 2:16 p.m. UTC
  The previous code allowed the number of Tx queues to be set higher than
the number of Rx queues.  If a packet was sent on a Tx queue with index
>= number Rx queues there was a segfault.

This commit fixes the issue by creating an Rx queue for every Tx queue
meaning that an event buffer is allocated to handle receiving Tx
completion messages.

mbuf pool and Rx ring are not allocated for these additional Rx queues
and RSS configuration ensures that no packets are received on them.

Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Cc: sthemmin@microsoft.com
Cc: stable@dpdk.org

Signed-off-by: Alan Elder <alan.elder@microsoft.com>
---
v3:
* Handle case of Rx queue creation failure in hn_dev_tx_queue_setup.
* Re-use rx queue if it has already been allocated.
* Don't allocate an mbuf if pool is NULL.  This avoids segfault if RSS
  configuration is incorrect.

v2:
* Remove function declaration for static non-member function

---
 drivers/net/netvsc/hn_ethdev.c |  9 +++++
 drivers/net/netvsc/hn_rxtx.c   | 70 +++++++++++++++++++++++++++++-----
 2 files changed, 70 insertions(+), 9 deletions(-)
  

Comments

Long Li March 19, 2024, 6:40 p.m. UTC | #1
> Subject: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
> 
> The previous code allowed the number of Tx queues to be set higher than the
> number of Rx queues.  If a packet was sent on a Tx queue with index
> >= number Rx queues there was a segfault.
> 
> This commit fixes the issue by creating an Rx queue for every Tx queue meaning
> that an event buffer is allocated to handle receiving Tx completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues and RSS
> configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>

Reviewed-by: Long Li <longli@microsoft.com>
  
Ferruh Yigit April 11, 2024, 11:38 a.m. UTC | #2
On 3/19/2024 2:16 PM, Alan Elder wrote:
> The previous code allowed the number of Tx queues to be set higher than
> the number of Rx queues.  If a packet was sent on a Tx queue with index
>> = number Rx queues there was a segfault.
> This commit fixes the issue by creating an Rx queue for every Tx queue
> meaning that an event buffer is allocated to handle receiving Tx
> completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues
> and RSS configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
>

Hi Alan,

What is the root cause of the crash, is it in driver scope or application?
  
Alan Elder April 11, 2024, 8:45 p.m. UTC | #3
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, April 11, 2024 7:38 AM
> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
> queues
> 
> On 3/19/2024 2:16 PM, Alan Elder wrote:
> > The previous code allowed the number of Tx queues to be set higher
> > than the number of Rx queues.  If a packet was sent on a Tx queue with
> > index
> >> = number Rx queues there was a segfault.
> > This commit fixes the issue by creating an Rx queue for every Tx queue
> > meaning that an event buffer is allocated to handle receiving Tx
> > completion messages.
> >
> > mbuf pool and Rx ring are not allocated for these additional Rx queues
> > and RSS configuration ensures that no packets are received on them.
> >
> > Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> > Cc: sthemmin@microsoft.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> >
> 
> Hi Alan,
> 
> What is the root cause of the crash, is it in driver scope or application?

Hi Ferruh,

The root cause of the crash was in the driver - a packet received on a Tx queue that had no corresponding Rx queue would cause the dev->data->rx_queues[] array to be accessed past the length of the array.

https://github.com/DPDK/dpdk/blob/main/drivers/net/netvsc/hn_rxtx.c#L1071

Thanks,
Alan
  
Ferruh Yigit April 12, 2024, 10:23 a.m. UTC | #4
On 4/11/2024 9:45 PM, Alan Elder wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Thursday, April 11, 2024 7:38 AM
>> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
>> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
>> queues
>>
>> On 3/19/2024 2:16 PM, Alan Elder wrote:
>>> The previous code allowed the number of Tx queues to be set higher
>>> than the number of Rx queues.  If a packet was sent on a Tx queue with
>>> index
>>>> = number Rx queues there was a segfault.
>>> This commit fixes the issue by creating an Rx queue for every Tx queue
>>> meaning that an event buffer is allocated to handle receiving Tx
>>> completion messages.
>>>
>>> mbuf pool and Rx ring are not allocated for these additional Rx queues
>>> and RSS configuration ensures that no packets are received on them.
>>>
>>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
>>> Cc: sthemmin@microsoft.com
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
>>>
>>
>> Hi Alan,
>>
>> What is the root cause of the crash, is it in driver scope or application?
> 
> Hi Ferruh,
> 
> The root cause of the crash was in the driver - a packet received on a Tx queue that had no corresponding Rx queue would cause the dev->data->rx_queues[] array to be accessed past the length of the array.
> 
> https://github.com/DPDK/dpdk/blob/main/drivers/net/netvsc/hn_rxtx.c#L1071
> 
> 

Why there is an access to Rx queue when processing Tx queue?

A backtrace of the crash can help to understand the issue, can you
please include this in commit log, plus some explanation why crash happens?

Thanks,
ferruh
  
Alan Elder April 12, 2024, 4:50 p.m. UTC | #5
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, April 12, 2024 6:23 AM
> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
> queues
> 
> On 4/11/2024 9:45 PM, Alan Elder wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Thursday, April 11, 2024 7:38 AM
> >> To: Alan Elder <alan.elder@microsoft.com>; Long Li
> >> <longli@microsoft.com>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
> >> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues >
> >> Rx queues
> >>
> >> On 3/19/2024 2:16 PM, Alan Elder wrote:
> >>> The previous code allowed the number of Tx queues to be set higher
> >>> than the number of Rx queues.  If a packet was sent on a Tx queue
> >>> with index
> >>>> = number Rx queues there was a segfault.
> >>> This commit fixes the issue by creating an Rx queue for every Tx
> >>> queue meaning that an event buffer is allocated to handle receiving
> >>> Tx completion messages.
> >>>
> >>> mbuf pool and Rx ring are not allocated for these additional Rx
> >>> queues and RSS configuration ensures that no packets are received on
> them.
> >>>
> >>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> >>> Cc: sthemmin@microsoft.com
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> >>>
> >>
> >> Hi Alan,
> >>
> >> What is the root cause of the crash, is it in driver scope or application?
> >
> > Hi Ferruh,
> >
> > The root cause of the crash was in the driver - a packet received on a Tx
> queue that had no corresponding Rx queue would cause the dev->data-
> >rx_queues[] array to be accessed past the length of the array.
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn
> _rxtx.
> >
> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1
> 4a64
> >
> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
> 3848514
> >
> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> iV2luMzI
> >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q
> cSm6g0
> > dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0
> >
> >
> 
> Why there is an access to Rx queue when processing Tx queue?
> 
> A backtrace of the crash can help to understand the issue, can you please
> include this in commit log, plus some explanation why crash happens?
> 
> Thanks,
> Ferruh

Hi Ferruh,

Netvsc slow path needs to handle Tx completion messages (to know when it can reclaim Tx buffers).  Tx completion messages are received on Rx queue, which is why the Rx queue is accessed as part of transmit processing.

An example call stack is:

#6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h
#7  hn_process_events (hv=, queue_id=2, tx_limit=) at /drivers/net/netvsc/hn_rxtx.c
#8  hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c

Which leads to the SEGV as 0x20 is not a valid address.

I'll update the commit messages and resubmit the patch.

Thanks,
Alan
  
Ferruh Yigit April 15, 2024, 5:54 p.m. UTC | #6
On 4/12/2024 5:50 PM, Alan Elder wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, April 12, 2024 6:23 AM
>> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
>> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
>> queues
>>
>> On 4/11/2024 9:45 PM, Alan Elder wrote:
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Thursday, April 11, 2024 7:38 AM
>>>> To: Alan Elder <alan.elder@microsoft.com>; Long Li
>>>> <longli@microsoft.com>; Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>
>>>> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
>>>> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues >
>>>> Rx queues
>>>>
>>>> On 3/19/2024 2:16 PM, Alan Elder wrote:
>>>>> The previous code allowed the number of Tx queues to be set higher
>>>>> than the number of Rx queues.  If a packet was sent on a Tx queue
>>>>> with index
>>>>>> = number Rx queues there was a segfault.
>>>>> This commit fixes the issue by creating an Rx queue for every Tx
>>>>> queue meaning that an event buffer is allocated to handle receiving
>>>>> Tx completion messages.
>>>>>
>>>>> mbuf pool and Rx ring are not allocated for these additional Rx
>>>>> queues and RSS configuration ensures that no packets are received on
>> them.
>>>>>
>>>>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
>>>>> Cc: sthemmin@microsoft.com
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
>>>>>
>>>>
>>>> Hi Alan,
>>>>
>>>> What is the root cause of the crash, is it in driver scope or application?
>>>
>>> Hi Ferruh,
>>>
>>> The root cause of the crash was in the driver - a packet received on a Tx
>> queue that had no corresponding Rx queue would cause the dev->data-
>>> rx_queues[] array to be accessed past the length of the array.
>>>
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>>>
>> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn
>> _rxtx.
>>>
>> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1
>> 4a64
>>>
>> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
>> 3848514
>>>
>> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
>> iV2luMzI
>>>
>> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q
>> cSm6g0
>>> dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0
>>>
>>>
>>
>> Why there is an access to Rx queue when processing Tx queue?
>>
>> A backtrace of the crash can help to understand the issue, can you please
>> include this in commit log, plus some explanation why crash happens?
>>
>> Thanks,
>> Ferruh
> 
> Hi Ferruh,
> 
> Netvsc slow path needs to handle Tx completion messages (to know when it can reclaim Tx buffers).  Tx completion messages are received on Rx queue, which is why the Rx queue is accessed as part of transmit processing.
> 
> An example call stack is:
> 
> #6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h
> #7  hn_process_events (hv=, queue_id=2, tx_limit=) at /drivers/net/netvsc/hn_rxtx.c
> #8  hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c
> 
> Which leads to the SEGV as 0x20 is not a valid address.
> 
> I'll update the commit messages and resubmit the patch.
> 
> 

Hi Alan,

Thanks for the detail.

'hn_xmit_pkts()' calls 'hn_process_events()' with the Tx queue_id, but
'hn_process_events()' seems designed for Rx event processing since it
uses 'queue_id' to get rxq.
Does it help to pass queue type to 'hn_process_events()'?


And the patch creates Rx queues for access Tx queues. Are the Tx
completion packets needs to be delivered to Rx queue with exact same Tx
queue_id by design?
Or the new Rx queues created just to prevent the crash, by providing
'rxq->ring_lock' etc?

Please also check comments on v4, thanks.
  

Patch

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index b8a32832d7..d7e3f12346 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -313,6 +313,15 @@  static int hn_rss_reta_update(struct rte_eth_dev *dev,
 
 		if (reta_conf[idx].mask & mask)
 			hv->rss_ind[i] = reta_conf[idx].reta[shift];
+
+		/*
+		 * Ensure we don't allow config that directs traffic to an Rx
+		 * queue that we aren't going to poll
+		 */
+		if (hv->rss_ind[i] >=  dev->data->nb_rx_queues) {
+			PMD_DRV_LOG(ERR, "RSS distributing traffic to invalid Rx queue");
+			return -EINVAL;
+		}
 	}
 
 	err = hn_rndis_conf_rss(hv, NDIS_RSS_FLAG_DISABLE);
diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 9bf1ec5509..e23880c176 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -234,6 +234,17 @@  static void hn_reset_txagg(struct hn_tx_queue *txq)
 	txq->agg_prevpkt = NULL;
 }
 
+static void
+hn_rx_queue_free_common(struct hn_rx_queue *rxq)
+{
+	if (!rxq)
+		return;
+
+	rte_free(rxq->rxbuf_info);
+	rte_free(rxq->event_buf);
+	rte_free(rxq);
+}
+
 int
 hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		      uint16_t queue_idx, uint16_t nb_desc,
@@ -243,6 +254,7 @@  hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
 	struct hn_data *hv = dev->data->dev_private;
 	struct hn_tx_queue *txq;
+	struct hn_rx_queue *rxq = NULL;
 	char name[RTE_MEMPOOL_NAMESIZE];
 	uint32_t tx_free_thresh;
 	int err = -ENOMEM;
@@ -301,6 +313,27 @@  hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		goto error;
 	}
 
+	/*
+	 * If there are more Tx queues than Rx queues, allocate rx_queues
+	 * with event buffer so that Tx completion messages can still be
+	 * received
+	 */
+	if (queue_idx >= dev->data->nb_rx_queues) {
+		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+
+		if (!rxq) {
+			err = -ENOMEM;
+			goto error;
+		}
+
+		/*
+		 * Don't allocate mbuf pool or rx ring.  RSS is always configured
+		 * to ensure packets aren't received by this Rx queue.
+		 */
+		rxq->mb_pool = NULL;
+		rxq->rx_ring = NULL;
+	}
+
 	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
 	txq->agg_pktmax = hv->rndis_agg_pkts;
 	txq->agg_align  = hv->rndis_agg_align;
@@ -311,12 +344,15 @@  hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 				     socket_id, tx_conf);
 	if (err == 0) {
 		dev->data->tx_queues[queue_idx] = txq;
+		if (rxq != NULL)
+			dev->data->rx_queues[queue_idx] = rxq;
 		return 0;
 	}
 
 error:
 	rte_mempool_free(txq->txdesc_pool);
 	rte_memzone_free(txq->tx_rndis_mz);
+	hn_rx_queue_free_common(rxq);
 	rte_free(txq);
 	return err;
 }
@@ -364,6 +400,13 @@  hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 	if (!txq)
 		return;
 
+	/*
+	 * Free any Rx queues allocated for a Tx queue without a corresponding
+	 * Rx queue
+	 */
+	if (qid >= dev->data->nb_rx_queues)
+		hn_rx_queue_free_common(dev->data->rx_queues[qid]);
+
 	rte_mempool_free(txq->txdesc_pool);
 
 	rte_memzone_free(txq->tx_rndis_mz);
@@ -552,10 +595,12 @@  static void hn_rxpkt(struct hn_rx_queue *rxq, struct hn_rx_bufinfo *rxb,
 		     const struct hn_rxinfo *info)
 {
 	struct hn_data *hv = rxq->hv;
-	struct rte_mbuf *m;
+	struct rte_mbuf *m = NULL;
 	bool use_extbuf = false;
 
-	m = rte_pktmbuf_alloc(rxq->mb_pool);
+	if (likely(rxq->mb_pool != NULL))
+		m = rte_pktmbuf_alloc(rxq->mb_pool);
+
 	if (unlikely(!m)) {
 		struct rte_eth_dev *dev =
 			&rte_eth_devices[rxq->port_id];
@@ -942,7 +987,15 @@  hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	if (queue_idx == 0) {
 		rxq = hv->primary;
 	} else {
-		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		/*
+		 * If the number of Tx queues was previously greater than the
+		 * number of Rx queues, we may already have allocated an rxq.
+		 */
+		if (!dev->data->rx_queues[queue_idx])
+			rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		else
+			rxq = dev->data->rx_queues[queue_idx];
+
 		if (!rxq)
 			return -ENOMEM;
 	}
@@ -975,9 +1028,10 @@  hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 
 fail:
 	rte_ring_free(rxq->rx_ring);
-	rte_free(rxq->rxbuf_info);
-	rte_free(rxq->event_buf);
-	rte_free(rxq);
+	/* Only free rxq if it was created in this function. */
+	if (!dev->data->rx_queues[queue_idx])
+		hn_rx_queue_free_common(rxq);
+
 	return error;
 }
 
@@ -998,9 +1052,7 @@  hn_rx_queue_free(struct hn_rx_queue *rxq, bool keep_primary)
 	if (keep_primary && rxq == rxq->hv->primary)
 		return;
 
-	rte_free(rxq->rxbuf_info);
-	rte_free(rxq->event_buf);
-	rte_free(rxq);
+	hn_rx_queue_free_common(rxq);
 }
 
 void