[2/2] net/bonding: avoid the next active slave going out of bound

Message ID 20190110102235.1238-3-hyonkim@cisco.com
State Superseded
Delegated to: Ferruh Yigit
Headers show
Series
  • net/bonding: a couple minor fixes
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Hyong Youb Kim Jan. 10, 2019, 10:22 a.m.
For bonding modes like broadcast that use bond_ethdev_rx_burst(), it
is fairly easy to produce a crash simply by bringing a slave port's
link down. When slave links go down, the driver on one thread reduces
active_slave_count via the LSC callback and deactivate_slave(). At the
same time, bond_ethdev_rx_burst() running on a forwarding thread may
increment active_slave (next active slave) beyond
active_slave_count. Here is a typical sequence of events.

At time 0:
active_slave_count = 3
active_slave = 2

At time 1:
A slave link goes down.
Thread 0 (main) reduces active_slave_count to 2.

At time 2:
Thread 1 (forwarding) executes bond_ethdev_rx_burst().
- Reads active_slave_count = 2.
- Increments active_slave at the end to 3.

From this point on, everytime bond_ethdev_rx_burst() runs,
active_slave increments by one, eventually going well out of bound of
the active_slaves array and causing a crash.

Make the rx burst function to first check that active_slave is within
bound. If not, reset it to 0 to avoid out-of-range array access.

Fixes: e1110e977648 ("net/bonding: fix Rx slave fairness")
Cc: stable@dpdk.org

Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Chas Williams Feb. 9, 2019, 1:17 p.m. | #1
On 1/10/19 5:22 AM, Hyong Youb Kim wrote:
> For bonding modes like broadcast that use bond_ethdev_rx_burst(), it
> is fairly easy to produce a crash simply by bringing a slave port's
> link down. When slave links go down, the driver on one thread reduces
> active_slave_count via the LSC callback and deactivate_slave(). At the
> same time, bond_ethdev_rx_burst() running on a forwarding thread may
> increment active_slave (next active slave) beyond
> active_slave_count. Here is a typical sequence of events.
> 
> At time 0:
> active_slave_count = 3
> active_slave = 2
> 
> At time 1:
> A slave link goes down.
> Thread 0 (main) reduces active_slave_count to 2.
> 
> At time 2:
> Thread 1 (forwarding) executes bond_ethdev_rx_burst().
> - Reads active_slave_count = 2.
> - Increments active_slave at the end to 3.
> 
>  From this point on, everytime bond_ethdev_rx_burst() runs,
> active_slave increments by one, eventually going well out of bound of
> the active_slaves array and causing a crash.
> 
> Make the rx burst function to first check that active_slave is within
> bound. If not, reset it to 0 to avoid out-of-range array access.
> 
> Fixes: e1110e977648 ("net/bonding: fix Rx slave fairness")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>

Acked-by: Chas Williams <chas3@att.com>

> ---
>   drivers/net/bonding/rte_eth_bond_pmd.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index daf2440cd..bc2405e54 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -68,6 +68,15 @@ bond_ethdev_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   	internals = bd_rx_q->dev_private;
>   	slave_count = internals->active_slave_count;
>   	active_slave = internals->active_slave;
> +	/*
> +	 * Reset the active slave index, in case active_slave goes out
> +	 * of bound.  It can hapen when slave links go down, and
> +	 * another thread (LSC callback) shrinks the slave count.
> +	 */
> +	if (active_slave >= slave_count) {
> +		internals->active_slave = 0;
> +		active_slave = 0;
> +	}
>   
>   	for (i = 0; i < slave_count && nb_pkts; i++) {
>   		uint16_t num_rx_slave;
> @@ -273,6 +282,11 @@ bond_ethdev_rx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs,
>   	active_slave = internals->active_slave;
>   	memcpy(slaves, internals->active_slaves,
>   			sizeof(internals->active_slaves[0]) * slave_count);
> +	/* active_slave may go out of bound. See bond_ethdev_rx_burst() */
> +	if (active_slave >= slave_count) {
> +		internals->active_slave = 0;
> +		active_slave = 0;
> +	}
>   
>   	for (i = 0; i < slave_count && nb_pkts; i++) {
>   		uint16_t num_rx_slave;
>
Ferruh Yigit Feb. 18, 2019, 3:25 p.m. | #2
On 2/11/2019 10:25 AM, Parthasarathy, JananeeX M wrote:
> Hi
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chas Williams
>> Sent: Saturday, February 09, 2019 6:47 PM
>> To: Hyong Youb Kim <hyonkim@cisco.com>; Yigit, Ferruh
>> <ferruh.yigit@intel.com>; Doherty, Declan <declan.doherty@intel.com>; Chas
>> Williams <chas3@att.com>
>> Cc: dev@dpdk.org; stable@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 2/2] net/bonding: avoid the next active slave
>> going out of bound
>>
>>
>>
>> On 1/10/19 5:22 AM, Hyong Youb Kim wrote:
>>> For bonding modes like broadcast that use bond_ethdev_rx_burst(), it
>>> is fairly easy to produce a crash simply by bringing a slave port's
>>> link down. When slave links go down, the driver on one thread reduces
>>> active_slave_count via the LSC callback and deactivate_slave(). At the
>>> same time, bond_ethdev_rx_burst() running on a forwarding thread may
>>> increment active_slave (next active slave) beyond active_slave_count.
>>> Here is a typical sequence of events.
>>>
>>> At time 0:
>>> active_slave_count = 3
>>> active_slave = 2
>>>
>>> At time 1:
>>> A slave link goes down.
>>> Thread 0 (main) reduces active_slave_count to 2.
>>>
>>> At time 2:
>>> Thread 1 (forwarding) executes bond_ethdev_rx_burst().
>>> - Reads active_slave_count = 2.
>>> - Increments active_slave at the end to 3.
>>>
>>>  From this point on, everytime bond_ethdev_rx_burst() runs,
>>> active_slave increments by one, eventually going well out of bound of
>>> the active_slaves array and causing a crash.
>>>
>>> Make the rx burst function to first check that active_slave is within
>>> bound. If not, reset it to 0 to avoid out-of-range array access.
>>>
>>> Fixes: e1110e977648 ("net/bonding: fix Rx slave fairness")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
>>
>> Acked-by: Chas Williams <chas3@att.com>
>>
>>> ---
>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 14 ++++++++++++++
>>>   1 file changed, 14 insertions(+)
>>>
>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> index daf2440cd..bc2405e54 100644
>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> @@ -68,6 +68,15 @@ bond_ethdev_rx_burst(void *queue, struct rte_mbuf
>> **bufs, uint16_t nb_pkts)
>>>   	internals = bd_rx_q->dev_private;
>>>   	slave_count = internals->active_slave_count;
>>>   	active_slave = internals->active_slave;
>>> +	/*
>>> +	 * Reset the active slave index, in case active_slave goes out
>>> +	 * of bound.  It can hapen when slave links go down, and
>>> +	 * another thread (LSC callback) shrinks the slave count.
>>> +	 */
>>> +	if (active_slave >= slave_count) {
>>> +		internals->active_slave = 0;
>>> +		active_slave = 0;
>>> +	}
> 
> Instead of introducing new conditions again at the top of functions, it would be better to check greater than, equal to >= instead of the equal to  in below condition.
> if (++internals->active_slave == slave_count)
>                 internals->active_slave = 0;
> 
> Thereby we can reduce the multiple if conditions and still ensure internals->active_slave points to correct index always.
> 
>>>
>>>   	for (i = 0; i < slave_count && nb_pkts; i++) {
>>>   		uint16_t num_rx_slave;
>>> @@ -273,6 +282,11 @@ bond_ethdev_rx_burst_8023ad_fast_queue(void
>> *queue, struct rte_mbuf **bufs,
>>>   	active_slave = internals->active_slave;
>>>   	memcpy(slaves, internals->active_slaves,
>>>   			sizeof(internals->active_slaves[0]) * slave_count);
>>> +	/* active_slave may go out of bound. See bond_ethdev_rx_burst() */
>>> +	if (active_slave >= slave_count) {
>>> +		internals->active_slave = 0;
>>> +		active_slave = 0;
>>> +	}
> 
> Same as above comment would be better.
>>>
>>>   	for (i = 0; i < slave_count && nb_pkts; i++) {
>>>   		uint16_t num_rx_slave;
>>>
> 
> It would be better to check the internals->active_slave during deactivate_slave() as well in rte_eth_bond_api.c.
> Since slave counts would be decremented during de-activation and resetting here appropriately would be better.
> 
> Regards
> M.P.Jananee


I don't see this comment on the patchwork, can you double check if your comment
hit the mailing list?
Ferruh Yigit Feb. 20, 2019, 4:28 p.m. | #3
On 2/18/2019 3:25 PM, Ferruh Yigit wrote:
> On 2/11/2019 10:25 AM, Parthasarathy, JananeeX M wrote:
>> Hi
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chas Williams
>>> Sent: Saturday, February 09, 2019 6:47 PM
>>> To: Hyong Youb Kim <hyonkim@cisco.com>; Yigit, Ferruh
>>> <ferruh.yigit@intel.com>; Doherty, Declan <declan.doherty@intel.com>; Chas
>>> Williams <chas3@att.com>
>>> Cc: dev@dpdk.org; stable@dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH 2/2] net/bonding: avoid the next active slave
>>> going out of bound
>>>
>>>
>>>
>>> On 1/10/19 5:22 AM, Hyong Youb Kim wrote:
>>>> For bonding modes like broadcast that use bond_ethdev_rx_burst(), it
>>>> is fairly easy to produce a crash simply by bringing a slave port's
>>>> link down. When slave links go down, the driver on one thread reduces
>>>> active_slave_count via the LSC callback and deactivate_slave(). At the
>>>> same time, bond_ethdev_rx_burst() running on a forwarding thread may
>>>> increment active_slave (next active slave) beyond active_slave_count.
>>>> Here is a typical sequence of events.
>>>>
>>>> At time 0:
>>>> active_slave_count = 3
>>>> active_slave = 2
>>>>
>>>> At time 1:
>>>> A slave link goes down.
>>>> Thread 0 (main) reduces active_slave_count to 2.
>>>>
>>>> At time 2:
>>>> Thread 1 (forwarding) executes bond_ethdev_rx_burst().
>>>> - Reads active_slave_count = 2.
>>>> - Increments active_slave at the end to 3.
>>>>
>>>>  From this point on, everytime bond_ethdev_rx_burst() runs,
>>>> active_slave increments by one, eventually going well out of bound of
>>>> the active_slaves array and causing a crash.
>>>>
>>>> Make the rx burst function to first check that active_slave is within
>>>> bound. If not, reset it to 0 to avoid out-of-range array access.
>>>>
>>>> Fixes: e1110e977648 ("net/bonding: fix Rx slave fairness")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
>>>
>>> Acked-by: Chas Williams <chas3@att.com>
>>>
>>>> ---
>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 14 ++++++++++++++
>>>>   1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> index daf2440cd..bc2405e54 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> @@ -68,6 +68,15 @@ bond_ethdev_rx_burst(void *queue, struct rte_mbuf
>>> **bufs, uint16_t nb_pkts)
>>>>   	internals = bd_rx_q->dev_private;
>>>>   	slave_count = internals->active_slave_count;
>>>>   	active_slave = internals->active_slave;
>>>> +	/*
>>>> +	 * Reset the active slave index, in case active_slave goes out
>>>> +	 * of bound.  It can hapen when slave links go down, and
>>>> +	 * another thread (LSC callback) shrinks the slave count.
>>>> +	 */
>>>> +	if (active_slave >= slave_count) {
>>>> +		internals->active_slave = 0;
>>>> +		active_slave = 0;
>>>> +	}
>>
>> Instead of introducing new conditions again at the top of functions, it would be better to check greater than, equal to >= instead of the equal to  in below condition.
>> if (++internals->active_slave == slave_count)
>>                 internals->active_slave = 0;
>>
>> Thereby we can reduce the multiple if conditions and still ensure internals->active_slave points to correct index always.
>>
>>>>
>>>>   	for (i = 0; i < slave_count && nb_pkts; i++) {
>>>>   		uint16_t num_rx_slave;
>>>> @@ -273,6 +282,11 @@ bond_ethdev_rx_burst_8023ad_fast_queue(void
>>> *queue, struct rte_mbuf **bufs,
>>>>   	active_slave = internals->active_slave;
>>>>   	memcpy(slaves, internals->active_slaves,
>>>>   			sizeof(internals->active_slaves[0]) * slave_count);
>>>> +	/* active_slave may go out of bound. See bond_ethdev_rx_burst() */
>>>> +	if (active_slave >= slave_count) {
>>>> +		internals->active_slave = 0;
>>>> +		active_slave = 0;
>>>> +	}
>>
>> Same as above comment would be better.
>>>>
>>>>   	for (i = 0; i < slave_count && nb_pkts; i++) {
>>>>   		uint16_t num_rx_slave;
>>>>
>>
>> It would be better to check the internals->active_slave during deactivate_slave() as well in rte_eth_bond_api.c.
>> Since slave counts would be decremented during de-activation and resetting here appropriately would be better.
>>
>> Regards
>> M.P.Jananee
> 
> 
> I don't see this comment on the patchwork, can you double check if your comment
> hit the mailing list?

For record, this patch superseded by:
https://patches.dpdk.org/patch/50346/

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index daf2440cd..bc2405e54 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -68,6 +68,15 @@  bond_ethdev_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	internals = bd_rx_q->dev_private;
 	slave_count = internals->active_slave_count;
 	active_slave = internals->active_slave;
+	/*
+	 * Reset the active slave index, in case active_slave goes out
+	 * of bound.  It can hapen when slave links go down, and
+	 * another thread (LSC callback) shrinks the slave count.
+	 */
+	if (active_slave >= slave_count) {
+		internals->active_slave = 0;
+		active_slave = 0;
+	}
 
 	for (i = 0; i < slave_count && nb_pkts; i++) {
 		uint16_t num_rx_slave;
@@ -273,6 +282,11 @@  bond_ethdev_rx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs,
 	active_slave = internals->active_slave;
 	memcpy(slaves, internals->active_slaves,
 			sizeof(internals->active_slaves[0]) * slave_count);
+	/* active_slave may go out of bound. See bond_ethdev_rx_burst() */
+	if (active_slave >= slave_count) {
+		internals->active_slave = 0;
+		active_slave = 0;
+	}
 
 	for (i = 0; i < slave_count && nb_pkts; i++) {
 		uint16_t num_rx_slave;