common/mlx5: fix bogus assert

Message ID 20200331060247.10954-1-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Delegated to: Raslan Darawsheh
Headers
Series common/mlx5: fix bogus assert |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-intel-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/travis-robot success Travis build: passed

Commit Message

Stephen Hemminger March 31, 2020, 6:02 a.m. UTC
  The MLX5 device supports up to 256 MAC addresses.
The code flushes all MAC devices.

If DPDK is compiled with MLX5_DEBUG this would an assert.
PANIC in mlx5_nl_mac_addr_flush():
line 775	assert "(size_t)(i) < sizeof(mac_own) * 8" failed

The root cause is that mac_own is a pointer and is being used as
a bitmap array. The sizeof(mac_own) would therfore be 64 but the
number of entries to be flushed would be 256.

There is a whole set of asserts in MLX5 netlink code with
the same bug; that should just be changed into proper error checks.

Fixes: 8e46d4e18f09 ("common/mlx5: improve assert control")
Cc: akozyrev@mellanox.com

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/common/mlx5/mlx5_nl.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)
  

Comments

Slava Ovsiienko March 31, 2020, 7:31 a.m. UTC | #1
Hi, Stephen

Thank you for the fix.

The exposed API to set MAC addresses:
- mlx5_mac_addr_set (invoked by rte_mac_addr_set ())
- mlx5_set_mc_addr_list (invoked by rte_eth_dev_set_mc_addr_list())

Both routines call mlx5_internal_mac_addr_add(), it in its turn calls
mlx5_nl_mac_addr_add() (that is subject of the patch).

mlx5_nl_mac_addr_add is internal function, not exposed external API,
the wrong parameter means the critical internal bug, so assert looks to be relevant here.
I would not remove MLX5_ASSERT at all but fix just it. 
Adding the parameter check and return an error is nice.
What do you think?

With best regards, Slava

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, March 31, 2020 9:03
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>;
> Alexander Kozyrev <akozyrev@mellanox.com>
> Subject: [PATCH] common/mlx5: fix bogus assert
> 
> The MLX5 device supports up to 256 MAC addresses.
> The code flushes all MAC devices.
> 
> If DPDK is compiled with MLX5_DEBUG this would an assert.
> PANIC in mlx5_nl_mac_addr_flush():
> line 775	assert "(size_t)(i) < sizeof(mac_own) * 8" failed
> 
> The root cause is that mac_own is a pointer and is being used as a bitmap
> array. The sizeof(mac_own) would therfore be 64 but the number of entries
> to be flushed would be 256.
> 
> There is a whole set of asserts in MLX5 netlink code with the same bug; that
> should just be changed into proper error checks.
> 
> Fixes: 8e46d4e18f09 ("common/mlx5: improve assert control")
> Cc: akozyrev@mellanox.com
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  drivers/common/mlx5/mlx5_nl.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/common/mlx5/mlx5_nl.c
> b/drivers/common/mlx5/mlx5_nl.c index 549e787b04bf..69f5efa50aa8 100644
> --- a/drivers/common/mlx5/mlx5_nl.c
> +++ b/drivers/common/mlx5/mlx5_nl.c
> @@ -671,7 +671,9 @@ mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int
> iface_idx,
> 
>  	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
>  	if (!ret) {
> -		MLX5_ASSERT((size_t)(index) < sizeof(mac_own) * CHAR_BIT);
> +		if (index >= MLX5_MAX_MAC_ADDRESSES)
> +			return -EINVAL;
> +
>  		BITFIELD_SET(mac_own, index);
>  	}
>  	if (ret == -EEXIST)
> @@ -700,7 +702,9 @@ int
>  mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t
> *mac_own,
>  			struct rte_ether_addr *mac, uint32_t index)  {
> -	MLX5_ASSERT((size_t)(index) < sizeof(mac_own) * CHAR_BIT);
> +	if (index >= MLX5_MAX_MAC_ADDRESSES)
> +		return -EINVAL;
> +
>  	BITFIELD_RESET(mac_own, index);
>  	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);  } @@ -
> 769,10 +773,12 @@ mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int
> iface_idx,  {
>  	int i;
> 
> +	if (n <= 0 || n >= MLX5_MAX_MAC_ADDRESSES)
> +		return;
> +
>  	for (i = n - 1; i >= 0; --i) {
>  		struct rte_ether_addr *m = &mac_addrs[i];
> 
> -		MLX5_ASSERT((size_t)(i) < sizeof(mac_own) * CHAR_BIT);
>  		if (BITFIELD_ISSET(mac_own, i))
>  			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx,
> mac_own, m,
>  						i);
> --
> 2.20.1
  
Stephen Hemminger March 31, 2020, 2:55 p.m. UTC | #2
On Tue, 31 Mar 2020 07:31:48 +0000
Slava Ovsiienko <viacheslavo@mellanox.com> wrote:

> Hi, Stephen
> 
> Thank you for the fix.
> 
> The exposed API to set MAC addresses:
> - mlx5_mac_addr_set (invoked by rte_mac_addr_set ())
> - mlx5_set_mc_addr_list (invoked by rte_eth_dev_set_mc_addr_list())
> 
> Both routines call mlx5_internal_mac_addr_add(), it in its turn calls
> mlx5_nl_mac_addr_add() (that is subject of the patch).
> 
> mlx5_nl_mac_addr_add is internal function, not exposed external API,
> the wrong parameter means the critical internal bug, so assert looks to be relevant here.
> I would not remove MLX5_ASSERT at all but fix just it. 
> Adding the parameter check and return an error is nice.
> What do you think?
> 
> With best regards, Slava

The real root cause is that sizeof(mac_own) is the wrong thing
to do. The error handling is up to you.

Since ASSERT's are compiled out they are never tested and are actually
making code less safe.
  
Slava Ovsiienko March 31, 2020, 3:09 p.m. UTC | #3
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, March 31, 2020 17:55
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; dev@dpdk.org; Alexander Kozyrev
> <akozyrev@mellanox.com>
> Subject: Re: [PATCH] common/mlx5: fix bogus assert
> 
> On Tue, 31 Mar 2020 07:31:48 +0000
> Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
> 
> > Hi, Stephen
> >
> > Thank you for the fix.
> >
> > The exposed API to set MAC addresses:
> > - mlx5_mac_addr_set (invoked by rte_mac_addr_set ())
> > - mlx5_set_mc_addr_list (invoked by rte_eth_dev_set_mc_addr_list())
> >
> > Both routines call mlx5_internal_mac_addr_add(), it in its turn calls
> > mlx5_nl_mac_addr_add() (that is subject of the patch).
> >
> > mlx5_nl_mac_addr_add is internal function, not exposed external API,
> > the wrong parameter means the critical internal bug, so assert looks to be
> relevant here.
> > I would not remove MLX5_ASSERT at all but fix just it.
> > Adding the parameter check and return an error is nice.
> > What do you think?
> >
> > With best regards, Slava
> 
> The real root cause is that sizeof(mac_own) is the wrong thing to do. The
> error handling is up to you.
> 
> Since ASSERT's are compiled out they are never tested and are actually
> making code less safe.

Generally speaking assert is not subject to test - I would consider it as a part of debug means.
Yes, this assert was with wrong condition and was not tested, but once enabled and a lot of MACs
came into game - we got an issue and your patch is here 😊. 

>> making code less safe.
The debug version of code is usually less safe and has no performance.
Adding the check and error return is OK, it works  always and improves the code, we do not expect engaging of it here, though.
Removing assert (instead of fixing one) reduces our debugging capabilities, so it is not OK, as for me.

With best regards, Slava
  
Stephen Hemminger April 10, 2020, 5:14 p.m. UTC | #4
On Tue, 31 Mar 2020 15:09:43 +0000
Slava Ovsiienko <viacheslavo@mellanox.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Tuesday, March 31, 2020 17:55
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > Cc: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> > <shahafs@mellanox.com>; dev@dpdk.org; Alexander Kozyrev
> > <akozyrev@mellanox.com>
> > Subject: Re: [PATCH] common/mlx5: fix bogus assert
> > 
> > On Tue, 31 Mar 2020 07:31:48 +0000
> > Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
> >   
> > > Hi, Stephen
> > >
> > > Thank you for the fix.
> > >
> > > The exposed API to set MAC addresses:
> > > - mlx5_mac_addr_set (invoked by rte_mac_addr_set ())
> > > - mlx5_set_mc_addr_list (invoked by rte_eth_dev_set_mc_addr_list())
> > >
> > > Both routines call mlx5_internal_mac_addr_add(), it in its turn calls
> > > mlx5_nl_mac_addr_add() (that is subject of the patch).
> > >
> > > mlx5_nl_mac_addr_add is internal function, not exposed external API,
> > > the wrong parameter means the critical internal bug, so assert looks to be  
> > relevant here.  
> > > I would not remove MLX5_ASSERT at all but fix just it.
> > > Adding the parameter check and return an error is nice.
> > > What do you think?
> > >
> > > With best regards, Slava  
> > 
> > The real root cause is that sizeof(mac_own) is the wrong thing to do. The
> > error handling is up to you.
> > 
> > Since ASSERT's are compiled out they are never tested and are actually
> > making code less safe.  
> 
> Generally speaking assert is not subject to test - I would consider it as a part of debug means.
> Yes, this assert was with wrong condition and was not tested, but once enabled and a lot of MACs
> came into game - we got an issue and your patch is here 😊. 
> 
> >> making code less safe.  
> The debug version of code is usually less safe and has no performance.
> Adding the check and error return is OK, it works  always and improves the code, we do not expect engaging of it here, though.
>

I am done being diplomatic.
You have repeatedly ignored the fact that doing sizeof a pointer is not
correct here. mac_own is a pointer so doing sizeof(mac_own) will not give what
you want.  You probably thought mac_own was an array, or that compiler would
know that the pointer was an array.

Any visible config option should work correctly. The code should not break.
Any not visible config option #ifdefs should be expunged from the upstream
code.

Either take the patch, or fix your code please
  
Slava Ovsiienko April 13, 2020, 9:51 a.m. UTC | #5
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, April 10, 2020 20:15
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; dev@dpdk.org; Alexander Kozyrev
> <akozyrev@mellanox.com>
> Subject: Re: [PATCH] common/mlx5: fix bogus assert
> 
> On Tue, 31 Mar 2020 15:09:43 +0000
> Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > Sent: Tuesday, March 31, 2020 17:55
> > > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > > Cc: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> > > <shahafs@mellanox.com>; dev@dpdk.org; Alexander Kozyrev
> > > <akozyrev@mellanox.com>
> > > Subject: Re: [PATCH] common/mlx5: fix bogus assert
> > >
> > > On Tue, 31 Mar 2020 07:31:48 +0000
> > > Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
> > >
> > > > Hi, Stephen
> > > >
> > > > Thank you for the fix.
> > > >
> > > > The exposed API to set MAC addresses:
> > > > - mlx5_mac_addr_set (invoked by rte_mac_addr_set ())
> > > > - mlx5_set_mc_addr_list (invoked by
> > > > rte_eth_dev_set_mc_addr_list())
> > > >
> > > > Both routines call mlx5_internal_mac_addr_add(), it in its turn
> > > > calls
> > > > mlx5_nl_mac_addr_add() (that is subject of the patch).
> > > >
> > > > mlx5_nl_mac_addr_add is internal function, not exposed external
> > > > API, the wrong parameter means the critical internal bug, so
> > > > assert looks to be
> > > relevant here.
> > > > I would not remove MLX5_ASSERT at all but fix just it.
> > > > Adding the parameter check and return an error is nice.
> > > > What do you think?
> > > >
> > > > With best regards, Slava
> > >
> > > The real root cause is that sizeof(mac_own) is the wrong thing to
> > > do. The error handling is up to you.
> > >
> > > Since ASSERT's are compiled out they are never tested and are
> > > actually making code less safe.
> >
> > Generally speaking assert is not subject to test - I would consider it as a part
> of debug means.
> > Yes, this assert was with wrong condition and was not tested, but once
> > enabled and a lot of MACs came into game - we got an issue and your patch
> is here 😊.
> >
> > >> making code less safe.
> > The debug version of code is usually less safe and has no performance.
> > Adding the check and error return is OK, it works  always and improves the
> code, we do not expect engaging of it here, though.
> >
> 
> I am done being diplomatic.
> You have repeatedly ignored the fact that doing sizeof a pointer is not correct
> here.
You are quite right. It is obvious bug and must be fixed, thank you for the patch.
And let me make you sure I did not mind fixing in anyway. 
My only proposal was to fix ASSERT as well instead of dropping one,
sorry if I did not express it in clear way.
Something like this:
MLX5_ASSERT(index < MLX5_MAX_MAC_ADDRESSES);

> mac_own is a pointer so doing sizeof(mac_own) will not give what you
> want.  You probably thought mac_own was an array, or that compiler would
> know that the pointer was an array.
> 
> Any visible config option should work correctly. The code should not break.
> Any not visible config option #ifdefs should be expunged from the upstream
> code.
> 
> Either take the patch, or fix your code please
Whatever you'd prefer - please, fix ASSERT, or let me know if I should.

Thanks in advance,
Slava
  

Patch

diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
index 549e787b04bf..69f5efa50aa8 100644
--- a/drivers/common/mlx5/mlx5_nl.c
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -671,7 +671,9 @@  mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
 
 	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
 	if (!ret) {
-		MLX5_ASSERT((size_t)(index) < sizeof(mac_own) * CHAR_BIT);
+		if (index >= MLX5_MAX_MAC_ADDRESSES)
+			return -EINVAL;
+
 		BITFIELD_SET(mac_own, index);
 	}
 	if (ret == -EEXIST)
@@ -700,7 +702,9 @@  int
 mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
 			struct rte_ether_addr *mac, uint32_t index)
 {
-	MLX5_ASSERT((size_t)(index) < sizeof(mac_own) * CHAR_BIT);
+	if (index >= MLX5_MAX_MAC_ADDRESSES)
+		return -EINVAL;
+
 	BITFIELD_RESET(mac_own, index);
 	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
 }
@@ -769,10 +773,12 @@  mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
 {
 	int i;
 
+	if (n <= 0 || n >= MLX5_MAX_MAC_ADDRESSES)
+		return;
+
 	for (i = n - 1; i >= 0; --i) {
 		struct rte_ether_addr *m = &mac_addrs[i];
 
-		MLX5_ASSERT((size_t)(i) < sizeof(mac_own) * CHAR_BIT);
 		if (BITFIELD_ISSET(mac_own, i))
 			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
 						i);