[2/2] kni: fix rtnl deadlocks and race conditions v3

Message ID 20210223134504.699-1-eladv6@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series kni: fix rtnl deadlocks and race conditions |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Elad Nachman Feb. 23, 2021, 1:45 p.m. UTC
  This part of the series includes my fixes for the issues reported
by Ferruh and Igor on top of part 1 of the patch series:

A. KNI sync lock is being locked while rtnl is held.
If two threads are calling kni_net_process_request() ,
then the first one will take the sync lock, release rtnl lock then sleep.
The second thread will try to lock sync lock while holding rtnl.
The first thread will wake, and try to lock rtnl, resulting in a deadlock.
The remedy is to release rtnl before locking the KNI sync lock.
Since in between nothing is accessing Linux network-wise,
no rtnl locking is needed.

B. There is a race condition in __dev_close_many() processing the
close_list while the application terminates.
It looks like if two vEth devices are terminating,
and one releases the rtnl lock, the other takes it,
updating the close_list in an unstable state,
causing the close_list to become a circular linked list,
hence list_for_each_entry() will endlessly loop inside
__dev_close_many() .
Since the description for the original patch indicate the
original motivation was bringing the device up,
I have changed kni_net_process_request() to hold the rtnl mutex
in case of bringing the device down since this is the path called
from __dev_close_many() , causing the corruption of the close_list. 

Signed-off-by: Elad Nachman <eladv6@gmail.com>
---
v3: 
* Include original patch and new patch as a series of patch, added a
  comment to the new patch
v2:
* rebuild the patch as increment from patch 64106
* fix comment and blank lines
---
 kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)
  

Comments

Igor Ryzhov Feb. 24, 2021, 12:49 p.m. UTC | #1
This looks more like a hack than an actual fix to me.

After this commit:
"ip link set up" is sent to the userspace with unlocked rtnl_lock
"ip link set down" is sent to the userspace with locked rtnl_lock

How is this really fixing anything? IMHO it only complicates the code.
If talking with userspace under rtnl_lock is a problem, then we should fix
all such requests, not only part of them.
If it is not a problem, then I don't see any point in merging this.

On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com> wrote:

> This part of the series includes my fixes for the issues reported
> by Ferruh and Igor on top of part 1 of the patch series:
>
> A. KNI sync lock is being locked while rtnl is held.
> If two threads are calling kni_net_process_request() ,
> then the first one will take the sync lock, release rtnl lock then sleep.
> The second thread will try to lock sync lock while holding rtnl.
> The first thread will wake, and try to lock rtnl, resulting in a deadlock.
> The remedy is to release rtnl before locking the KNI sync lock.
> Since in between nothing is accessing Linux network-wise,
> no rtnl locking is needed.
>
> B. There is a race condition in __dev_close_many() processing the
> close_list while the application terminates.
> It looks like if two vEth devices are terminating,
> and one releases the rtnl lock, the other takes it,
> updating the close_list in an unstable state,
> causing the close_list to become a circular linked list,
> hence list_for_each_entry() will endlessly loop inside
> __dev_close_many() .
> Since the description for the original patch indicate the
> original motivation was bringing the device up,
> I have changed kni_net_process_request() to hold the rtnl mutex
> in case of bringing the device down since this is the path called
> from __dev_close_many() , causing the corruption of the close_list.
>
> Signed-off-by: Elad Nachman <eladv6@gmail.com>
> ---
> v3:
> * Include original patch and new patch as a series of patch, added a
>   comment to the new patch
> v2:
> * rebuild the patch as increment from patch 64106
> * fix comment and blank lines
> ---
>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
> index f0b6e9a8d..017e44812 100644
> --- a/kernel/linux/kni/kni_net.c
> +++ b/kernel/linux/kni/kni_net.c
> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
>         void *resp_va;
>         uint32_t num;
>         int ret_val;
> +       int req_is_dev_stop = 0;
> +
> +       /* For configuring the interface to down,
> +        * rtnl must be held all the way to prevent race condition
> +        * inside __dev_close_many() between two netdev instances of KNI
> +        */
> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
> +                       req->if_up == 0)
> +               req_is_dev_stop = 1;
>
>         ASSERT_RTNL();
>
> +       /* Since we need to wait and RTNL mutex is held
> +        * drop the mutex and hold reference to keep device
> +        */
> +       if (!req_is_dev_stop) {
> +               dev_hold(dev);
> +               rtnl_unlock();
> +       }
> +
>         mutex_lock(&kni->sync_lock);
>
>         /* Construct data */
> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
>                 goto fail;
>         }
>
> -       /* Since we need to wait and RTNL mutex is held
> -        * drop the mutex and hold refernce to keep device
> -        */
> -       dev_hold(dev);
> -       rtnl_unlock();
> -
>         ret_val = wait_event_interruptible_timeout(kni->wq,
>                         kni_fifo_count(kni->resp_q), 3 * HZ);
> -       rtnl_lock();
> -       dev_put(dev);
>
>         if (signal_pending(current) || ret_val <= 0) {
>                 ret = -ETIME;
> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
>
>  fail:
>         mutex_unlock(&kni->sync_lock);
> +       if (!req_is_dev_stop) {
> +               rtnl_lock();
> +               dev_put(dev);
> +       }
>         return ret;
>  }
>
> --
> 2.17.1
>
>
  
Elad Nachman Feb. 24, 2021, 1:33 p.m. UTC | #2
Currently KNI has a lot of issues with deadlocks locking the code,
after this commit, they are gone, and the code runs properly without
crashing.
That was tested with over 100 restarts of the application, which
previously required a hard reset of the board.

I think this benefit overweights the complication of the code.

The function is called with rtnl locked because this is how the Linux
kernel is designed to work - it is not designed to work with deferral
to user-space mid-function.

To fix all such requests you need to reach an agreement with Linux
netdev, which is unlikely.

Calling user-space can be done asynchronously, as Ferruh asked, but
then you will always have to return success, even on failure, as Linux
kernel does not have a mechanism to asynchronously report on failure
for such system calls.

IMHO - weighting the non-reporting of failure versus how the code
looks (as it functions perfectly OK), I decided to go with
functionality.

FYI,

Elad.

On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
>
> This looks more like a hack than an actual fix to me.
>
> After this commit:
> "ip link set up" is sent to the userspace with unlocked rtnl_lock
> "ip link set down" is sent to the userspace with locked rtnl_lock
>
> How is this really fixing anything? IMHO it only complicates the code.
> If talking with userspace under rtnl_lock is a problem, then we should fix all such requests, not only part of them.
> If it is not a problem, then I don't see any point in merging this.
>
> On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com> wrote:
>>
>> This part of the series includes my fixes for the issues reported
>> by Ferruh and Igor on top of part 1 of the patch series:
>>
>> A. KNI sync lock is being locked while rtnl is held.
>> If two threads are calling kni_net_process_request() ,
>> then the first one will take the sync lock, release rtnl lock then sleep.
>> The second thread will try to lock sync lock while holding rtnl.
>> The first thread will wake, and try to lock rtnl, resulting in a deadlock.
>> The remedy is to release rtnl before locking the KNI sync lock.
>> Since in between nothing is accessing Linux network-wise,
>> no rtnl locking is needed.
>>
>> B. There is a race condition in __dev_close_many() processing the
>> close_list while the application terminates.
>> It looks like if two vEth devices are terminating,
>> and one releases the rtnl lock, the other takes it,
>> updating the close_list in an unstable state,
>> causing the close_list to become a circular linked list,
>> hence list_for_each_entry() will endlessly loop inside
>> __dev_close_many() .
>> Since the description for the original patch indicate the
>> original motivation was bringing the device up,
>> I have changed kni_net_process_request() to hold the rtnl mutex
>> in case of bringing the device down since this is the path called
>> from __dev_close_many() , causing the corruption of the close_list.
>>
>> Signed-off-by: Elad Nachman <eladv6@gmail.com>
>> ---
>> v3:
>> * Include original patch and new patch as a series of patch, added a
>>   comment to the new patch
>> v2:
>> * rebuild the patch as increment from patch 64106
>> * fix comment and blank lines
>> ---
>>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
>>  1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
>> index f0b6e9a8d..017e44812 100644
>> --- a/kernel/linux/kni/kni_net.c
>> +++ b/kernel/linux/kni/kni_net.c
>> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>>         void *resp_va;
>>         uint32_t num;
>>         int ret_val;
>> +       int req_is_dev_stop = 0;
>> +
>> +       /* For configuring the interface to down,
>> +        * rtnl must be held all the way to prevent race condition
>> +        * inside __dev_close_many() between two netdev instances of KNI
>> +        */
>> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
>> +                       req->if_up == 0)
>> +               req_is_dev_stop = 1;
>>
>>         ASSERT_RTNL();
>>
>> +       /* Since we need to wait and RTNL mutex is held
>> +        * drop the mutex and hold reference to keep device
>> +        */
>> +       if (!req_is_dev_stop) {
>> +               dev_hold(dev);
>> +               rtnl_unlock();
>> +       }
>> +
>>         mutex_lock(&kni->sync_lock);
>>
>>         /* Construct data */
>> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>>                 goto fail;
>>         }
>>
>> -       /* Since we need to wait and RTNL mutex is held
>> -        * drop the mutex and hold refernce to keep device
>> -        */
>> -       dev_hold(dev);
>> -       rtnl_unlock();
>> -
>>         ret_val = wait_event_interruptible_timeout(kni->wq,
>>                         kni_fifo_count(kni->resp_q), 3 * HZ);
>> -       rtnl_lock();
>> -       dev_put(dev);
>>
>>         if (signal_pending(current) || ret_val <= 0) {
>>                 ret = -ETIME;
>> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>>
>>  fail:
>>         mutex_unlock(&kni->sync_lock);
>> +       if (!req_is_dev_stop) {
>> +               rtnl_lock();
>> +               dev_put(dev);
>> +       }
>>         return ret;
>>  }
>>
>> --
>> 2.17.1
>>
  
Igor Ryzhov Feb. 24, 2021, 2:04 p.m. UTC | #3
Elad,

I understand your point.
But the fact that this fix works for you doesn't mean that it will work for
all DPDK users.

For example, I provided two simple commands: "ip link set up" and "ip link
set down".
Your fix works for only one of them. For me, this is not a proper fix.
It may work for you because you don't disable interfaces, but it will fail
for users who do.

On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman <eladv6@gmail.com> wrote:

> Currently KNI has a lot of issues with deadlocks locking the code,
> after this commit, they are gone, and the code runs properly without
> crashing.
> That was tested with over 100 restarts of the application, which
> previously required a hard reset of the board.
>
> I think this benefit overweights the complication of the code.
>
> The function is called with rtnl locked because this is how the Linux
> kernel is designed to work - it is not designed to work with deferral
> to user-space mid-function.
>
> To fix all such requests you need to reach an agreement with Linux
> netdev, which is unlikely.
>
> Calling user-space can be done asynchronously, as Ferruh asked, but
> then you will always have to return success, even on failure, as Linux
> kernel does not have a mechanism to asynchronously report on failure
> for such system calls.
>
> IMHO - weighting the non-reporting of failure versus how the code
> looks (as it functions perfectly OK), I decided to go with
> functionality.
>
> FYI,
>
> Elad.
>
> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >
> > This looks more like a hack than an actual fix to me.
> >
> > After this commit:
> > "ip link set up" is sent to the userspace with unlocked rtnl_lock
> > "ip link set down" is sent to the userspace with locked rtnl_lock
> >
> > How is this really fixing anything? IMHO it only complicates the code.
> > If talking with userspace under rtnl_lock is a problem, then we should
> fix all such requests, not only part of them.
> > If it is not a problem, then I don't see any point in merging this.
> >
> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com> wrote:
> >>
> >> This part of the series includes my fixes for the issues reported
> >> by Ferruh and Igor on top of part 1 of the patch series:
> >>
> >> A. KNI sync lock is being locked while rtnl is held.
> >> If two threads are calling kni_net_process_request() ,
> >> then the first one will take the sync lock, release rtnl lock then
> sleep.
> >> The second thread will try to lock sync lock while holding rtnl.
> >> The first thread will wake, and try to lock rtnl, resulting in a
> deadlock.
> >> The remedy is to release rtnl before locking the KNI sync lock.
> >> Since in between nothing is accessing Linux network-wise,
> >> no rtnl locking is needed.
> >>
> >> B. There is a race condition in __dev_close_many() processing the
> >> close_list while the application terminates.
> >> It looks like if two vEth devices are terminating,
> >> and one releases the rtnl lock, the other takes it,
> >> updating the close_list in an unstable state,
> >> causing the close_list to become a circular linked list,
> >> hence list_for_each_entry() will endlessly loop inside
> >> __dev_close_many() .
> >> Since the description for the original patch indicate the
> >> original motivation was bringing the device up,
> >> I have changed kni_net_process_request() to hold the rtnl mutex
> >> in case of bringing the device down since this is the path called
> >> from __dev_close_many() , causing the corruption of the close_list.
> >>
> >> Signed-off-by: Elad Nachman <eladv6@gmail.com>
> >> ---
> >> v3:
> >> * Include original patch and new patch as a series of patch, added a
> >>   comment to the new patch
> >> v2:
> >> * rebuild the patch as increment from patch 64106
> >> * fix comment and blank lines
> >> ---
> >>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
> >>  1 file changed, 21 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
> >> index f0b6e9a8d..017e44812 100644
> >> --- a/kernel/linux/kni/kni_net.c
> >> +++ b/kernel/linux/kni/kni_net.c
> >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >>         void *resp_va;
> >>         uint32_t num;
> >>         int ret_val;
> >> +       int req_is_dev_stop = 0;
> >> +
> >> +       /* For configuring the interface to down,
> >> +        * rtnl must be held all the way to prevent race condition
> >> +        * inside __dev_close_many() between two netdev instances of KNI
> >> +        */
> >> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
> >> +                       req->if_up == 0)
> >> +               req_is_dev_stop = 1;
> >>
> >>         ASSERT_RTNL();
> >>
> >> +       /* Since we need to wait and RTNL mutex is held
> >> +        * drop the mutex and hold reference to keep device
> >> +        */
> >> +       if (!req_is_dev_stop) {
> >> +               dev_hold(dev);
> >> +               rtnl_unlock();
> >> +       }
> >> +
> >>         mutex_lock(&kni->sync_lock);
> >>
> >>         /* Construct data */
> >> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >>                 goto fail;
> >>         }
> >>
> >> -       /* Since we need to wait and RTNL mutex is held
> >> -        * drop the mutex and hold refernce to keep device
> >> -        */
> >> -       dev_hold(dev);
> >> -       rtnl_unlock();
> >> -
> >>         ret_val = wait_event_interruptible_timeout(kni->wq,
> >>                         kni_fifo_count(kni->resp_q), 3 * HZ);
> >> -       rtnl_lock();
> >> -       dev_put(dev);
> >>
> >>         if (signal_pending(current) || ret_val <= 0) {
> >>                 ret = -ETIME;
> >> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >>
> >>  fail:
> >>         mutex_unlock(&kni->sync_lock);
> >> +       if (!req_is_dev_stop) {
> >> +               rtnl_lock();
> >> +               dev_put(dev);
> >> +       }
> >>         return ret;
> >>  }
> >>
> >> --
> >> 2.17.1
> >>
>
  
Elad Nachman Feb. 24, 2021, 2:06 p.m. UTC | #4
I tested both link up and link down many times without any problems on
100 restarts of the application.

Having KNI deadlock frequently for real life applications is far worst, IMHO.

FYI

Elad.

On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
>
> Elad,
>
> I understand your point.
> But the fact that this fix works for you doesn't mean that it will work for all DPDK users.
>
> For example, I provided two simple commands: "ip link set up" and "ip link set down".
> Your fix works for only one of them. For me, this is not a proper fix.
> It may work for you because you don't disable interfaces, but it will fail for users who do.
>
> On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman <eladv6@gmail.com> wrote:
>>
>> Currently KNI has a lot of issues with deadlocks locking the code,
>> after this commit, they are gone, and the code runs properly without
>> crashing.
>> That was tested with over 100 restarts of the application, which
>> previously required a hard reset of the board.
>>
>> I think this benefit overweights the complication of the code.
>>
>> The function is called with rtnl locked because this is how the Linux
>> kernel is designed to work - it is not designed to work with deferral
>> to user-space mid-function.
>>
>> To fix all such requests you need to reach an agreement with Linux
>> netdev, which is unlikely.
>>
>> Calling user-space can be done asynchronously, as Ferruh asked, but
>> then you will always have to return success, even on failure, as Linux
>> kernel does not have a mechanism to asynchronously report on failure
>> for such system calls.
>>
>> IMHO - weighting the non-reporting of failure versus how the code
>> looks (as it functions perfectly OK), I decided to go with
>> functionality.
>>
>> FYI,
>>
>> Elad.
>>
>> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
>> >
>> > This looks more like a hack than an actual fix to me.
>> >
>> > After this commit:
>> > "ip link set up" is sent to the userspace with unlocked rtnl_lock
>> > "ip link set down" is sent to the userspace with locked rtnl_lock
>> >
>> > How is this really fixing anything? IMHO it only complicates the code.
>> > If talking with userspace under rtnl_lock is a problem, then we should fix all such requests, not only part of them.
>> > If it is not a problem, then I don't see any point in merging this.
>> >
>> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com> wrote:
>> >>
>> >> This part of the series includes my fixes for the issues reported
>> >> by Ferruh and Igor on top of part 1 of the patch series:
>> >>
>> >> A. KNI sync lock is being locked while rtnl is held.
>> >> If two threads are calling kni_net_process_request() ,
>> >> then the first one will take the sync lock, release rtnl lock then sleep.
>> >> The second thread will try to lock sync lock while holding rtnl.
>> >> The first thread will wake, and try to lock rtnl, resulting in a deadlock.
>> >> The remedy is to release rtnl before locking the KNI sync lock.
>> >> Since in between nothing is accessing Linux network-wise,
>> >> no rtnl locking is needed.
>> >>
>> >> B. There is a race condition in __dev_close_many() processing the
>> >> close_list while the application terminates.
>> >> It looks like if two vEth devices are terminating,
>> >> and one releases the rtnl lock, the other takes it,
>> >> updating the close_list in an unstable state,
>> >> causing the close_list to become a circular linked list,
>> >> hence list_for_each_entry() will endlessly loop inside
>> >> __dev_close_many() .
>> >> Since the description for the original patch indicate the
>> >> original motivation was bringing the device up,
>> >> I have changed kni_net_process_request() to hold the rtnl mutex
>> >> in case of bringing the device down since this is the path called
>> >> from __dev_close_many() , causing the corruption of the close_list.
>> >>
>> >> Signed-off-by: Elad Nachman <eladv6@gmail.com>
>> >> ---
>> >> v3:
>> >> * Include original patch and new patch as a series of patch, added a
>> >>   comment to the new patch
>> >> v2:
>> >> * rebuild the patch as increment from patch 64106
>> >> * fix comment and blank lines
>> >> ---
>> >>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
>> >>  1 file changed, 21 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
>> >> index f0b6e9a8d..017e44812 100644
>> >> --- a/kernel/linux/kni/kni_net.c
>> >> +++ b/kernel/linux/kni/kni_net.c
>> >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>> >>         void *resp_va;
>> >>         uint32_t num;
>> >>         int ret_val;
>> >> +       int req_is_dev_stop = 0;
>> >> +
>> >> +       /* For configuring the interface to down,
>> >> +        * rtnl must be held all the way to prevent race condition
>> >> +        * inside __dev_close_many() between two netdev instances of KNI
>> >> +        */
>> >> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
>> >> +                       req->if_up == 0)
>> >> +               req_is_dev_stop = 1;
>> >>
>> >>         ASSERT_RTNL();
>> >>
>> >> +       /* Since we need to wait and RTNL mutex is held
>> >> +        * drop the mutex and hold reference to keep device
>> >> +        */
>> >> +       if (!req_is_dev_stop) {
>> >> +               dev_hold(dev);
>> >> +               rtnl_unlock();
>> >> +       }
>> >> +
>> >>         mutex_lock(&kni->sync_lock);
>> >>
>> >>         /* Construct data */
>> >> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>> >>                 goto fail;
>> >>         }
>> >>
>> >> -       /* Since we need to wait and RTNL mutex is held
>> >> -        * drop the mutex and hold refernce to keep device
>> >> -        */
>> >> -       dev_hold(dev);
>> >> -       rtnl_unlock();
>> >> -
>> >>         ret_val = wait_event_interruptible_timeout(kni->wq,
>> >>                         kni_fifo_count(kni->resp_q), 3 * HZ);
>> >> -       rtnl_lock();
>> >> -       dev_put(dev);
>> >>
>> >>         if (signal_pending(current) || ret_val <= 0) {
>> >>                 ret = -ETIME;
>> >> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>> >>
>> >>  fail:
>> >>         mutex_unlock(&kni->sync_lock);
>> >> +       if (!req_is_dev_stop) {
>> >> +               rtnl_lock();
>> >> +               dev_put(dev);
>> >> +       }
>> >>         return ret;
>> >>  }
>> >>
>> >> --
>> >> 2.17.1
>> >>
  
Igor Ryzhov Feb. 24, 2021, 2:41 p.m. UTC | #5
Both link up and link down also work for me without this patch.
So what's the point in merging it?

Just to clarify - I am not against the idea of this patch.
Talking to userspace under rtnl_lock is a bad idea.
I just think that any patch should fix some specified problem.

If this patch is trying to solve the overall "userspace request under
rtnl_lock" problem,
then it doesn't solve it correctly, because we still send link down
requests under the lock.

If this patch is trying to solve some other issue, for example, some "KNI
deadlocks"
you're talking about, then you should explain what these deadlocks are, how
to reproduce
them and why this patch solves the issue.

On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman <eladv6@gmail.com> wrote:

> I tested both link up and link down many times without any problems on
> 100 restarts of the application.
>
> Having KNI deadlock frequently for real life applications is far worst,
> IMHO.
>
> FYI
>
> Elad.
>
> On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >
> > Elad,
> >
> > I understand your point.
> > But the fact that this fix works for you doesn't mean that it will work
> for all DPDK users.
> >
> > For example, I provided two simple commands: "ip link set up" and "ip
> link set down".
> > Your fix works for only one of them. For me, this is not a proper fix.
> > It may work for you because you don't disable interfaces, but it will
> fail for users who do.
> >
> > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman <eladv6@gmail.com> wrote:
> >>
> >> Currently KNI has a lot of issues with deadlocks locking the code,
> >> after this commit, they are gone, and the code runs properly without
> >> crashing.
> >> That was tested with over 100 restarts of the application, which
> >> previously required a hard reset of the board.
> >>
> >> I think this benefit overweights the complication of the code.
> >>
> >> The function is called with rtnl locked because this is how the Linux
> >> kernel is designed to work - it is not designed to work with deferral
> >> to user-space mid-function.
> >>
> >> To fix all such requests you need to reach an agreement with Linux
> >> netdev, which is unlikely.
> >>
> >> Calling user-space can be done asynchronously, as Ferruh asked, but
> >> then you will always have to return success, even on failure, as Linux
> >> kernel does not have a mechanism to asynchronously report on failure
> >> for such system calls.
> >>
> >> IMHO - weighting the non-reporting of failure versus how the code
> >> looks (as it functions perfectly OK), I decided to go with
> >> functionality.
> >>
> >> FYI,
> >>
> >> Elad.
> >>
> >> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >> >
> >> > This looks more like a hack than an actual fix to me.
> >> >
> >> > After this commit:
> >> > "ip link set up" is sent to the userspace with unlocked rtnl_lock
> >> > "ip link set down" is sent to the userspace with locked rtnl_lock
> >> >
> >> > How is this really fixing anything? IMHO it only complicates the code.
> >> > If talking with userspace under rtnl_lock is a problem, then we
> should fix all such requests, not only part of them.
> >> > If it is not a problem, then I don't see any point in merging this.
> >> >
> >> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com>
> wrote:
> >> >>
> >> >> This part of the series includes my fixes for the issues reported
> >> >> by Ferruh and Igor on top of part 1 of the patch series:
> >> >>
> >> >> A. KNI sync lock is being locked while rtnl is held.
> >> >> If two threads are calling kni_net_process_request() ,
> >> >> then the first one will take the sync lock, release rtnl lock then
> sleep.
> >> >> The second thread will try to lock sync lock while holding rtnl.
> >> >> The first thread will wake, and try to lock rtnl, resulting in a
> deadlock.
> >> >> The remedy is to release rtnl before locking the KNI sync lock.
> >> >> Since in between nothing is accessing Linux network-wise,
> >> >> no rtnl locking is needed.
> >> >>
> >> >> B. There is a race condition in __dev_close_many() processing the
> >> >> close_list while the application terminates.
> >> >> It looks like if two vEth devices are terminating,
> >> >> and one releases the rtnl lock, the other takes it,
> >> >> updating the close_list in an unstable state,
> >> >> causing the close_list to become a circular linked list,
> >> >> hence list_for_each_entry() will endlessly loop inside
> >> >> __dev_close_many() .
> >> >> Since the description for the original patch indicate the
> >> >> original motivation was bringing the device up,
> >> >> I have changed kni_net_process_request() to hold the rtnl mutex
> >> >> in case of bringing the device down since this is the path called
> >> >> from __dev_close_many() , causing the corruption of the close_list.
> >> >>
> >> >> Signed-off-by: Elad Nachman <eladv6@gmail.com>
> >> >> ---
> >> >> v3:
> >> >> * Include original patch and new patch as a series of patch, added a
> >> >>   comment to the new patch
> >> >> v2:
> >> >> * rebuild the patch as increment from patch 64106
> >> >> * fix comment and blank lines
> >> >> ---
> >> >>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
> >> >>  1 file changed, 21 insertions(+), 8 deletions(-)
> >> >>
> >> >> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
> >> >> index f0b6e9a8d..017e44812 100644
> >> >> --- a/kernel/linux/kni/kni_net.c
> >> >> +++ b/kernel/linux/kni/kni_net.c
> >> >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >> >>         void *resp_va;
> >> >>         uint32_t num;
> >> >>         int ret_val;
> >> >> +       int req_is_dev_stop = 0;
> >> >> +
> >> >> +       /* For configuring the interface to down,
> >> >> +        * rtnl must be held all the way to prevent race condition
> >> >> +        * inside __dev_close_many() between two netdev instances of
> KNI
> >> >> +        */
> >> >> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
> >> >> +                       req->if_up == 0)
> >> >> +               req_is_dev_stop = 1;
> >> >>
> >> >>         ASSERT_RTNL();
> >> >>
> >> >> +       /* Since we need to wait and RTNL mutex is held
> >> >> +        * drop the mutex and hold reference to keep device
> >> >> +        */
> >> >> +       if (!req_is_dev_stop) {
> >> >> +               dev_hold(dev);
> >> >> +               rtnl_unlock();
> >> >> +       }
> >> >> +
> >> >>         mutex_lock(&kni->sync_lock);
> >> >>
> >> >>         /* Construct data */
> >> >> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >> >>                 goto fail;
> >> >>         }
> >> >>
> >> >> -       /* Since we need to wait and RTNL mutex is held
> >> >> -        * drop the mutex and hold refernce to keep device
> >> >> -        */
> >> >> -       dev_hold(dev);
> >> >> -       rtnl_unlock();
> >> >> -
> >> >>         ret_val = wait_event_interruptible_timeout(kni->wq,
> >> >>                         kni_fifo_count(kni->resp_q), 3 * HZ);
> >> >> -       rtnl_lock();
> >> >> -       dev_put(dev);
> >> >>
> >> >>         if (signal_pending(current) || ret_val <= 0) {
> >> >>                 ret = -ETIME;
> >> >> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >> >>
> >> >>  fail:
> >> >>         mutex_unlock(&kni->sync_lock);
> >> >> +       if (!req_is_dev_stop) {
> >> >> +               rtnl_lock();
> >> >> +               dev_put(dev);
> >> >> +       }
> >> >>         return ret;
> >> >>  }
> >> >>
> >> >> --
> >> >> 2.17.1
> >> >>
>
  
Elad Nachman Feb. 24, 2021, 2:56 p.m. UTC | #6
The deadlock scenarios are explained below:

It is described in Stephen Hemminger's original patch:

"

This fixes a deadlock when using KNI with bifurcated drivers.
Bringing kni device up always times out when using Mellanox
devices.

The kernel KNI driver sends message to userspace to complete
the request. For the case of bifurcated driver, this may involve
an additional request to kernel to change state. This request
would deadlock because KNI was holding the RTNL mutex.

"

And also in my patch:

"
KNI sync lock is being locked while rtnl is held.
If two threads are calling kni_net_process_request() ,
then the first one will take the sync lock, release rtnl lock then sleep.
The second thread will try to lock sync lock while holding rtnl.
The first thread will wake, and try to lock rtnl, resulting in a deadlock.
The remedy is to release rtnl before locking the KNI sync lock.
Since in between nothing is accessing Linux network-wise,
no rtnl locking is needed.
"

FYI,

Elad.

On Wed, Feb 24, 2021 at 4:41 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
>
> Both link up and link down also work for me without this patch.
> So what's the point in merging it?
>
> Just to clarify - I am not against the idea of this patch.
> Talking to userspace under rtnl_lock is a bad idea.
> I just think that any patch should fix some specified problem.
>
> If this patch is trying to solve the overall "userspace request under rtnl_lock" problem,
> then it doesn't solve it correctly, because we still send link down requests under the lock.
>
> If this patch is trying to solve some other issue, for example, some "KNI deadlocks"
> you're talking about, then you should explain what these deadlocks are, how to reproduce
> them and why this patch solves the issue.
>
> On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman <eladv6@gmail.com> wrote:
>>
>> I tested both link up and link down many times without any problems on
>> 100 restarts of the application.
>>
>> Having KNI deadlock frequently for real life applications is far worst, IMHO.
>>
>> FYI
>>
>> Elad.
>>
>> On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
>> >
>> > Elad,
>> >
>> > I understand your point.
>> > But the fact that this fix works for you doesn't mean that it will work for all DPDK users.
>> >
>> > For example, I provided two simple commands: "ip link set up" and "ip link set down".
>> > Your fix works for only one of them. For me, this is not a proper fix.
>> > It may work for you because you don't disable interfaces, but it will fail for users who do.
>> >
>> > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman <eladv6@gmail.com> wrote:
>> >>
>> >> Currently KNI has a lot of issues with deadlocks locking the code,
>> >> after this commit, they are gone, and the code runs properly without
>> >> crashing.
>> >> That was tested with over 100 restarts of the application, which
>> >> previously required a hard reset of the board.
>> >>
>> >> I think this benefit overweights the complication of the code.
>> >>
>> >> The function is called with rtnl locked because this is how the Linux
>> >> kernel is designed to work - it is not designed to work with deferral
>> >> to user-space mid-function.
>> >>
>> >> To fix all such requests you need to reach an agreement with Linux
>> >> netdev, which is unlikely.
>> >>
>> >> Calling user-space can be done asynchronously, as Ferruh asked, but
>> >> then you will always have to return success, even on failure, as Linux
>> >> kernel does not have a mechanism to asynchronously report on failure
>> >> for such system calls.
>> >>
>> >> IMHO - weighting the non-reporting of failure versus how the code
>> >> looks (as it functions perfectly OK), I decided to go with
>> >> functionality.
>> >>
>> >> FYI,
>> >>
>> >> Elad.
>> >>
>> >> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
>> >> >
>> >> > This looks more like a hack than an actual fix to me.
>> >> >
>> >> > After this commit:
>> >> > "ip link set up" is sent to the userspace with unlocked rtnl_lock
>> >> > "ip link set down" is sent to the userspace with locked rtnl_lock
>> >> >
>> >> > How is this really fixing anything? IMHO it only complicates the code.
>> >> > If talking with userspace under rtnl_lock is a problem, then we should fix all such requests, not only part of them.
>> >> > If it is not a problem, then I don't see any point in merging this.
>> >> >
>> >> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com> wrote:
>> >> >>
>> >> >> This part of the series includes my fixes for the issues reported
>> >> >> by Ferruh and Igor on top of part 1 of the patch series:
>> >> >>
>> >> >> A. KNI sync lock is being locked while rtnl is held.
>> >> >> If two threads are calling kni_net_process_request() ,
>> >> >> then the first one will take the sync lock, release rtnl lock then sleep.
>> >> >> The second thread will try to lock sync lock while holding rtnl.
>> >> >> The first thread will wake, and try to lock rtnl, resulting in a deadlock.
>> >> >> The remedy is to release rtnl before locking the KNI sync lock.
>> >> >> Since in between nothing is accessing Linux network-wise,
>> >> >> no rtnl locking is needed.
>> >> >>
>> >> >> B. There is a race condition in __dev_close_many() processing the
>> >> >> close_list while the application terminates.
>> >> >> It looks like if two vEth devices are terminating,
>> >> >> and one releases the rtnl lock, the other takes it,
>> >> >> updating the close_list in an unstable state,
>> >> >> causing the close_list to become a circular linked list,
>> >> >> hence list_for_each_entry() will endlessly loop inside
>> >> >> __dev_close_many() .
>> >> >> Since the description for the original patch indicate the
>> >> >> original motivation was bringing the device up,
>> >> >> I have changed kni_net_process_request() to hold the rtnl mutex
>> >> >> in case of bringing the device down since this is the path called
>> >> >> from __dev_close_many() , causing the corruption of the close_list.
>> >> >>
>> >> >> Signed-off-by: Elad Nachman <eladv6@gmail.com>
>> >> >> ---
>> >> >> v3:
>> >> >> * Include original patch and new patch as a series of patch, added a
>> >> >>   comment to the new patch
>> >> >> v2:
>> >> >> * rebuild the patch as increment from patch 64106
>> >> >> * fix comment and blank lines
>> >> >> ---
>> >> >>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
>> >> >>  1 file changed, 21 insertions(+), 8 deletions(-)
>> >> >>
>> >> >> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
>> >> >> index f0b6e9a8d..017e44812 100644
>> >> >> --- a/kernel/linux/kni/kni_net.c
>> >> >> +++ b/kernel/linux/kni/kni_net.c
>> >> >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>> >> >>         void *resp_va;
>> >> >>         uint32_t num;
>> >> >>         int ret_val;
>> >> >> +       int req_is_dev_stop = 0;
>> >> >> +
>> >> >> +       /* For configuring the interface to down,
>> >> >> +        * rtnl must be held all the way to prevent race condition
>> >> >> +        * inside __dev_close_many() between two netdev instances of KNI
>> >> >> +        */
>> >> >> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
>> >> >> +                       req->if_up == 0)
>> >> >> +               req_is_dev_stop = 1;
>> >> >>
>> >> >>         ASSERT_RTNL();
>> >> >>
>> >> >> +       /* Since we need to wait and RTNL mutex is held
>> >> >> +        * drop the mutex and hold reference to keep device
>> >> >> +        */
>> >> >> +       if (!req_is_dev_stop) {
>> >> >> +               dev_hold(dev);
>> >> >> +               rtnl_unlock();
>> >> >> +       }
>> >> >> +
>> >> >>         mutex_lock(&kni->sync_lock);
>> >> >>
>> >> >>         /* Construct data */
>> >> >> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>> >> >>                 goto fail;
>> >> >>         }
>> >> >>
>> >> >> -       /* Since we need to wait and RTNL mutex is held
>> >> >> -        * drop the mutex and hold refernce to keep device
>> >> >> -        */
>> >> >> -       dev_hold(dev);
>> >> >> -       rtnl_unlock();
>> >> >> -
>> >> >>         ret_val = wait_event_interruptible_timeout(kni->wq,
>> >> >>                         kni_fifo_count(kni->resp_q), 3 * HZ);
>> >> >> -       rtnl_lock();
>> >> >> -       dev_put(dev);
>> >> >>
>> >> >>         if (signal_pending(current) || ret_val <= 0) {
>> >> >>                 ret = -ETIME;
>> >> >> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>> >> >>
>> >> >>  fail:
>> >> >>         mutex_unlock(&kni->sync_lock);
>> >> >> +       if (!req_is_dev_stop) {
>> >> >> +               rtnl_lock();
>> >> >> +               dev_put(dev);
>> >> >> +       }
>> >> >>         return ret;
>> >> >>  }
>> >> >>
>> >> >> --
>> >> >> 2.17.1
>> >> >>
  
Igor Ryzhov Feb. 24, 2021, 3:18 p.m. UTC | #7
Stephen's idea was to fix the deadlock when working with the bifurcated
driver.
Your rework breaks this because you still send link down requests under
rtnl_lock.
Did you test your patch with Mellanox devices?

On Wed, Feb 24, 2021 at 5:56 PM Elad Nachman <eladv6@gmail.com> wrote:

> The deadlock scenarios are explained below:
>
> It is described in Stephen Hemminger's original patch:
>
> "
>
> This fixes a deadlock when using KNI with bifurcated drivers.
> Bringing kni device up always times out when using Mellanox
> devices.
>
> The kernel KNI driver sends message to userspace to complete
> the request. For the case of bifurcated driver, this may involve
> an additional request to kernel to change state. This request
> would deadlock because KNI was holding the RTNL mutex.
>
> "
>
> And also in my patch:
>
> "
> KNI sync lock is being locked while rtnl is held.
> If two threads are calling kni_net_process_request() ,
> then the first one will take the sync lock, release rtnl lock then sleep.
> The second thread will try to lock sync lock while holding rtnl.
> The first thread will wake, and try to lock rtnl, resulting in a deadlock.
> The remedy is to release rtnl before locking the KNI sync lock.
> Since in between nothing is accessing Linux network-wise,
> no rtnl locking is needed.
> "
>
> FYI,
>
> Elad.
>
> On Wed, Feb 24, 2021 at 4:41 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >
> > Both link up and link down also work for me without this patch.
> > So what's the point in merging it?
> >
> > Just to clarify - I am not against the idea of this patch.
> > Talking to userspace under rtnl_lock is a bad idea.
> > I just think that any patch should fix some specified problem.
> >
> > If this patch is trying to solve the overall "userspace request under
> rtnl_lock" problem,
> > then it doesn't solve it correctly, because we still send link down
> requests under the lock.
> >
> > If this patch is trying to solve some other issue, for example, some
> "KNI deadlocks"
> > you're talking about, then you should explain what these deadlocks are,
> how to reproduce
> > them and why this patch solves the issue.
> >
> > On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman <eladv6@gmail.com> wrote:
> >>
> >> I tested both link up and link down many times without any problems on
> >> 100 restarts of the application.
> >>
> >> Having KNI deadlock frequently for real life applications is far worst,
> IMHO.
> >>
> >> FYI
> >>
> >> Elad.
> >>
> >> On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >> >
> >> > Elad,
> >> >
> >> > I understand your point.
> >> > But the fact that this fix works for you doesn't mean that it will
> work for all DPDK users.
> >> >
> >> > For example, I provided two simple commands: "ip link set up" and "ip
> link set down".
> >> > Your fix works for only one of them. For me, this is not a proper fix.
> >> > It may work for you because you don't disable interfaces, but it will
> fail for users who do.
> >> >
> >> > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman <eladv6@gmail.com>
> wrote:
> >> >>
> >> >> Currently KNI has a lot of issues with deadlocks locking the code,
> >> >> after this commit, they are gone, and the code runs properly without
> >> >> crashing.
> >> >> That was tested with over 100 restarts of the application, which
> >> >> previously required a hard reset of the board.
> >> >>
> >> >> I think this benefit overweights the complication of the code.
> >> >>
> >> >> The function is called with rtnl locked because this is how the Linux
> >> >> kernel is designed to work - it is not designed to work with deferral
> >> >> to user-space mid-function.
> >> >>
> >> >> To fix all such requests you need to reach an agreement with Linux
> >> >> netdev, which is unlikely.
> >> >>
> >> >> Calling user-space can be done asynchronously, as Ferruh asked, but
> >> >> then you will always have to return success, even on failure, as
> Linux
> >> >> kernel does not have a mechanism to asynchronously report on failure
> >> >> for such system calls.
> >> >>
> >> >> IMHO - weighting the non-reporting of failure versus how the code
> >> >> looks (as it functions perfectly OK), I decided to go with
> >> >> functionality.
> >> >>
> >> >> FYI,
> >> >>
> >> >> Elad.
> >> >>
> >> >> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov <iryzhov@nfware.com>
> wrote:
> >> >> >
> >> >> > This looks more like a hack than an actual fix to me.
> >> >> >
> >> >> > After this commit:
> >> >> > "ip link set up" is sent to the userspace with unlocked rtnl_lock
> >> >> > "ip link set down" is sent to the userspace with locked rtnl_lock
> >> >> >
> >> >> > How is this really fixing anything? IMHO it only complicates the
> code.
> >> >> > If talking with userspace under rtnl_lock is a problem, then we
> should fix all such requests, not only part of them.
> >> >> > If it is not a problem, then I don't see any point in merging this.
> >> >> >
> >> >> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com>
> wrote:
> >> >> >>
> >> >> >> This part of the series includes my fixes for the issues reported
> >> >> >> by Ferruh and Igor on top of part 1 of the patch series:
> >> >> >>
> >> >> >> A. KNI sync lock is being locked while rtnl is held.
> >> >> >> If two threads are calling kni_net_process_request() ,
> >> >> >> then the first one will take the sync lock, release rtnl lock
> then sleep.
> >> >> >> The second thread will try to lock sync lock while holding rtnl.
> >> >> >> The first thread will wake, and try to lock rtnl, resulting in a
> deadlock.
> >> >> >> The remedy is to release rtnl before locking the KNI sync lock.
> >> >> >> Since in between nothing is accessing Linux network-wise,
> >> >> >> no rtnl locking is needed.
> >> >> >>
> >> >> >> B. There is a race condition in __dev_close_many() processing the
> >> >> >> close_list while the application terminates.
> >> >> >> It looks like if two vEth devices are terminating,
> >> >> >> and one releases the rtnl lock, the other takes it,
> >> >> >> updating the close_list in an unstable state,
> >> >> >> causing the close_list to become a circular linked list,
> >> >> >> hence list_for_each_entry() will endlessly loop inside
> >> >> >> __dev_close_many() .
> >> >> >> Since the description for the original patch indicate the
> >> >> >> original motivation was bringing the device up,
> >> >> >> I have changed kni_net_process_request() to hold the rtnl mutex
> >> >> >> in case of bringing the device down since this is the path called
> >> >> >> from __dev_close_many() , causing the corruption of the
> close_list.
> >> >> >>
> >> >> >> Signed-off-by: Elad Nachman <eladv6@gmail.com>
> >> >> >> ---
> >> >> >> v3:
> >> >> >> * Include original patch and new patch as a series of patch,
> added a
> >> >> >>   comment to the new patch
> >> >> >> v2:
> >> >> >> * rebuild the patch as increment from patch 64106
> >> >> >> * fix comment and blank lines
> >> >> >> ---
> >> >> >>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
> >> >> >>  1 file changed, 21 insertions(+), 8 deletions(-)
> >> >> >>
> >> >> >> diff --git a/kernel/linux/kni/kni_net.c
> b/kernel/linux/kni/kni_net.c
> >> >> >> index f0b6e9a8d..017e44812 100644
> >> >> >> --- a/kernel/linux/kni/kni_net.c
> >> >> >> +++ b/kernel/linux/kni/kni_net.c
> >> >> >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device
> *dev, struct rte_kni_request *req)
> >> >> >>         void *resp_va;
> >> >> >>         uint32_t num;
> >> >> >>         int ret_val;
> >> >> >> +       int req_is_dev_stop = 0;
> >> >> >> +
> >> >> >> +       /* For configuring the interface to down,
> >> >> >> +        * rtnl must be held all the way to prevent race condition
> >> >> >> +        * inside __dev_close_many() between two netdev instances
> of KNI
> >> >> >> +        */
> >> >> >> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
> >> >> >> +                       req->if_up == 0)
> >> >> >> +               req_is_dev_stop = 1;
> >> >> >>
> >> >> >>         ASSERT_RTNL();
> >> >> >>
> >> >> >> +       /* Since we need to wait and RTNL mutex is held
> >> >> >> +        * drop the mutex and hold reference to keep device
> >> >> >> +        */
> >> >> >> +       if (!req_is_dev_stop) {
> >> >> >> +               dev_hold(dev);
> >> >> >> +               rtnl_unlock();
> >> >> >> +       }
> >> >> >> +
> >> >> >>         mutex_lock(&kni->sync_lock);
> >> >> >>
> >> >> >>         /* Construct data */
> >> >> >> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device
> *dev, struct rte_kni_request *req)
> >> >> >>                 goto fail;
> >> >> >>         }
> >> >> >>
> >> >> >> -       /* Since we need to wait and RTNL mutex is held
> >> >> >> -        * drop the mutex and hold refernce to keep device
> >> >> >> -        */
> >> >> >> -       dev_hold(dev);
> >> >> >> -       rtnl_unlock();
> >> >> >> -
> >> >> >>         ret_val = wait_event_interruptible_timeout(kni->wq,
> >> >> >>                         kni_fifo_count(kni->resp_q), 3 * HZ);
> >> >> >> -       rtnl_lock();
> >> >> >> -       dev_put(dev);
> >> >> >>
> >> >> >>         if (signal_pending(current) || ret_val <= 0) {
> >> >> >>                 ret = -ETIME;
> >> >> >> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device
> *dev, struct rte_kni_request *req)
> >> >> >>
> >> >> >>  fail:
> >> >> >>         mutex_unlock(&kni->sync_lock);
> >> >> >> +       if (!req_is_dev_stop) {
> >> >> >> +               rtnl_lock();
> >> >> >> +               dev_put(dev);
> >> >> >> +       }
> >> >> >>         return ret;
> >> >> >>  }
> >> >> >>
> >> >> >> --
> >> >> >> 2.17.1
> >> >> >>
>
  
Stephen Hemminger Feb. 24, 2021, 3:54 p.m. UTC | #8
On Wed, 24 Feb 2021 15:49:49 +0300
Igor Ryzhov <iryzhov@nfware.com> wrote:

> This looks more like a hack than an actual fix to me.
> 
> After this commit:
> "ip link set up" is sent to the userspace with unlocked rtnl_lock
> "ip link set down" is sent to the userspace with locked rtnl_lock

Calling userspace with rtnl held is a recipe for disaster
  
Igor Ryzhov Feb. 24, 2021, 4:31 p.m. UTC | #9
Elad,

To make it work on Mellanox NIC, you need to find a way to send
ALL requests to userspace without rtnl_lock held, including link down.
As I understand, the race condition in __dev_close_many must be
solved somehow.

I can't provide remote access, but I am happy to test on Mellanox NICs,
if you find a way to fix link down requests.

On Wed, Feb 24, 2021 at 7:11 PM Elad Nachman <eladv6@gmail.com> wrote:

> Sorry, don't have Mellanox NIC currently. Will have one in 8-12 weeks.
> Will be happy to test it remotely if anyone can provide remote HW or
> VM (Azure, for example).
>
> Elad.
>
> On Wed, Feb 24, 2021 at 5:18 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >
> > Stephen's idea was to fix the deadlock when working with the bifurcated
> driver.
> > Your rework breaks this because you still send link down requests under
> rtnl_lock.
> > Did you test your patch with Mellanox devices?
> >
> > On Wed, Feb 24, 2021 at 5:56 PM Elad Nachman <eladv6@gmail.com> wrote:
> >>
> >> The deadlock scenarios are explained below:
> >>
> >> It is described in Stephen Hemminger's original patch:
> >>
> >> "
> >>
> >> This fixes a deadlock when using KNI with bifurcated drivers.
> >> Bringing kni device up always times out when using Mellanox
> >> devices.
> >>
> >> The kernel KNI driver sends message to userspace to complete
> >> the request. For the case of bifurcated driver, this may involve
> >> an additional request to kernel to change state. This request
> >> would deadlock because KNI was holding the RTNL mutex.
> >>
> >> "
> >>
> >> And also in my patch:
> >>
> >> "
> >> KNI sync lock is being locked while rtnl is held.
> >> If two threads are calling kni_net_process_request() ,
> >> then the first one will take the sync lock, release rtnl lock then
> sleep.
> >> The second thread will try to lock sync lock while holding rtnl.
> >> The first thread will wake, and try to lock rtnl, resulting in a
> deadlock.
> >> The remedy is to release rtnl before locking the KNI sync lock.
> >> Since in between nothing is accessing Linux network-wise,
> >> no rtnl locking is needed.
> >> "
> >>
> >> FYI,
> >>
> >> Elad.
> >>
> >> On Wed, Feb 24, 2021 at 4:41 PM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >> >
> >> > Both link up and link down also work for me without this patch.
> >> > So what's the point in merging it?
> >> >
> >> > Just to clarify - I am not against the idea of this patch.
> >> > Talking to userspace under rtnl_lock is a bad idea.
> >> > I just think that any patch should fix some specified problem.
> >> >
> >> > If this patch is trying to solve the overall "userspace request under
> rtnl_lock" problem,
> >> > then it doesn't solve it correctly, because we still send link down
> requests under the lock.
> >> >
> >> > If this patch is trying to solve some other issue, for example, some
> "KNI deadlocks"
> >> > you're talking about, then you should explain what these deadlocks
> are, how to reproduce
> >> > them and why this patch solves the issue.
> >> >
> >> > On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman <eladv6@gmail.com>
> wrote:
> >> >>
> >> >> I tested both link up and link down many times without any problems
> on
> >> >> 100 restarts of the application.
> >> >>
> >> >> Having KNI deadlock frequently for real life applications is far
> worst, IMHO.
> >> >>
> >> >> FYI
> >> >>
> >> >> Elad.
> >> >>
> >> >> On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov <iryzhov@nfware.com>
> wrote:
> >> >> >
> >> >> > Elad,
> >> >> >
> >> >> > I understand your point.
> >> >> > But the fact that this fix works for you doesn't mean that it will
> work for all DPDK users.
> >> >> >
> >> >> > For example, I provided two simple commands: "ip link set up" and
> "ip link set down".
> >> >> > Your fix works for only one of them. For me, this is not a proper
> fix.
> >> >> > It may work for you because you don't disable interfaces, but it
> will fail for users who do.
> >> >> >
> >> >> > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman <eladv6@gmail.com>
> wrote:
> >> >> >>
> >> >> >> Currently KNI has a lot of issues with deadlocks locking the code,
> >> >> >> after this commit, they are gone, and the code runs properly
> without
> >> >> >> crashing.
> >> >> >> That was tested with over 100 restarts of the application, which
> >> >> >> previously required a hard reset of the board.
> >> >> >>
> >> >> >> I think this benefit overweights the complication of the code.
> >> >> >>
> >> >> >> The function is called with rtnl locked because this is how the
> Linux
> >> >> >> kernel is designed to work - it is not designed to work with
> deferral
> >> >> >> to user-space mid-function.
> >> >> >>
> >> >> >> To fix all such requests you need to reach an agreement with Linux
> >> >> >> netdev, which is unlikely.
> >> >> >>
> >> >> >> Calling user-space can be done asynchronously, as Ferruh asked,
> but
> >> >> >> then you will always have to return success, even on failure, as
> Linux
> >> >> >> kernel does not have a mechanism to asynchronously report on
> failure
> >> >> >> for such system calls.
> >> >> >>
> >> >> >> IMHO - weighting the non-reporting of failure versus how the code
> >> >> >> looks (as it functions perfectly OK), I decided to go with
> >> >> >> functionality.
> >> >> >>
> >> >> >> FYI,
> >> >> >>
> >> >> >> Elad.
> >> >> >>
> >> >> >> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov <iryzhov@nfware.com>
> wrote:
> >> >> >> >
> >> >> >> > This looks more like a hack than an actual fix to me.
> >> >> >> >
> >> >> >> > After this commit:
> >> >> >> > "ip link set up" is sent to the userspace with unlocked
> rtnl_lock
> >> >> >> > "ip link set down" is sent to the userspace with locked
> rtnl_lock
> >> >> >> >
> >> >> >> > How is this really fixing anything? IMHO it only complicates
> the code.
> >> >> >> > If talking with userspace under rtnl_lock is a problem, then we
> should fix all such requests, not only part of them.
> >> >> >> > If it is not a problem, then I don't see any point in merging
> this.
> >> >> >> >
> >> >> >> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman <eladv6@gmail.com>
> wrote:
> >> >> >> >>
> >> >> >> >> This part of the series includes my fixes for the issues
> reported
> >> >> >> >> by Ferruh and Igor on top of part 1 of the patch series:
> >> >> >> >>
> >> >> >> >> A. KNI sync lock is being locked while rtnl is held.
> >> >> >> >> If two threads are calling kni_net_process_request() ,
> >> >> >> >> then the first one will take the sync lock, release rtnl lock
> then sleep.
> >> >> >> >> The second thread will try to lock sync lock while holding
> rtnl.
> >> >> >> >> The first thread will wake, and try to lock rtnl, resulting in
> a deadlock.
> >> >> >> >> The remedy is to release rtnl before locking the KNI sync lock.
> >> >> >> >> Since in between nothing is accessing Linux network-wise,
> >> >> >> >> no rtnl locking is needed.
> >> >> >> >>
> >> >> >> >> B. There is a race condition in __dev_close_many() processing
> the
> >> >> >> >> close_list while the application terminates.
> >> >> >> >> It looks like if two vEth devices are terminating,
> >> >> >> >> and one releases the rtnl lock, the other takes it,
> >> >> >> >> updating the close_list in an unstable state,
> >> >> >> >> causing the close_list to become a circular linked list,
> >> >> >> >> hence list_for_each_entry() will endlessly loop inside
> >> >> >> >> __dev_close_many() .
> >> >> >> >> Since the description for the original patch indicate the
> >> >> >> >> original motivation was bringing the device up,
> >> >> >> >> I have changed kni_net_process_request() to hold the rtnl mutex
> >> >> >> >> in case of bringing the device down since this is the path
> called
> >> >> >> >> from __dev_close_many() , causing the corruption of the
> close_list.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Elad Nachman <eladv6@gmail.com>
> >> >> >> >> ---
> >> >> >> >> v3:
> >> >> >> >> * Include original patch and new patch as a series of patch,
> added a
> >> >> >> >>   comment to the new patch
> >> >> >> >> v2:
> >> >> >> >> * rebuild the patch as increment from patch 64106
> >> >> >> >> * fix comment and blank lines
> >> >> >> >> ---
> >> >> >> >>  kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++--------
> >> >> >> >>  1 file changed, 21 insertions(+), 8 deletions(-)
> >> >> >> >>
> >> >> >> >> diff --git a/kernel/linux/kni/kni_net.c
> b/kernel/linux/kni/kni_net.c
> >> >> >> >> index f0b6e9a8d..017e44812 100644
> >> >> >> >> --- a/kernel/linux/kni/kni_net.c
> >> >> >> >> +++ b/kernel/linux/kni/kni_net.c
> >> >> >> >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device
> *dev, struct rte_kni_request *req)
> >> >> >> >>         void *resp_va;
> >> >> >> >>         uint32_t num;
> >> >> >> >>         int ret_val;
> >> >> >> >> +       int req_is_dev_stop = 0;
> >> >> >> >> +
> >> >> >> >> +       /* For configuring the interface to down,
> >> >> >> >> +        * rtnl must be held all the way to prevent race
> condition
> >> >> >> >> +        * inside __dev_close_many() between two netdev
> instances of KNI
> >> >> >> >> +        */
> >> >> >> >> +       if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
> >> >> >> >> +                       req->if_up == 0)
> >> >> >> >> +               req_is_dev_stop = 1;
> >> >> >> >>
> >> >> >> >>         ASSERT_RTNL();
> >> >> >> >>
> >> >> >> >> +       /* Since we need to wait and RTNL mutex is held
> >> >> >> >> +        * drop the mutex and hold reference to keep device
> >> >> >> >> +        */
> >> >> >> >> +       if (!req_is_dev_stop) {
> >> >> >> >> +               dev_hold(dev);
> >> >> >> >> +               rtnl_unlock();
> >> >> >> >> +       }
> >> >> >> >> +
> >> >> >> >>         mutex_lock(&kni->sync_lock);
> >> >> >> >>
> >> >> >> >>         /* Construct data */
> >> >> >> >> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device
> *dev, struct rte_kni_request *req)
> >> >> >> >>                 goto fail;
> >> >> >> >>         }
> >> >> >> >>
> >> >> >> >> -       /* Since we need to wait and RTNL mutex is held
> >> >> >> >> -        * drop the mutex and hold refernce to keep device
> >> >> >> >> -        */
> >> >> >> >> -       dev_hold(dev);
> >> >> >> >> -       rtnl_unlock();
> >> >> >> >> -
> >> >> >> >>         ret_val = wait_event_interruptible_timeout(kni->wq,
> >> >> >> >>                         kni_fifo_count(kni->resp_q), 3 * HZ);
> >> >> >> >> -       rtnl_lock();
> >> >> >> >> -       dev_put(dev);
> >> >> >> >>
> >> >> >> >>         if (signal_pending(current) || ret_val <= 0) {
> >> >> >> >>                 ret = -ETIME;
> >> >> >> >> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device
> *dev, struct rte_kni_request *req)
> >> >> >> >>
> >> >> >> >>  fail:
> >> >> >> >>         mutex_unlock(&kni->sync_lock);
> >> >> >> >> +       if (!req_is_dev_stop) {
> >> >> >> >> +               rtnl_lock();
> >> >> >> >> +               dev_put(dev);
> >> >> >> >> +       }
> >> >> >> >>         return ret;
> >> >> >> >>  }
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> 2.17.1
> >> >> >> >>
>
  

Patch

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index f0b6e9a8d..017e44812 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -110,9 +110,26 @@  kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
 	void *resp_va;
 	uint32_t num;
 	int ret_val;
+	int req_is_dev_stop = 0;
+
+	/* For configuring the interface to down,
+	 * rtnl must be held all the way to prevent race condition
+	 * inside __dev_close_many() between two netdev instances of KNI
+	 */
+	if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
+			req->if_up == 0)
+		req_is_dev_stop = 1;
 
 	ASSERT_RTNL();
 
+	/* Since we need to wait and RTNL mutex is held
+	 * drop the mutex and hold reference to keep device
+	 */
+	if (!req_is_dev_stop) {
+		dev_hold(dev);
+		rtnl_unlock();
+	}
+
 	mutex_lock(&kni->sync_lock);
 
 	/* Construct data */
@@ -124,16 +141,8 @@  kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
 		goto fail;
 	}
 
-	/* Since we need to wait and RTNL mutex is held
-	 * drop the mutex and hold refernce to keep device
-	 */
-	dev_hold(dev);
-	rtnl_unlock();
-
 	ret_val = wait_event_interruptible_timeout(kni->wq,
 			kni_fifo_count(kni->resp_q), 3 * HZ);
-	rtnl_lock();
-	dev_put(dev);
 
 	if (signal_pending(current) || ret_val <= 0) {
 		ret = -ETIME;
@@ -152,6 +161,10 @@  kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
 
 fail:
 	mutex_unlock(&kni->sync_lock);
+	if (!req_is_dev_stop) {
+		rtnl_lock();
+		dev_put(dev);
+	}
 	return ret;
 }