vdpa/mlx5: fix configuration mutex cleanup

Message ID 1609915409-272126-1-git-send-email-matan@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Maxime Coquelin
Headers
Series vdpa/mlx5: fix configuration mutex cleanup |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Matan Azrad Jan. 6, 2021, 6:43 a.m. UTC
  When the vDPA device is closed, the driver polling thread is canceled.
The polling thread locks the configuration mutex while it polls the CQs.

When the cancellation happens, it may terminate the thread inside the
critical section what remains the configuration mutex locked.

After device close, the driver may be configured again, in this case,
for example, when the first queue state is updated, the driver tries to
lock the mutex again and deadlock appears.

Initialize the mutex after the polling thread cancellation.

Fixes: 99abbd62c272 ("vdpa/mlx5: fix queue update synchronization")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Maxime Coquelin Jan. 7, 2021, 6:09 p.m. UTC | #1
On 1/6/21 7:43 AM, Matan Azrad wrote:
> When the vDPA device is closed, the driver polling thread is canceled.
> The polling thread locks the configuration mutex while it polls the CQs.
> 
> When the cancellation happens, it may terminate the thread inside the
> critical section what remains the configuration mutex locked.
> 
> After device close, the driver may be configured again, in this case,
> for example, when the first queue state is updated, the driver tries to
> lock the mutex again and deadlock appears.
> 
> Initialize the mutex after the polling thread cancellation.
> 
> Fixes: 99abbd62c272 ("vdpa/mlx5: fix queue update synchronization")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@nvidia.com>
> Acked-by: Xueming Li <xuemingl@nvidia.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index b64f364..0b2f1ab 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -295,6 +295,8 @@
>  	}
>  	priv->configured = 0;
>  	priv->vid = 0;
> +	/* The mutex may stay locked after event thread cancel - initiate it. */
> +	pthread_mutex_init(&priv->vq_config_lock, NULL);
>  	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
>  	return ret;
>  }
> 

I wonder if it would be possible and cleaner to disable cancellation on
the thread while the mutex is held?

Regards,
Maxime
  
David Marchand Jan. 8, 2021, 8:48 a.m. UTC | #2
On Thu, Jan 7, 2021 at 7:09 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 1/6/21 7:43 AM, Matan Azrad wrote:
> > When the vDPA device is closed, the driver polling thread is canceled.
> > The polling thread locks the configuration mutex while it polls the CQs.
> >
> > When the cancellation happens, it may terminate the thread inside the
> > critical section what remains the configuration mutex locked.
> >
> > After device close, the driver may be configured again, in this case,
> > for example, when the first queue state is updated, the driver tries to
> > lock the mutex again and deadlock appears.
> >
> > Initialize the mutex after the polling thread cancellation.
> >
> > Fixes: 99abbd62c272 ("vdpa/mlx5: fix queue update synchronization")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Matan Azrad <matan@nvidia.com>
> > Acked-by: Xueming Li <xuemingl@nvidia.com>
> > ---
> >  drivers/vdpa/mlx5/mlx5_vdpa.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> > index b64f364..0b2f1ab 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> > @@ -295,6 +295,8 @@
> >       }
> >       priv->configured = 0;
> >       priv->vid = 0;
> > +     /* The mutex may stay locked after event thread cancel - initiate it. */
> > +     pthread_mutex_init(&priv->vq_config_lock, NULL);
> >       DRV_LOG(INFO, "vDPA device %d was closed.", vid);
> >       return ret;
> >  }
> >
>
> I wonder if it would be possible and cleaner to disable cancellation on
> the thread while the mutex is held?

+1
  
David Marchand Jan. 14, 2021, 8:34 a.m. UTC | #3
On Fri, Jan 8, 2021 at 9:48 AM David Marchand <david.marchand@redhat.com> wrote:
> > I wonder if it would be possible and cleaner to disable cancellation on
> > the thread while the mutex is held?
>
> +1

IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied, adding
pthread_t to the list of types that are not required to be arithmetic
types, thus allowing pthread_t to be defined as a structure.

It would be better to leave pthread_t alone and not interpret it:

if (priv->timer_tid) {
    pthread_cancel(priv->timer_tid);
    pthread_join(priv->timer_tid, &status);
}
priv->timer_tid = 0;
  
Matan Azrad Jan. 14, 2021, 11:49 a.m. UTC | #4
Hi Maxime and David

Thank you for Review.

From: David Marchand
> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
> <david.marchand@redhat.com> wrote:
> > > I wonder if it would be possible and cleaner to disable cancellation
> > > on the thread while the mutex is held?

Yes, we can cause thread to return by some global variable sync.
It is the same logic.

> > +1
> 
> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied, adding
> pthread_t to the list of types that are not required to be arithmetic types, thus
> allowing pthread_t to be defined as a structure.
> 
> It would be better to leave pthread_t alone and not interpret it:
> 
> if (priv->timer_tid) {
>     pthread_cancel(priv->timer_tid);
>     pthread_join(priv->timer_tid, &status); }
> priv->timer_tid = 0;


I'm not sure why you think it is better in this specific case.
The cancellation will close the thread in faster way, no need to wait for the thread to close itself.


> 
> --
> David Marchand
  
Maxime Coquelin Jan. 14, 2021, 12:38 p.m. UTC | #5
Hi Matan,

On 1/14/21 12:49 PM, Matan Azrad wrote:
> Hi Maxime and David
> 
> Thank you for Review.
> 
> From: David Marchand
>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
>> <david.marchand@redhat.com> wrote:
>>>> I wonder if it would be possible and cleaner to disable cancellation
>>>> on the thread while the mutex is held?
> 
> Yes, we can cause thread to return by some global variable sync.
> It is the same logic.

No, that was not my suggestion. My suggestion is to block the thread
cancellation while in the critical section, using
pthread_setcancelstate().


>>> +1
>>
>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied, adding
>> pthread_t to the list of types that are not required to be arithmetic types, thus
>> allowing pthread_t to be defined as a structure.
>>
>> It would be better to leave pthread_t alone and not interpret it:
>>
>> if (priv->timer_tid) {
>>     pthread_cancel(priv->timer_tid);
>>     pthread_join(priv->timer_tid, &status); }
>> priv->timer_tid = 0;
> 
> 
> I'm not sure why you think it is better in this specific case.
> The cancellation will close the thread in faster way, no need to wait for the thread to close itself.
> 
> 
>>
>> --
>> David Marchand
>
  
Matan Azrad Jan. 14, 2021, 1:09 p.m. UTC | #6
From: Maxime Coquelin
> Hi Matan,
> 
> On 1/14/21 12:49 PM, Matan Azrad wrote:
> > Hi Maxime and David
> >
> > Thank you for Review.
> >
> > From: David Marchand
> >> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
> >> <david.marchand@redhat.com> wrote:
> >>>> I wonder if it would be possible and cleaner to disable
> >>>> cancellation on the thread while the mutex is held?
> >
> > Yes, we can cause thread to return by some global variable sync.
> > It is the same logic.
> 
> No, that was not my suggestion. My suggestion is to block the thread
> cancellation while in the critical section, using pthread_setcancelstate().

Yes, Generally it is better to let the thread control his cancellation, either cancel itself or enabling\disabling cancellations. 

I don't see a reason to wait for the thread in current logic - the critical section is not important to be completed here.

We just want to close the thread and to clean the mutex. 
 
> >>> +1
> >>
> >> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
> >> adding pthread_t to the list of types that are not required to be
> >> arithmetic types, thus allowing pthread_t to be defined as a structure.
> >>
> >> It would be better to leave pthread_t alone and not interpret it:
> >>
> >> if (priv->timer_tid) {
> >>     pthread_cancel(priv->timer_tid);
> >>     pthread_join(priv->timer_tid, &status); }
> >> priv->timer_tid = 0;
> >
> >
> > I'm not sure why you think it is better in this specific case.
> > The cancellation will close the thread in faster way, no need to wait for the
> thread to close itself.
> >
> >
> >>
> >> --
> >> David Marchand
> >
  
Maxime Coquelin Jan. 14, 2021, 2:27 p.m. UTC | #7
On 1/14/21 2:09 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin
>> Hi Matan,
>>
>> On 1/14/21 12:49 PM, Matan Azrad wrote:
>>> Hi Maxime and David
>>>
>>> Thank you for Review.
>>>
>>> From: David Marchand
>>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
>>>> <david.marchand@redhat.com> wrote:
>>>>>> I wonder if it would be possible and cleaner to disable
>>>>>> cancellation on the thread while the mutex is held?
>>>
>>> Yes, we can cause thread to return by some global variable sync.
>>> It is the same logic.
>>
>> No, that was not my suggestion. My suggestion is to block the thread
>> cancellation while in the critical section, using pthread_setcancelstate().
> 
> Yes, Generally it is better to let the thread control his cancellation, either cancel itself or enabling\disabling cancellations. 
> 
> I don't see a reason to wait for the thread in current logic - the critical section is not important to be completed here.

The reason I see is there are quite a few things done in this critical
section. And if tomorrow someone add new things in it, he may not know
the thread can be cancelled at any time, which could cause hard to debug
issues.

> We just want to close the thread and to clean the mutex. 
>  
>>>>> +1
>>>>
>>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
>>>> adding pthread_t to the list of types that are not required to be
>>>> arithmetic types, thus allowing pthread_t to be defined as a structure.
>>>>
>>>> It would be better to leave pthread_t alone and not interpret it:
>>>>
>>>> if (priv->timer_tid) {
>>>>     pthread_cancel(priv->timer_tid);
>>>>     pthread_join(priv->timer_tid, &status); }
>>>> priv->timer_tid = 0;
>>>
>>>
>>> I'm not sure why you think it is better in this specific case.
>>> The cancellation will close the thread in faster way, no need to wait for the
>> thread to close itself.
>>>
>>>
>>>>
>>>> --
>>>> David Marchand
>>>
>
  
Matan Azrad Jan. 14, 2021, 3:23 p.m. UTC | #8
From: Maxime Coquelin
> On 1/14/21 2:09 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin
> >> Hi Matan,
> >>
> >> On 1/14/21 12:49 PM, Matan Azrad wrote:
> >>> Hi Maxime and David
> >>>
> >>> Thank you for Review.
> >>>
> >>> From: David Marchand
> >>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
> >>>> <david.marchand@redhat.com> wrote:
> >>>>>> I wonder if it would be possible and cleaner to disable
> >>>>>> cancellation on the thread while the mutex is held?
> >>>
> >>> Yes, we can cause thread to return by some global variable sync.
> >>> It is the same logic.
> >>
> >> No, that was not my suggestion. My suggestion is to block the thread
> >> cancellation while in the critical section, using pthread_setcancelstate().
> >
> > Yes, Generally it is better to let the thread control his cancellation, either
> cancel itself or enabling\disabling cancellations.
> >
> > I don't see a reason to wait for the thread in current logic - the critical section
> is not important to be completed here.
> 
> The reason I see is there are quite a few things done in this critical section. And
> if tomorrow someone add new things in it, he may not know the thread can be
> cancelled at any time, which could cause hard to debug issues.

As I said, here it is not needed, this thread designed just to cause guest notifications.

The optional future developer mistake can be done also outside the critical section in in any other place - we cannot protect it.

The design choice is to close the thread fast.

> > We just want to close the thread and to clean the mutex.
> >
> >>>>> +1
> >>>>
> >>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
> >>>> adding pthread_t to the list of types that are not required to be
> >>>> arithmetic types, thus allowing pthread_t to be defined as a structure.
> >>>>
> >>>> It would be better to leave pthread_t alone and not interpret it:
> >>>>
> >>>> if (priv->timer_tid) {
> >>>>     pthread_cancel(priv->timer_tid);
> >>>>     pthread_join(priv->timer_tid, &status); }
> >>>> priv->timer_tid = 0;
> >>>
> >>>
> >>> I'm not sure why you think it is better in this specific case.
> >>> The cancellation will close the thread in faster way, no need to
> >>> wait for the
> >> thread to close itself.
> >>>
> >>>
> >>>>
> >>>> --
> >>>> David Marchand
> >>>
> >
  
Maxime Coquelin Jan. 21, 2021, 10:46 a.m. UTC | #9
On 1/14/21 4:23 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin
>> On 1/14/21 2:09 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin
>>>> Hi Matan,
>>>>
>>>> On 1/14/21 12:49 PM, Matan Azrad wrote:
>>>>> Hi Maxime and David
>>>>>
>>>>> Thank you for Review.
>>>>>
>>>>> From: David Marchand
>>>>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
>>>>>> <david.marchand@redhat.com> wrote:
>>>>>>>> I wonder if it would be possible and cleaner to disable
>>>>>>>> cancellation on the thread while the mutex is held?
>>>>>
>>>>> Yes, we can cause thread to return by some global variable sync.
>>>>> It is the same logic.
>>>>
>>>> No, that was not my suggestion. My suggestion is to block the thread
>>>> cancellation while in the critical section, using pthread_setcancelstate().
>>>
>>> Yes, Generally it is better to let the thread control his cancellation, either
>> cancel itself or enabling\disabling cancellations.
>>>
>>> I don't see a reason to wait for the thread in current logic - the critical section
>> is not important to be completed here.
>>
>> The reason I see is there are quite a few things done in this critical section. And
>> if tomorrow someone add new things in it, he may not know the thread can be
>> cancelled at any time, which could cause hard to debug issues.
> 
> As I said, here it is not needed, this thread designed just to cause guest notifications.
> 
> The optional future developer mistake can be done also outside the critical section in in any other place - we cannot protect it.
> 
> The design choice is to close the thread fast.

But why is it so urgent that it cannot been stopped cleanly?
I don't think it would add seconds delay by doing it in a clean way.

Thanks,
Maxime

>>> We just want to close the thread and to clean the mutex.
>>>
>>>>>>> +1
>>>>>>
>>>>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
>>>>>> adding pthread_t to the list of types that are not required to be
>>>>>> arithmetic types, thus allowing pthread_t to be defined as a structure.
>>>>>>
>>>>>> It would be better to leave pthread_t alone and not interpret it:
>>>>>>
>>>>>> if (priv->timer_tid) {
>>>>>>     pthread_cancel(priv->timer_tid);
>>>>>>     pthread_join(priv->timer_tid, &status); }
>>>>>> priv->timer_tid = 0;
>>>>>
>>>>>
>>>>> I'm not sure why you think it is better in this specific case.
>>>>> The cancellation will close the thread in faster way, no need to
>>>>> wait for the
>>>> thread to close itself.
>>>>>
>>>>>
>>>>>>
>>>>>> --
>>>>>> David Marchand
>>>>>
>>>
>
  
Matan Azrad Jan. 21, 2021, 8:13 p.m. UTC | #10
From: Maxime Coquelin
> On 1/14/21 4:23 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin
> >> On 1/14/21 2:09 PM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin
> >>>> Hi Matan,
> >>>>
> >>>> On 1/14/21 12:49 PM, Matan Azrad wrote:
> >>>>> Hi Maxime and David
> >>>>>
> >>>>> Thank you for Review.
> >>>>>
> >>>>> From: David Marchand
> >>>>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
> >>>>>> <david.marchand@redhat.com> wrote:
> >>>>>>>> I wonder if it would be possible and cleaner to disable
> >>>>>>>> cancellation on the thread while the mutex is held?
> >>>>>
> >>>>> Yes, we can cause thread to return by some global variable sync.
> >>>>> It is the same logic.
> >>>>
> >>>> No, that was not my suggestion. My suggestion is to block the
> >>>> thread cancellation while in the critical section, using
> pthread_setcancelstate().
> >>>
> >>> Yes, Generally it is better to let the thread control his
> >>> cancellation, either
> >> cancel itself or enabling\disabling cancellations.
> >>>
> >>> I don't see a reason to wait for the thread in current logic - the
> >>> critical section
> >> is not important to be completed here.
> >>
> >> The reason I see is there are quite a few things done in this
> >> critical section. And if tomorrow someone add new things in it, he
> >> may not know the thread can be cancelled at any time, which could cause
> hard to debug issues.
> >
> > As I said, here it is not needed, this thread designed just to cause guest
> notifications.
> >
> > The optional future developer mistake can be done also outside the critical
> section in in any other place - we cannot protect it.
> >
> > The design choice is to close the thread fast.
> 
> But why is it so urgent that it cannot been stopped cleanly?
> I don't think it would add seconds delay by doing it in a clean way.

We have system calls there per queue.
No need this optional delay just because of mutex cleaning. 


 
> Thanks,
> Maxime
> 
> >>> We just want to close the thread and to clean the mutex.
> >>>
> >>>>>>> +1
> >>>>>>
> >>>>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
> >>>>>> adding pthread_t to the list of types that are not required to be
> >>>>>> arithmetic types, thus allowing pthread_t to be defined as a structure.
> >>>>>>
> >>>>>> It would be better to leave pthread_t alone and not interpret it:
> >>>>>>
> >>>>>> if (priv->timer_tid) {
> >>>>>>     pthread_cancel(priv->timer_tid);
> >>>>>>     pthread_join(priv->timer_tid, &status); }
> >>>>>> priv->timer_tid = 0;
> >>>>>
> >>>>>
> >>>>> I'm not sure why you think it is better in this specific case.
> >>>>> The cancellation will close the thread in faster way, no need to
> >>>>> wait for the
> >>>> thread to close itself.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> --
> >>>>>> David Marchand
> >>>>>
> >>>
> >
  
Maxime Coquelin Jan. 26, 2021, 10:22 a.m. UTC | #11
On 1/21/21 9:13 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin
>> On 1/14/21 4:23 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin
>>>> On 1/14/21 2:09 PM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>> From: Maxime Coquelin
>>>>>> Hi Matan,
>>>>>>
>>>>>> On 1/14/21 12:49 PM, Matan Azrad wrote:
>>>>>>> Hi Maxime and David
>>>>>>>
>>>>>>> Thank you for Review.
>>>>>>>
>>>>>>> From: David Marchand
>>>>>>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
>>>>>>>> <david.marchand@redhat.com> wrote:
>>>>>>>>>> I wonder if it would be possible and cleaner to disable
>>>>>>>>>> cancellation on the thread while the mutex is held?
>>>>>>>
>>>>>>> Yes, we can cause thread to return by some global variable sync.
>>>>>>> It is the same logic.
>>>>>>
>>>>>> No, that was not my suggestion. My suggestion is to block the
>>>>>> thread cancellation while in the critical section, using
>> pthread_setcancelstate().
>>>>>
>>>>> Yes, Generally it is better to let the thread control his
>>>>> cancellation, either
>>>> cancel itself or enabling\disabling cancellations.
>>>>>
>>>>> I don't see a reason to wait for the thread in current logic - the
>>>>> critical section
>>>> is not important to be completed here.
>>>>
>>>> The reason I see is there are quite a few things done in this
>>>> critical section. And if tomorrow someone add new things in it, he
>>>> may not know the thread can be cancelled at any time, which could cause
>> hard to debug issues.
>>>
>>> As I said, here it is not needed, this thread designed just to cause guest
>> notifications.
>>>
>>> The optional future developer mistake can be done also outside the critical
>> section in in any other place - we cannot protect it.
>>>
>>> The design choice is to close the thread fast.
>>
>> But why is it so urgent that it cannot been stopped cleanly?
>> I don't think it would add seconds delay by doing it in a clean way.
> 
> We have system calls there per queue.
> No need this optional delay just because of mutex cleaning. 

OK, up to you...

And what about the timer lock?

> 
>  
>> Thanks,
>> Maxime
>>
>>>>> We just want to close the thread and to clean the mutex.
>>>>>
>>>>>>>>> +1
>>>>>>>>
>>>>>>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
>>>>>>>> adding pthread_t to the list of types that are not required to be
>>>>>>>> arithmetic types, thus allowing pthread_t to be defined as a structure.
>>>>>>>>
>>>>>>>> It would be better to leave pthread_t alone and not interpret it:
>>>>>>>>
>>>>>>>> if (priv->timer_tid) {
>>>>>>>>     pthread_cancel(priv->timer_tid);
>>>>>>>>     pthread_join(priv->timer_tid, &status); }
>>>>>>>> priv->timer_tid = 0;
>>>>>>>
>>>>>>>
>>>>>>> I'm not sure why you think it is better in this specific case.
>>>>>>> The cancellation will close the thread in faster way, no need to
>>>>>>> wait for the
>>>>>> thread to close itself.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> David Marchand
>>>>>>>
>>>>>
>>>
>
  
Matan Azrad Jan. 26, 2021, 10:45 a.m. UTC | #12
From: Maxime Coquelin
> > From: Maxime Coquelin
> >> On 1/14/21 4:23 PM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin
> >>>> On 1/14/21 2:09 PM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin
> >>>>>> Hi Matan,
> >>>>>>
> >>>>>> On 1/14/21 12:49 PM, Matan Azrad wrote:
> >>>>>>> Hi Maxime and David
> >>>>>>>
> >>>>>>> Thank you for Review.
> >>>>>>>
> >>>>>>> From: David Marchand
> >>>>>>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
> >>>>>>>> <david.marchand@redhat.com> wrote:
> >>>>>>>>>> I wonder if it would be possible and cleaner to disable
> >>>>>>>>>> cancellation on the thread while the mutex is held?
> >>>>>>>
> >>>>>>> Yes, we can cause thread to return by some global variable sync.
> >>>>>>> It is the same logic.
> >>>>>>
> >>>>>> No, that was not my suggestion. My suggestion is to block the
> >>>>>> thread cancellation while in the critical section, using
> >> pthread_setcancelstate().
> >>>>>
> >>>>> Yes, Generally it is better to let the thread control his
> >>>>> cancellation, either
> >>>> cancel itself or enabling\disabling cancellations.
> >>>>>
> >>>>> I don't see a reason to wait for the thread in current logic - the
> >>>>> critical section
> >>>> is not important to be completed here.
> >>>>
> >>>> The reason I see is there are quite a few things done in this
> >>>> critical section. And if tomorrow someone add new things in it, he
> >>>> may not know the thread can be cancelled at any time, which could
> >>>> cause
> >> hard to debug issues.
> >>>
> >>> As I said, here it is not needed, this thread designed just to cause
> >>> guest
> >> notifications.
> >>>
> >>> The optional future developer mistake can be done also outside the
> >>> critical
> >> section in in any other place - we cannot protect it.
> >>>
> >>> The design choice is to close the thread fast.
> >>
> >> But why is it so urgent that it cannot been stopped cleanly?
> >> I don't think it would add seconds delay by doing it in a clean way.
> >
> > We have system calls there per queue.
> > No need this optional delay just because of mutex cleaning.
> 
> OK, up to you...
> 
> And what about the timer lock?

Existing code initiates it before reusing...

Thanks.

> 
> >
> >
> >> Thanks,
> >> Maxime
> >>
> >>>>> We just want to close the thread and to clean the mutex.
> >>>>>
> >>>>>>>>> +1
> >>>>>>>>
> >>>>>>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
> >>>>>>>> adding pthread_t to the list of types that are not required to
> >>>>>>>> be arithmetic types, thus allowing pthread_t to be defined as a
> structure.
> >>>>>>>>
> >>>>>>>> It would be better to leave pthread_t alone and not interpret it:
> >>>>>>>>
> >>>>>>>> if (priv->timer_tid) {
> >>>>>>>>     pthread_cancel(priv->timer_tid);
> >>>>>>>>     pthread_join(priv->timer_tid, &status); }
> >>>>>>>> priv->timer_tid = 0;
> >>>>>>>
> >>>>>>>
> >>>>>>> I'm not sure why you think it is better in this specific case.
> >>>>>>> The cancellation will close the thread in faster way, no need to
> >>>>>>> wait for the
> >>>>>> thread to close itself.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> David Marchand
> >>>>>>>
> >>>>>
> >>>
> >
  
Maxime Coquelin Jan. 26, 2021, 1 p.m. UTC | #13
On 1/26/21 11:45 AM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin
>>> From: Maxime Coquelin
>>>> On 1/14/21 4:23 PM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>> From: Maxime Coquelin
>>>>>> On 1/14/21 2:09 PM, Matan Azrad wrote:
>>>>>>>
>>>>>>>
>>>>>>> From: Maxime Coquelin
>>>>>>>> Hi Matan,
>>>>>>>>
>>>>>>>> On 1/14/21 12:49 PM, Matan Azrad wrote:
>>>>>>>>> Hi Maxime and David
>>>>>>>>>
>>>>>>>>> Thank you for Review.
>>>>>>>>>
>>>>>>>>> From: David Marchand
>>>>>>>>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
>>>>>>>>>> <david.marchand@redhat.com> wrote:
>>>>>>>>>>>> I wonder if it would be possible and cleaner to disable
>>>>>>>>>>>> cancellation on the thread while the mutex is held?
>>>>>>>>>
>>>>>>>>> Yes, we can cause thread to return by some global variable sync.
>>>>>>>>> It is the same logic.
>>>>>>>>
>>>>>>>> No, that was not my suggestion. My suggestion is to block the
>>>>>>>> thread cancellation while in the critical section, using
>>>> pthread_setcancelstate().
>>>>>>>
>>>>>>> Yes, Generally it is better to let the thread control his
>>>>>>> cancellation, either
>>>>>> cancel itself or enabling\disabling cancellations.
>>>>>>>
>>>>>>> I don't see a reason to wait for the thread in current logic - the
>>>>>>> critical section
>>>>>> is not important to be completed here.
>>>>>>
>>>>>> The reason I see is there are quite a few things done in this
>>>>>> critical section. And if tomorrow someone add new things in it, he
>>>>>> may not know the thread can be cancelled at any time, which could
>>>>>> cause
>>>> hard to debug issues.
>>>>>
>>>>> As I said, here it is not needed, this thread designed just to cause
>>>>> guest
>>>> notifications.
>>>>>
>>>>> The optional future developer mistake can be done also outside the
>>>>> critical
>>>> section in in any other place - we cannot protect it.
>>>>>
>>>>> The design choice is to close the thread fast.
>>>>
>>>> But why is it so urgent that it cannot been stopped cleanly?
>>>> I don't think it would add seconds delay by doing it in a clean way.
>>>
>>> We have system calls there per queue.
>>> No need this optional delay just because of mutex cleaning.
>>
>> OK, up to you...
>>
>> And what about the timer lock?
> 
> Existing code initiates it before reusing...

Ok, so why not applying same logic for both mutexes?

> Thanks.
> 
>>
>>>
>>>
>>>> Thanks,
>>>> Maxime
>>>>
>>>>>>> We just want to close the thread and to clean the mutex.
>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is applied,
>>>>>>>>>> adding pthread_t to the list of types that are not required to
>>>>>>>>>> be arithmetic types, thus allowing pthread_t to be defined as a
>> structure.
>>>>>>>>>>
>>>>>>>>>> It would be better to leave pthread_t alone and not interpret it:
>>>>>>>>>>
>>>>>>>>>> if (priv->timer_tid) {
>>>>>>>>>>     pthread_cancel(priv->timer_tid);
>>>>>>>>>>     pthread_join(priv->timer_tid, &status); }
>>>>>>>>>> priv->timer_tid = 0;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not sure why you think it is better in this specific case.
>>>>>>>>> The cancellation will close the thread in faster way, no need to
>>>>>>>>> wait for the
>>>>>>>> thread to close itself.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> David Marchand
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
  
Matan Azrad Jan. 26, 2021, 6:23 p.m. UTC | #14
From: Maxime Coquelin
> On 1/26/21 11:45 AM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin
> >>> From: Maxime Coquelin
> >>>> On 1/14/21 4:23 PM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin
> >>>>>> On 1/14/21 2:09 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin
> >>>>>>>> Hi Matan,
> >>>>>>>>
> >>>>>>>> On 1/14/21 12:49 PM, Matan Azrad wrote:
> >>>>>>>>> Hi Maxime and David
> >>>>>>>>>
> >>>>>>>>> Thank you for Review.
> >>>>>>>>>
> >>>>>>>>> From: David Marchand
> >>>>>>>>>> On Fri, Jan 8, 2021 at 9:48 AM David Marchand
> >>>>>>>>>> <david.marchand@redhat.com> wrote:
> >>>>>>>>>>>> I wonder if it would be possible and cleaner to disable
> >>>>>>>>>>>> cancellation on the thread while the mutex is held?
> >>>>>>>>>
> >>>>>>>>> Yes, we can cause thread to return by some global variable sync.
> >>>>>>>>> It is the same logic.
> >>>>>>>>
> >>>>>>>> No, that was not my suggestion. My suggestion is to block the
> >>>>>>>> thread cancellation while in the critical section, using
> >>>> pthread_setcancelstate().
> >>>>>>>
> >>>>>>> Yes, Generally it is better to let the thread control his
> >>>>>>> cancellation, either
> >>>>>> cancel itself or enabling\disabling cancellations.
> >>>>>>>
> >>>>>>> I don't see a reason to wait for the thread in current logic -
> >>>>>>> the critical section
> >>>>>> is not important to be completed here.
> >>>>>>
> >>>>>> The reason I see is there are quite a few things done in this
> >>>>>> critical section. And if tomorrow someone add new things in it,
> >>>>>> he may not know the thread can be cancelled at any time, which
> >>>>>> could cause
> >>>> hard to debug issues.
> >>>>>
> >>>>> As I said, here it is not needed, this thread designed just to
> >>>>> cause guest
> >>>> notifications.
> >>>>>
> >>>>> The optional future developer mistake can be done also outside the
> >>>>> critical
> >>>> section in in any other place - we cannot protect it.
> >>>>>
> >>>>> The design choice is to close the thread fast.
> >>>>
> >>>> But why is it so urgent that it cannot been stopped cleanly?
> >>>> I don't think it would add seconds delay by doing it in a clean way.
> >>>
> >>> We have system calls there per queue.
> >>> No need this optional delay just because of mutex cleaning.
> >>
> >> OK, up to you...
> >>
> >> And what about the timer lock?
> >
> > Existing code initiates it before reusing...
> 
> Ok, so why not applying same logic for both mutexes?

Different dependencies, different usage.

Timer timer lock is more tied to the poll thread usage, this patch mutex has more usage, not only for the poll thread management.

> 
> > Thanks.
> >
> >>
> >>>
> >>>
> >>>> Thanks,
> >>>> Maxime
> >>>>
> >>>>>>> We just want to close the thread and to clean the mutex.
> >>>>>>>
> >>>>>>>>>>> +1
> >>>>>>>>>>
> >>>>>>>>>> IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/26 is
> >>>>>>>>>> applied, adding pthread_t to the list of types that are not
> >>>>>>>>>> required to be arithmetic types, thus allowing pthread_t to
> >>>>>>>>>> be defined as a
> >> structure.
> >>>>>>>>>>
> >>>>>>>>>> It would be better to leave pthread_t alone and not interpret it:
> >>>>>>>>>>
> >>>>>>>>>> if (priv->timer_tid) {
> >>>>>>>>>>     pthread_cancel(priv->timer_tid);
> >>>>>>>>>>     pthread_join(priv->timer_tid, &status); }
> >>>>>>>>>> priv->timer_tid = 0;
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm not sure why you think it is better in this specific case.
> >>>>>>>>> The cancellation will close the thread in faster way, no need
> >>>>>>>>> to wait for the
> >>>>>>>> thread to close itself.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> David Marchand
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >
  
Maxime Coquelin Jan. 27, 2021, 10:45 a.m. UTC | #15
On 1/6/21 7:43 AM, Matan Azrad wrote:
> When the vDPA device is closed, the driver polling thread is canceled.
> The polling thread locks the configuration mutex while it polls the CQs.
> 
> When the cancellation happens, it may terminate the thread inside the
> critical section what remains the configuration mutex locked.
> 
> After device close, the driver may be configured again, in this case,
> for example, when the first queue state is updated, the driver tries to
> lock the mutex again and deadlock appears.
> 
> Initialize the mutex after the polling thread cancellation.
> 
> Fixes: 99abbd62c272 ("vdpa/mlx5: fix queue update synchronization")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@nvidia.com>
> Acked-by: Xueming Li <xuemingl@nvidia.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index b64f364..0b2f1ab 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -295,6 +295,8 @@
>  	}
>  	priv->configured = 0;
>  	priv->vid = 0;
> +	/* The mutex may stay locked after event thread cancel - initiate it. */
> +	pthread_mutex_init(&priv->vq_config_lock, NULL);
>  	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
>  	return ret;
>  }
> 

Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
  
Maxime Coquelin Jan. 27, 2021, 12:01 p.m. UTC | #16
On 1/6/21 7:43 AM, Matan Azrad wrote:
> When the vDPA device is closed, the driver polling thread is canceled.
> The polling thread locks the configuration mutex while it polls the CQs.
> 
> When the cancellation happens, it may terminate the thread inside the
> critical section what remains the configuration mutex locked.
> 
> After device close, the driver may be configured again, in this case,
> for example, when the first queue state is updated, the driver tries to
> lock the mutex again and deadlock appears.
> 
> Initialize the mutex after the polling thread cancellation.
> 
> Fixes: 99abbd62c272 ("vdpa/mlx5: fix queue update synchronization")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@nvidia.com>
> Acked-by: Xueming Li <xuemingl@nvidia.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime
  

Patch

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index b64f364..0b2f1ab 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -295,6 +295,8 @@ 
 	}
 	priv->configured = 0;
 	priv->vid = 0;
+	/* The mutex may stay locked after event thread cancel - initiate it. */
+	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }