[v2,1/1] librte_eal: rte_intr_callback_unregister_sync() - wrapper around rte_intr_callback_unregister().

Message ID 20200817140828.9769-2-Renata.Saiakhova@ekinops.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series pci_vfio_disable_notifier(): avoid race with unregister |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Renata Saiakhova Aug. 17, 2020, 2:08 p.m. UTC
  Avoid race with unregister interrupt hanlder if interrupt
source has some active callbacks at the moment, use wrapper
around rte_intr_callback_unregister() to check for -EAGAIN
return value and to loop until rte_intr_callback_unregister()
succeeds.

Signed-off-by: Renata Saiakhova <Renata.Saiakhova@ekinops.com>
---
 drivers/bus/pci/linux/pci_vfio.c        |  2 +-
 lib/librte_eal/freebsd/eal_interrupts.c | 12 ++++++++++++
 lib/librte_eal/include/rte_interrupts.h | 25 +++++++++++++++++++++++++
 lib/librte_eal/linux/eal_interrupts.c   | 12 ++++++++++++
 lib/librte_eal/rte_eal_version.map      |  1 +
 5 files changed, 51 insertions(+), 1 deletion(-)
  

Comments

David Marchand Oct. 8, 2020, 7:47 a.m. UTC | #1
On Mon, Aug 17, 2020 at 4:09 PM Renata Saiakhova
<Renata.Saiakhova@ekinops.com> wrote:
>
> Avoid race with unregister interrupt hanlder if interrupt
> source has some active callbacks at the moment, use wrapper
> around rte_intr_callback_unregister() to check for -EAGAIN
> return value and to loop until rte_intr_callback_unregister()
> succeeds.
>
> Signed-off-by: Renata Saiakhova <Renata.Saiakhova@ekinops.com>

Review please.
  
David Marchand Oct. 20, 2020, 1:40 p.m. UTC | #2
On Thu, Oct 8, 2020 at 9:47 AM David Marchand <david.marchand@redhat.com> wrote:
>
> On Mon, Aug 17, 2020 at 4:09 PM Renata Saiakhova
> <Renata.Saiakhova@ekinops.com> wrote:
> >
> > Avoid race with unregister interrupt hanlder if interrupt
> > source has some active callbacks at the moment, use wrapper
> > around rte_intr_callback_unregister() to check for -EAGAIN
> > return value and to loop until rte_intr_callback_unregister()
> > succeeds.
> >
> > Signed-off-by: Renata Saiakhova <Renata.Saiakhova@ekinops.com>

Anatoly, Harman, this patch has been waiting for a long time.
Can you review it?


Thanks.
  
Harman Kalra Oct. 28, 2020, 8:36 p.m. UTC | #3
On Mon, Aug 17, 2020 at 04:08:27PM +0200, Renata Saiakhova wrote:
> External Email
> 
> ----------------------------------------------------------------------
> Avoid race with unregister interrupt hanlder if interrupt
> source has some active callbacks at the moment, use wrapper
> around rte_intr_callback_unregister() to check for -EAGAIN
> return value and to loop until rte_intr_callback_unregister()
> succeeds.
> 

Hi Renata,

   Just trying to understand the scenario, as you mentioned "while
   removing the device by rte_dev_remove()" are you calling
   rte_eal_hotplug_remove or kernel has sent an event to remove the
   device. As far as I know vfio notifier mechanism is used by kernel
   vfio driver to notify user to release the resources and as you are
   observing EAGAIN means same callback is executing.
   Regarding the tight polling loop in the patch, I think its good to
   have a fixed retry logic to avoid any unidentified corner case which
   might lead to infinite looping.

Thanks
Harman
   

> Signed-off-by: Renata Saiakhova <Renata.Saiakhova@ekinops.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c        |  2 +-
>  lib/librte_eal/freebsd/eal_interrupts.c | 12 ++++++++++++
>  lib/librte_eal/include/rte_interrupts.h | 25 +++++++++++++++++++++++++
>  lib/librte_eal/linux/eal_interrupts.c   | 12 ++++++++++++
>  lib/librte_eal/rte_eal_version.map      |  1 +
>  5 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index 07e072e13..a4bfdf553 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -415,7 +415,7 @@ pci_vfio_disable_notifier(struct rte_pci_device *dev)
>  		return -1;
>  	}
>  
> -	ret = rte_intr_callback_unregister(&dev->vfio_req_intr_handle,
> +	ret = rte_intr_callback_unregister_sync(&dev->vfio_req_intr_handle,
>  					   pci_vfio_req_handler,
>  					   (void *)&dev->device);
>  	if (ret < 0) {
> diff --git a/lib/librte_eal/freebsd/eal_interrupts.c b/lib/librte_eal/freebsd/eal_interrupts.c
> index 6d53d33c8..7d99bdaff 100644
> --- a/lib/librte_eal/freebsd/eal_interrupts.c
> +++ b/lib/librte_eal/freebsd/eal_interrupts.c
> @@ -345,6 +345,18 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle,
>  	return ret;
>  }
>  
> +int
> +rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
> +		rte_intr_callback_fn cb_fn, void *cb_arg)
> +{
> +	int ret = 0;
> +
> +	while ((ret = rte_intr_callback_unregister(intr_handle, cb_fn, cb_arg)) == -EAGAIN)
> +		rte_pause();
> +
> +	return ret;
> +}
> +
>  int
>  rte_intr_enable(const struct rte_intr_handle *intr_handle)
>  {
> diff --git a/lib/librte_eal/include/rte_interrupts.h b/lib/librte_eal/include/rte_interrupts.h
> index e3b406abc..cc3bf45d8 100644
> --- a/lib/librte_eal/include/rte_interrupts.h
> +++ b/lib/librte_eal/include/rte_interrupts.h
> @@ -94,6 +94,31 @@ rte_intr_callback_unregister_pending(const struct rte_intr_handle *intr_handle,
>  				rte_intr_callback_fn cb_fn, void *cb_arg,
>  				rte_intr_unregister_callback_fn ucb_fn);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Loop until rte_intr_callback_unregister() succeeds.
> + * After a call to this function,
> + * the callback provided by the specified interrupt handle is unregistered.
> + *
> + * @param intr_handle
> + *  pointer to the interrupt handle.
> + * @param cb
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +__rte_experimental
> +int
> +rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
> +				rte_intr_callback_fn cb, void *cb_arg);
> +
>  /**
>   * It enables the interrupt for the specified handle.
>   *
> diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
> index 13db5c4e8..c99d5dbd4 100644
> --- a/lib/librte_eal/linux/eal_interrupts.c
> +++ b/lib/librte_eal/linux/eal_interrupts.c
> @@ -662,6 +662,18 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle,
>  	return ret;
>  }
>  
> +int
> +rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
> +			rte_intr_callback_fn cb_fn, void *cb_arg)
> +{
> +	int ret = 0;
> +
> +	while ((ret = rte_intr_callback_unregister(intr_handle, cb_fn, cb_arg)) == -EAGAIN)
> +		rte_pause();
> +
> +	return ret;
> +}
> +
>  int
>  rte_intr_enable(const struct rte_intr_handle *intr_handle)
>  {
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index bf0c17c23..b1d824f59 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -325,6 +325,7 @@ EXPERIMENTAL {
>  	rte_fbarray_find_rev_biggest_free;
>  	rte_fbarray_find_rev_biggest_used;
>  	rte_intr_callback_unregister_pending;
> +	rte_intr_callback_unregister_sync;
>  	rte_realloc_socket;
>  
>  	# added in 19.08
> -- 
> 2.17.2
>
  
Renata Saiakhova Nov. 30, 2020, 5:20 p.m. UTC | #4
Hi Harman,

sorry for late reply...

Yes, indeed, this is a race between an application which calls rte_dev_remove() and a kernel event which is sent as a result of unbinding the device from vfio_pci driver.
(dpdk-devbind.py -u 0000:05:00.0)

rte_intr_callback_unregister() may fail and return -EAGAIN, if an interrupt source (kernel) has some active callbacks at the moment. As a result, the callback (req notifier) can never be unregistered,
and vfio_req_intr_handle.fd can never be closed.
The kernel will continuously try to notify the user space using req notifier, but as the device is already removed, in this case it even cannot find a bus for that device, below is the log which illustrates it:
EAL: fail to unregister req notifier handler.
EAL: fail to disable req notifier.
dpdk_disconnect 1545: Device '0000:05:00.0' has been removed and detached
dpdk_disconnect 1557: All devices shared with device '0000:05:00.0' have been detached
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
etc.

This continues eternally, and application stops to work properly.
So, at least the retry logic should be put somewhere to avoid this kind of race. Or bus->hot_unplug_handler(dev) called from pci_vfio_req_handler() should do some work to release the above resources.

 Regarding the tight polling loop in the patch and fixed retry logic to avoid infinite looping, what could be an option? As it continues to loop only in -EAGAIN case, which means kernel event is processed, doesn't it guarantee that it won't last forever?

Kind regards,
Renata
  
Burakov, Anatoly Feb. 11, 2021, 10:48 a.m. UTC | #5
On 17-Aug-20 3:08 PM, Renata Saiakhova wrote:
> Avoid race with unregister interrupt hanlder if interrupt
> source has some active callbacks at the moment, use wrapper
> around rte_intr_callback_unregister() to check for -EAGAIN
> return value and to loop until rte_intr_callback_unregister()
> succeeds.
> 
> Signed-off-by: Renata Saiakhova <Renata.Saiakhova@ekinops.com>
> ---

The subject line is too long, suggested rewording:

eal/interrupts: add synchronous wrapper around unregister

Otherwise, LGTM

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  

Patch

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 07e072e13..a4bfdf553 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -415,7 +415,7 @@  pci_vfio_disable_notifier(struct rte_pci_device *dev)
 		return -1;
 	}
 
-	ret = rte_intr_callback_unregister(&dev->vfio_req_intr_handle,
+	ret = rte_intr_callback_unregister_sync(&dev->vfio_req_intr_handle,
 					   pci_vfio_req_handler,
 					   (void *)&dev->device);
 	if (ret < 0) {
diff --git a/lib/librte_eal/freebsd/eal_interrupts.c b/lib/librte_eal/freebsd/eal_interrupts.c
index 6d53d33c8..7d99bdaff 100644
--- a/lib/librte_eal/freebsd/eal_interrupts.c
+++ b/lib/librte_eal/freebsd/eal_interrupts.c
@@ -345,6 +345,18 @@  rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle,
 	return ret;
 }
 
+int
+rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
+		rte_intr_callback_fn cb_fn, void *cb_arg)
+{
+	int ret = 0;
+
+	while ((ret = rte_intr_callback_unregister(intr_handle, cb_fn, cb_arg)) == -EAGAIN)
+		rte_pause();
+
+	return ret;
+}
+
 int
 rte_intr_enable(const struct rte_intr_handle *intr_handle)
 {
diff --git a/lib/librte_eal/include/rte_interrupts.h b/lib/librte_eal/include/rte_interrupts.h
index e3b406abc..cc3bf45d8 100644
--- a/lib/librte_eal/include/rte_interrupts.h
+++ b/lib/librte_eal/include/rte_interrupts.h
@@ -94,6 +94,31 @@  rte_intr_callback_unregister_pending(const struct rte_intr_handle *intr_handle,
 				rte_intr_callback_fn cb_fn, void *cb_arg,
 				rte_intr_unregister_callback_fn ucb_fn);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Loop until rte_intr_callback_unregister() succeeds.
+ * After a call to this function,
+ * the callback provided by the specified interrupt handle is unregistered.
+ *
+ * @param intr_handle
+ *  pointer to the interrupt handle.
+ * @param cb
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+__rte_experimental
+int
+rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
+				rte_intr_callback_fn cb, void *cb_arg);
+
 /**
  * It enables the interrupt for the specified handle.
  *
diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
index 13db5c4e8..c99d5dbd4 100644
--- a/lib/librte_eal/linux/eal_interrupts.c
+++ b/lib/librte_eal/linux/eal_interrupts.c
@@ -662,6 +662,18 @@  rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle,
 	return ret;
 }
 
+int
+rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
+			rte_intr_callback_fn cb_fn, void *cb_arg)
+{
+	int ret = 0;
+
+	while ((ret = rte_intr_callback_unregister(intr_handle, cb_fn, cb_arg)) == -EAGAIN)
+		rte_pause();
+
+	return ret;
+}
+
 int
 rte_intr_enable(const struct rte_intr_handle *intr_handle)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bf0c17c23..b1d824f59 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -325,6 +325,7 @@  EXPERIMENTAL {
 	rte_fbarray_find_rev_biggest_free;
 	rte_fbarray_find_rev_biggest_used;
 	rte_intr_callback_unregister_pending;
+	rte_intr_callback_unregister_sync;
 	rte_realloc_socket;
 
 	# added in 19.08