[v2] eal: fix positive error codes from probe/remove

Message ID 20190606100228.19959-1-i.maximets@samsung.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] eal: fix positive error codes from probe/remove |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation fail Compilation issues

Commit Message

Ilya Maximets June 6, 2019, 10:02 a.m. UTC
  According to API, 'rte_dev_probe()' and 'rte_dev_remove()' must
return 0 or negative error code. Bus code returns positive values
if device wasn't recognized by any driver, so the result of
'bus->plug/unplug()' must be converted. 'local_dev_probe()' and
'local_dev_remove()' also has their internal API, so the conversion
should be done there.

Positive on remove means that device not found by driver.
Positive on probe means that there are no suitable buses/drivers,
i.e. device is not supported.

Users of these API fixed to provide a good example by respecting
DPDK API. This also will allow to catch such issues in the future.

CC: stable@dpdk.org
Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
Fixes: 244d5130719c ("eal: enable hotplug on multi-process")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---

Version 2:

    * Fixed API callers.
    * Check for probe moved from 'rte_dev_probe' to 'local_dev_probe'.

 app/test-pmd/testpmd.c                 | 4 ++--
 drivers/net/failsafe/failsafe.c        | 2 +-
 drivers/net/failsafe/failsafe_eal.c    | 4 ++--
 drivers/net/failsafe/failsafe_ether.c  | 2 +-
 drivers/net/vdev_netvsc/vdev_netvsc.c  | 2 +-
 lib/librte_eal/common/eal_common_dev.c | 5 ++++-
 6 files changed, 11 insertions(+), 8 deletions(-)
  

Comments

David Marchand June 7, 2019, 8:32 a.m. UTC | #1
On Thu, Jun 6, 2019 at 12:03 PM Ilya Maximets <i.maximets@samsung.com>
wrote:

> According to API, 'rte_dev_probe()' and 'rte_dev_remove()' must
> return 0 or negative error code. Bus code returns positive values
> if device wasn't recognized by any driver, so the result of
> 'bus->plug/unplug()' must be converted. 'local_dev_probe()' and
> 'local_dev_remove()' also has their internal API, so the conversion
> should be done there.
>
> Positive on remove means that device not found by driver.
>

For backports, it is safer to add the check on > 0.
The patch looks good to me.

Reviewed-by: David Marchand <david.marchand@redhat.com>


But I have some comments on the current state of the code.
After inspecting the eal and buses, this problem is not supposed to happen
on the rte_dev_remove path.
rte_dev_remove() ensures that it calls local_dev_remove() after checking
that the device is attached to a driver (see the check on
!rte_dev_probed()).


Anatoly,

- When handling a detach operation in the primary process
https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/hotplug_mp.c#n124, we
signal all other secondary processes to detach right away.
Then we do a bus/device lookup.
Then we call the bus unplug.

Would not it be better to check the device exists _and_ check if the device
is attached to a driver in the primary process before calling other
secondary processes?


Thomas,

- Calling unplug on a device that is not attached is a bit weird to me, all
the more so that we have rte_dev_probed().
But there might be users calling directly the bus unplug api and not the
official api...
Does this enter the ABI stability perimeter?
If not, I would be for changing unplug api so that we only deal with 0 or <
0 on remove path.

On the plug side, is there a reason why we do not check for
rte_dev_probed() and let the bus replies that the device is already probed?
Does it have something to do with representors ?
Only guessing.

- On the plug side again, can't we have an indication from the buses that
they have a driver that can handle the device rather than this odd (and
historical) > 0 return code?
This should not change the current behavior, just make the code a bit
easier to understand.

I know you are travelling, so this can wait anyway.
  
Ilya Maximets June 17, 2019, 10:54 a.m. UTC | #2
On 06.06.2019 13:02, Ilya Maximets wrote:
> According to API, 'rte_dev_probe()' and 'rte_dev_remove()' must
> return 0 or negative error code. Bus code returns positive values
> if device wasn't recognized by any driver, so the result of
> 'bus->plug/unplug()' must be converted. 'local_dev_probe()' and
> 'local_dev_remove()' also has their internal API, so the conversion
> should be done there.
> 
> Positive on remove means that device not found by driver.
> Positive on probe means that there are no suitable buses/drivers,
> i.e. device is not supported.
> 
> Users of these API fixed to provide a good example by respecting
> DPDK API. This also will allow to catch such issues in the future.
> 
> CC: stable@dpdk.org
> Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
> Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
> 
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
> 
> Version 2:
> 
>     * Fixed API callers.
>     * Check for probe moved from 'rte_dev_probe' to 'local_dev_probe'.
> 
>  app/test-pmd/testpmd.c                 | 4 ++--
>  drivers/net/failsafe/failsafe.c        | 2 +-
>  drivers/net/failsafe/failsafe_eal.c    | 4 ++--
>  drivers/net/failsafe/failsafe_ether.c  | 2 +-
>  drivers/net/vdev_netvsc/vdev_netvsc.c  | 2 +-
>  lib/librte_eal/common/eal_common_dev.c | 5 ++++-
>  6 files changed, 11 insertions(+), 8 deletions(-)


Any more thoughts on this patch? Or can it be merged?

Best regards, Ilya Maximets.
  
Thomas Monjalon June 26, 2019, 9:03 p.m. UTC | #3
07/06/2019 10:32, David Marchand:
> On Thu, Jun 6, 2019 at 12:03 PM Ilya Maximets <i.maximets@samsung.com>
> wrote:
> 
> > According to API, 'rte_dev_probe()' and 'rte_dev_remove()' must
> > return 0 or negative error code. Bus code returns positive values
> > if device wasn't recognized by any driver, so the result of
> > 'bus->plug/unplug()' must be converted. 'local_dev_probe()' and
> > 'local_dev_remove()' also has their internal API, so the conversion
> > should be done there.
> >
> > Positive on remove means that device not found by driver.
> >
> 
> For backports, it is safer to add the check on > 0.
> The patch looks good to me.
> 
> Reviewed-by: David Marchand <david.marchand@redhat.com>

I did not get your comment. Is it OK to get this v2?
What do you mean about backports?
  
Thomas Monjalon June 26, 2019, 9:03 p.m. UTC | #4
07/06/2019 10:32, David Marchand:
> Thomas,
> 
> - Calling unplug on a device that is not attached is a bit weird to me, all
> the more so that we have rte_dev_probed().
> But there might be users calling directly the bus unplug api and not the
> official api...
> Does this enter the ABI stability perimeter?
> If not, I would be for changing unplug api so that we only deal with 0 or <
> 0 on remove path.

Where the positive value is documented?
If it's only a non-documented usage, I tend to think it can be changed.

> On the plug side, is there a reason why we do not check for
> rte_dev_probed() and let the bus replies that the device is already probed?

A device can be re-probed to allow discovering new ports.

> Does it have something to do with representors ?
> Only guessing.

Yes representors are a case of ports which can appear on a new probe.

> - On the plug side again, can't we have an indication from the buses that
> they have a driver that can handle the device rather than this odd (and
> historical) > 0 return code?
> This should not change the current behavior, just make the code a bit
> easier to understand.

The positive code is also used for white/blacklist.
And I think we may need to try probing in order to give a final answer,
in general case.
  
David Marchand June 27, 2019, 7:37 a.m. UTC | #5
On Wed, Jun 26, 2019 at 11:03 PM Thomas Monjalon <thomas@monjalon.net>
wrote:

> 07/06/2019 10:32, David Marchand:
> > On Thu, Jun 6, 2019 at 12:03 PM Ilya Maximets <i.maximets@samsung.com>
> > wrote:
> >
> > > According to API, 'rte_dev_probe()' and 'rte_dev_remove()' must
> > > return 0 or negative error code. Bus code returns positive values
> > > if device wasn't recognized by any driver, so the result of
> > > 'bus->plug/unplug()' must be converted. 'local_dev_probe()' and
> > > 'local_dev_remove()' also has their internal API, so the conversion
> > > should be done there.
> > >
> > > Positive on remove means that device not found by driver.
> > >
> >
> > For backports, it is safer to add the check on > 0.
> > The patch looks good to me.
> >
> > Reviewed-by: David Marchand <david.marchand@redhat.com>
>
> I did not get your comment. Is it OK to get this v2?
> What do you mean about backports?
>
>
Yes this v2 is ok.
I wanted to dissociate from my other comments which would not be part of
the fix for stable.
  
Thomas Monjalon June 29, 2019, 7:30 p.m. UTC | #6
07/06/2019 10:32, David Marchand:
> On Thu, Jun 6, 2019 at 12:03 PM Ilya Maximets <i.maximets@samsung.com>
> wrote:
> 
> > According to API, 'rte_dev_probe()' and 'rte_dev_remove()' must
> > return 0 or negative error code. Bus code returns positive values
> > if device wasn't recognized by any driver, so the result of
> > 'bus->plug/unplug()' must be converted. 'local_dev_probe()' and
> > 'local_dev_remove()' also has their internal API, so the conversion
> > should be done there.
> >
> > Positive on remove means that device not found by driver.
> >
> 
> For backports, it is safer to add the check on > 0.
> The patch looks good to me.
> 
> Reviewed-by: David Marchand <david.marchand@redhat.com>

Applied, thanks
  

Patch

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4f2a431e4..52244b442 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2361,7 +2361,7 @@  attach_port(char *identifier)
 		return;
 	}
 
-	if (rte_dev_probe(identifier) != 0) {
+	if (rte_dev_probe(identifier) < 0) {
 		TESTPMD_LOG(ERR, "Failed to attach port %s\n", identifier);
 		return;
 	}
@@ -2431,7 +2431,7 @@  detach_port_device(portid_t port_id)
 			port_flow_flush(port_id);
 	}
 
-	if (rte_dev_remove(dev) != 0) {
+	if (rte_dev_remove(dev) < 0) {
 		TESTPMD_LOG(ERR, "Failed to detach device %s\n", dev->name);
 		return;
 	}
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index e91c274d8..19dd71d4e 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -374,7 +374,7 @@  rte_pmd_failsafe_probe(struct rte_vdev_device *vdev)
 			}
 			if (!devargs_already_listed(&devargs)) {
 				ret = rte_dev_probe(devargs.name);
-				if (ret != 0) {
+				if (ret < 0) {
 					ERROR("Failed to probe devargs %s",
 					      devargs.name);
 					continue;
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index 820a915f7..b9fc50867 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -48,7 +48,7 @@  fs_bus_init(struct rte_eth_dev *dev)
 			ret = rte_eal_hotplug_add(da->bus->name,
 						  da->name,
 						  da->args);
-			if (ret) {
+			if (ret < 0) {
 				ERROR("sub_device %d probe failed %s%s%s", i,
 				      rte_errno ? "(" : "",
 				      rte_errno ? strerror(rte_errno) : "",
@@ -147,7 +147,7 @@  fs_bus_uninit(struct rte_eth_dev *dev)
 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
 		sdev_ret = rte_dev_remove(sdev->dev);
-		if (sdev_ret) {
+		if (sdev_ret < 0) {
 			ERROR("Failed to remove requested device %s (err: %d)",
 			      sdev->dev->name, sdev_ret);
 			continue;
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 4746fad36..504c76edb 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -284,7 +284,7 @@  fs_dev_remove(struct sub_device *sdev)
 		/* fallthrough */
 	case DEV_PROBED:
 		ret = rte_dev_remove(sdev->dev);
-		if (ret) {
+		if (ret < 0) {
 			ERROR("Bus detach failed for sub_device %u",
 			      SUB_ID(sdev));
 		} else {
diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c
index edab63e3a..1fcf90d7b 100644
--- a/drivers/net/vdev_netvsc/vdev_netvsc.c
+++ b/drivers/net/vdev_netvsc/vdev_netvsc.c
@@ -633,7 +633,7 @@  vdev_netvsc_netvsc_probe(const struct if_nameindex *iface,
 		ctx->devname, ctx->devargs);
 	vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, 0, ctx);
 	ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs);
-	if (ret)
+	if (ret < 0)
 		goto error;
 	LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry);
 	++vdev_netvsc_ctx_count;
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 824b8f926..f8f2a94b3 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -172,6 +172,9 @@  local_dev_probe(const char *devargs, struct rte_device **new_dev)
 	 */
 
 	ret = dev->bus->plug(dev);
+	if (ret > 0)
+		ret = -ENOTSUP;
+
 	if (ret && !rte_dev_is_probed(dev)) { /* if hasn't ever succeeded */
 		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
 			dev->name);
@@ -319,7 +322,7 @@  local_dev_remove(struct rte_device *dev)
 	if (ret) {
 		RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n",
 			dev->name);
-		return ret;
+		return (ret < 0) ? ret : -ENOENT;
 	}
 
 	return 0;