[dpdk-dev] eal: bus scan and probe never fail

Message ID 20170812102220.27773-1-shreyansh.jain@nxp.com (mailing list archive)
State Accepted, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Shreyansh Jain Aug. 12, 2017, 10:22 a.m. UTC
  Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.

Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

---
Until now, this decision was left onto author of bus specific scan and
probe function. But, that is incorrect.
---
 lib/librte_eal/common/eal_common_bus.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)
  

Comments

Hemant Agrawal Sept. 18, 2017, 11:36 a.m. UTC | #1
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>

On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
> Bus scan is responsible for finding devices over *all* buses.
> Some of these buses might not be able to scan but that should
> not prevent other buses to be scanned.
>
> Same is the case for probing. It is possible that some devices which
> were scanned didn't have a specific driver. That should not prevent
> other buses from being probed.
>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>
> ---
> Until now, this decision was left onto author of bus specific scan and
> probe function. But, that is incorrect.
> ---
>  lib/librte_eal/common/eal_common_bus.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 08bec2d..58e1084 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -73,11 +73,9 @@ rte_bus_scan(void)
>
>  	TAILQ_FOREACH(bus, &rte_bus_list, next) {
>  		ret = bus->scan();
> -		if (ret) {
> +		if (ret)
>  			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
>  				bus->name);
> -			return ret;
> -		}
>  	}
>
>  	return 0;
> @@ -97,20 +95,16 @@ rte_bus_probe(void)
>  		}
>
>  		ret = bus->probe();
> -		if (ret) {
> +		if (ret)
>  			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>  				bus->name);
> -			return ret;
> -		}
>  	}
>
>  	if (vbus) {
>  		ret = vbus->probe();
> -		if (ret) {
> +		if (ret)
>  			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>  				vbus->name);
> -			return ret;
> -		}
>  	}
>
>  	return 0;
>
  
Jan Blunck Sept. 19, 2017, 6:51 p.m. UTC | #2
On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>
>
> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>>
>> Bus scan is responsible for finding devices over *all* buses.
>> Some of these buses might not be able to scan but that should
>> not prevent other buses to be scanned.
>>

If scanning the bus fails this is signaling an error. In that case we
might even want to unregister the bus.

>> Same is the case for probing. It is possible that some devices which
>> were scanned didn't have a specific driver. That should not prevent
>> other buses from being probed.

Absolutely correct.

>>
>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>
>> ---
>> Until now, this decision was left onto author of bus specific scan and
>> probe function. But, that is incorrect.
>> ---
>>  lib/librte_eal/common/eal_common_bus.c | 12 +++---------
>>  1 file changed, 3 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/eal_common_bus.c
>> b/lib/librte_eal/common/eal_common_bus.c
>> index 08bec2d..58e1084 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -73,11 +73,9 @@ rte_bus_scan(void)
>>
>>         TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>                 ret = bus->scan();
>> -               if (ret) {
>> +               if (ret)
>>                         RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
>>                                 bus->name);
>> -                       return ret;
>> -               }
>>         }
>>
>>         return 0;
>> @@ -97,20 +95,16 @@ rte_bus_probe(void)
>>                 }
>>
>>                 ret = bus->probe();
>> -               if (ret) {
>> +               if (ret)
>>                         RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>                                 bus->name);
>> -                       return ret;
>> -               }
>>         }
>>
>>         if (vbus) {
>>                 ret = vbus->probe();
>> -               if (ret) {
>> +               if (ret)
>>                         RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>                                 vbus->name);
>> -                       return ret;
>> -               }
>>         }
>>
>>         return 0;
>>
>
  
Thomas Monjalon Oct. 5, 2017, 11:21 p.m. UTC | #3
19/09/2017 20:51, Jan Blunck:
> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
> > Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> >
> >
> > On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
> >>
> >> Bus scan is responsible for finding devices over *all* buses.
> >> Some of these buses might not be able to scan but that should
> >> not prevent other buses to be scanned.
> >>
> 
> If scanning the bus fails this is signaling an error. In that case we
> might even want to unregister the bus.

A scan error seems important enough to be reported to the caller.
OK to continue scanning other buses, but an error code should be returned.

> >> Same is the case for probing. It is possible that some devices which
> >> were scanned didn't have a specific driver. That should not prevent
> >> other buses from being probed.
> 
> Absolutely correct.

Yes
When we will have a probe notification, we will be able
to notify the upper layer that a device probing has failed.
  
Shreyansh Jain Oct. 6, 2017, 1:12 p.m. UTC | #4
On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
> 19/09/2017 20:51, Jan Blunck:
>> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
>>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>>
>>>
>>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>>>>
>>>> Bus scan is responsible for finding devices over *all* buses.
>>>> Some of these buses might not be able to scan but that should
>>>> not prevent other buses to be scanned.
>>>>
>>
>> If scanning the bus fails this is signaling an error. In that case we
>> might even want to unregister the bus.
> 
> A scan error seems important enough to be reported to the caller.
> OK to continue scanning other buses, but an error code should be returned.

Isn't that counter intuitive if the scanning continues after error and 
an error is expected to be returned from it?
What if there are more than one error? Which one is reported.

As for cleanup, bus un-registration is not correct. Scan has failed, 
which might mean some assumption that bus took for scanning for devices 
doesn't exist for time being or present platform. Either way, I think 
whatever rollback needs to be done for scan failure, would be done by 
the bus->scan() implementation.

Let me know what you think - I will make changes to the patch and push 
again.

> 
>>>> Same is the case for probing. It is possible that some devices which
>>>> were scanned didn't have a specific driver. That should not prevent
>>>> other buses from being probed.
>>
>> Absolutely correct.
> 
> Yes
> When we will have a probe notification, we will be able
> to notify the upper layer that a device probing has failed.
  
Thomas Monjalon Oct. 6, 2017, 1:37 p.m. UTC | #5
06/10/2017 15:12, Shreyansh Jain:
> On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
> > 19/09/2017 20:51, Jan Blunck:
> >> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
> >>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> >>>
> >>>
> >>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
> >>>>
> >>>> Bus scan is responsible for finding devices over *all* buses.
> >>>> Some of these buses might not be able to scan but that should
> >>>> not prevent other buses to be scanned.
> >>>>
> >>
> >> If scanning the bus fails this is signaling an error. In that case we
> >> might even want to unregister the bus.
> > 
> > A scan error seems important enough to be reported to the caller.
> > OK to continue scanning other buses, but an error code should be returned.
> 
> Isn't that counter intuitive if the scanning continues after error and 
> an error is expected to be returned from it?
> What if there are more than one error? Which one is reported.

Both are reported with the same code.
Anyway, there is no way to know which bus is failing,
except from log.

> As for cleanup, bus un-registration is not correct. Scan has failed, 
> which might mean some assumption that bus took for scanning for devices 
> doesn't exist for time being or present platform. Either way, I think 
> whatever rollback needs to be done for scan failure, would be done by 
> the bus->scan() implementation.
> 
> Let me know what you think - I will make changes to the patch and push 
> again.

We may need more opinion here.

Mine is that we should not hide a scan failure.
I would return an error code if any of the scan has failed,
but would process every scans.
  
Jan Blunck Oct. 6, 2017, 5:34 p.m. UTC | #6
On Fri, Oct 6, 2017 at 3:37 PM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 06/10/2017 15:12, Shreyansh Jain:
>> On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
>> > 19/09/2017 20:51, Jan Blunck:
>> >> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
>> >>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>> >>>
>> >>>
>> >>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>> >>>>
>> >>>> Bus scan is responsible for finding devices over *all* buses.
>> >>>> Some of these buses might not be able to scan but that should
>> >>>> not prevent other buses to be scanned.
>> >>>>
>> >>
>> >> If scanning the bus fails this is signaling an error. In that case we
>> >> might even want to unregister the bus.
>> >
>> > A scan error seems important enough to be reported to the caller.
>> > OK to continue scanning other buses, but an error code should be returned.
>>
>> Isn't that counter intuitive if the scanning continues after error and
>> an error is expected to be returned from it?
>> What if there are more than one error? Which one is reported.
>
> Both are reported with the same code.
> Anyway, there is no way to know which bus is failing,
> except from log.
>

Correct. Also there is no way to handle that failure except for
reporting it to the log in all detail.


>> As for cleanup, bus un-registration is not correct. Scan has failed,
>> which might mean some assumption that bus took for scanning for devices
>> doesn't exist for time being or present platform. Either way, I think
>> whatever rollback needs to be done for scan failure, would be done by
>> the bus->scan() implementation.
>>
>> Let me know what you think - I will make changes to the patch and push
>> again.
>
> We may need more opinion here.
>
> Mine is that we should not hide a scan failure.

Hide scan failures? Do you mean hiding it from the log? I wouldn't do that.

> I would return an error code if any of the scan has failed,
> but would process every scans.

FWIW I agree.
  
Shreyansh Jain Oct. 9, 2017, 11:10 a.m. UTC | #7
On Friday 06 October 2017 11:04 PM, Jan Blunck wrote:
> On Fri, Oct 6, 2017 at 3:37 PM, Thomas Monjalon <thomas@monjalon.net> wrote:
>> 06/10/2017 15:12, Shreyansh Jain:
>>> On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
>>>> 19/09/2017 20:51, Jan Blunck:
>>>>> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
>>>>>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>>>>>
>>>>>>
>>>>>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>>>>>>>
>>>>>>> Bus scan is responsible for finding devices over *all* buses.
>>>>>>> Some of these buses might not be able to scan but that should
>>>>>>> not prevent other buses to be scanned.
>>>>>>>
>>>>>
>>>>> If scanning the bus fails this is signaling an error. In that case we
>>>>> might even want to unregister the bus.
>>>>
>>>> A scan error seems important enough to be reported to the caller.
>>>> OK to continue scanning other buses, but an error code should be returned.
>>>
>>> Isn't that counter intuitive if the scanning continues after error and
>>> an error is expected to be returned from it?
>>> What if there are more than one error? Which one is reported.
>>
>> Both are reported with the same code.
>> Anyway, there is no way to know which bus is failing,
>> except from log.
>>
> 
> Correct. Also there is no way to handle that failure except for
> reporting it to the log in all detail.

Even now both, scan and probe, are reporting error to EAL if scan or 
probe fail. This is what you are suggesting, isn't it?

> 
> 
>>> As for cleanup, bus un-registration is not correct. Scan has failed,
>>> which might mean some assumption that bus took for scanning for devices
>>> doesn't exist for time being or present platform. Either way, I think
>>> whatever rollback needs to be done for scan failure, would be done by
>>> the bus->scan() implementation.
>>>
>>> Let me know what you think - I will make changes to the patch and push
>>> again.
>>
>> We may need more opinion here.
>>
>> Mine is that we should not hide a scan failure.
> 
> Hide scan failures? Do you mean hiding it from the log? I wouldn't do that.

I think Thomas was of the opinion to *not* hide scan failure.
Reporting through logs works fine here, I guess.

> 
>> I would return an error code if any of the scan has failed,
>> but would process every scans.
> 
> FWIW I agree.
> 

This is where I have disagreement/doubt.
Reporting error code from rte_bus_scan would do two things:

1. rte_eal_init is not designed to ignore/log-only these errors - it 
would quit initialization. (But, this can be changed)
2. What should rte_eal_init do with this error? rte_bus_scan would have 
already printed the problematic bus->scan() failure.

Also, does it make sense to report error from rte_bus_scan() to 
rte_eal_init() when no buses are identified? Currently that is not 
happening.

-
Shreyansh
  
don provan Oct. 9, 2017, 6:21 p.m. UTC | #8
> -----Original Message-----

> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]

> Sent: Monday, October 09, 2017 4:10 AM

> To: Jan Blunck <jblunck@infradead.org>; Thomas Monjalon

> <thomas@monjalon.net>

> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>

> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail

> 

>...

> This is where I have disagreement/doubt.

> Reporting error code from rte_bus_scan would do two things:

> 

> 1. rte_eal_init is not designed to ignore/log-only these errors - it

> would quit initialization. (But, this can be changed)

> 2. What should rte_eal_init do with this error? rte_bus_scan would have

> already printed the problematic bus->scan() failure.


These practical problems confirm to me that the failure of a bus
scan is more of a strategic issue: when asking "which devices can
I use?", "none" is a perfectly valid answer that does not seem
like an error to me even when a failed bus scan is the reason for
that answer.

From the application's point of view, the potential error here
is that the device it wants to use isn't available. I don't see that
either the init function or the probe function will have enough
information to understand that application-level problem, so
they should leave it to the application to detect it.

-don provan
dprovan@bivio.net
  
Thomas Monjalon Oct. 9, 2017, 7:34 p.m. UTC | #9
09/10/2017 20:21, Don Provan:
> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
> >...
> > This is where I have disagreement/doubt.
> > Reporting error code from rte_bus_scan would do two things:
> > 
> > 1. rte_eal_init is not designed to ignore/log-only these errors - it
> > would quit initialization. (But, this can be changed)
> > 2. What should rte_eal_init do with this error? rte_bus_scan would have
> > already printed the problematic bus->scan() failure.
> 
> These practical problems confirm to me that the failure of a bus
> scan is more of a strategic issue: when asking "which devices can
> I use?", "none" is a perfectly valid answer that does not seem
> like an error to me even when a failed bus scan is the reason for
> that answer.
> 
> From the application's point of view, the potential error here
> is that the device it wants to use isn't available. I don't see that
> either the init function or the probe function will have enough
> information to understand that application-level problem, so
> they should leave it to the application to detect it.

Thank you Don. I think you convinced me.
  
Shreyansh Jain Oct. 10, 2017, 5 a.m. UTC | #10
Hello Don,

On Monday 09 October 2017 11:51 PM, Don Provan wrote:
>> -----Original Message-----
>> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
>> Sent: Monday, October 09, 2017 4:10 AM
>> To: Jan Blunck <jblunck@infradead.org>; Thomas Monjalon
>> <thomas@monjalon.net>
>> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>
>> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail
>>
>> ...
>> This is where I have disagreement/doubt.
>> Reporting error code from rte_bus_scan would do two things:
>>
>> 1. rte_eal_init is not designed to ignore/log-only these errors - it
>> would quit initialization. (But, this can be changed)
>> 2. What should rte_eal_init do with this error? rte_bus_scan would have
>> already printed the problematic bus->scan() failure.
> 
> These practical problems confirm to me that the failure of a bus
> scan is more of a strategic issue: when asking "which devices can
> I use?", "none" is a perfectly valid answer that does not seem
> like an error to me even when a failed bus scan is the reason for
> that answer.

I agree with this.

> 
>  From the application's point of view, the potential error here
> is that the device it wants to use isn't available. I don't see that
> either the init function or the probe function will have enough
> information to understand that application-level problem, so
> they should leave it to the application to detect it.

I think I understand you comment but just want to cross check again:
Scan or probe error should simply be ignored by EAL layer and let the 
application take stance when it detects that the device it was looking 
for is missing. Is my understanding correct?

I am trying to come a conclusion so that this patch can either be 
modified or pushed as it is. If the above understanding is correct, I 
don't see any changes required in the patch.

> 
> -don provan
> dprovan@bivio.net
>
  
don provan Oct. 11, 2017, 12:03 a.m. UTC | #11
> -----Original Message-----

> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]

> Sent: Monday, October 09, 2017 10:01 PM

> To: Don Provan <dprovan@bivio.net>; Jan Blunck <jblunck@infradead.org>;

> Thomas Monjalon <thomas@monjalon.net>

> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>

> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail

> 

> ...

>

> >  From the application's point of view, the potential error here

> > is that the device it wants to use isn't available. I don't see that

> > either the init function or the probe function will have enough

> > information to understand that application-level problem, so

> > they should leave it to the application to detect it.

> 

> I think I understand you comment but just want to cross check again:

> Scan or probe error should simply be ignored by EAL layer and let the

> application take stance when it detects that the device it was looking

> for is missing. Is my understanding correct?

> 

> I am trying to come a conclusion so that this patch can either be

> modified or pushed as it is. If the above understanding is correct, I

> don't see any changes required in the patch.


Yes, I agree my comments support the patch as is.
-don
  
Thomas Monjalon Oct. 11, 2017, 10:32 p.m. UTC | #12
11/10/2017 02:03, Don Provan:
> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
> > 
> > ...
> >
> > >  From the application's point of view, the potential error here
> > > is that the device it wants to use isn't available. I don't see that
> > > either the init function or the probe function will have enough
> > > information to understand that application-level problem, so
> > > they should leave it to the application to detect it.
> > 
> > I think I understand you comment but just want to cross check again:
> > Scan or probe error should simply be ignored by EAL layer and let the
> > application take stance when it detects that the device it was looking
> > for is missing. Is my understanding correct?
> > 
> > I am trying to come a conclusion so that this patch can either be
> > modified or pushed as it is. If the above understanding is correct, I
> > don't see any changes required in the patch.
> 
> Yes, I agree my comments support the patch as is.
> -don

Applied, thanks for the discussion
  

Patch

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d..58e1084 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -73,11 +73,9 @@  rte_bus_scan(void)
 
 	TAILQ_FOREACH(bus, &rte_bus_list, next) {
 		ret = bus->scan();
-		if (ret) {
+		if (ret)
 			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
 				bus->name);
-			return ret;
-		}
 	}
 
 	return 0;
@@ -97,20 +95,16 @@  rte_bus_probe(void)
 		}
 
 		ret = bus->probe();
-		if (ret) {
+		if (ret)
 			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
 				bus->name);
-			return ret;
-		}
 	}
 
 	if (vbus) {
 		ret = vbus->probe();
-		if (ret) {
+		if (ret)
 			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
 				vbus->name);
-			return ret;
-		}
 	}
 
 	return 0;