[v8,04/19] ethdev: introduce device lock

Message ID 20180702054450.29269-5-qi.z.zhang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series enable hotplug on multi-process |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Qi Zhang July 2, 2018, 5:44 a.m. UTC
  Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
application lock or unlock on specific ethdev, a locked device
can't be detached, this help applicaiton to prevent unexpected
device detaching, especially in multi-process envrionment.

Aslo introduce the new API rte_eth_dev_lock_with_callback and
rte_eth_dev_unlock_with callback to let application to register
a callback function which will be invoked before a device is going
to be detached, the return value of the function will decide if
device will continue be detached or not, this support application
to do condition check at runtime.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_ethdev/Makefile               |   1 +
 lib/librte_ethdev/ethdev_lock.c          | 140 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/ethdev_lock.h          |  31 +++++++
 lib/librte_ethdev/ethdev_mp.c            |   3 +-
 lib/librte_ethdev/meson.build            |   1 +
 lib/librte_ethdev/rte_ethdev.c           |  60 ++++++++++++-
 lib/librte_ethdev/rte_ethdev.h           | 124 +++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_version.map |   2 +
 8 files changed, 360 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ethdev/ethdev_lock.c
 create mode 100644 lib/librte_ethdev/ethdev_lock.h
  

Comments

Thomas Monjalon July 3, 2018, 9:56 a.m. UTC | #1
02/07/2018 07:44, Qi Zhang:
> Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> application lock or unlock on specific ethdev, a locked device
> can't be detached, this help applicaiton to prevent unexpected
> device detaching, especially in multi-process envrionment.

Trying to understand: a process of an application could try to detach
a port while another process is against this decision.
Why an application needs to be protected against itself?
I guess it is only an application inter-process management.
If we really want to provide such helper in DPDK, it should not be
limited to ethdev.
(for info, see class implementation: https://patches.dpdk.org/patch/41605/)

What about hardware unplug?
Can we detach the locked ports associated to the unplugged device?

> Aslo introduce the new API rte_eth_dev_lock_with_callback and
> rte_eth_dev_unlock_with callback to let application to register
> a callback function which will be invoked before a device is going
> to be detached, the return value of the function will decide if
> device will continue be detached or not, this support application
> to do condition check at runtime.

You don't need 2 flavors for the lock.
We can have only the "_with_callback" flavour and provide a default
callback which says always no.
  
Qi Zhang July 3, 2018, 3:08 p.m. UTC | #2
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, July 3, 2018 5:56 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton,
> Benjamin H <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 02/07/2018 07:44, Qi Zhang:
> > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> > application lock or unlock on specific ethdev, a locked device can't
> > be detached, this help applicaiton to prevent unexpected device
> > detaching, especially in multi-process envrionment.
> 
> Trying to understand: a process of an application could try to detach a port
> while another process is against this decision.
> Why an application needs to be protected against itself?

I think we can regard this as a help function, it help application to simplified the situation when one process want to detach a device while another one is still using it.
Application can register a callback which can do to necessary clean up (like stop traffic, release memory ...) before device be detached.

> I guess it is only an application inter-process management.
> If we really want to provide such helper in DPDK, it should not be limited to
> ethdev.

Once we move to eal layer, we will have rte_eal_dev_lock/unlock(devname, busname).
But its also better we keep rte_eth_dev_lock/unlock to make ethdev API more completed (any port be locked means underline rte_device also be locked?)
and this is same for other device family.


> (for info, see class implementation: https://patches.dpdk.org/patch/41605/)
> 
> What about hardware unplug?
> Can we detach the locked ports associated to the unplugged device?

NO, we can't.
But do you think, we need to introduce a "force detach" version, which will ignore all locks.

> 
> > Aslo introduce the new API rte_eth_dev_lock_with_callback and
> > rte_eth_dev_unlock_with callback to let application to register a
> > callback function which will be invoked before a device is going to be
> > detached, the return value of the function will decide if device will
> > continue be detached or not, this support application to do condition
> > check at runtime.
> 
> You don't need 2 flavors for the lock.
> We can have only the "_with_callback" flavour and provide a default callback
> which says always no.

OK, I can rollback this.

Regards
Qi

>
  
Thomas Monjalon July 3, 2018, 10:13 p.m. UTC | #3
03/07/2018 17:08, Zhang, Qi Z:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 02/07/2018 07:44, Qi Zhang:
> > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> > > application lock or unlock on specific ethdev, a locked device can't
> > > be detached, this help applicaiton to prevent unexpected device
> > > detaching, especially in multi-process envrionment.
> > 
> > Trying to understand: a process of an application could try to detach a port
> > while another process is against this decision.
> > Why an application needs to be protected against itself?
> 
> I think we can regard this as a help function, it help application to simplified the situation when one process want to detach a device while another one is still using it.
> Application can register a callback which can do to necessary clean up (like stop traffic, release memory ...) before device be detached.

Yes I agree such hook can be a good idea.


> > I guess it is only an application inter-process management.
> > If we really want to provide such helper in DPDK, it should not be limited to
> > ethdev.
> 
> Once we move to eal layer, we will have rte_eal_dev_lock/unlock(devname, busname).
> But its also better we keep rte_eth_dev_lock/unlock to make ethdev API more completed (any port be locked means underline rte_device also be locked?)
> and this is same for other device family.

No. Again, a port is not exactly a device.
There can be several ports per device.

I think the right place for this hook is in port-level API
(ethdev, crypto, etc). And as we improve only ethdev currently,
without any common genericity for other device classes,
it is probably fine in ethdev for now.
> 
> > (for info, see class implementation: https://patches.dpdk.org/patch/41605/)
> > 
> > What about hardware unplug?
> > Can we detach the locked ports associated to the unplugged device?
> 
> NO, we can't.
> But do you think, we need to introduce a "force detach" version, which will ignore all locks.

No, I don't think so.
I am just trying to show that you cannot really "lock" a port.
Maybe you should just rename those functions.
After all, it is just a pre-detach hook.
Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
Perhaps we just need to improve the handling of the DESTROY event?
  
Qi Zhang July 4, 2018, 1:47 a.m. UTC | #4
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, July 4, 2018 6:14 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton,
> Benjamin H <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 03/07/2018 17:08, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 02/07/2018 07:44, Qi Zhang:
> > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> > > > application lock or unlock on specific ethdev, a locked device
> > > > can't be detached, this help applicaiton to prevent unexpected
> > > > device detaching, especially in multi-process envrionment.
> > >
> > > Trying to understand: a process of an application could try to
> > > detach a port while another process is against this decision.
> > > Why an application needs to be protected against itself?
> >
> > I think we can regard this as a help function, it help application to simplified
> the situation when one process want to detach a device while another one is
> still using it.
> > Application can register a callback which can do to necessary clean up (like
> stop traffic, release memory ...) before device be detached.
> 
> Yes I agree such hook can be a good idea.
> 
> 
> > > I guess it is only an application inter-process management.
> > > If we really want to provide such helper in DPDK, it should not be
> > > limited to ethdev.
> >
> > Once we move to eal layer, we will have rte_eal_dev_lock/unlock(devname,
> busname).
> > But its also better we keep rte_eth_dev_lock/unlock to make ethdev API
> > more completed (any port be locked means underline rte_device also be
> locked?) and this is same for other device family.
> 
> No. Again, a port is not exactly a device.
> There can be several ports per device.

Yes, I know that.
what I mean is, we should assume lock any port of that rte_device will prevent the device be detached.

> 
> I think the right place for this hook is in port-level API (ethdev, crypto, etc). And
> as we improve only ethdev currently, without any common genericity for other
> device classes, it is probably fine in ethdev for now.
> >
> > > (for info, see class implementation:
> > > https://patches.dpdk.org/patch/41605/)
> > >
> > > What about hardware unplug?
> > > Can we detach the locked ports associated to the unplugged device?
> >
> > NO, we can't.
> > But do you think, we need to introduce a "force detach" version, which will
> ignore all locks.
> 
> No, I don't think so.
> I am just trying to show that you cannot really "lock" a port.
> Maybe you should just rename those functions.
> After all, it is just a pre-detach hook.

> Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> Perhaps we just need to improve the handling of the DESTROY event?

I have thought about this before.
Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook here need to give feedback, pass or fail will impact the following behavior, this make it special, so I separate it from all exist rte_eth_event_type handle mechanism. 

The alternative solution is 
we just introduce a new event type like RTE_ETH_EVENT_PRE_DETACH and reuse all exist API 
rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.

But in _rte_eth_dev_callback_process we need to add a code branch for PRE_DETACH handle.

If (event = RTE_ETH_EVENT_PRE_DETACH)
	<...>.
else {
	<....>
}

And we may also need to keep rte_eth_dev_lock/unlock which will register a default callback for PRE_DETACH.

What do you think about?








 



> 
>
  
Yuanhan Liu July 4, 2018, 2:19 a.m. UTC | #5
On Mon, Jul 2, 2018, at 1:44 PM, Qi Zhang wrote:
> Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> application lock or unlock on specific ethdev, a locked device
> can't be detached, this help applicaiton to prevent unexpected
> device detaching, especially in multi-process envrionment.

I'm just wondering why the need of introducing such 2 (very specific)
APIs, judging that it  looks like a ref count could solve this kind of issues
perfectly.

Also, the API name (_dev_lock/unlock) looks like an over killing, judging
that you just want to pin a device (if I understand this commit log correctly).
What firstly comes to my mind when I see "lock" was "no one can *access*
a device after another has grabbed the lock", which doesn't quite match
what you described here.

That said, if I were you, I would go with introducing few generic refcount
APIs.

        --yliu


> 
> Aslo introduce the new API rte_eth_dev_lock_with_callback and
> rte_eth_dev_unlock_with callback to let application to register
> a callback function which will be invoked before a device is going
> to be detached, the return value of the function will decide if
> device will continue be detached or not, this support application
> to do condition check at runtime.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Qi Zhang July 4, 2018, 3:24 a.m. UTC | #6
Hi Yuanhan:

> -----Original Message-----
> From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> Sent: Wednesday, July 4, 2018 10:20 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net; Burakov,
> Anatoly <anatoly.burakov@intel.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 
> 
> On Mon, Jul 2, 2018, at 1:44 PM, Qi Zhang wrote:
> > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> > application lock or unlock on specific ethdev, a locked device can't
> > be detached, this help applicaiton to prevent unexpected device
> > detaching, especially in multi-process envrionment.
> 
> I'm just wondering why the need of introducing such 2 (very specific) APIs,
> judging that it  looks like a ref count could solve this kind of issues perfectly.

Yes, if we separate the callback stuff, ref_count can solve the issue.
Actually this is my original implementation, but since we also need a callback for runtime check, 
so I just reuse it by registering a default callback.

But if we decide to replace lock_with_callback by adding a new PRE_DETACH event in rte_eth_dev_register_callback, 
We can take the ref_count way for simple device pin.

> 
> Also, the API name (_dev_lock/unlock) looks like an over killing, judging that
> you just want to pin a device (if I understand this commit log correctly).
> What firstly comes to my mind when I see "lock" was "no one can *access* a
> device after another has grabbed the lock", which doesn't quite match what
> you described here.

Yes, device is shared, any processes can access it, we are not going to introduce exclusive access.
But probably we should rename the API's name as rte_eth_dev_get/put? or rte_eth_dev_pin/unpin?

Thanks!
Qi

> 
> That said, if I were you, I would go with introducing few generic refcount APIs.
> 
>         --yliu
> 
> 
> >
> > Aslo introduce the new API rte_eth_dev_lock_with_callback and
> > rte_eth_dev_unlock_with callback to let application to register a
> > callback function which will be invoked before a device is going to be
> > detached, the return value of the function will decide if device will
> > continue be detached or not, this support application to do condition
> > check at runtime.
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Thomas Monjalon July 4, 2018, 7:27 a.m. UTC | #7
04/07/2018 03:47, Zhang, Qi Z:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 03/07/2018 17:08, Zhang, Qi Z:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > 02/07/2018 07:44, Qi Zhang:
> > > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> > > > > application lock or unlock on specific ethdev, a locked device
> > > > > can't be detached, this help applicaiton to prevent unexpected
> > > > > device detaching, especially in multi-process envrionment.
> > > >
> > > > Trying to understand: a process of an application could try to
> > > > detach a port while another process is against this decision.
> > > > Why an application needs to be protected against itself?
> > >
> > > I think we can regard this as a help function, it help application to simplified
> > the situation when one process want to detach a device while another one is
> > still using it.
> > > Application can register a callback which can do to necessary clean up (like
> > stop traffic, release memory ...) before device be detached.
> > 
> > Yes I agree such hook can be a good idea.
> > 
> > 
> > > > I guess it is only an application inter-process management.
> > > > If we really want to provide such helper in DPDK, it should not be
> > > > limited to ethdev.
> > >
> > > Once we move to eal layer, we will have rte_eal_dev_lock/unlock(devname,
> > busname).
> > > But its also better we keep rte_eth_dev_lock/unlock to make ethdev API
> > > more completed (any port be locked means underline rte_device also be
> > locked?) and this is same for other device family.
> > 
> > No. Again, a port is not exactly a device.
> > There can be several ports per device.
> 
> Yes, I know that.
> what I mean is, we should assume lock any port of that rte_device will prevent the device be detached.
> 
> > 
> > I think the right place for this hook is in port-level API (ethdev, crypto, etc). And
> > as we improve only ethdev currently, without any common genericity for other
> > device classes, it is probably fine in ethdev for now.
> > >
> > > > (for info, see class implementation:
> > > > https://patches.dpdk.org/patch/41605/)
> > > >
> > > > What about hardware unplug?
> > > > Can we detach the locked ports associated to the unplugged device?
> > >
> > > NO, we can't.
> > > But do you think, we need to introduce a "force detach" version, which will
> > ignore all locks.
> > 
> > No, I don't think so.
> > I am just trying to show that you cannot really "lock" a port.
> > Maybe you should just rename those functions.
> > After all, it is just a pre-detach hook.
> 
> > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > Perhaps we just need to improve the handling of the DESTROY event?
> 
> I have thought about this before.
> Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook here need to give feedback, pass or fail will impact the following behavior, this make it special, so I separate it from all exist rte_eth_event_type handle mechanism. 

Look at _rte_eth_dev_callback_process, there is a "ret_param".

> The alternative solution is 
> we just introduce a new event type like RTE_ETH_EVENT_PRE_DETACH and reuse all exist API 
> rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.

I don't think we need a new event.
Let's try to use RTE_ETH_EVENT_DESTROY.

> But in _rte_eth_dev_callback_process we need to add a code branch for PRE_DETACH handle.
> 
> If (event = RTE_ETH_EVENT_PRE_DETACH)
> 	<...>.
> else {
> 	<....>
> }
> 
> And we may also need to keep rte_eth_dev_lock/unlock which will register a default callback for PRE_DETACH.

The default callback can be registered by the application.

> What do you think about?
  
Qi Zhang July 4, 2018, 10:49 a.m. UTC | #8
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, July 4, 2018 3:27 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton,
> Benjamin H <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 04/07/2018 03:47, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> > > > > > application lock or unlock on specific ethdev, a locked device
> > > > > > can't be detached, this help applicaiton to prevent unexpected
> > > > > > device detaching, especially in multi-process envrionment.
> > > > >
> > > > > Trying to understand: a process of an application could try to
> > > > > detach a port while another process is against this decision.
> > > > > Why an application needs to be protected against itself?
> > > >
> > > > I think we can regard this as a help function, it help application
> > > > to simplified
> > > the situation when one process want to detach a device while another
> > > one is still using it.
> > > > Application can register a callback which can do to necessary
> > > > clean up (like
> > > stop traffic, release memory ...) before device be detached.
> > >
> > > Yes I agree such hook can be a good idea.
> > >
> > >
> > > > > I guess it is only an application inter-process management.
> > > > > If we really want to provide such helper in DPDK, it should not
> > > > > be limited to ethdev.
> > > >
> > > > Once we move to eal layer, we will have
> > > > rte_eal_dev_lock/unlock(devname,
> > > busname).
> > > > But its also better we keep rte_eth_dev_lock/unlock to make ethdev
> > > > API more completed (any port be locked means underline rte_device
> > > > also be
> > > locked?) and this is same for other device family.
> > >
> > > No. Again, a port is not exactly a device.
> > > There can be several ports per device.
> >
> > Yes, I know that.
> > what I mean is, we should assume lock any port of that rte_device will
> prevent the device be detached.
> >
> > >
> > > I think the right place for this hook is in port-level API (ethdev,
> > > crypto, etc). And as we improve only ethdev currently, without any
> > > common genericity for other device classes, it is probably fine in ethdev for
> now.
> > > >
> > > > > (for info, see class implementation:
> > > > > https://patches.dpdk.org/patch/41605/)
> > > > >
> > > > > What about hardware unplug?
> > > > > Can we detach the locked ports associated to the unplugged device?
> > > >
> > > > NO, we can't.
> > > > But do you think, we need to introduce a "force detach" version,
> > > > which will
> > > ignore all locks.
> > >
> > > No, I don't think so.
> > > I am just trying to show that you cannot really "lock" a port.
> > > Maybe you should just rename those functions.
> > > After all, it is just a pre-detach hook.
> >
> > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > Perhaps we just need to improve the handling of the DESTROY event?
> >
> > I have thought about this before.
> > Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook here
> need to give feedback, pass or fail will impact the following behavior, this
> make it special, so I separate it from all exist rte_eth_event_type handle
> mechanism.
> 
> Look at _rte_eth_dev_callback_process, there is a "ret_param".

OK, that should work.
> 
> > The alternative solution is
> > we just introduce a new event type like RTE_ETH_EVENT_PRE_DETACH and
> > reuse all exist API
> rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> 
> I don't think we need a new event.
> Let's try to use RTE_ETH_EVENT_DESTROY.

The problem is RTE_ETH_EVENT_DESTROY is used in rte_eth_dev_release_port already.
And in PMD, rte_eth_dev_release_port is called after dev_uninit, that mean its too late to reject a detach
So , do you mean we can remove _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in rte_eth_dev_release_port 

And where is right place to call _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
If can't be called in rte_eth_dev_detach, because if device is removed by rte_eal_hotplug_remove, it will be skipped.
probably we need to call this at the beginning of each PMD driver->remove?, that means, we need to change all PMD drivers?




> 
> > But in _rte_eth_dev_callback_process we need to add a code branch for
> PRE_DETACH handle.
> >
> > If (event = RTE_ETH_EVENT_PRE_DETACH)
> > 	<...>.
> > else {
> > 	<....>
> > }
> >
> > And we may also need to keep rte_eth_dev_lock/unlock which will register a
> default callback for PRE_DETACH.
> 
> The default callback can be registered by the application.

OK.
> 
> > What do you think about?
> 
>
  
Thomas Monjalon July 4, 2018, 9:41 p.m. UTC | #9
04/07/2018 12:49, Zhang, Qi Z:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 04/07/2018 03:47, Zhang, Qi Z:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to let
> > > > > > > application lock or unlock on specific ethdev, a locked device
> > > > > > > can't be detached, this help applicaiton to prevent unexpected
> > > > > > > device detaching, especially in multi-process envrionment.
> > > > > >
> > > > > > Trying to understand: a process of an application could try to
> > > > > > detach a port while another process is against this decision.
> > > > > > Why an application needs to be protected against itself?
> > > > >
> > > > > I think we can regard this as a help function, it help application
> > > > > to simplified
> > > > the situation when one process want to detach a device while another
> > > > one is still using it.
> > > > > Application can register a callback which can do to necessary
> > > > > clean up (like
> > > > stop traffic, release memory ...) before device be detached.
> > > >
> > > > Yes I agree such hook can be a good idea.
[...]
> > > > After all, it is just a pre-detach hook.
> > >
> > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > Perhaps we just need to improve the handling of the DESTROY event?
> > >
> > > I have thought about this before.
> > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook here
> > need to give feedback, pass or fail will impact the following behavior, this
> > make it special, so I separate it from all exist rte_eth_event_type handle
> > mechanism.
> > 
> > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> 
> OK, that should work.
> > 
> > > The alternative solution is
> > > we just introduce a new event type like RTE_ETH_EVENT_PRE_DETACH and
> > > reuse all exist API
> > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > 
> > I don't think we need a new event.
> > Let's try to use RTE_ETH_EVENT_DESTROY.
> 
> The problem is RTE_ETH_EVENT_DESTROY is used in rte_eth_dev_release_port already.
> And in PMD, rte_eth_dev_release_port is called after dev_uninit, that mean its too late to reject a detach

You're right.

It's a real mess currently.
The right order should be to remove ethdev ports before
removing the underlying EAL device. But it's strangely not the case.

We need to separate things.
The function rte_eth_dev_close can be used to remove an ethdev port
if we add a call to rte_eth_dev_release_port.
So we could call rte_eth_dev_close in PMD remove functions.
Is "close" a good time to ask confirmation to the application?
Or should we ask confirmation a step before, on "stop"?

> So , do you mean we can remove _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in rte_eth_dev_release_port 

I would say we need RTE_ETH_EVENT_DESTROY to notify that the port
is really destroyed.
Maybe the right thing to do is to add a new event
RTE_ETH_EVENT_CLOSE_REQUEST or something else.
Note that we already have 2 removal events in ethdev:
	- RTE_ETH_EVENT_INTR_RMV when the port cannot be used anymore
	- RTE_ETH_EVENT_DESTROY when the port is going to be deleted

> And where is right place to call _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
> If can't be called in rte_eth_dev_detach, because if device is removed by rte_eal_hotplug_remove, it will be skipped.

No, rte_eth_dev_detach and rte_eal_hotplug_remove are 2 different things.
One is a mix of ethdev and EAL (and should be deprecated),
the other one is for the underlying device at EAL level.

> probably we need to call this at the beginning of each PMD driver->remove?, that means, we need to change all PMD drivers?

Yes, we can call rte_eth_dev_stop and rte_eth_dev_close
at the beginning of PMD remove.
Note that there is already a helper rte_eth_dev_destroy called in
some PMD to achieve the removal, but curiously, it doesn't call
stop and close functions.
  
Qi Zhang July 5, 2018, 1:38 a.m. UTC | #10
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, July 5, 2018 5:42 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton,
> Benjamin H <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>; arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 04/07/2018 12:49, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to
> > > > > > > > let application lock or unlock on specific ethdev, a
> > > > > > > > locked device can't be detached, this help applicaiton to
> > > > > > > > prevent unexpected device detaching, especially in multi-process
> envrionment.
> > > > > > >
> > > > > > > Trying to understand: a process of an application could try
> > > > > > > to detach a port while another process is against this decision.
> > > > > > > Why an application needs to be protected against itself?
> > > > > >
> > > > > > I think we can regard this as a help function, it help
> > > > > > application to simplified
> > > > > the situation when one process want to detach a device while
> > > > > another one is still using it.
> > > > > > Application can register a callback which can do to necessary
> > > > > > clean up (like
> > > > > stop traffic, release memory ...) before device be detached.
> > > > >
> > > > > Yes I agree such hook can be a good idea.
> [...]
> > > > > After all, it is just a pre-detach hook.
> > > >
> > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > > Perhaps we just need to improve the handling of the DESTROY event?
> > > >
> > > > I have thought about this before.
> > > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook here
> > > need to give feedback, pass or fail will impact the following
> > > behavior, this make it special, so I separate it from all exist
> > > rte_eth_event_type handle mechanism.
> > >
> > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> >
> > OK, that should work.
> > >
> > > > The alternative solution is
> > > > we just introduce a new event type like RTE_ETH_EVENT_PRE_DETACH
> > > > and reuse all exist API
> > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > >
> > > I don't think we need a new event.
> > > Let's try to use RTE_ETH_EVENT_DESTROY.
> >
> > The problem is RTE_ETH_EVENT_DESTROY is used in
> rte_eth_dev_release_port already.
> > And in PMD, rte_eth_dev_release_port is called after dev_uninit, that
> > mean its too late to reject a detach
> 
> You're right.
> 
> It's a real mess currently.
> The right order should be to remove ethdev ports before removing the
> underlying EAL device. But it's strangely not the case.
> 
> We need to separate things.
> The function rte_eth_dev_close can be used to remove an ethdev port if we
> add a call to rte_eth_dev_release_port.
> So we could call rte_eth_dev_close in PMD remove functions.
> Is "close" a good time to ask confirmation to the application?
> Or should we ask confirmation a step before, on "stop"?

I think the confirmation should before any cleanup stage, it should at the beginning of driver->remove.
Also we should not put it into rte_eth_dev_stop, because, rte_eth_dev_stop can invoked by application directly, in that case, we don't what any callback be invoked.

> 
> > So , do you mean we can remove
> > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in
> > rte_eth_dev_release_port
> 
> I would say we need RTE_ETH_EVENT_DESTROY to notify that the port is really
> destroyed.
> Maybe the right thing to do is to add a new event
> RTE_ETH_EVENT_CLOSE_REQUEST or something else.
> Note that we already have 2 removal events in ethdev:
> 	- RTE_ETH_EVENT_INTR_RMV when the port cannot be used anymore
> 	- RTE_ETH_EVENT_DESTROY when the port is going to be deleted
> 
> > And where is right place to call
> _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
> > If can't be called in rte_eth_dev_detach, because if device is removed by
> rte_eal_hotplug_remove, it will be skipped.
> 
> No, rte_eth_dev_detach and rte_eal_hotplug_remove are 2 different things.
> One is a mix of ethdev and EAL (and should be deprecated), the other one is
> for the underlying device at EAL level.
> 
> > probably we need to call this at the beginning of each PMD driver->remove?,
> that means, we need to change all PMD drivers?
> 
> Yes, we can call rte_eth_dev_stop and rte_eth_dev_close at the beginning of
> PMD remove.
> Note that there is already a helper rte_eth_dev_destroy called in some PMD to
> achieve the removal, but curiously, it doesn't call stop and close functions.

Currently PMD implement driver->remove with different way, rte_eth_dev_stop / rte_eth_dev_close / rte_eth_dev_destroy is not always be invoked.
So Before we standardize what ethdev API and what sequence should be called in driver->remove (I think this is a separate task)
I will suggest 
1. Create another help function like _rte_eth_dev_allow_to_remove, 
2. the help function will call _rte_eth_dev_callback_process(RTE_ETH_EVENT_PRE_REMOVE) and update ret_param which contain a reject count.
3. the help function should to invoked at beginning at driver->remove and driver->remove will abort if the help function failed.

But once we standardized that , we can do cleanup to merge it into another rte_eth_xxx API in next step.

What do you think?

> 
>
  
Thomas Monjalon July 5, 2018, 1:55 a.m. UTC | #11
05/07/2018 03:38, Zhang, Qi Z:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 04/07/2018 12:49, Zhang, Qi Z:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock to
> > > > > > > > > let application lock or unlock on specific ethdev, a
> > > > > > > > > locked device can't be detached, this help applicaiton to
> > > > > > > > > prevent unexpected device detaching, especially in multi-process
> > envrionment.
> > > > > > > >
> > > > > > > > Trying to understand: a process of an application could try
> > > > > > > > to detach a port while another process is against this decision.
> > > > > > > > Why an application needs to be protected against itself?
> > > > > > >
> > > > > > > I think we can regard this as a help function, it help
> > > > > > > application to simplified
> > > > > > the situation when one process want to detach a device while
> > > > > > another one is still using it.
> > > > > > > Application can register a callback which can do to necessary
> > > > > > > clean up (like
> > > > > > stop traffic, release memory ...) before device be detached.
> > > > > >
> > > > > > Yes I agree such hook can be a good idea.
> > [...]
> > > > > > After all, it is just a pre-detach hook.
> > > > >
> > > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > > > Perhaps we just need to improve the handling of the DESTROY event?
> > > > >
> > > > > I have thought about this before.
> > > > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook here
> > > > need to give feedback, pass or fail will impact the following
> > > > behavior, this make it special, so I separate it from all exist
> > > > rte_eth_event_type handle mechanism.
> > > >
> > > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> > >
> > > OK, that should work.
> > > >
> > > > > The alternative solution is
> > > > > we just introduce a new event type like RTE_ETH_EVENT_PRE_DETACH
> > > > > and reuse all exist API
> > > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > > >
> > > > I don't think we need a new event.
> > > > Let's try to use RTE_ETH_EVENT_DESTROY.
> > >
> > > The problem is RTE_ETH_EVENT_DESTROY is used in
> > rte_eth_dev_release_port already.
> > > And in PMD, rte_eth_dev_release_port is called after dev_uninit, that
> > > mean its too late to reject a detach
> > 
> > You're right.
> > 
> > It's a real mess currently.
> > The right order should be to remove ethdev ports before removing the
> > underlying EAL device. But it's strangely not the case.
> > 
> > We need to separate things.
> > The function rte_eth_dev_close can be used to remove an ethdev port if we
> > add a call to rte_eth_dev_release_port.
> > So we could call rte_eth_dev_close in PMD remove functions.
> > Is "close" a good time to ask confirmation to the application?
> > Or should we ask confirmation a step before, on "stop"?
> 
> I think the confirmation should before any cleanup stage, it should at the beginning of driver->remove.

So you stop a port, even if the app policy is against detaching it?

> Also we should not put it into rte_eth_dev_stop, because, rte_eth_dev_stop can invoked by application directly, in that case, we don't what any callback be invoked.

It it the same to detach a port: it is invoked directly by application.
I thought you wanted a callback as helper for inter-process management?

> > > So , do you mean we can remove
> > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in
> > > rte_eth_dev_release_port
> > 
> > I would say we need RTE_ETH_EVENT_DESTROY to notify that the port is really
> > destroyed.
> > Maybe the right thing to do is to add a new event
> > RTE_ETH_EVENT_CLOSE_REQUEST or something else.
> > Note that we already have 2 removal events in ethdev:
> > 	- RTE_ETH_EVENT_INTR_RMV when the port cannot be used anymore
> > 	- RTE_ETH_EVENT_DESTROY when the port is going to be deleted
> > 
> > > And where is right place to call
> > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
> > > If can't be called in rte_eth_dev_detach, because if device is removed by
> > rte_eal_hotplug_remove, it will be skipped.
> > 
> > No, rte_eth_dev_detach and rte_eal_hotplug_remove are 2 different things.
> > One is a mix of ethdev and EAL (and should be deprecated), the other one is
> > for the underlying device at EAL level.
> > 
> > > probably we need to call this at the beginning of each PMD driver->remove?,
> > that means, we need to change all PMD drivers?
> > 
> > Yes, we can call rte_eth_dev_stop and rte_eth_dev_close at the beginning of
> > PMD remove.
> > Note that there is already a helper rte_eth_dev_destroy called in some PMD to
> > achieve the removal, but curiously, it doesn't call stop and close functions.
> 
> Currently PMD implement driver->remove with different way, rte_eth_dev_stop / rte_eth_dev_close / rte_eth_dev_destroy is not always be invoked.
> So Before we standardize what ethdev API and what sequence should be called in driver->remove (I think this is a separate task)
> I will suggest 
> 1. Create another help function like _rte_eth_dev_allow_to_remove, 
> 2. the help function will call _rte_eth_dev_callback_process(RTE_ETH_EVENT_PRE_REMOVE) and update ret_param which contain a reject count.
> 3. the help function should to invoked at beginning at driver->remove and driver->remove will abort if the help function failed.
> 
> But once we standardized that , we can do cleanup to merge it into another rte_eth_xxx API in next step.
> 
> What do you think?

No
All the problems we have today are because we preferred add more and more
functions instead of fixing the basic stuff. And it is especially the case
for all the detach crap.
So no.
Let's fix stuff first.
  
Qi Zhang July 5, 2018, 3:37 a.m. UTC | #12
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, July 5, 2018 9:55 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton,
> Benjamin H <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>; arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 05/07/2018 03:38, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 04/07/2018 12:49, Zhang, Qi Z:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock
> > > > > > > > > > to let application lock or unlock on specific ethdev,
> > > > > > > > > > a locked device can't be detached, this help
> > > > > > > > > > applicaiton to prevent unexpected device detaching,
> > > > > > > > > > especially in multi-process
> > > envrionment.
> > > > > > > > >
> > > > > > > > > Trying to understand: a process of an application could
> > > > > > > > > try to detach a port while another process is against this
> decision.
> > > > > > > > > Why an application needs to be protected against itself?
> > > > > > > >
> > > > > > > > I think we can regard this as a help function, it help
> > > > > > > > application to simplified
> > > > > > > the situation when one process want to detach a device while
> > > > > > > another one is still using it.
> > > > > > > > Application can register a callback which can do to
> > > > > > > > necessary clean up (like
> > > > > > > stop traffic, release memory ...) before device be detached.
> > > > > > >
> > > > > > > Yes I agree such hook can be a good idea.
> > > [...]
> > > > > > > After all, it is just a pre-detach hook.
> > > > > >
> > > > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > > > > Perhaps we just need to improve the handling of the DESTROY
> event?
> > > > > >
> > > > > > I have thought about this before.
> > > > > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook
> > > > > > here
> > > > > need to give feedback, pass or fail will impact the following
> > > > > behavior, this make it special, so I separate it from all exist
> > > > > rte_eth_event_type handle mechanism.
> > > > >
> > > > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> > > >
> > > > OK, that should work.
> > > > >
> > > > > > The alternative solution is
> > > > > > we just introduce a new event type like
> > > > > > RTE_ETH_EVENT_PRE_DETACH and reuse all exist API
> > > > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > > > >
> > > > > I don't think we need a new event.
> > > > > Let's try to use RTE_ETH_EVENT_DESTROY.
> > > >
> > > > The problem is RTE_ETH_EVENT_DESTROY is used in
> > > rte_eth_dev_release_port already.
> > > > And in PMD, rte_eth_dev_release_port is called after dev_uninit,
> > > > that mean its too late to reject a detach
> > >
> > > You're right.
> > >
> > > It's a real mess currently.
> > > The right order should be to remove ethdev ports before removing the
> > > underlying EAL device. But it's strangely not the case.
> > >
> > > We need to separate things.
> > > The function rte_eth_dev_close can be used to remove an ethdev port
> > > if we add a call to rte_eth_dev_release_port.
> > > So we could call rte_eth_dev_close in PMD remove functions.
> > > Is "close" a good time to ask confirmation to the application?
> > > Or should we ask confirmation a step before, on "stop"?
> >
> > I think the confirmation should before any cleanup stage, it should at the
> beginning of driver->remove.
> 
> So you stop a port, even if the app policy is against detaching it?

My understanding is, stop and detach is different, we may stop a device and reconfigure it then restart it.
but for detach, properly we will not use it, unless it be probed again.
For dev_close , it should be called after dev_stop.
so we have to like below.

If (dev->started) {
	dev_stop /* but still problem here, if traffic is ongoing */
	if (dev_close()) {
		dev_start()
		return -EBUSY.
	}
} else {
	If (dev_close())
		Return _EBUSY
}

So for me, neither rte_eth_dev_stop and rte_eth_dev_close is the right place to check this.
But rte_eth_dev_destroy looks like a good one. We can put all the ethdev general logic into it, 
and PMD specific dev_unit will be called at last

> 
> > Also we should not put it into rte_eth_dev_stop, because, rte_eth_dev_stop
> can invoked by application directly, in that case, we don't what any callback be
> invoked.
> 
> It it the same to detach a port: it is invoked directly by application.
> I thought you wanted a callback as helper for inter-process management?
> 
> > > > So , do you mean we can remove
> > > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in
> > > > rte_eth_dev_release_port
> > >
> > > I would say we need RTE_ETH_EVENT_DESTROY to notify that the port is
> > > really destroyed.
> > > Maybe the right thing to do is to add a new event
> > > RTE_ETH_EVENT_CLOSE_REQUEST or something else.
> > > Note that we already have 2 removal events in ethdev:
> > > 	- RTE_ETH_EVENT_INTR_RMV when the port cannot be used anymore
> > > 	- RTE_ETH_EVENT_DESTROY when the port is going to be deleted
> > >
> > > > And where is right place to call
> > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
> > > > If can't be called in rte_eth_dev_detach, because if device is
> > > > removed by
> > > rte_eal_hotplug_remove, it will be skipped.
> > >
> > > No, rte_eth_dev_detach and rte_eal_hotplug_remove are 2 different
> things.
> > > One is a mix of ethdev and EAL (and should be deprecated), the other
> > > one is for the underlying device at EAL level.
> > >
> > > > probably we need to call this at the beginning of each PMD
> > > > driver->remove?,
> > > that means, we need to change all PMD drivers?
> > >
> > > Yes, we can call rte_eth_dev_stop and rte_eth_dev_close at the
> > > beginning of PMD remove.
> > > Note that there is already a helper rte_eth_dev_destroy called in
> > > some PMD to achieve the removal, but curiously, it doesn't call stop and
> close functions.
> >
> > Currently PMD implement driver->remove with different way,
> rte_eth_dev_stop / rte_eth_dev_close / rte_eth_dev_destroy is not always be
> invoked.
> > So Before we standardize what ethdev API and what sequence should be
> > called in driver->remove (I think this is a separate task) I will
> > suggest 1. Create another help function like
> > _rte_eth_dev_allow_to_remove, 2. the help function will call
> _rte_eth_dev_callback_process(RTE_ETH_EVENT_PRE_REMOVE) and update
> ret_param which contain a reject count.
> > 3. the help function should to invoked at beginning at driver->remove and
> driver->remove will abort if the help function failed.
> >
> > But once we standardized that , we can do cleanup to merge it into another
> rte_eth_xxx API in next step.
> >
> > What do you think?
> 
> No
> All the problems we have today are because we preferred add more and more
> functions instead of fixing the basic stuff. And it is especially the case for all the
> detach crap.
> So no.
> Let's fix stuff first.
> 
>
  
Thomas Monjalon July 5, 2018, 7:22 a.m. UTC | #13
05/07/2018 05:37, Zhang, Qi Z:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 05/07/2018 03:38, Zhang, Qi Z:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > 04/07/2018 12:49, Zhang, Qi Z:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > > > > Introduce API rte_eth_dev_lock and rte_eth_dev_unlock
> > > > > > > > > > > to let application lock or unlock on specific ethdev,
> > > > > > > > > > > a locked device can't be detached, this help
> > > > > > > > > > > applicaiton to prevent unexpected device detaching,
> > > > > > > > > > > especially in multi-process
> > > > envrionment.
> > > > > > > > > >
> > > > > > > > > > Trying to understand: a process of an application could
> > > > > > > > > > try to detach a port while another process is against this
> > decision.
> > > > > > > > > > Why an application needs to be protected against itself?
> > > > > > > > >
> > > > > > > > > I think we can regard this as a help function, it help
> > > > > > > > > application to simplified
> > > > > > > > the situation when one process want to detach a device while
> > > > > > > > another one is still using it.
> > > > > > > > > Application can register a callback which can do to
> > > > > > > > > necessary clean up (like
> > > > > > > > stop traffic, release memory ...) before device be detached.
> > > > > > > >
> > > > > > > > Yes I agree such hook can be a good idea.
> > > > [...]
> > > > > > > > After all, it is just a pre-detach hook.
> > > > > > >
> > > > > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > > > > > Perhaps we just need to improve the handling of the DESTROY
> > event?
> > > > > > >
> > > > > > > I have thought about this before.
> > > > > > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the hook
> > > > > > > here
> > > > > > need to give feedback, pass or fail will impact the following
> > > > > > behavior, this make it special, so I separate it from all exist
> > > > > > rte_eth_event_type handle mechanism.
> > > > > >
> > > > > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> > > > >
> > > > > OK, that should work.
> > > > > >
> > > > > > > The alternative solution is
> > > > > > > we just introduce a new event type like
> > > > > > > RTE_ETH_EVENT_PRE_DETACH and reuse all exist API
> > > > > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > > > > >
> > > > > > I don't think we need a new event.
> > > > > > Let's try to use RTE_ETH_EVENT_DESTROY.
> > > > >
> > > > > The problem is RTE_ETH_EVENT_DESTROY is used in
> > > > rte_eth_dev_release_port already.
> > > > > And in PMD, rte_eth_dev_release_port is called after dev_uninit,
> > > > > that mean its too late to reject a detach
> > > >
> > > > You're right.
> > > >
> > > > It's a real mess currently.
> > > > The right order should be to remove ethdev ports before removing the
> > > > underlying EAL device. But it's strangely not the case.
> > > >
> > > > We need to separate things.
> > > > The function rte_eth_dev_close can be used to remove an ethdev port
> > > > if we add a call to rte_eth_dev_release_port.
> > > > So we could call rte_eth_dev_close in PMD remove functions.
> > > > Is "close" a good time to ask confirmation to the application?
> > > > Or should we ask confirmation a step before, on "stop"?
> > >
> > > I think the confirmation should before any cleanup stage, it should at the
> > beginning of driver->remove.
> > 
> > So you stop a port, even if the app policy is against detaching it?
> 
> My understanding is, stop and detach is different, we may stop a device and reconfigure it then restart it.
> but for detach, properly we will not use it, unless it be probed again.
> For dev_close , it should be called after dev_stop.
> so we have to like below.
> 
> If (dev->started) {
> 	dev_stop /* but still problem here, if traffic is ongoing */
> 	if (dev_close()) {
> 		dev_start()
> 		return -EBUSY.
> 	}
> } else {
> 	If (dev_close())
> 		Return _EBUSY
> }
> 
> So for me, neither rte_eth_dev_stop and rte_eth_dev_close is the right place to check this.
> But rte_eth_dev_destroy looks like a good one. We can put all the ethdev general logic into it, 
> and PMD specific dev_unit will be called at last

If you want to detach a port, you need to stop it.
If one process try to detach a port, but another process decides
(via callback) that the port should not be detached,
you will have stopped a port for no good reason.
To me it is a real design issue.


> > > Also we should not put it into rte_eth_dev_stop, because, rte_eth_dev_stop
> > can invoked by application directly, in that case, we don't what any callback be
> > invoked.
> > 
> > It it the same to detach a port: it is invoked directly by application.
> > I thought you wanted a callback as helper for inter-process management?
> > 
> > > > > So , do you mean we can remove
> > > > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in
> > > > > rte_eth_dev_release_port
> > > >
> > > > I would say we need RTE_ETH_EVENT_DESTROY to notify that the port is
> > > > really destroyed.
> > > > Maybe the right thing to do is to add a new event
> > > > RTE_ETH_EVENT_CLOSE_REQUEST or something else.
> > > > Note that we already have 2 removal events in ethdev:
> > > > 	- RTE_ETH_EVENT_INTR_RMV when the port cannot be used anymore
> > > > 	- RTE_ETH_EVENT_DESTROY when the port is going to be deleted
> > > >
> > > > > And where is right place to call
> > > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
> > > > > If can't be called in rte_eth_dev_detach, because if device is
> > > > > removed by
> > > > rte_eal_hotplug_remove, it will be skipped.
> > > >
> > > > No, rte_eth_dev_detach and rte_eal_hotplug_remove are 2 different
> > things.
> > > > One is a mix of ethdev and EAL (and should be deprecated), the other
> > > > one is for the underlying device at EAL level.
> > > >
> > > > > probably we need to call this at the beginning of each PMD
> > > > > driver->remove?,
> > > > that means, we need to change all PMD drivers?
> > > >
> > > > Yes, we can call rte_eth_dev_stop and rte_eth_dev_close at the
> > > > beginning of PMD remove.
> > > > Note that there is already a helper rte_eth_dev_destroy called in
> > > > some PMD to achieve the removal, but curiously, it doesn't call stop and
> > close functions.
> > >
> > > Currently PMD implement driver->remove with different way,
> > rte_eth_dev_stop / rte_eth_dev_close / rte_eth_dev_destroy is not always be
> > invoked.
> > > So Before we standardize what ethdev API and what sequence should be
> > > called in driver->remove (I think this is a separate task) I will
> > > suggest 1. Create another help function like
> > > _rte_eth_dev_allow_to_remove, 2. the help function will call
> > _rte_eth_dev_callback_process(RTE_ETH_EVENT_PRE_REMOVE) and update
> > ret_param which contain a reject count.
> > > 3. the help function should to invoked at beginning at driver->remove and
> > driver->remove will abort if the help function failed.
> > >
> > > But once we standardized that , we can do cleanup to merge it into another
> > rte_eth_xxx API in next step.
> > >
> > > What do you think?
> > 
> > No
> > All the problems we have today are because we preferred add more and more
> > functions instead of fixing the basic stuff. And it is especially the case for all the
> > detach crap.
> > So no.
> > Let's fix stuff first.
> > 
> > 
>
  
Qi Zhang July 5, 2018, 9:54 a.m. UTC | #14
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, July 5, 2018 3:23 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton,
> Benjamin H <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>; arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 05/07/2018 05:37, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 05/07/2018 03:38, Zhang, Qi Z:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > 04/07/2018 12:49, Zhang, Qi Z:
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > > > > > Introduce API rte_eth_dev_lock and
> > > > > > > > > > > > rte_eth_dev_unlock to let application lock or
> > > > > > > > > > > > unlock on specific ethdev, a locked device can't
> > > > > > > > > > > > be detached, this help applicaiton to prevent
> > > > > > > > > > > > unexpected device detaching, especially in
> > > > > > > > > > > > multi-process
> > > > > envrionment.
> > > > > > > > > > >
> > > > > > > > > > > Trying to understand: a process of an application
> > > > > > > > > > > could try to detach a port while another process is
> > > > > > > > > > > against this
> > > decision.
> > > > > > > > > > > Why an application needs to be protected against itself?
> > > > > > > > > >
> > > > > > > > > > I think we can regard this as a help function, it help
> > > > > > > > > > application to simplified
> > > > > > > > > the situation when one process want to detach a device
> > > > > > > > > while another one is still using it.
> > > > > > > > > > Application can register a callback which can do to
> > > > > > > > > > necessary clean up (like
> > > > > > > > > stop traffic, release memory ...) before device be detached.
> > > > > > > > >
> > > > > > > > > Yes I agree such hook can be a good idea.
> > > > > [...]
> > > > > > > > > After all, it is just a pre-detach hook.
> > > > > > > >
> > > > > > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > > > > > > Perhaps we just need to improve the handling of the
> > > > > > > > > DESTROY
> > > event?
> > > > > > > >
> > > > > > > > I have thought about this before.
> > > > > > > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the
> > > > > > > > hook here
> > > > > > > need to give feedback, pass or fail will impact the
> > > > > > > following behavior, this make it special, so I separate it
> > > > > > > from all exist rte_eth_event_type handle mechanism.
> > > > > > >
> > > > > > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> > > > > >
> > > > > > OK, that should work.
> > > > > > >
> > > > > > > > The alternative solution is we just introduce a new event
> > > > > > > > type like RTE_ETH_EVENT_PRE_DETACH and reuse all exist API
> > > > > > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > > > > > >
> > > > > > > I don't think we need a new event.
> > > > > > > Let's try to use RTE_ETH_EVENT_DESTROY.
> > > > > >
> > > > > > The problem is RTE_ETH_EVENT_DESTROY is used in
> > > > > rte_eth_dev_release_port already.
> > > > > > And in PMD, rte_eth_dev_release_port is called after
> > > > > > dev_uninit, that mean its too late to reject a detach
> > > > >
> > > > > You're right.
> > > > >
> > > > > It's a real mess currently.
> > > > > The right order should be to remove ethdev ports before removing
> > > > > the underlying EAL device. But it's strangely not the case.
> > > > >
> > > > > We need to separate things.
> > > > > The function rte_eth_dev_close can be used to remove an ethdev
> > > > > port if we add a call to rte_eth_dev_release_port.
> > > > > So we could call rte_eth_dev_close in PMD remove functions.
> > > > > Is "close" a good time to ask confirmation to the application?
> > > > > Or should we ask confirmation a step before, on "stop"?
> > > >
> > > > I think the confirmation should before any cleanup stage, it
> > > > should at the
> > > beginning of driver->remove.
> > >
> > > So you stop a port, even if the app policy is against detaching it?
> >
> > My understanding is, stop and detach is different, we may stop a device and
> reconfigure it then restart it.
> > but for detach, properly we will not use it, unless it be probed again.
> > For dev_close , it should be called after dev_stop.
> > so we have to like below.
> >
> > If (dev->started) {
> > 	dev_stop /* but still problem here, if traffic is ongoing */
> > 	if (dev_close()) {
> > 		dev_start()
> > 		return -EBUSY.
> > 	}
> > } else {
> > 	If (dev_close())
> > 		Return _EBUSY
> > }
> >
> > So for me, neither rte_eth_dev_stop and rte_eth_dev_close is the right place
> to check this.
> > But rte_eth_dev_destroy looks like a good one. We can put all the
> > ethdev general logic into it, and PMD specific dev_unit will be called
> > at last
> 
> If you want to detach a port, you need to stop it.
> If one process try to detach a port, but another process decides (via callback)
> that the port should not be detached, you will have stopped a port for no good
> reason.
> To me it is a real design issue.

Yes, so I think we still need two iterates for detach.
First iterate to get agreement on all processes.
Secondary iterate to do the detach.

But how?
> 
> 
> > > > Also we should not put it into rte_eth_dev_stop, because,
> > > > rte_eth_dev_stop
> > > can invoked by application directly, in that case, we don't what any
> > > callback be invoked.
> > >
> > > It it the same to detach a port: it is invoked directly by application.
> > > I thought you wanted a callback as helper for inter-process management?
> > >
> > > > > > So , do you mean we can remove
> > > > > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in
> > > > > > rte_eth_dev_release_port
> > > > >
> > > > > I would say we need RTE_ETH_EVENT_DESTROY to notify that the
> > > > > port is really destroyed.
> > > > > Maybe the right thing to do is to add a new event
> > > > > RTE_ETH_EVENT_CLOSE_REQUEST or something else.
> > > > > Note that we already have 2 removal events in ethdev:
> > > > > 	- RTE_ETH_EVENT_INTR_RMV when the port cannot be used
> anymore
> > > > > 	- RTE_ETH_EVENT_DESTROY when the port is going to be deleted
> > > > >
> > > > > > And where is right place to call
> > > > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
> > > > > > If can't be called in rte_eth_dev_detach, because if device is
> > > > > > removed by
> > > > > rte_eal_hotplug_remove, it will be skipped.
> > > > >
> > > > > No, rte_eth_dev_detach and rte_eal_hotplug_remove are 2
> > > > > different
> > > things.
> > > > > One is a mix of ethdev and EAL (and should be deprecated), the
> > > > > other one is for the underlying device at EAL level.
> > > > >
> > > > > > probably we need to call this at the beginning of each PMD
> > > > > > driver->remove?,
> > > > > that means, we need to change all PMD drivers?
> > > > >
> > > > > Yes, we can call rte_eth_dev_stop and rte_eth_dev_close at the
> > > > > beginning of PMD remove.
> > > > > Note that there is already a helper rte_eth_dev_destroy called
> > > > > in some PMD to achieve the removal, but curiously, it doesn't
> > > > > call stop and
> > > close functions.
> > > >
> > > > Currently PMD implement driver->remove with different way,
> > > rte_eth_dev_stop / rte_eth_dev_close / rte_eth_dev_destroy is not
> > > always be invoked.
> > > > So Before we standardize what ethdev API and what sequence should
> > > > be called in driver->remove (I think this is a separate task) I
> > > > will suggest 1. Create another help function like
> > > > _rte_eth_dev_allow_to_remove, 2. the help function will call
> > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_PRE_REMOVE) and
> update
> > > ret_param which contain a reject count.
> > > > 3. the help function should to invoked at beginning at
> > > > driver->remove and
> > > driver->remove will abort if the help function failed.
> > > >
> > > > But once we standardized that , we can do cleanup to merge it into
> > > > another
> > > rte_eth_xxx API in next step.
> > > >
> > > > What do you think?
> > >
> > > No
> > > All the problems we have today are because we preferred add more and
> > > more functions instead of fixing the basic stuff. And it is
> > > especially the case for all the detach crap.
> > > So no.
> > > Let's fix stuff first.
> > >
> > >
> >
> 
> 
> 
>
  
Thomas Monjalon July 5, 2018, 10:54 a.m. UTC | #15
05/07/2018 11:54, Zhang, Qi Z:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 05/07/2018 05:37, Zhang, Qi Z:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > 05/07/2018 03:38, Zhang, Qi Z:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > 04/07/2018 12:49, Zhang, Qi Z:
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > > > > > > Introduce API rte_eth_dev_lock and
> > > > > > > > > > > > > rte_eth_dev_unlock to let application lock or
> > > > > > > > > > > > > unlock on specific ethdev, a locked device can't
> > > > > > > > > > > > > be detached, this help applicaiton to prevent
> > > > > > > > > > > > > unexpected device detaching, especially in
> > > > > > > > > > > > > multi-process
> > > > > > envrionment.
> > > > > > > > > > > >
> > > > > > > > > > > > Trying to understand: a process of an application
> > > > > > > > > > > > could try to detach a port while another process is
> > > > > > > > > > > > against this
> > > > decision.
> > > > > > > > > > > > Why an application needs to be protected against itself?
> > > > > > > > > > >
> > > > > > > > > > > I think we can regard this as a help function, it help
> > > > > > > > > > > application to simplified
> > > > > > > > > > the situation when one process want to detach a device
> > > > > > > > > > while another one is still using it.
> > > > > > > > > > > Application can register a callback which can do to
> > > > > > > > > > > necessary clean up (like
> > > > > > > > > > stop traffic, release memory ...) before device be detached.
> > > > > > > > > >
> > > > > > > > > > Yes I agree such hook can be a good idea.
> > > > > > [...]
> > > > > > > > > > After all, it is just a pre-detach hook.
> > > > > > > > >
> > > > > > > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > > > > > > > Perhaps we just need to improve the handling of the
> > > > > > > > > > DESTROY
> > > > event?
> > > > > > > > >
> > > > > > > > > I have thought about this before.
> > > > > > > > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the
> > > > > > > > > hook here
> > > > > > > > need to give feedback, pass or fail will impact the
> > > > > > > > following behavior, this make it special, so I separate it
> > > > > > > > from all exist rte_eth_event_type handle mechanism.
> > > > > > > >
> > > > > > > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> > > > > > >
> > > > > > > OK, that should work.
> > > > > > > >
> > > > > > > > > The alternative solution is we just introduce a new event
> > > > > > > > > type like RTE_ETH_EVENT_PRE_DETACH and reuse all exist API
> > > > > > > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > > > > > > >
> > > > > > > > I don't think we need a new event.
> > > > > > > > Let's try to use RTE_ETH_EVENT_DESTROY.
> > > > > > >
> > > > > > > The problem is RTE_ETH_EVENT_DESTROY is used in
> > > > > > rte_eth_dev_release_port already.
> > > > > > > And in PMD, rte_eth_dev_release_port is called after
> > > > > > > dev_uninit, that mean its too late to reject a detach
> > > > > >
> > > > > > You're right.
> > > > > >
> > > > > > It's a real mess currently.
> > > > > > The right order should be to remove ethdev ports before removing
> > > > > > the underlying EAL device. But it's strangely not the case.
> > > > > >
> > > > > > We need to separate things.
> > > > > > The function rte_eth_dev_close can be used to remove an ethdev
> > > > > > port if we add a call to rte_eth_dev_release_port.
> > > > > > So we could call rte_eth_dev_close in PMD remove functions.
> > > > > > Is "close" a good time to ask confirmation to the application?
> > > > > > Or should we ask confirmation a step before, on "stop"?
> > > > >
> > > > > I think the confirmation should before any cleanup stage, it
> > > > > should at the
> > > > beginning of driver->remove.
> > > >
> > > > So you stop a port, even if the app policy is against detaching it?
> > >
> > > My understanding is, stop and detach is different, we may stop a device and
> > reconfigure it then restart it.
> > > but for detach, properly we will not use it, unless it be probed again.
> > > For dev_close , it should be called after dev_stop.
> > > so we have to like below.
> > >
> > > If (dev->started) {
> > > 	dev_stop /* but still problem here, if traffic is ongoing */
> > > 	if (dev_close()) {
> > > 		dev_start()
> > > 		return -EBUSY.
> > > 	}
> > > } else {
> > > 	If (dev_close())
> > > 		Return _EBUSY
> > > }
> > >
> > > So for me, neither rte_eth_dev_stop and rte_eth_dev_close is the right place
> > to check this.
> > > But rte_eth_dev_destroy looks like a good one. We can put all the
> > > ethdev general logic into it, and PMD specific dev_unit will be called
> > > at last
> > 
> > If you want to detach a port, you need to stop it.
> > If one process try to detach a port, but another process decides (via callback)
> > that the port should not be detached, you will have stopped a port for no good
> > reason.
> > To me it is a real design issue.
> 
> Yes, so I think we still need two iterates for detach.
> First iterate to get agreement on all processes.
> Secondary iterate to do the detach.
> 
> But how?

An option is to let the application manages itself
its process synchronization and authorization for detaching devices.
  
Qi Zhang July 5, 2018, 12:16 p.m. UTC | #16
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, July 5, 2018 6:55 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton,
> Benjamin H <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>; arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
> 
> 05/07/2018 11:54, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 05/07/2018 05:37, Zhang, Qi Z:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > 05/07/2018 03:38, Zhang, Qi Z:
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > 04/07/2018 12:49, Zhang, Qi Z:
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > > > > > > > Introduce API rte_eth_dev_lock and
> > > > > > > > > > > > > > rte_eth_dev_unlock to let application lock or
> > > > > > > > > > > > > > unlock on specific ethdev, a locked device
> > > > > > > > > > > > > > can't be detached, this help applicaiton to
> > > > > > > > > > > > > > prevent unexpected device detaching,
> > > > > > > > > > > > > > especially in multi-process
> > > > > > > envrionment.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Trying to understand: a process of an
> > > > > > > > > > > > > application could try to detach a port while
> > > > > > > > > > > > > another process is against this
> > > > > decision.
> > > > > > > > > > > > > Why an application needs to be protected against itself?
> > > > > > > > > > > >
> > > > > > > > > > > > I think we can regard this as a help function, it
> > > > > > > > > > > > help application to simplified
> > > > > > > > > > > the situation when one process want to detach a
> > > > > > > > > > > device while another one is still using it.
> > > > > > > > > > > > Application can register a callback which can do
> > > > > > > > > > > > to necessary clean up (like
> > > > > > > > > > > stop traffic, release memory ...) before device be detached.
> > > > > > > > > > >
> > > > > > > > > > > Yes I agree such hook can be a good idea.
> > > > > > > [...]
> > > > > > > > > > > After all, it is just a pre-detach hook.
> > > > > > > > > >
> > > > > > > > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY
> callback?
> > > > > > > > > > > Perhaps we just need to improve the handling of the
> > > > > > > > > > > DESTROY
> > > > > event?
> > > > > > > > > >
> > > > > > > > > > I have thought about this before.
> > > > > > > > > > Not like RTE_ETH_EVENT_DESTROY and other event hook,
> > > > > > > > > > the hook here
> > > > > > > > > need to give feedback, pass or fail will impact the
> > > > > > > > > following behavior, this make it special, so I separate
> > > > > > > > > it from all exist rte_eth_event_type handle mechanism.
> > > > > > > > >
> > > > > > > > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> > > > > > > >
> > > > > > > > OK, that should work.
> > > > > > > > >
> > > > > > > > > > The alternative solution is we just introduce a new
> > > > > > > > > > event type like RTE_ETH_EVENT_PRE_DETACH and reuse all
> > > > > > > > > > exist API
> > > > > > > > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > > > > > > > >
> > > > > > > > > I don't think we need a new event.
> > > > > > > > > Let's try to use RTE_ETH_EVENT_DESTROY.
> > > > > > > >
> > > > > > > > The problem is RTE_ETH_EVENT_DESTROY is used in
> > > > > > > rte_eth_dev_release_port already.
> > > > > > > > And in PMD, rte_eth_dev_release_port is called after
> > > > > > > > dev_uninit, that mean its too late to reject a detach
> > > > > > >
> > > > > > > You're right.
> > > > > > >
> > > > > > > It's a real mess currently.
> > > > > > > The right order should be to remove ethdev ports before
> > > > > > > removing the underlying EAL device. But it's strangely not the case.
> > > > > > >
> > > > > > > We need to separate things.
> > > > > > > The function rte_eth_dev_close can be used to remove an
> > > > > > > ethdev port if we add a call to rte_eth_dev_release_port.
> > > > > > > So we could call rte_eth_dev_close in PMD remove functions.
> > > > > > > Is "close" a good time to ask confirmation to the application?
> > > > > > > Or should we ask confirmation a step before, on "stop"?
> > > > > >
> > > > > > I think the confirmation should before any cleanup stage, it
> > > > > > should at the
> > > > > beginning of driver->remove.
> > > > >
> > > > > So you stop a port, even if the app policy is against detaching it?
> > > >
> > > > My understanding is, stop and detach is different, we may stop a
> > > > device and
> > > reconfigure it then restart it.
> > > > but for detach, properly we will not use it, unless it be probed again.
> > > > For dev_close , it should be called after dev_stop.
> > > > so we have to like below.
> > > >
> > > > If (dev->started) {
> > > > 	dev_stop /* but still problem here, if traffic is ongoing */
> > > > 	if (dev_close()) {
> > > > 		dev_start()
> > > > 		return -EBUSY.
> > > > 	}
> > > > } else {
> > > > 	If (dev_close())
> > > > 		Return _EBUSY
> > > > }
> > > >
> > > > So for me, neither rte_eth_dev_stop and rte_eth_dev_close is the
> > > > right place
> > > to check this.
> > > > But rte_eth_dev_destroy looks like a good one. We can put all the
> > > > ethdev general logic into it, and PMD specific dev_unit will be
> > > > called at last
> > >
> > > If you want to detach a port, you need to stop it.
> > > If one process try to detach a port, but another process decides
> > > (via callback) that the port should not be detached, you will have
> > > stopped a port for no good reason.
> > > To me it is a real design issue.
> >
> > Yes, so I think we still need two iterates for detach.
> > First iterate to get agreement on all processes.
> > Secondary iterate to do the detach.
> >
> > But how?
> 
> An option is to let the application manages itself its process synchronization
> and authorization for detaching devices.

Yes, I agree
Before we figure out a comprehensive solution, leave this to application's handle so far.
I will remove this one from patchset.
>
  

Patch

diff --git a/lib/librte_ethdev/Makefile b/lib/librte_ethdev/Makefile
index d0a059b83..62bef03fc 100644
--- a/lib/librte_ethdev/Makefile
+++ b/lib/librte_ethdev/Makefile
@@ -20,6 +20,7 @@  LIBABIVER := 9
 
 SRCS-y += rte_ethdev.c
 SRCS-y += ethdev_mp.c
+SRCS-y += ethdev_lock.c
 SRCS-y += rte_flow.c
 SRCS-y += rte_tm.c
 SRCS-y += rte_mtr.c
diff --git a/lib/librte_ethdev/ethdev_lock.c b/lib/librte_ethdev/ethdev_lock.c
new file mode 100644
index 000000000..6379519e3
--- /dev/null
+++ b/lib/librte_ethdev/ethdev_lock.c
@@ -0,0 +1,140 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include "ethdev_lock.h"
+
+struct lock_entry {
+	TAILQ_ENTRY(lock_entry) next;
+	rte_eth_dev_lock_callback_t callback;
+	uint16_t port_id;
+	void *user_args;
+	int ref_count;
+};
+
+TAILQ_HEAD(lock_entry_list, lock_entry);
+static struct lock_entry_list lock_entry_list =
+	TAILQ_HEAD_INITIALIZER(lock_entry_list);
+static rte_spinlock_t lock_entry_lock = RTE_SPINLOCK_INITIALIZER;
+
+int
+register_lock_callback(uint16_t port_id,
+			rte_eth_dev_lock_callback_t callback,
+			void *user_args)
+{
+	struct lock_entry *le;
+
+	rte_spinlock_lock(&lock_entry_lock);
+
+	TAILQ_FOREACH(le, &lock_entry_list, next) {
+		if (le->port_id == port_id &&
+		    le->callback == callback &&
+		    le->user_args == user_args)
+			break;
+	}
+
+	if (le == NULL) {
+		le = calloc(1, sizeof(struct lock_entry));
+		if (le == NULL) {
+			rte_spinlock_unlock(&lock_entry_lock);
+			return -ENOMEM;
+		}
+		le->callback = callback;
+		le->port_id = port_id;
+		le->user_args = user_args;
+		TAILQ_INSERT_TAIL(&lock_entry_list, le, next);
+	}
+	le->ref_count++;
+
+	rte_spinlock_unlock(&lock_entry_lock);
+	return 0;
+}
+
+int
+unregister_lock_callback(uint16_t port_id,
+			rte_eth_dev_lock_callback_t callback,
+			void *user_args)
+{
+	struct lock_entry *le;
+	int ret = 0;
+
+	rte_spinlock_lock(&lock_entry_lock);
+
+	TAILQ_FOREACH(le, &lock_entry_list, next) {
+		if (le->port_id == port_id &&
+		    le->callback == callback &&
+		    le->user_args == user_args)
+			break;
+	}
+
+	if (le != NULL) {
+		le->ref_count--;
+		if (le->ref_count == 0) {
+			TAILQ_REMOVE(&lock_entry_list, le, next);
+			free(le);
+		}
+	} else {
+		ret = -ENOENT;
+	}
+
+	rte_spinlock_unlock(&lock_entry_lock);
+	return ret;
+}
+
+static int clean_lock_callback_one(uint16_t port_id)
+{
+	struct lock_entry *le;
+	int ret = 0;
+
+	TAILQ_FOREACH(le, &lock_entry_list, next) {
+		if (le->port_id == port_id)
+			break;
+	}
+
+	if (le != NULL) {
+		le->ref_count--;
+		if (le->ref_count == 0) {
+			TAILQ_REMOVE(&lock_entry_list, le, next);
+			free(le);
+		}
+	} else {
+		ret = -ENOENT;
+	}
+
+	return ret;
+
+}
+
+void clean_lock_callback(uint16_t port_id)
+{
+	int ret;
+
+	rte_spinlock_lock(&lock_entry_lock);
+
+	for (;;) {
+		ret = clean_lock_callback_one(port_id);
+		if (ret == -ENOENT)
+			break;
+	}
+
+	rte_spinlock_unlock(&lock_entry_lock);
+}
+
+int process_lock_callbacks(uint16_t port_id)
+{
+	struct lock_entry *le;
+
+	rte_spinlock_lock(&lock_entry_lock);
+
+	TAILQ_FOREACH(le, &lock_entry_list, next) {
+		if (le->port_id != port_id)
+			continue;
+
+		if (le->callback(port_id, le->user_args)) {
+			rte_spinlock_unlock(&lock_entry_lock);
+			return -EBUSY;
+		}
+	}
+
+	rte_spinlock_unlock(&lock_entry_lock);
+	return 0;
+}
diff --git a/lib/librte_ethdev/ethdev_lock.h b/lib/librte_ethdev/ethdev_lock.h
new file mode 100644
index 000000000..82132eb0c
--- /dev/null
+++ b/lib/librte_ethdev/ethdev_lock.h
@@ -0,0 +1,31 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_ETHDEV_LOCK_H_
+#define _RTE_ETHDEV_LOCK_H_
+
+#include "rte_ethdev.h"
+
+/* Register lock callback function on specific port */
+int
+register_lock_callback(uint16_t port_id,
+			rte_eth_dev_lock_callback_t callback,
+			void *user_args);
+
+/* Unregister lock callback function on specific port */
+int
+unregister_lock_callback(uint16_t port_id,
+			rte_eth_dev_lock_callback_t callback,
+			void *user_args);
+
+/**
+ * Unregister all callback function on specific port.
+ * This will be called when a device is detached.
+ */
+void clean_lock_callback(uint16_t port_id);
+
+/* Run each callback one by one. */
+int process_lock_callbacks(uint16_t port_id);
+
+#endif
diff --git a/lib/librte_ethdev/ethdev_mp.c b/lib/librte_ethdev/ethdev_mp.c
index 0f9d8990d..1d148cd5e 100644
--- a/lib/librte_ethdev/ethdev_mp.c
+++ b/lib/librte_ethdev/ethdev_mp.c
@@ -6,6 +6,7 @@ 
 
 #include "rte_ethdev_driver.h"
 #include "ethdev_mp.h"
+#include "ethdev_lock.h"
 
 #define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */
 
@@ -108,7 +109,7 @@  static void __handle_primary_request(void *param)
 		ret = attach_on_secondary(req->devargs, req->port_id);
 		break;
 	case REQ_TYPE_PRE_DETACH:
-		ret = 0;
+		ret = process_lock_callbacks(req->port_id);
 		break;
 	case REQ_TYPE_DETACH:
 	case REQ_TYPE_ATTACH_ROLLBACK:
diff --git a/lib/librte_ethdev/meson.build b/lib/librte_ethdev/meson.build
index b60256855..9bb0aec7f 100644
--- a/lib/librte_ethdev/meson.build
+++ b/lib/librte_ethdev/meson.build
@@ -6,6 +6,7 @@  version = 9
 allow_experimental_apis = true
 sources = files('ethdev_profile.c',
 	'ethdev_mp.c'
+	'ethdev_lock.c'
 	'rte_ethdev.c',
 	'rte_flow.c',
 	'rte_mtr.c',
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 91d9f9a37..7d89d9f95 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -43,6 +43,7 @@ 
 #include "ethdev_profile.h"
 #include "ethdev_mp.h"
 #include "ethdev_private.h"
+#include "ethdev_lock.h"
 
 int ethdev_logtype;
 
@@ -735,6 +736,7 @@  do_eth_dev_detach(uint16_t port_id)
 	if (ret < 0)
 		return ret;
 
+	clean_lock_callback(port_id);
 	if (solid_release)
 		return rte_eth_dev_release_port(&rte_eth_devices[port_id]);
 	else
@@ -803,7 +805,6 @@  rte_eth_dev_attach(const char *devargs, uint16_t *port_id)
 int
 rte_eth_dev_attach_private(const char *devargs, uint16_t *port_id)
 {
-
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return -ENOTSUP;
 
@@ -845,6 +846,10 @@  rte_eth_dev_detach(uint16_t port_id, char *name __rte_unused)
 		return req.result;
 	}
 
+	ret = process_lock_callbacks(port_id);
+	if (ret)
+		return ret;
+
 	/* check pre_detach */
 	req.t = REQ_TYPE_PRE_DETACH;
 	req.port_id = port_id;
@@ -891,6 +896,7 @@  int
 rte_eth_dev_detach_private(uint16_t port_id, char *name __rte_unused)
 {
 	uint32_t dev_flags;
+	int ret;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return -ENOTSUP;
@@ -904,6 +910,10 @@  rte_eth_dev_detach_private(uint16_t port_id, char *name __rte_unused)
 		return -ENOTSUP;
 	}
 
+	ret = process_lock_callbacks(port_id);
+	if (ret)
+		return ret;
+
 	return do_eth_dev_detach(port_id);
 }
 
@@ -4706,6 +4716,54 @@  rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
 	return result;
 }
 
+static int
+dev_is_busy(uint16_t port_id __rte_unused, void *user_args __rte_unused)
+{
+	return -EBUSY;
+}
+
+int
+rte_eth_dev_lock(uint16_t port_id)
+{
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	return register_lock_callback(port_id, dev_is_busy, NULL);
+}
+
+int
+rte_eth_dev_lock_with_callback(uint16_t port_id,
+				rte_eth_dev_lock_callback_t callback,
+				void *user_args)
+{
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (callback == NULL)
+		return -EINVAL;
+
+	return register_lock_callback(port_id, callback, user_args);
+}
+
+int
+rte_eth_dev_unlock(uint16_t port_id)
+{
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	return unregister_lock_callback(port_id, dev_is_busy, NULL);
+}
+
+int
+rte_eth_dev_unlock_with_callback(uint16_t port_id,
+				rte_eth_dev_lock_callback_t callback,
+				void *user_args)
+{
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (callback == NULL)
+		return -EINVAL;
+
+	return unregister_lock_callback(port_id, callback, user_args);
+}
+
 RTE_INIT(ethdev_init_log);
 static void
 ethdev_init_log(void)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 4d207e2fe..b2c2de600 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -4364,6 +4364,130 @@  rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * Callback function before device is detached.
+ *
+ * This type of function will be added into a function list, and will be
+ * invoked before device be detached. Application can register a callback
+ * function so it can be notified and do some cleanup before detach happen.
+ * Also, any callback function return !0 value will prevent device be
+ * detached (ref. rte_eth_dev_lock_with_callback and
+ * rte_eth_dev_unlock_with_callback).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param user_args
+ *   This is parameter "user_args" be saved when callback function is
+ *   registered(rte_dev_eth_lock).
+ *
+ * @return
+ *   0  device is allowed be detached.
+ *   !0 device is not allowed be detached.
+ */
+typedef int (*rte_eth_dev_lock_callback_t)(uint16_t port_id, void *user_args);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Lock an Ethernet Device, this help application to prevent a device
+ * be detached unexpectedly.
+ *
+ * @note
+ *   In multi-process situation, any process lock a share device will
+ *   prevent it be detached from all process. Also this is per-process
+ *   lock, which means unlock a device from process A take no effect
+ *   if the device is locked from process B.
+ *
+ * @note
+ *   Lock a device multiple times will increase a ref_count, and
+ *   corresponding unlock decrease the ref_count, the device will be
+ *   unlocked when ref_count reach 0.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *
+ * @return
+ *   0 on success, negative on error.
+ */
+int __rte_experimental rte_eth_dev_lock(uint16_t port_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Lock an Ethernet device base on a callback function which can performs
+ * condition check at the moment before device be detached. if the
+ * condition check not pass, the device will not be detached, else,
+ * continue to detach or not rely on return value of other callbacks
+ * on the same port.
+ *
+ * @note
+ *   Same as rte_eth_dev_lock, it is per-process lock.
+ *
+ * @note
+ *   Lock a device with different callback or user_args will add different
+ *   lock entries (<callback, user_args> pair) in a list. Lock a device
+ *   multiple times with same callback and args will only increase a
+ *   ref_count of specific lock entry, and corresponding unlock decrease
+ *   the ref_count, an entry will be removed if its ref_count reach 0.
+ *
+ * @note
+ *   All callbacks be attached to specific port will be removed
+ *   automatically if the device is detached.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param callback
+ *   the callback function will be added into a pre-detach list,
+ *   it will be invoked when a device is going to be detached. The
+ *   return value will decide if continue detach the device or not.
+ * @param user_args
+ *   parameter will be parsed to callback function.
+ *
+ * @return
+ *   0 on success, negative on error.
+ */
+int __rte_experimental
+rte_eth_dev_lock_with_callback(uint16_t port_id,
+				rte_eth_dev_lock_callback_t callback,
+				void *user_args);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reverse operation of rte_eth_dev_lock.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *
+ * @return
+ *   0 on success, negative on error.
+ */
+int __rte_experimental rte_eth_dev_unlock(uint16_t port_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reverse operation of rte_eth_dev_lock_with_callback.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param callback
+ *   parameter to match a lock entry.
+ * @param user_args
+ *   parameter to match a lock entry.
+ *
+ * @return
+ *   0 on success, negative on error.
+ */
+int __rte_experimental
+rte_eth_dev_unlock_with_callback(uint16_t port_id,
+				rte_eth_dev_lock_callback_t callback,
+				void *user_args);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index fbcc03925..1e270a387 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -225,6 +225,7 @@  EXPERIMENTAL {
 	rte_eth_dev_get_module_eeprom;
 	rte_eth_dev_get_module_info;
 	rte_eth_dev_is_removed;
+	rte_eth_dev_lock;
 	rte_eth_dev_owner_delete;
 	rte_eth_dev_owner_get;
 	rte_eth_dev_owner_new;
@@ -232,6 +233,7 @@  EXPERIMENTAL {
 	rte_eth_dev_owner_unset;
 	rte_eth_dev_rx_offload_name;
 	rte_eth_dev_tx_offload_name;
+	rte_eth_dev_unlock;
 	rte_eth_switch_domain_alloc;
 	rte_eth_switch_domain_free;
 	rte_mtr_capabilities_get;