[v4,06/24] ethdev: enable hotplug on multi-process

Message ID 20180626070832.3055-7-qi.z.zhang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series enable hotplug on multi-process |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Qi Zhang June 26, 2018, 7:08 a.m. UTC
  We are going to introduce the solution to handle different hotplug
cases in multi-process situation, it include below scenario:

1. Attach a share device from primary
2. Detach a share device from primary
3. Attach a share device from secondary
4. Detach a share device from secondary
5. Attach a private device from secondary
6. Detach a private device from secondary
7. Detach a share device from secondary privately
8. Attach a share device from secondary privately

In primary-secondary process model, we assume device is shared by default.
that means attach or detach a device on any process will broadcast to
all other processes through mp channel then device information will be
synchronized on all processes.

Any failure during attaching process will cause inconsistent status
between processes, so proper rollback action should be considered.
Also it is not safe to detach a share device when other process still use
it, so a handshake mechanism is introduced.

This patch covers the implementation of case 1,2,5,6,7,8.
Case 3,4 will be implemented on separate patch as well as handshake
mechanism.

Scenario for Case 1, 2:

attach device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Case 5, 6:
Secondary process can attach private device which only visible to itself,
in this case no IPC is involved, primary process is not allowed to have
private device so far.

Case 7, 8:
Secondary process can also temporally to detach a share device "privately"
then attach it back later, this action also not impact other processes.

APIs changes:

rte_eth_dev_attach and rte_eth_dev_attach are extended to support
share device attach/detach in primary-secondary process model, it will
be called in case 1,2,3,4.

New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are
introduced to cover case 5,6,7,8, this API can only be invoked in secondary
process.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/librte_ethdev/Makefile          |   1 +
 lib/librte_ethdev/ethdev_mp.c       | 248 ++++++++++++++++++++++++++++++++++++
 lib/librte_ethdev/ethdev_mp.h       |  41 ++++++
 lib/librte_ethdev/ethdev_private.h  |  39 ++++++
 lib/librte_ethdev/meson.build       |   1 +
 lib/librte_ethdev/rte_ethdev.c      | 190 ++++++++++++++++++++++++---
 lib/librte_ethdev/rte_ethdev.h      |  45 +++++++
 lib/librte_ethdev/rte_ethdev_core.h |   5 +
 8 files changed, 553 insertions(+), 17 deletions(-)
 create mode 100644 lib/librte_ethdev/ethdev_mp.c
 create mode 100644 lib/librte_ethdev/ethdev_mp.h
 create mode 100644 lib/librte_ethdev/ethdev_private.h
  

Comments

Burakov, Anatoly June 26, 2018, 12:09 p.m. UTC | #1
On 26-Jun-18 8:08 AM, Qi Zhang wrote:
> We are going to introduce the solution to handle different hotplug
> cases in multi-process situation, it include below scenario:
> 
> 1. Attach a share device from primary
> 2. Detach a share device from primary
> 3. Attach a share device from secondary
> 4. Detach a share device from secondary
> 5. Attach a private device from secondary
> 6. Detach a private device from secondary
> 7. Detach a share device from secondary privately
> 8. Attach a share device from secondary privately
> 
> In primary-secondary process model, we assume device is shared by default.
> that means attach or detach a device on any process will broadcast to
> all other processes through mp channel then device information will be
> synchronized on all processes.
> 
> Any failure during attaching process will cause inconsistent status
> between processes, so proper rollback action should be considered.
> Also it is not safe to detach a share device when other process still use
> it, so a handshake mechanism is introduced.
> 
> This patch covers the implementation of case 1,2,5,6,7,8.
> Case 3,4 will be implemented on separate patch as well as handshake
> mechanism.
> 
> Scenario for Case 1, 2:
> 
> attach device
> a) primary attach the new device if failed goto h).
> b) primary send attach sync request to all secondary.
> c) secondary receive request and attach device and send reply.
> d) primary check the reply if all success go to i).
> e) primary send attach rollback sync request to all secondary.
> f) secondary receive the request and detach device and send reply.
> g) primary receive the reply and detach device as rollback action.
> h) attach fail
> i) attach success
> 
> detach device
> a) primary perform pre-detach check, if device is locked, goto i).
> b) primary send pre-detach sync request to all secondary.
> c) secondary perform pre-detach check and send reply.
> d) primary check the reply if any fail goto i).
> e) primary send detach sync request to all secondary
> f) secondary detach the device and send reply (assume no fail)
> g) primary detach the device.
> h) detach success
> i) detach failed
> 
> Case 5, 6:
> Secondary process can attach private device which only visible to itself,
> in this case no IPC is involved, primary process is not allowed to have
> private device so far.
> 
> Case 7, 8:
> Secondary process can also temporally to detach a share device "privately"
> then attach it back later, this action also not impact other processes.
> 
> APIs changes:
> 
> rte_eth_dev_attach and rte_eth_dev_attach are extended to support
> share device attach/detach in primary-secondary process model, it will
> be called in case 1,2,3,4.
> 
> New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are
> introduced to cover case 5,6,7,8, this API can only be invoked in secondary
> process.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> ---

<snip>

> +static int
> +handle_primary_request(const struct rte_mp_msg *msg, const void *peer)
> +{
> +
> +	struct rte_mp_msg mp_resp;
> +	const struct eth_dev_mp_req *req =
> +		(const struct eth_dev_mp_req *)msg->param;
> +	struct eth_dev_mp_req *resp =
> +		(struct eth_dev_mp_req *)mp_resp.param;
> +	struct mp_reply_bundle *bundle;
> +	int ret = 0;
> +
> +	memset(&mp_resp, 0, sizeof(mp_resp));
> +	strlcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
> +	mp_resp.len_param = sizeof(*req);
> +	memcpy(resp, req, sizeof(*resp));
> +
> +	bundle = calloc(1, sizeof(*bundle));
> +	if (bundle == NULL) {
> +		resp->result = -ENOMEM;
> +		ret = rte_mp_reply(&mp_resp, peer);
> +		if (ret) {
> +			ethdev_log(ERR, "failed to send reply to primary request\n");
> +			return ret;
> +		}
> +	}
> +
> +	bundle->msg = *msg;
> +	bundle->peer = peer;
> +
> +	ret = rte_eal_mp_task_add(__handle_primary_request, bundle);
> +	if (ret) {
> +		resp->result = ret;
> +		ret = rte_mp_reply(&mp_resp, peer);
> +		if (ret) {
> +			ethdev_log(ERR, "failed to send reply to primary request\n");
> +			return ret;
> +		}
> +	}

What you're doing here is quite dangerous. The parameter "const void 
*peer" is only guaranteed to be valid at the time of the callback - not 
necessarily afterwards. So, if you're handing off sending replies to a 
separate thread, things might blow up because the pointer may no longer 
be valid.
  
Qi Zhang June 26, 2018, 12:19 p.m. UTC | #2
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, June 26, 2018 8:09 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> 
> On 26-Jun-18 8:08 AM, Qi Zhang wrote:
> > We are going to introduce the solution to handle different hotplug
> > cases in multi-process situation, it include below scenario:
> >
> > 1. Attach a share device from primary
> > 2. Detach a share device from primary
> > 3. Attach a share device from secondary 4. Detach a share device from
> > secondary 5. Attach a private device from secondary 6. Detach a
> > private device from secondary 7. Detach a share device from secondary
> > privately 8. Attach a share device from secondary privately
> >
> > In primary-secondary process model, we assume device is shared by default.
> > that means attach or detach a device on any process will broadcast to
> > all other processes through mp channel then device information will be
> > synchronized on all processes.
> >
> > Any failure during attaching process will cause inconsistent status
> > between processes, so proper rollback action should be considered.
> > Also it is not safe to detach a share device when other process still
> > use it, so a handshake mechanism is introduced.
> >
> > This patch covers the implementation of case 1,2,5,6,7,8.
> > Case 3,4 will be implemented on separate patch as well as handshake
> > mechanism.
> >
> > Scenario for Case 1, 2:
> >
> > attach device
> > a) primary attach the new device if failed goto h).
> > b) primary send attach sync request to all secondary.
> > c) secondary receive request and attach device and send reply.
> > d) primary check the reply if all success go to i).
> > e) primary send attach rollback sync request to all secondary.
> > f) secondary receive the request and detach device and send reply.
> > g) primary receive the reply and detach device as rollback action.
> > h) attach fail
> > i) attach success
> >
> > detach device
> > a) primary perform pre-detach check, if device is locked, goto i).
> > b) primary send pre-detach sync request to all secondary.
> > c) secondary perform pre-detach check and send reply.
> > d) primary check the reply if any fail goto i).
> > e) primary send detach sync request to all secondary
> > f) secondary detach the device and send reply (assume no fail)
> > g) primary detach the device.
> > h) detach success
> > i) detach failed
> >
> > Case 5, 6:
> > Secondary process can attach private device which only visible to
> > itself, in this case no IPC is involved, primary process is not
> > allowed to have private device so far.
> >
> > Case 7, 8:
> > Secondary process can also temporally to detach a share device "privately"
> > then attach it back later, this action also not impact other processes.
> >
> > APIs changes:
> >
> > rte_eth_dev_attach and rte_eth_dev_attach are extended to support
> > share device attach/detach in primary-secondary process model, it will
> > be called in case 1,2,3,4.
> >
> > New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are
> > introduced to cover case 5,6,7,8, this API can only be invoked in
> > secondary process.
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> 
> <snip>
> 
> > +static int
> > +handle_primary_request(const struct rte_mp_msg *msg, const void
> > +*peer) {
> > +
> > +	struct rte_mp_msg mp_resp;
> > +	const struct eth_dev_mp_req *req =
> > +		(const struct eth_dev_mp_req *)msg->param;
> > +	struct eth_dev_mp_req *resp =
> > +		(struct eth_dev_mp_req *)mp_resp.param;
> > +	struct mp_reply_bundle *bundle;
> > +	int ret = 0;
> > +
> > +	memset(&mp_resp, 0, sizeof(mp_resp));
> > +	strlcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST,
> sizeof(mp_resp.name));
> > +	mp_resp.len_param = sizeof(*req);
> > +	memcpy(resp, req, sizeof(*resp));
> > +
> > +	bundle = calloc(1, sizeof(*bundle));
> > +	if (bundle == NULL) {
> > +		resp->result = -ENOMEM;
> > +		ret = rte_mp_reply(&mp_resp, peer);
> > +		if (ret) {
> > +			ethdev_log(ERR, "failed to send reply to primary request\n");
> > +			return ret;
> > +		}
> > +	}
> > +
> > +	bundle->msg = *msg;
> > +	bundle->peer = peer;
> > +
> > +	ret = rte_eal_mp_task_add(__handle_primary_request, bundle);
> > +	if (ret) {
> > +		resp->result = ret;
> > +		ret = rte_mp_reply(&mp_resp, peer);
> > +		if (ret) {
> > +			ethdev_log(ERR, "failed to send reply to primary request\n");
> > +			return ret;
> > +		}
> > +	}
> 
> What you're doing here is quite dangerous. The parameter "const void *peer"
> is only guaranteed to be valid at the time of the callback - not necessarily
> afterwards. So, if you're handing off sending replies to a separate thread,
> things might blow up because the pointer may no longer be valid.

OK, so what about clone the content a buffer, I think the content should be valid before reply is sent, right?

Thanks
Qi
> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly June 26, 2018, 12:49 p.m. UTC | #3
On 26-Jun-18 1:19 PM, Zhang, Qi Z wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly
>> Sent: Tuesday, June 26, 2018 8:09 PM
>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
>> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
>> <ferruh.yigit@intel.com>; Shelton, Benjamin H
>> <benjamin.h.shelton@intel.com>; Vangati, Narender
>> <narender.vangati@intel.com>
>> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
>>
>> On 26-Jun-18 8:08 AM, Qi Zhang wrote:
>>> We are going to introduce the solution to handle different hotplug
>>> cases in multi-process situation, it include below scenario:
>>>
>>> 1. Attach a share device from primary
>>> 2. Detach a share device from primary
>>> 3. Attach a share device from secondary 4. Detach a share device from
>>> secondary 5. Attach a private device from secondary 6. Detach a
>>> private device from secondary 7. Detach a share device from secondary
>>> privately 8. Attach a share device from secondary privately
>>>
>>> In primary-secondary process model, we assume device is shared by default.
>>> that means attach or detach a device on any process will broadcast to
>>> all other processes through mp channel then device information will be
>>> synchronized on all processes.
>>>
>>> Any failure during attaching process will cause inconsistent status
>>> between processes, so proper rollback action should be considered.
>>> Also it is not safe to detach a share device when other process still
>>> use it, so a handshake mechanism is introduced.
>>>
>>> This patch covers the implementation of case 1,2,5,6,7,8.
>>> Case 3,4 will be implemented on separate patch as well as handshake
>>> mechanism.
>>>
>>> Scenario for Case 1, 2:
>>>
>>> attach device
>>> a) primary attach the new device if failed goto h).
>>> b) primary send attach sync request to all secondary.
>>> c) secondary receive request and attach device and send reply.
>>> d) primary check the reply if all success go to i).
>>> e) primary send attach rollback sync request to all secondary.
>>> f) secondary receive the request and detach device and send reply.
>>> g) primary receive the reply and detach device as rollback action.
>>> h) attach fail
>>> i) attach success
>>>
>>> detach device
>>> a) primary perform pre-detach check, if device is locked, goto i).
>>> b) primary send pre-detach sync request to all secondary.
>>> c) secondary perform pre-detach check and send reply.
>>> d) primary check the reply if any fail goto i).
>>> e) primary send detach sync request to all secondary
>>> f) secondary detach the device and send reply (assume no fail)
>>> g) primary detach the device.
>>> h) detach success
>>> i) detach failed
>>>
>>> Case 5, 6:
>>> Secondary process can attach private device which only visible to
>>> itself, in this case no IPC is involved, primary process is not
>>> allowed to have private device so far.
>>>
>>> Case 7, 8:
>>> Secondary process can also temporally to detach a share device "privately"
>>> then attach it back later, this action also not impact other processes.
>>>
>>> APIs changes:
>>>
>>> rte_eth_dev_attach and rte_eth_dev_attach are extended to support
>>> share device attach/detach in primary-secondary process model, it will
>>> be called in case 1,2,3,4.
>>>
>>> New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are
>>> introduced to cover case 5,6,7,8, this API can only be invoked in
>>> secondary process.
>>>
>>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>>> ---
>>
>> <snip>
>>
>>> +static int
>>> +handle_primary_request(const struct rte_mp_msg *msg, const void
>>> +*peer) {
>>> +
>>> +	struct rte_mp_msg mp_resp;
>>> +	const struct eth_dev_mp_req *req =
>>> +		(const struct eth_dev_mp_req *)msg->param;
>>> +	struct eth_dev_mp_req *resp =
>>> +		(struct eth_dev_mp_req *)mp_resp.param;
>>> +	struct mp_reply_bundle *bundle;
>>> +	int ret = 0;
>>> +
>>> +	memset(&mp_resp, 0, sizeof(mp_resp));
>>> +	strlcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST,
>> sizeof(mp_resp.name));
>>> +	mp_resp.len_param = sizeof(*req);
>>> +	memcpy(resp, req, sizeof(*resp));
>>> +
>>> +	bundle = calloc(1, sizeof(*bundle));
>>> +	if (bundle == NULL) {
>>> +		resp->result = -ENOMEM;
>>> +		ret = rte_mp_reply(&mp_resp, peer);
>>> +		if (ret) {
>>> +			ethdev_log(ERR, "failed to send reply to primary request\n");
>>> +			return ret;
>>> +		}
>>> +	}
>>> +
>>> +	bundle->msg = *msg;
>>> +	bundle->peer = peer;
>>> +
>>> +	ret = rte_eal_mp_task_add(__handle_primary_request, bundle);
>>> +	if (ret) {
>>> +		resp->result = ret;
>>> +		ret = rte_mp_reply(&mp_resp, peer);
>>> +		if (ret) {
>>> +			ethdev_log(ERR, "failed to send reply to primary request\n");
>>> +			return ret;
>>> +		}
>>> +	}
>>
>> What you're doing here is quite dangerous. The parameter "const void *peer"
>> is only guaranteed to be valid at the time of the callback - not necessarily
>> afterwards. So, if you're handing off sending replies to a separate thread,
>> things might blow up because the pointer may no longer be valid.
> 
> OK, so what about clone the content a buffer, I think the content should be valid before reply is sent, right?

Yes, but even if you clone the content of the buffer, where would you 
send it *to*? You'll need the peer parameter to know where to send your 
response.

> 
> Thanks
> Qi
>>
>> --
>> Thanks,
>> Anatoly
  
Qi Zhang June 26, 2018, 12:58 p.m. UTC | #4
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, June 26, 2018 8:50 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> 
> On 26-Jun-18 1:19 PM, Zhang, Qi Z wrote:
> >
> >
> >> -----Original Message-----
> >> From: Burakov, Anatoly
> >> Sent: Tuesday, June 26, 2018 8:09 PM
> >> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> >> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> >> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> >> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> >> <benjamin.h.shelton@intel.com>; Vangati, Narender
> >> <narender.vangati@intel.com>
> >> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> >>
> >> On 26-Jun-18 8:08 AM, Qi Zhang wrote:
> >>> We are going to introduce the solution to handle different hotplug
> >>> cases in multi-process situation, it include below scenario:
> >>>
> >>> 1. Attach a share device from primary 2. Detach a share device from
> >>> primary 3. Attach a share device from secondary 4. Detach a share
> >>> device from secondary 5. Attach a private device from secondary 6.
> >>> Detach a private device from secondary 7. Detach a share device from
> >>> secondary privately 8. Attach a share device from secondary
> >>> privately
> >>>
> >>> In primary-secondary process model, we assume device is shared by
> default.
> >>> that means attach or detach a device on any process will broadcast
> >>> to all other processes through mp channel then device information
> >>> will be synchronized on all processes.
> >>>
> >>> Any failure during attaching process will cause inconsistent status
> >>> between processes, so proper rollback action should be considered.
> >>> Also it is not safe to detach a share device when other process
> >>> still use it, so a handshake mechanism is introduced.
> >>>
> >>> This patch covers the implementation of case 1,2,5,6,7,8.
> >>> Case 3,4 will be implemented on separate patch as well as handshake
> >>> mechanism.
> >>>
> >>> Scenario for Case 1, 2:
> >>>
> >>> attach device
> >>> a) primary attach the new device if failed goto h).
> >>> b) primary send attach sync request to all secondary.
> >>> c) secondary receive request and attach device and send reply.
> >>> d) primary check the reply if all success go to i).
> >>> e) primary send attach rollback sync request to all secondary.
> >>> f) secondary receive the request and detach device and send reply.
> >>> g) primary receive the reply and detach device as rollback action.
> >>> h) attach fail
> >>> i) attach success
> >>>
> >>> detach device
> >>> a) primary perform pre-detach check, if device is locked, goto i).
> >>> b) primary send pre-detach sync request to all secondary.
> >>> c) secondary perform pre-detach check and send reply.
> >>> d) primary check the reply if any fail goto i).
> >>> e) primary send detach sync request to all secondary
> >>> f) secondary detach the device and send reply (assume no fail)
> >>> g) primary detach the device.
> >>> h) detach success
> >>> i) detach failed
> >>>
> >>> Case 5, 6:
> >>> Secondary process can attach private device which only visible to
> >>> itself, in this case no IPC is involved, primary process is not
> >>> allowed to have private device so far.
> >>>
> >>> Case 7, 8:
> >>> Secondary process can also temporally to detach a share device "privately"
> >>> then attach it back later, this action also not impact other processes.
> >>>
> >>> APIs changes:
> >>>
> >>> rte_eth_dev_attach and rte_eth_dev_attach are extended to support
> >>> share device attach/detach in primary-secondary process model, it
> >>> will be called in case 1,2,3,4.
> >>>
> >>> New API rte_eth_dev_attach_private and rte_eth_dev_detach_private
> >>> are introduced to cover case 5,6,7,8, this API can only be invoked
> >>> in secondary process.
> >>>
> >>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> >>> ---
> >>
> >> <snip>
> >>
> >>> +static int
> >>> +handle_primary_request(const struct rte_mp_msg *msg, const void
> >>> +*peer) {
> >>> +
> >>> +	struct rte_mp_msg mp_resp;
> >>> +	const struct eth_dev_mp_req *req =
> >>> +		(const struct eth_dev_mp_req *)msg->param;
> >>> +	struct eth_dev_mp_req *resp =
> >>> +		(struct eth_dev_mp_req *)mp_resp.param;
> >>> +	struct mp_reply_bundle *bundle;
> >>> +	int ret = 0;
> >>> +
> >>> +	memset(&mp_resp, 0, sizeof(mp_resp));
> >>> +	strlcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST,
> >> sizeof(mp_resp.name));
> >>> +	mp_resp.len_param = sizeof(*req);
> >>> +	memcpy(resp, req, sizeof(*resp));
> >>> +
> >>> +	bundle = calloc(1, sizeof(*bundle));
> >>> +	if (bundle == NULL) {
> >>> +		resp->result = -ENOMEM;
> >>> +		ret = rte_mp_reply(&mp_resp, peer);
> >>> +		if (ret) {
> >>> +			ethdev_log(ERR, "failed to send reply to primary
> request\n");
> >>> +			return ret;
> >>> +		}
> >>> +	}
> >>> +
> >>> +	bundle->msg = *msg;
> >>> +	bundle->peer = peer;
> >>> +
> >>> +	ret = rte_eal_mp_task_add(__handle_primary_request, bundle);
> >>> +	if (ret) {
> >>> +		resp->result = ret;
> >>> +		ret = rte_mp_reply(&mp_resp, peer);
> >>> +		if (ret) {
> >>> +			ethdev_log(ERR, "failed to send reply to primary
> request\n");
> >>> +			return ret;
> >>> +		}
> >>> +	}
> >>
> >> What you're doing here is quite dangerous. The parameter "const void
> *peer"
> >> is only guaranteed to be valid at the time of the callback - not
> >> necessarily afterwards. So, if you're handing off sending replies to
> >> a separate thread, things might blow up because the pointer may no longer
> be valid.
> >
> > OK, so what about clone the content a buffer, I think the content should be
> valid before reply is sent, right?
> 
> Yes, but even if you clone the content of the buffer, where would you send it
> *to*? You'll need the peer parameter to know where to send your response.

my understand is peer is identified by a string (or filename)
what I mean is clone the content of the buffer that peer point to , 
So I don't need to worry if the original peer be used to point to some other data



> 
> >
> > Thanks
> > Qi
> >>
> >> --
> >> Thanks,
> >> Anatoly
> 
> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly June 26, 2018, 1:20 p.m. UTC | #5
On 26-Jun-18 1:58 PM, Zhang, Qi Z wrote:
> 
> my understand is peer is identified by a string (or filename)
> what I mean is clone the content of the buffer that peer point to ,
> So I don't need to worry if the original peer be used to point to some other data
> 

As far as the application is concerned, peer is an opaque pointer, and 
should be treated as such. Peeking behind a void pointer that is not 
designed for this purpose is not a good idea, even if technically you 
know what's in there.
  
Qi Zhang June 26, 2018, 1:25 p.m. UTC | #6
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, June 26, 2018 9:21 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> 
> On 26-Jun-18 1:58 PM, Zhang, Qi Z wrote:
> >
> > my understand is peer is identified by a string (or filename) what I
> > mean is clone the content of the buffer that peer point to , So I
> > don't need to worry if the original peer be used to point to some
> > other data
> >
> 
> As far as the application is concerned, peer is an opaque pointer, and should
> be treated as such. Peeking behind a void pointer that is not designed for this
> purpose is not a good idea, even if technically you know what's in there.

We can expose a clone interface, like MP_PEER_CLONE, so we don't need to know what's inside, just need to know that it can be used on another thread?


> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly June 26, 2018, 1:45 p.m. UTC | #7
On 26-Jun-18 2:25 PM, Zhang, Qi Z wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly
>> Sent: Tuesday, June 26, 2018 9:21 PM
>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
>> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
>> <ferruh.yigit@intel.com>; Shelton, Benjamin H
>> <benjamin.h.shelton@intel.com>; Vangati, Narender
>> <narender.vangati@intel.com>
>> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
>>
>> On 26-Jun-18 1:58 PM, Zhang, Qi Z wrote:
>>>
>>> my understand is peer is identified by a string (or filename) what I
>>> mean is clone the content of the buffer that peer point to , So I
>>> don't need to worry if the original peer be used to point to some
>>> other data
>>>
>>
>> As far as the application is concerned, peer is an opaque pointer, and should
>> be treated as such. Peeking behind a void pointer that is not designed for this
>> purpose is not a good idea, even if technically you know what's in there.
> 
> We can expose a clone interface, like MP_PEER_CLONE, so we don't need to know what's inside, just need to know that it can be used on another thread?
> 

Well, that can probably work. Feels like a hacky workaround though.

Another way to do the same thing would be to store peer information 
right in the message, as opposed to providing it separately. Still a 
hack though, and will require far more changes, but it could be a better 
solution as (if done right) it would allow identifying which reply came 
from which peer.

Of course, an even better approach would be to devise some kind of 
addressing scheme (uuid?), so that peer addresses are no longer opaque 
pointers but rather are valid data types.

Thoughts?
  
Qi Zhang June 26, 2018, 2:24 p.m. UTC | #8
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, June 26, 2018 9:46 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> 
> On 26-Jun-18 2:25 PM, Zhang, Qi Z wrote:
> >
> >
> >> -----Original Message-----
> >> From: Burakov, Anatoly
> >> Sent: Tuesday, June 26, 2018 9:21 PM
> >> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> >> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> >> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> >> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> >> <benjamin.h.shelton@intel.com>; Vangati, Narender
> >> <narender.vangati@intel.com>
> >> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> >>
> >> On 26-Jun-18 1:58 PM, Zhang, Qi Z wrote:
> >>>
> >>> my understand is peer is identified by a string (or filename) what I
> >>> mean is clone the content of the buffer that peer point to , So I
> >>> don't need to worry if the original peer be used to point to some
> >>> other data
> >>>
> >>
> >> As far as the application is concerned, peer is an opaque pointer,
> >> and should be treated as such. Peeking behind a void pointer that is
> >> not designed for this purpose is not a good idea, even if technically you
> know what's in there.
> >
> > We can expose a clone interface, like MP_PEER_CLONE, so we don't need to
> know what's inside, just need to know that it can be used on another thread?
> >
> 
> Well, that can probably work. Feels like a hacky workaround though.
> 
> Another way to do the same thing would be to store peer information right in
> the message, as opposed to providing it separately. Still a hack though, and will
> require far more changes, but it could be a better solution as (if done right) it
> would allow identifying which reply came from which peer.
> 
> Of course, an even better approach would be to devise some kind of
> addressing scheme (uuid?), so that peer addresses are no longer opaque
> pointers but rather are valid data types.
> 
> Thoughts?

I may not give insight comment from the IPC implementation, but from user's view, what required is a unique token,
 it can be used for reply at anywhere and at any time, to me, currently implementation looks like missing some mapping management between an abstract token to its real data.  

Thanks
Qi.

> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly June 26, 2018, 3:12 p.m. UTC | #9
On 26-Jun-18 3:24 PM, Zhang, Qi Z wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly
>> Sent: Tuesday, June 26, 2018 9:46 PM
>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
>> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
>> <ferruh.yigit@intel.com>; Shelton, Benjamin H
>> <benjamin.h.shelton@intel.com>; Vangati, Narender
>> <narender.vangati@intel.com>
>> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
>>
>> On 26-Jun-18 2:25 PM, Zhang, Qi Z wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly
>>>> Sent: Tuesday, June 26, 2018 9:21 PM
>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
>>>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
>>>> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
>>>> <ferruh.yigit@intel.com>; Shelton, Benjamin H
>>>> <benjamin.h.shelton@intel.com>; Vangati, Narender
>>>> <narender.vangati@intel.com>
>>>> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
>>>>
>>>> On 26-Jun-18 1:58 PM, Zhang, Qi Z wrote:
>>>>>
>>>>> my understand is peer is identified by a string (or filename) what I
>>>>> mean is clone the content of the buffer that peer point to , So I
>>>>> don't need to worry if the original peer be used to point to some
>>>>> other data
>>>>>
>>>>
>>>> As far as the application is concerned, peer is an opaque pointer,
>>>> and should be treated as such. Peeking behind a void pointer that is
>>>> not designed for this purpose is not a good idea, even if technically you
>> know what's in there.
>>>
>>> We can expose a clone interface, like MP_PEER_CLONE, so we don't need to
>> know what's inside, just need to know that it can be used on another thread?
>>>
>>
>> Well, that can probably work. Feels like a hacky workaround though.
>>
>> Another way to do the same thing would be to store peer information right in
>> the message, as opposed to providing it separately. Still a hack though, and will
>> require far more changes, but it could be a better solution as (if done right) it
>> would allow identifying which reply came from which peer.
>>
>> Of course, an even better approach would be to devise some kind of
>> addressing scheme (uuid?), so that peer addresses are no longer opaque
>> pointers but rather are valid data types.
>>
>> Thoughts?
> 
> I may not give insight comment from the IPC implementation, but from user's view, what required is a unique token,
>   it can be used for reply at anywhere and at any time, to me, currently implementation looks like missing some mapping management between an abstract token to its real data.
> 

Well, the idea was to not provide that :) We wanted to keep it simple 
and actively discourage any attempts to know which peer you're 
communicating with, as allowing that would imply some kind of addressing 
scheme, peer discovery and things like that, which we don't want to 
bother with. If you want replies, you use callbacks, which provide you 
with a peer address.

On the one hand, i understand the frustration of being forced to deal 
with deliberate simplicity of IPC API and to create workarounds like 
doing sendmsg() and storing requests that you're waiting on inside the 
application and launching interrupt threads, all for things that should 
Just Work in an IPC API. Believe me, i know about its deficiencies more 
than most people :)

On the other hand, we don't want to introduce hacks to solve a specific 
problem. If we are to solve this in IPC, we should do it properly. I 
don't think adding a workaround with "cloning" peer address is the 
solution. A proper solution would've been to implement addressing 
scheme, so that any response could be submitted at any time.

The hard route would be to add peer discovery, etc. The easy route would 
be to just change the peer address to something typed, something that we 
can use to reconstruct the original peer id. A great candidate for this 
is the uuid, but it's an external dependency. A workaround for this 
would be to just use a uint64 value obtained through some magic that is 
part of socket path. For example, replace 
"/var/run/dpdk/rte/mp_socket_pid_rdtsc" with something just as (likely) 
unique, but something that can be put into a uint64.

Without significantly changing the API and the internals, i think this 
is the best we will be able to do.

> Thanks
> Qi.
> 
>>
>> --
>> Thanks,
>> Anatoly
  
Qi Zhang June 27, 2018, 1:31 a.m. UTC | #10
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, June 26, 2018 11:12 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> 
> On 26-Jun-18 3:24 PM, Zhang, Qi Z wrote:
> >
> >
> >> -----Original Message-----
> >> From: Burakov, Anatoly
> >> Sent: Tuesday, June 26, 2018 9:46 PM
> >> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> >> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> >> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> >> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> >> <benjamin.h.shelton@intel.com>; Vangati, Narender
> >> <narender.vangati@intel.com>
> >> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on multi-process
> >>
> >> On 26-Jun-18 2:25 PM, Zhang, Qi Z wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Burakov, Anatoly
> >>>> Sent: Tuesday, June 26, 2018 9:21 PM
> >>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> >>>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> >>>> dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> >>>> Yigit, Ferruh <ferruh.yigit@intel.com>; Shelton, Benjamin H
> >>>> <benjamin.h.shelton@intel.com>; Vangati, Narender
> >>>> <narender.vangati@intel.com>
> >>>> Subject: Re: [PATCH v4 06/24] ethdev: enable hotplug on
> >>>> multi-process
> >>>>
> >>>> On 26-Jun-18 1:58 PM, Zhang, Qi Z wrote:
> >>>>>
> >>>>> my understand is peer is identified by a string (or filename) what
> >>>>> I mean is clone the content of the buffer that peer point to , So
> >>>>> I don't need to worry if the original peer be used to point to
> >>>>> some other data
> >>>>>
> >>>>
> >>>> As far as the application is concerned, peer is an opaque pointer,
> >>>> and should be treated as such. Peeking behind a void pointer that
> >>>> is not designed for this purpose is not a good idea, even if
> >>>> technically you
> >> know what's in there.
> >>>
> >>> We can expose a clone interface, like MP_PEER_CLONE, so we don't
> >>> need to
> >> know what's inside, just need to know that it can be used on another
> thread?
> >>>
> >>
> >> Well, that can probably work. Feels like a hacky workaround though.
> >>
> >> Another way to do the same thing would be to store peer information
> >> right in the message, as opposed to providing it separately. Still a
> >> hack though, and will require far more changes, but it could be a
> >> better solution as (if done right) it would allow identifying which reply came
> from which peer.
> >>
> >> Of course, an even better approach would be to devise some kind of
> >> addressing scheme (uuid?), so that peer addresses are no longer
> >> opaque pointers but rather are valid data types.
> >>
> >> Thoughts?
> >
> > I may not give insight comment from the IPC implementation, but from
> user's view, what required is a unique token,
> >   it can be used for reply at anywhere and at any time, to me, currently
> implementation looks like missing some mapping management between an
> abstract token to its real data.
> >
> 
> Well, the idea was to not provide that :) We wanted to keep it simple and
> actively discourage any attempts to know which peer you're communicating
> with, as allowing that would imply some kind of addressing scheme, peer
> discovery and things like that, which we don't want to bother with. If you want
> replies, you use callbacks, which provide you with a peer address.
> 
> On the one hand, i understand the frustration of being forced to deal with
> deliberate simplicity of IPC API and to create workarounds like doing sendmsg()
> and storing requests that you're waiting on inside the application and
> launching interrupt threads, all for things that should Just Work in an IPC API.
> Believe me, i know about its deficiencies more than most people :)
> 
> On the other hand, we don't want to introduce hacks to solve a specific
> problem. If we are to solve this in IPC, we should do it properly. I don't think
> adding a workaround with "cloning" peer address is the solution. A proper
> solution would've been to implement addressing scheme, so that any
> response could be submitted at any time.
> 
> The hard route would be to add peer discovery, etc. The easy route would be
> to just change the peer address to something typed, something that we can
> use to reconstruct the original peer id. A great candidate for this is the uuid,
> but it's an external dependency. A workaround for this would be to just use a
> uint64 value obtained through some magic that is part of socket path. For
> example, replace "/var/run/dpdk/rte/mp_socket_pid_rdtsc" with something
> just as (likely) unique, but something that can be put into a uint64.
> 
> Without significantly changing the API and the internals, i think this is the best
> we will be able to do.

Ok, let me summarize this.

Current sync IPC does not support reply to a sync request outside the callback function, 
Of cause it will not support reply on separate thread which required by mp hotplug implementation.

So, the question is do we agree that IPC should support a more relax sync reply? 
For my option, I think the requirement is reasonable, because it greatly simplified the implementation of the case when the receiver need to do some task on a separate thread and reply on that thread.(note, this is typical case since the IPC deadlock limitation)

If the answer is yes:

	The question is will we able to enable this in this release?
	For me I think I may not able to solve this quickly, I need more time to review the implementation detail of IPC , not sure if you can cover this on your new patchset for IPC. 
	if yes, I assume I don't need to change any code (or a little bit) of current mp implementation, 	
	because "peer"(or whatever) as an identity of reply target can be parsed to other thread

	If not, we can take workaround for mp hotplug, 
		since it is workaround I prefer the simplest way with less impact and enough comments, since we will fix this anyway.
		Yes, clone is the simplest way :), I can just implement it inside ethdev_mp.c and add comment say "Fix me, this is hack"., so it will not impact other IPC users.

If the answer is NO
	I have to give up all sync IPC, and use sendmsg to replace sync request and to match request / reply in ethdev_mp.c , though it looks like should be a part of IPC and this looks like we are going to enable another set of sync IPC which support reply on another thread.

Thanks
Qi
> 
> > Thanks
> > Qi.
> >
> >>
> >> --
> >> Thanks,
> >> Anatoly
> 
> 
> --
> Thanks,
> Anatoly
  

Patch

diff --git a/lib/librte_ethdev/Makefile b/lib/librte_ethdev/Makefile
index c2f2f7d82..d0a059b83 100644
--- a/lib/librte_ethdev/Makefile
+++ b/lib/librte_ethdev/Makefile
@@ -19,6 +19,7 @@  EXPORT_MAP := rte_ethdev_version.map
 LIBABIVER := 9
 
 SRCS-y += rte_ethdev.c
+SRCS-y += ethdev_mp.c
 SRCS-y += rte_flow.c
 SRCS-y += rte_tm.c
 SRCS-y += rte_mtr.c
diff --git a/lib/librte_ethdev/ethdev_mp.c b/lib/librte_ethdev/ethdev_mp.c
new file mode 100644
index 000000000..87fc430bf
--- /dev/null
+++ b/lib/librte_ethdev/ethdev_mp.c
@@ -0,0 +1,248 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2018 Intel Corporation
+ */
+
+#include <rte_string_fns.h>
+#include "rte_ethdev_driver.h"
+#include "ethdev_mp.h"
+
+#define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */
+
+struct mp_reply_bundle {
+	struct rte_mp_msg msg;
+	const void *peer;
+};
+
+static int detach_on_secondary(uint16_t port_id)
+{
+	struct rte_device *dev;
+	struct rte_bus *bus;
+	int ret = 0;
+
+	if (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED) {
+		ethdev_log(ERR, "detach on secondary: invalid port %d\n",
+			   port_id);
+		return -ENODEV;
+	}
+
+	dev = rte_eth_devices[port_id].device;
+	if (dev == NULL)
+		return -EINVAL;
+
+	bus = rte_bus_find_by_device(dev);
+	if (bus == NULL)
+		return -ENOENT;
+
+	ret = rte_eal_hotplug_remove(bus->name, dev->name);
+	if (ret) {
+		ethdev_log(ERR, "failed to hot unplug bus: %s, device:%s\n",
+			   bus->name, dev->name);
+		return ret;
+	}
+
+	rte_eth_dev_release_port_private(&rte_eth_devices[port_id]);
+	return ret;
+}
+
+static int attach_on_secondary(const char *devargs, uint16_t port_id)
+{
+	struct rte_devargs da;
+	int ret;
+
+	if (rte_eth_devices[port_id].state != RTE_ETH_DEV_UNUSED) {
+		ethdev_log(ERR, "port %d already in used, failed to attach\n",
+			   port_id);
+		return -EINVAL;
+	}
+
+	memset(&da, 0, sizeof(da));
+
+	if (rte_devargs_parse(&da, "%s", devargs)) {
+		ethdev_log(ERR, "failed to parse devargs %s\n", devargs);
+		return -EINVAL;
+	}
+
+	ret = rte_eal_hotplug_add(da.bus->name, da.name, "");
+	if (ret) {
+		ethdev_log(ERR, "failed to hotplug bus:%s, device:%s\n",
+			   da.bus->name, da.name);
+		free(da.args);
+		return ret;
+	}
+
+	if (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED) {
+		ethdev_log(ERR,
+			"failed to attach to port %d, this is a pmd issue\n",
+			   port_id);
+		free(da.args);
+		return -ENODEV;
+	}
+	free(da.args);
+	return 0;
+}
+
+static int
+handle_secondary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+	RTE_SET_USED(msg);
+	RTE_SET_USED(peer);
+	return -ENOTSUP;
+}
+
+static void __handle_primary_request(void *param)
+{
+	struct mp_reply_bundle *bundle = param;
+	struct rte_mp_msg *msg = &bundle->msg;
+	const struct eth_dev_mp_req *req =
+		(const struct eth_dev_mp_req *)msg->param;
+	struct rte_mp_msg mp_resp;
+	struct eth_dev_mp_req *resp =
+		(struct eth_dev_mp_req *)mp_resp.param;
+	int ret = 0;
+
+	memset(&mp_resp, 0, sizeof(mp_resp));
+
+	switch (req->t) {
+	case REQ_TYPE_ATTACH:
+		ret = attach_on_secondary(req->devargs, req->port_id);
+		break;
+	case REQ_TYPE_PRE_DETACH:
+		ret = 0;
+		break;
+	case REQ_TYPE_DETACH:
+	case REQ_TYPE_ATTACH_ROLLBACK:
+		ret = detach_on_secondary(req->port_id);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	strlcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+	mp_resp.len_param = sizeof(*req);
+	memcpy(resp, req, sizeof(*resp));
+	resp->result = ret;
+	if (rte_mp_reply(&mp_resp, bundle->peer) < 0)
+		ethdev_log(ERR, "failed to send reply to primary request\n");
+}
+
+static int
+handle_primary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+
+	struct rte_mp_msg mp_resp;
+	const struct eth_dev_mp_req *req =
+		(const struct eth_dev_mp_req *)msg->param;
+	struct eth_dev_mp_req *resp =
+		(struct eth_dev_mp_req *)mp_resp.param;
+	struct mp_reply_bundle *bundle;
+	int ret = 0;
+
+	memset(&mp_resp, 0, sizeof(mp_resp));
+	strlcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+	mp_resp.len_param = sizeof(*req);
+	memcpy(resp, req, sizeof(*resp));
+
+	bundle = calloc(1, sizeof(*bundle));
+	if (bundle == NULL) {
+		resp->result = -ENOMEM;
+		ret = rte_mp_reply(&mp_resp, peer);
+		if (ret) {
+			ethdev_log(ERR, "failed to send reply to primary request\n");
+			return ret;
+		}
+	}
+
+	bundle->msg = *msg;
+	bundle->peer = peer;
+
+	ret = rte_eal_mp_task_add(__handle_primary_request, bundle);
+	if (ret) {
+		resp->result = ret;
+		ret = rte_mp_reply(&mp_resp, peer);
+		if (ret) {
+			ethdev_log(ERR, "failed to send reply to primary request\n");
+			return ret;
+		}
+	}
+	return 0;
+}
+
+int eth_dev_request_to_primary(struct eth_dev_mp_req *req)
+{
+	RTE_SET_USED(req);
+	return -ENOTSUP;
+}
+
+/**
+ * Request from primary to secondary.
+ *
+ * Be invoked when try to attach or detach a share device
+ * from primary process.
+ */
+int eth_dev_request_to_secondary(struct eth_dev_mp_req *req)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_reply mp_reply;
+	struct timespec ts = {.tv_sec = MP_TIMEOUT_S, .tv_nsec = 0};
+	int ret;
+	int i;
+
+	memset(&mp_req, 0, sizeof(mp_req));
+	memcpy(mp_req.param, req, sizeof(*req));
+	mp_req.len_param = sizeof(*req);
+	strlcpy(mp_req.name, ETH_DEV_MP_ACTION_REQUEST, sizeof(mp_req.name));
+
+	ret = rte_mp_request_sync(&mp_req, &mp_reply, &ts);
+	if (ret) {
+		ethdev_log(ERR, "rte_mp_request_sync failed\n");
+		return ret;
+	}
+
+	if (mp_reply.nb_sent != mp_reply.nb_received) {
+		ethdev_log(ERR, "not all secondary reply\n");
+		return -1;
+	}
+
+	req->result = 0;
+	for (i = 0; i < mp_reply.nb_received; i++) {
+		struct eth_dev_mp_req *resp =
+			(struct eth_dev_mp_req *)mp_reply.msgs[i].param;
+		if (resp->result) {
+			req->result = resp->result;
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int on_mp_init(void)
+{
+	int ret;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		ret = rte_mp_action_register(ETH_DEV_MP_ACTION_REQUEST,
+					   handle_secondary_request);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+				ETH_DEV_MP_ACTION_REQUEST);
+			return ret;
+		}
+	} else {
+		ret = rte_mp_action_register(ETH_DEV_MP_ACTION_REQUEST,
+					   handle_primary_request);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+				ETH_DEV_MP_ACTION_REQUEST);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+RTE_INIT(ethdev_mp_init)
+{
+	if (rte_eal_register_mp_init(on_mp_init))
+		RTE_LOG(ERR, EAL, "ethdev mp channel init failed\n");
+}
diff --git a/lib/librte_ethdev/ethdev_mp.h b/lib/librte_ethdev/ethdev_mp.h
new file mode 100644
index 000000000..40be46c89
--- /dev/null
+++ b/lib/librte_ethdev/ethdev_mp.h
@@ -0,0 +1,41 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_ETHDEV_MP_H_
+#define _RTE_ETHDEV_MP_H_
+
+#define MAX_DEV_ARGS_LEN 0x80
+
+#define ETH_DEV_MP_ACTION_REQUEST	"eth_dev_mp_request"
+#define ETH_DEV_MP_ACTION_RESPONSE	"eth_dev_mp_response"
+
+enum eth_dev_req_type {
+	REQ_TYPE_ATTACH,
+	REQ_TYPE_PRE_DETACH,
+	REQ_TYPE_DETACH,
+	REQ_TYPE_ATTACH_ROLLBACK,
+};
+
+struct eth_dev_mp_req {
+	enum eth_dev_req_type t;
+	char devargs[MAX_DEV_ARGS_LEN];
+	uint16_t port_id;
+	int result;
+};
+
+/**
+ * this is a synchronous wrapper for secondary process send
+ * request to primary process, this is invoked when an attach
+ * or detach request issued from primary.
+ */
+int eth_dev_request_to_primary(struct eth_dev_mp_req *req);
+
+/**
+ * this is a synchronous wrapper for primary process send
+ * request to secondary process, this is invoked when an attach
+ * or detach request issued from secondary process.
+ */
+int eth_dev_request_to_secondary(struct eth_dev_mp_req *req);
+
+#endif
diff --git a/lib/librte_ethdev/ethdev_private.h b/lib/librte_ethdev/ethdev_private.h
new file mode 100644
index 000000000..981e7de8a
--- /dev/null
+++ b/lib/librte_ethdev/ethdev_private.h
@@ -0,0 +1,39 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2018 Intel Corporation
+ */
+
+#ifndef _ETHDEV_PRIVATE_H_
+#define _ETHDEV_PRIVATE_H_
+
+/**
+ * Attach a new Ethernet device in current process.
+ *
+ * @param devargs
+ *  A pointer to a strings array describing the new device
+ *  to be attached. The strings should be a pci address like
+ *  '0000:01:00.0' or virtual device name like 'net_pcap0'.
+ *
+ * @param port_id
+ *  A pointer to a port identifier actually attached.
+ *
+ * @return
+ *  0 on success and port_id is filled, negative on error
+ */
+int do_eth_dev_attach(const char *devargs, uint16_t *port_id);
+
+/**
+ * Detach a Ethernet device in current process.
+ *
+ * @param port_id
+ *   The port identifier of the device to detach.
+ *
+ * @param devname
+ *   A pointer to a buffer that will be filled with the device name.
+ *   This buffer must be at least RTE_DEV_NAME_MAX_LEN long.
+ *
+ * @return
+ *  0 on success and devname is filled, negative on error
+ */
+int do_eth_dev_detach(uint16_t port_id);
+
+#endif
diff --git a/lib/librte_ethdev/meson.build b/lib/librte_ethdev/meson.build
index aed5d2265..b60256855 100644
--- a/lib/librte_ethdev/meson.build
+++ b/lib/librte_ethdev/meson.build
@@ -5,6 +5,7 @@  name = 'ethdev'
 version = 9
 allow_experimental_apis = true
 sources = files('ethdev_profile.c',
+	'ethdev_mp.c'
 	'rte_ethdev.c',
 	'rte_flow.c',
 	'rte_mtr.c',
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 205b2ee33..1a5861f30 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -41,11 +41,13 @@ 
 #include "rte_ethdev.h"
 #include "rte_ethdev_driver.h"
 #include "ethdev_profile.h"
+#include "ethdev_mp.h"
+#include "ethdev_private.h"
 
-static int ethdev_logtype;
+int ethdev_logtype;
 
-#define ethdev_log(level, fmt, ...) \
-	rte_log(RTE_LOG_ ## level, ethdev_logtype, fmt "\n", ## __VA_ARGS__)
+#define RTE_ETH_MP_ACTION_REQUEST	"rte_eth_mp_request"
+#define RTE_ETH_MP_ACTION_RESPONSE	"rte_eth_mp_response"
 
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
 struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
@@ -656,9 +658,8 @@  eth_err(uint16_t port_id, int ret)
 	return ret;
 }
 
-/* attach the new device, then store port_id of the device */
 int
-rte_eth_dev_attach(const char *devargs, uint16_t *port_id)
+do_eth_dev_attach(const char *devargs, uint16_t *port_id)
 {
 	int current = rte_eth_dev_count_total();
 	struct rte_devargs da;
@@ -703,14 +704,105 @@  rte_eth_dev_attach(const char *devargs, uint16_t *port_id)
 	return ret;
 }
 
-/* detach the device, then store the name of the device */
 int
-rte_eth_dev_detach(uint16_t port_id, char *name __rte_unused)
+do_eth_dev_detach(uint16_t port_id)
 {
 	struct rte_device *dev;
 	struct rte_bus *bus;
+	int ret = 0;
+
+	dev = rte_eth_devices[port_id].device;
+	if (dev == NULL)
+		return -EINVAL;
+
+	bus = rte_bus_find_by_device(dev);
+	if (bus == NULL)
+		return -ENOENT;
+
+	ret = rte_eal_hotplug_remove(bus->name, dev->name);
+	if (ret < 0)
+		return ret;
+
+	rte_eth_dev_release_port(&rte_eth_devices[port_id]);
+	return ret;
+
+}
+
+/* attach the new device, then store port_id of the device */
+int
+rte_eth_dev_attach(const char *devargs, uint16_t *port_id)
+{
+	struct eth_dev_mp_req req;
+	int ret;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+
+		/**
+		 * If secondary process, we just send request to primary
+		 * to start the process.
+		 */
+		req.t = REQ_TYPE_ATTACH;
+		strlcpy(req.devargs, devargs, MAX_DEV_ARGS_LEN);
+
+		ret = eth_dev_request_to_primary(&req);
+		if (ret) {
+			ethdev_log(ERR,
+				"Failed to send device attach request to primary\n");
+			return ret;
+		}
+
+		*port_id = req.port_id;
+		return req.result;
+	}
+
+	ret = do_eth_dev_attach(devargs, port_id);
+	if (ret)
+		return ret;
+
+	/* send attach request to seoncary */
+	req.t = REQ_TYPE_ATTACH;
+	strlcpy(req.devargs, devargs, MAX_DEV_ARGS_LEN);
+	req.port_id = *port_id;
+	ret = eth_dev_request_to_secondary(&req);
+	if (ret) {
+		ethdev_log(ERR, "Failed to send device attach request to secondary\n");
+		goto rollback;
+	}
+
+	if (req.result)
+		goto rollback;
+
+	return 0;
+
+rollback:
+	/* send rollback request to secondary since some one fail to attach */
+	req.t = REQ_TYPE_ATTACH_ROLLBACK;
+	req.port_id = *port_id;
+	eth_dev_request_to_secondary(&req);
+
+	do_eth_dev_detach(*port_id);
+
+	return -ENODEV;
+}
+
+/* attach the new device, then store port_id of the device */
+int
+rte_eth_dev_attach_private(const char *devargs, uint16_t *port_id)
+{
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		return -ENOTSUP;
+
+	return do_eth_dev_attach(devargs, port_id);
+}
+
+/* detach the device, then store the name of the device */
+int
+rte_eth_dev_detach(uint16_t port_id, char *name __rte_unused)
+{
+	struct eth_dev_mp_req req = {0};
+	int ret;
 	uint32_t dev_flags;
-	int ret = -1;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
@@ -721,22 +813,86 @@  rte_eth_dev_detach(uint16_t port_id, char *name __rte_unused)
 		return -ENOTSUP;
 	}
 
-	dev = rte_eth_devices[port_id].device;
-	if (dev == NULL)
-		return -EINVAL;
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		req.t = REQ_TYPE_DETACH;
+		req.port_id = port_id;
 
-	bus = rte_bus_find_by_device(dev);
-	if (bus == NULL)
-		return -ENOENT;
+		/**
+		 * If secondary process, we just send request to primary
+		 * to start the process.
+		 */
+		ret = eth_dev_request_to_primary(&req);
+		if (ret) {
+			ethdev_log(ERR,
+				"Failed to send device detach request to primary\n");
+			return ret;
+		}
 
-	ret = rte_eal_hotplug_remove(bus->name, dev->name);
-	if (ret < 0)
+		return req.result;
+	}
+
+	/* check pre_detach */
+	req.t = REQ_TYPE_PRE_DETACH;
+	req.port_id = port_id;
+	ret = eth_dev_request_to_secondary(&req);
+	if (ret) {
+		ethdev_log(ERR,
+			"Failed to send device pre-detach request to secondary\n");
+		return ret;
+	}
+
+	if (req.result) {
+		ethdev_log(ERR,
+			"Device is busy on secondary, can't be detached\n");
+		return req.result;
+	}
+
+	/* detach on seconary first */
+	req.t = REQ_TYPE_DETACH;
+	ret = eth_dev_request_to_secondary(&req);
+	if (ret) {
+		ethdev_log(ERR,
+			"Failed to send device detach request to secondary\n");
+		return ret;
+	}
+
+	if (req.result)
+		/**
+		 * this should rarely happen, something wrong in secondary
+		 * process, will not block primary detach.
+		 */
+		ethdev_log(ERR,
+			"Failed to detach device on secondary process\n");
+
+	/* detach on primary */
+	ret =  do_eth_dev_detach(port_id);
+	if (ret)
 		return ret;
 
-	rte_eth_dev_release_port(&rte_eth_devices[port_id]);
 	return 0;
 }
 
+/* detach the device, then store the name of the device */
+int
+rte_eth_dev_detach_private(uint16_t port_id, char *name __rte_unused)
+{
+	uint32_t dev_flags;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		return -ENOTSUP;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev_flags = rte_eth_devices[port_id].data->dev_flags;
+	if (dev_flags & RTE_ETH_DEV_BONDED_SLAVE) {
+		ethdev_log(ERR,
+			"Port %" PRIu16 " is bonded, cannot detach", port_id);
+		return -ENOTSUP;
+	}
+
+	return do_eth_dev_detach(port_id);
+}
+
 static int
 rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 36e3984ea..813806e3c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1462,6 +1462,9 @@  uint16_t __rte_experimental rte_eth_dev_count_total(void);
 
 /**
  * Attach a new Ethernet device specified by arguments.
+ * In multi-process mode, it will sync with other process
+ * to make sure all processes attach the device, any
+ * failure on other process will rollback the action.
  *
  * @param devargs
  *  A pointer to a strings array describing the new device
@@ -1475,9 +1478,31 @@  uint16_t __rte_experimental rte_eth_dev_count_total(void);
 int rte_eth_dev_attach(const char *devargs, uint16_t *port_id);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Attach a private Ethernet device specified by arguments.
+ * A private device is invisible to other process.
+ * Can only be invoked in secondary process.
+ *
+ * @param devargs
+ *  A pointer to a strings array describing the new device
+ *  to be attached. The strings should be a pci address like
+ *  '0000:01:00.0' or virtual device name like 'net_pcap0'.
+ * @param port_id
+ *  A pointer to a port identifier actually attached.
+ * @return
+ *  0 on success and port_id is filled, negative on error
+ */
+int __rte_experimental
+rte_eth_dev_attach_private(const char *devargs, uint16_t *port_id);
+
+/**
  * Detach a Ethernet device specified by port identifier.
  * This function must be called when the device is in the
  * closed state.
+ * In multi-process mode, it will sync with other process
+ * to detach the device.
  *
  * @param port_id
  *   The port identifier of the device to detach.
@@ -1490,6 +1515,26 @@  int rte_eth_dev_attach(const char *devargs, uint16_t *port_id);
 int rte_eth_dev_detach(uint16_t port_id, char *devname);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Detach a private Ethernet device specified by port identifier
+ * This function must be called when the device is in the
+ * closed state.
+ * Can only be invoked in secondary process.
+ *
+ * @param port_id
+ *   The port identifier of the device to detach.
+ * @param devname
+ *   A pointer to a buffer that will be filled with the device name.
+ *   This buffer must be at least RTE_DEV_NAME_MAX_LEN long.
+ * @return
+ *  0 on success and devname is filled, negative on error
+ */
+int __rte_experimental
+rte_eth_dev_detach_private(uint16_t port_id, char *devname);
+
+/**
  * Convert a numerical speed in Mbps to a bitmap flag that can be used in
  * the bitmap link_speeds of the struct rte_eth_conf
  *
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 33d12b3a2..2cb6de745 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -622,4 +622,9 @@  struct rte_eth_dev_data {
  */
 extern struct rte_eth_dev rte_eth_devices[];
 
+extern int ethdev_logtype;
+#define ethdev_log(level, fmt, ...) \
+	rte_log(RTE_LOG_ ## level, ethdev_logtype, fmt "\n", ## __VA_ARGS__)
+
+
 #endif /* _RTE_ETHDEV_CORE_H_ */