[dpdk-dev,v2,12/12] net/vhost: support to run in the secondary process

Message ID 1506606959-76230-13-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Jianfeng Tan Sept. 28, 2017, 1:55 p.m. UTC
  Support to run vhost-pmd vdev in the secondary process. We obtain
information, like memory regions, kickfd, callfd, through
primary/secondary communication channel.

And by invoking rte_vhost_set_vring_effective_fd, we can set the
kickfd which can be recognized by the secondary process.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 200 +++++++++++++++++++++++++++++++++++---
 1 file changed, 187 insertions(+), 13 deletions(-)
  

Comments

Yuanhan Liu Sept. 29, 2017, 8:28 a.m. UTC | #1
On Thu, Sep 28, 2017 at 01:55:59PM +0000, Jianfeng Tan wrote:
>  static int
> +share_device(int vid)
> +{
> +	uint32_t i, vring_num;
> +	int len;
> +	int fds[8];
> +	struct rte_vhost_memory *mem;
> +	struct vhost_params *params;
> +	struct rte_vhost_vring vring;
> +
> +	/* share mem table */
> +	if (rte_vhost_get_mem_table(vid, &mem) < 0) {
> +		RTE_LOG(ERR, PMD, "Failed to get mem table\n");
> +		return 0;
> +	}
> +	for (i = 0; i < mem->nregions; ++i)
> +		fds[i] = mem->regions[i].fd;
> +
> +	len = sizeof(struct rte_vhost_mem_region) * mem->nregions;
> +	params = malloc(sizeof(*params) + len);
> +	if (params == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory\n");
> +		return -1;
> +	}
> +
> +	params->type = VHOST_MSG_TYPE_REGIONS;
> +	params->vid = vid;
> +	memcpy(params->regions, mem->regions, len);
> +
> +	if (rte_eal_mp_sendmsg("vhost pmd", params, sizeof(*params) + len,

To me, it's not a good idea to identify an object by a string. The common
practice is to use a handler, which could either be a struct or a nubmer.

> +			       fds, mem->nregions) < 0) {
> +		RTE_LOG(ERR, PMD, "Failed to share mem table\n");
> +		free(params);
> +		return -1;
> +	}
> +
> +	/* share callfd and kickfd */
> +	params->type = VHOST_MSG_TYPE_SET_FDS;
> +	vring_num = rte_vhost_get_vring_num(vid);
> +	for (i = 0; i < vring_num; i++) {
> +		if (rte_vhost_get_vhost_vring(vid, i, &vring) < 0) {

If you save the fds here, you don't have to get it every time when there
is a new secondary process attached. Then as I have suggested firstly,
you don't have to introduce callfd_pri in the vhost lib.

	--yliu
  
Jianfeng Tan Sept. 30, 2017, 4:03 a.m. UTC | #2
Hi Yuanhan,


On 9/29/2017 4:28 PM, Yuanhan Liu wrote:
> On Thu, Sep 28, 2017 at 01:55:59PM +0000, Jianfeng Tan wrote:
>>   static int
>> +share_device(int vid)
>> +{
>> +	uint32_t i, vring_num;
>> +	int len;
>> +	int fds[8];
>> +	struct rte_vhost_memory *mem;
>> +	struct vhost_params *params;
>> +	struct rte_vhost_vring vring;
>> +
>> +	/* share mem table */
>> +	if (rte_vhost_get_mem_table(vid, &mem) < 0) {
>> +		RTE_LOG(ERR, PMD, "Failed to get mem table\n");
>> +		return 0;
>> +	}
>> +	for (i = 0; i < mem->nregions; ++i)
>> +		fds[i] = mem->regions[i].fd;
>> +
>> +	len = sizeof(struct rte_vhost_mem_region) * mem->nregions;
>> +	params = malloc(sizeof(*params) + len);
>> +	if (params == NULL) {
>> +		RTE_LOG(ERR, PMD, "Failed to allocate memory\n");
>> +		return -1;
>> +	}
>> +
>> +	params->type = VHOST_MSG_TYPE_REGIONS;
>> +	params->vid = vid;
>> +	memcpy(params->regions, mem->regions, len);
>> +
>> +	if (rte_eal_mp_sendmsg("vhost pmd", params, sizeof(*params) + len,
> To me, it's not a good idea to identify an object by a string. The common
> practice is to use a handler, which could either be a struct or a nubmer.

My point is that if we choose the way you suggested, we need somewhere 
to maintain them. For example, every time we need register a new action, 
we need to update that file to add a new entry (number).

In current way, the duplicated register with the same name will fail, 
developers is responsible to check that.

>
>> +			       fds, mem->nregions) < 0) {
>> +		RTE_LOG(ERR, PMD, "Failed to share mem table\n");
>> +		free(params);
>> +		return -1;
>> +	}
>> +
>> +	/* share callfd and kickfd */
>> +	params->type = VHOST_MSG_TYPE_SET_FDS;
>> +	vring_num = rte_vhost_get_vring_num(vid);
>> +	for (i = 0; i < vring_num; i++) {
>> +		if (rte_vhost_get_vhost_vring(vid, i, &vring) < 0) {
> If you save the fds here, you don't have to get it every time when there
> is a new secondary process attached. Then as I have suggested firstly,
> you don't have to introduce callfd_pri in the vhost lib.

If we don't introduce callfd_pri, when we do virtqueue cleanup (or 
similar operations) in vhost lib, it will wrongly close fds belonging to 
secondary process.

You remind me that, instead of introduce callfd_pri, we can introduce 
callfd_effective, to reduce the modification lines.

Thanks,
Jianfeng
  
Yuanhan Liu Sept. 30, 2017, 8:16 a.m. UTC | #3
On Sat, Sep 30, 2017 at 12:03:33PM +0800, Tan, Jianfeng wrote:
> >>+	if (rte_eal_mp_sendmsg("vhost pmd", params, sizeof(*params) + len,
> >To me, it's not a good idea to identify an object by a string. The common
> >practice is to use a handler, which could either be a struct or a nubmer.
> 
> My point is that if we choose the way you suggested, we need somewhere to
> maintain them. For example, every time we need register a new action, we
> need to update that file to add a new entry (number).

Not really. You could let the register function to return a struct, in
which there is a name field.

Anyway, I think it's not a big deal to turn it to struct, judging that
it may only contain one field only: the name. But at least, you should
not type the same string everywhere. Use macro instead.

> In current way, the duplicated register with the same name will fail,
> developers is responsible to check that.
> 
> >
> >>+			       fds, mem->nregions) < 0) {
> >>+		RTE_LOG(ERR, PMD, "Failed to share mem table\n");
> >>+		free(params);
> >>+		return -1;
> >>+	}
> >>+
> >>+	/* share callfd and kickfd */
> >>+	params->type = VHOST_MSG_TYPE_SET_FDS;
> >>+	vring_num = rte_vhost_get_vring_num(vid);
> >>+	for (i = 0; i < vring_num; i++) {
> >>+		if (rte_vhost_get_vhost_vring(vid, i, &vring) < 0) {
> >If you save the fds here, you don't have to get it every time when there
> >is a new secondary process attached. Then as I have suggested firstly,
> >you don't have to introduce callfd_pri in the vhost lib.
> 
> If we don't introduce callfd_pri, when we do virtqueue cleanup (or similar
> operations) in vhost lib, it will wrongly close fds belonging to secondary
> process.
> 
> You remind me that, instead of introduce callfd_pri, we can introduce
> callfd_effective, to reduce the modification lines.

It's not about how many lines are modified. You were adding "effective"
fds, which is a semantic change. It makes the logic a bit more complex.
What's worse, it even doesn't resolve the issue completely.

	--yliu
  
Yuanhan Liu Sept. 30, 2017, 8:23 a.m. UTC | #4
On Thu, Sep 28, 2017 at 01:55:59PM +0000, Jianfeng Tan wrote:
> +static int
>  new_device(int vid)
>  {
>  	struct rte_eth_dev *eth_dev;
> @@ -610,6 +685,8 @@ new_device(int vid)
>  	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC,
>  				      NULL, NULL);
>  
> +	share_device(vid);
> +

Another question is, have you considered/tested the case when the VM
changes the qeueue number later?

	--yliu
  
Jianfeng Tan Sept. 30, 2017, 10:06 a.m. UTC | #5
On 9/30/2017 4:16 PM, Yuanhan Liu wrote:
> On Sat, Sep 30, 2017 at 12:03:33PM +0800, Tan, Jianfeng wrote:
>>>> +	if (rte_eal_mp_sendmsg("vhost pmd", params, sizeof(*params) + len,
>>> To me, it's not a good idea to identify an object by a string. The common
>>> practice is to use a handler, which could either be a struct or a nubmer.
>> My point is that if we choose the way you suggested, we need somewhere to
>> maintain them. For example, every time we need register a new action, we
>> need to update that file to add a new entry (number).
> Not really. You could let the register function to return a struct, in
> which there is a name field.
>
> Anyway, I think it's not a big deal to turn it to struct, judging that
> it may only contain one field only: the name. But at least, you should
> not type the same string everywhere. Use macro instead.

Agreed. Will fix that.

>
>> In current way, the duplicated register with the same name will fail,
>> developers is responsible to check that.
>>
>>>> +			       fds, mem->nregions) < 0) {
>>>> +		RTE_LOG(ERR, PMD, "Failed to share mem table\n");
>>>> +		free(params);
>>>> +		return -1;
>>>> +	}
>>>> +
>>>> +	/* share callfd and kickfd */
>>>> +	params->type = VHOST_MSG_TYPE_SET_FDS;
>>>> +	vring_num = rte_vhost_get_vring_num(vid);
>>>> +	for (i = 0; i < vring_num; i++) {
>>>> +		if (rte_vhost_get_vhost_vring(vid, i, &vring) < 0) {
>>> If you save the fds here, you don't have to get it every time when there
>>> is a new secondary process attached. Then as I have suggested firstly,
>>> you don't have to introduce callfd_pri in the vhost lib.
>> If we don't introduce callfd_pri, when we do virtqueue cleanup (or similar
>> operations) in vhost lib, it will wrongly close fds belonging to secondary
>> process.
>>
>> You remind me that, instead of introduce callfd_pri, we can introduce
>> callfd_effective, to reduce the modification lines.
> It's not about how many lines are modified. You were adding "effective"
> fds, which is a semantic change. It makes the logic a bit more complex.
> What's worse, it even doesn't resolve the issue completely.

Yes, it still has limitation. Just wonder if possible to move event fd 
write out of vhost lib.

Thanks,
Jianfeng

>
> 	--yliu
  
Jianfeng Tan Sept. 30, 2017, 10:53 a.m. UTC | #6
On 9/30/2017 4:23 PM, Yuanhan Liu wrote:
> On Thu, Sep 28, 2017 at 01:55:59PM +0000, Jianfeng Tan wrote:
>> +static int
>>   new_device(int vid)
>>   {
>>   	struct rte_eth_dev *eth_dev;
>> @@ -610,6 +685,8 @@ new_device(int vid)
>>   	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC,
>>   				      NULL, NULL);
>>   
>> +	share_device(vid);
>> +
> Another question is, have you considered/tested the case when the VM
> changes the qeueue number later?

Yes, that is a covered test case, we use ethtool to increase the 
combined queue number; see cover letter for detail.

Thanks,
Jianfeng

> 	--yliu
  
Yuanhan Liu Sept. 30, 2017, 11:34 a.m. UTC | #7
On Thu, Sep 30, 2017 at 12:53:00PM +0000, Jianfeng Tan wrote:
>On 9/30/2017 4:23 PM, Yuanhan Liu wrote:
> > On Thu, Sep 28, 2017 at 01:55:59PM +0000, Jianfeng Tan wrote:
> >> +static int
> >>   new_device(int vid)
> >>   {
> >>   	struct rte_eth_dev *eth_dev;
> >> @@ -610,6 +685,8 @@ new_device(int vid)
> >>   	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC,
> >>   				      NULL, NULL);
> >>   
> >> +	share_device(vid);
> >> +
> > Another question is, have you considered/tested the case when the VM
> > changes the qeueue number later?
> 
> Yes, that is a covered test case, we use ethtool to increase the 
> combined queue number; see cover letter for detail.

Sorry I missed that!

However, I'm not quite sure I understood you:

    Step 5: enable multi queue in VM1 and VM2.
      $ ethtool -L ethX combined 2
    
    Note in this test case, only queue 1, i.e., secondary process can process
    packets. To use queue 1, basically, we can run command like:
      $ taskset -c 1 <commands>

Do you mean the secondary can't rx/tx pkts from/to the 2nd queue?
And you are asking the user to add "taskset -c 1" each time he
wants to run a command inside the VM?

	--yliu
  
Yuanhan Liu Sept. 30, 2017, 11:49 a.m. UTC | #8
On Sat, Sep 30, 2017 at 12:06:44PM 0000, Jianfeng Tan wrote:
> >>>> +	/* share callfd and kickfd */
> >>>> +	params->type = VHOST_MSG_TYPE_SET_FDS;
> >>>> +	vring_num = rte_vhost_get_vring_num(vid);
> >>>> +	for (i = 0; i < vring_num; i++) {
> >>>> +		if (rte_vhost_get_vhost_vring(vid, i, &vring) < 0) {
> >>> If you save the fds here, you don't have to get it every time when there
> >>> is a new secondary process attached. Then as I have suggested firstly,
> >>> you don't have to introduce callfd_pri in the vhost lib.
> >> If we don't introduce callfd_pri, when we do virtqueue cleanup (or similar
> >> operations) in vhost lib, it will wrongly close fds belonging to secondary
> >> process.
> >>
> >> You remind me that, instead of introduce callfd_pri, we can introduce
> >> callfd_effective, to reduce the modification lines.
> > It's not about how many lines are modified. You were adding "effective"
> > fds, which is a semantic change. It makes the logic a bit more complex.
> > What's worse, it even doesn't resolve the issue completely.
> 
> Yes, it still has limitation. Just wonder if possible to move event fd 
> write out of vhost lib.

It could be another solution: we have the valid fd and vring_avail
outside the vhost lib. It's just that the callfd write is inside
the vhost lib. If you just take it out (remove it), it breaks some
users who rely on it: it's an API behaviour change.

	--yliu
  
Jianfeng Tan Oct. 1, 2017, 11:46 p.m. UTC | #9
On 9/30/2017 7:34 PM, Yuanhan Liu wrote:
> On Thu, Sep 30, 2017 at 12:53:00PM +0000, Jianfeng Tan wrote:
>> On 9/30/2017 4:23 PM, Yuanhan Liu wrote:
>>> On Thu, Sep 28, 2017 at 01:55:59PM +0000, Jianfeng Tan wrote:
>>>> +static int
>>>>    new_device(int vid)
>>>>    {
>>>>    	struct rte_eth_dev *eth_dev;
>>>> @@ -610,6 +685,8 @@ new_device(int vid)
>>>>    	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC,
>>>>    				      NULL, NULL);
>>>>    
>>>> +	share_device(vid);
>>>> +
>>> Another question is, have you considered/tested the case when the VM
>>> changes the qeueue number later?
>> Yes, that is a covered test case, we use ethtool to increase the
>> combined queue number; see cover letter for detail.
> Sorry I missed that!
>
> However, I'm not quite sure I understood you:
>
>      Step 5: enable multi queue in VM1 and VM2.
>        $ ethtool -L ethX combined 2
>      
>      Note in this test case, only queue 1, i.e., secondary process can process
>      packets. To use queue 1, basically, we can run command like:
>        $ taskset -c 1 <commands>
>
> Do you mean the secondary can't rx/tx pkts from/to the 2nd queue?
> And you are asking the user to add "taskset -c 1" each time he
> wants to run a command inside the VM?

No, the result is because the logic of this example, symmetric_mp, is 
that primary process works on the queue pair 0; and secondary process 
works on queue pair 1. And because of the limitation we mentioned 
earlier, primary process can process the virtqueue but cannot kick the 
queue pair 0 (wrong callfd). But secondary process can work on both 
queue pair 0 and 1 in theory, unfortunately, cannot find an existing 
example to test that.

Thanks,
Jianfeng

>
> 	--yliu
  
Jianfeng Tan Oct. 1, 2017, 11:48 p.m. UTC | #10
On 9/30/2017 7:49 PM, Yuanhan Liu wrote:
> On Sat, Sep 30, 2017 at 12:06:44PM 0000, Jianfeng Tan wrote:
>>>>>> +	/* share callfd and kickfd */
>>>>>> +	params->type = VHOST_MSG_TYPE_SET_FDS;
>>>>>> +	vring_num = rte_vhost_get_vring_num(vid);
>>>>>> +	for (i = 0; i < vring_num; i++) {
>>>>>> +		if (rte_vhost_get_vhost_vring(vid, i, &vring) < 0) {
>>>>> If you save the fds here, you don't have to get it every time when there
>>>>> is a new secondary process attached. Then as I have suggested firstly,
>>>>> you don't have to introduce callfd_pri in the vhost lib.
>>>> If we don't introduce callfd_pri, when we do virtqueue cleanup (or similar
>>>> operations) in vhost lib, it will wrongly close fds belonging to secondary
>>>> process.
>>>>
>>>> You remind me that, instead of introduce callfd_pri, we can introduce
>>>> callfd_effective, to reduce the modification lines.
>>> It's not about how many lines are modified. You were adding "effective"
>>> fds, which is a semantic change. It makes the logic a bit more complex.
>>> What's worse, it even doesn't resolve the issue completely.
>> Yes, it still has limitation. Just wonder if possible to move event fd
>> write out of vhost lib.
> It could be another solution: we have the valid fd and vring_avail
> outside the vhost lib. It's just that the callfd write is inside
> the vhost lib. If you just take it out (remove it), it breaks some
> users who rely on it: it's an API behaviour change.

Make sense.

Thanks,
Jianfeng
  

Patch

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 0dac5e6..6a685a3 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -33,6 +33,7 @@ 
 #include <unistd.h>
 #include <pthread.h>
 #include <stdbool.h>
+#include <sys/mman.h>
 
 #include <rte_mbuf.h>
 #include <rte_ethdev.h>
@@ -46,6 +47,20 @@ 
 
 #include "rte_eth_vhost.h"
 
+#define VHOST_MSG_TYPE_REGIONS	1
+#define VHOST_MSG_TYPE_SET_FDS	2
+#define VHOST_MSG_TYPE_INIT	3
+
+struct vhost_params {
+	int type;
+	union {
+		int vid;
+		int portid;
+	};
+	int vring_idx;
+	struct rte_vhost_mem_region regions[0];
+};
+
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 
 #define ETH_VHOST_IFACE_ARG		"iface"
@@ -550,6 +565,66 @@  update_queuing_status(struct rte_eth_dev *dev)
 }
 
 static int
+share_device(int vid)
+{
+	uint32_t i, vring_num;
+	int len;
+	int fds[8];
+	struct rte_vhost_memory *mem;
+	struct vhost_params *params;
+	struct rte_vhost_vring vring;
+
+	/* share mem table */
+	if (rte_vhost_get_mem_table(vid, &mem) < 0) {
+		RTE_LOG(ERR, PMD, "Failed to get mem table\n");
+		return 0;
+	}
+	for (i = 0; i < mem->nregions; ++i)
+		fds[i] = mem->regions[i].fd;
+
+	len = sizeof(struct rte_vhost_mem_region) * mem->nregions;
+	params = malloc(sizeof(*params) + len);
+	if (params == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory\n");
+		return -1;
+	}
+
+	params->type = VHOST_MSG_TYPE_REGIONS;
+	params->vid = vid;
+	memcpy(params->regions, mem->regions, len);
+
+	if (rte_eal_mp_sendmsg("vhost pmd", params, sizeof(*params) + len,
+			       fds, mem->nregions) < 0) {
+		RTE_LOG(ERR, PMD, "Failed to share mem table\n");
+		free(params);
+		return -1;
+	}
+
+	/* share callfd and kickfd */
+	params->type = VHOST_MSG_TYPE_SET_FDS;
+	vring_num = rte_vhost_get_vring_num(vid);
+	for (i = 0; i < vring_num; i++) {
+		if (rte_vhost_get_vhost_vring(vid, i, &vring) < 0) {
+			RTE_LOG(ERR, PMD, "Failed to get vring, idx = %d\n", i);
+			free(params);
+			return -1;
+		}
+
+		params->vring_idx = i;
+		fds[0] = vring.callfd;
+		fds[1] = vring.kickfd;
+		if (rte_eal_mp_sendmsg("vhost pmd", params,
+				       sizeof(*params), fds, 2) < 0) {
+			RTE_LOG(ERR, PMD, "Failed to set fds\n");
+			return -1;
+		}
+	}
+
+	free(params);
+	return 0;
+}
+
+static int
 new_device(int vid)
 {
 	struct rte_eth_dev *eth_dev;
@@ -610,6 +685,8 @@  new_device(int vid)
 	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC,
 				      NULL, NULL);
 
+	share_device(vid);
+
 	return 0;
 }
 
@@ -1025,13 +1102,6 @@  eth_dev_vhost_create(struct rte_vdev_device *dev, char *iface_name,
 	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
 		numa_node);
 
-	/* now do all data allocation - for eth_dev structure and internal
-	 * (private) data
-	 */
-	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
-	if (data == NULL)
-		goto error;
-
 	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
 	if (list == NULL)
 		goto error;
@@ -1073,11 +1143,7 @@  eth_dev_vhost_create(struct rte_vdev_device *dev, char *iface_name,
 	rte_spinlock_init(&vring_state->lock);
 	vring_states[eth_dev->data->port_id] = vring_state;
 
-	/* We'll replace the 'data' originally allocated by eth_dev. So the
-	 * vhost PMD resources won't be shared between multi processes.
-	 */
-	rte_memcpy(data, eth_dev->data, sizeof(*data));
-	eth_dev->data = data;
+	data = eth_dev->data;
 
 	data->nb_rx_queues = queues;
 	data->nb_tx_queues = queues;
@@ -1125,6 +1191,30 @@  eth_dev_vhost_create(struct rte_vdev_device *dev, char *iface_name,
 	return -1;
 }
 
+static int
+eth_dev_vhost_attach(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct rte_eth_dev_data *data = NULL;
+
+	RTE_LOG(INFO, PMD, "Attach vhost user port\n");
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_vdev_allocate(dev, sizeof(struct pmd_internal));
+	if (eth_dev == NULL)
+		return -1;
+
+	eth_dev->dev_ops = &ops;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	data = eth_dev->data;
+
+	return data->port_id;
+}
+
 static inline int
 open_iface(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -1154,10 +1244,84 @@  open_int(const char *key __rte_unused, const char *value, void *extra_args)
 }
 
 static int
+vhost_pmd_action(const void *params, int len, int fds[], int fds_num)
+{
+	int i;
+	int vid;
+	void *base_addr;
+	const struct vhost_params *p = params;
+	const struct rte_vhost_mem_region *regions;
+
+	if (len < (int)sizeof(*p)) {
+		RTE_LOG(ERR, PMD, "message if too short\n");
+		return -1;
+	}
+
+	switch (p->type) {
+	case VHOST_MSG_TYPE_REGIONS:
+		regions = p->regions;
+		for (i = 0; i < fds_num; ++i) {
+			base_addr = mmap(regions[i].mmap_addr,
+					 regions[i].mmap_size,
+					 PROT_READ | PROT_WRITE,
+					 MAP_FIXED | MAP_SHARED, fds[i], 0);
+			if (base_addr != regions[i].mmap_addr) {
+				RTE_LOG(ERR, PMD,
+					"vhost in secondary mmap error: %s\n",
+					strerror(errno));
+				break;
+			}
+		}
+		break;
+	case VHOST_MSG_TYPE_SET_FDS:
+		rte_vhost_set_vring_effective_fd(p->vid,
+						 p->vring_idx,
+						 fds[0], fds[1]);
+		break;
+	case VHOST_MSG_TYPE_INIT:
+		vid = rte_eth_vhost_get_vid_from_port_id(p->portid);
+		share_device(vid);
+		break;
+	}
+
+	return 0;
+}
+
+static int
+probe_secondary(struct rte_vdev_device *dev)
+{
+	int portid = eth_dev_vhost_attach(dev);
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_params p;
+
+	if (portid < 0)
+		return -1;
+
+	eth_dev = &rte_eth_devices[portid];
+	internal = eth_dev->data->dev_private;
+
+	if (!internal ||
+	    rte_atomic32_read(&internal->dev_attached) == 0) {
+		RTE_LOG(INFO, PMD, "%s is not ready\n", dev->device.name);
+		return 0;
+	}
+
+	p.type = VHOST_MSG_TYPE_INIT;
+	p.portid = portid;
+	if (rte_eal_mp_sendmsg("vhost pmd", &p, sizeof(p), NULL, 0) < 0) {
+		RTE_LOG(ERR, PMD, "Failed to send request for init\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
 rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 {
 	struct rte_kvargs *kvlist = NULL;
-	int ret = 0;
+	int ret;
 	char *iface_name;
 	uint16_t queues;
 	uint64_t flags = 0;
@@ -1167,6 +1331,15 @@  rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n",
 		rte_vdev_device_name(dev));
 
+	ret = rte_eal_mp_action_register("vhost pmd", vhost_pmd_action);
+	if (ret < 0 && ret != -EEXIST) {
+		RTE_LOG(ERR, PMD, "vhost fails to add action\n");
+		return -1;
+	}
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+		return probe_secondary(dev);
+
 	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
 	if (kvlist == NULL)
 		return -1;
@@ -1216,6 +1389,7 @@  rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 	eth_dev_vhost_create(dev, iface_name, queues, dev->device.numa_node,
 		flags);
 
+	ret = 0;
 out_free:
 	rte_kvargs_free(kvlist);
 	return ret;