[v7,3/3] net/tap: allow secondary process to access primary device queues

Message ID 1539766564-9433-3-git-send-email-rasland@mellanox.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series [v7,1/3] net/tap: add queue and port ids in Rx/Tx queues structures |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Raslan Darawsheh Oct. 17, 2018, 8:56 a.m. UTC
  In the case the device is created by the primary process,
the secondary must request some file descriptors to attach the queues.
The file descriptors are shared via IPC Unix socket.

Thanks to the IPC synchronization, the secondary process
is now able to do Rx/Tx on a TAP created by the primary process.

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

---
    v2:
       - translate file descriptors via IPC API
       - add documentation
    v3:
       - rabse the commit
       - use private static array for fd's to be local for each process

    v4:
       - removed TODO and FIXME tags
       - used strlcpy instead of strcpy

    v5: rebase the commit on top of Alejandro Lucero patch for secondary
        process private pointer.
        http://patches.dpdk.org/patch/46185/

    v6: reword the commit log

    v7: rely on tap_device_count for registration
---
---
 doc/guides/nics/tap.rst                |  16 ++++
 doc/guides/rel_notes/release_18_11.rst |   5 ++
 drivers/net/tap/Makefile               |   1 +
 drivers/net/tap/meson.build            |   1 +
 drivers/net/tap/rte_eth_tap.c          | 150 ++++++++++++++++++++++++++++++++-
 5 files changed, 172 insertions(+), 1 deletion(-)
  

Comments

Ferruh Yigit Oct. 17, 2018, 12:06 p.m. UTC | #1
On 10/17/2018 9:56 AM, Raslan Darawsheh wrote:
> @@ -2082,6 +2214,16 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
>  	TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s",
>  		name, tap_name);
>  
> +	/* Register IPC feed callback */
> +	if (!tap_devices_count) {
> +		ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
> +		if (ret < 0) {
> +			TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
> +				tuntap_name, strerror(rte_errno));
> +			goto leave;
> +		}
> +	}
> +	tap_devices_count++;
>  	ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
>  		ETH_TUNTAP_TYPE_TAP);
>  
> @@ -2089,6 +2231,9 @@ rte_pmd_tap_probe(struct rte_vdev_device *dev)
>  	if (ret == -1) {
>  		TAP_LOG(ERR, "Failed to create pmd for %s as %s",
>  			name, tap_name);
> +		if (!tap_devices_count)
> +			rte_mp_action_unregister(TAP_MP_KEY);
> +		tap_devices_count--;
>  		tap_unit--;		/* Restore the unit number */
>  	}
>  	rte_kvargs_free(kvlist);
Fail recovery part seems broken, it can be like [1] or [2], but both requires a
new variable.
I double checked the logic in prev version of the patch that uses EEXIST return
values, that is also broken. Overall the challenge is in error recovery part we
don't know if we enter there before or after increasing dev_count, that is why a
local variable required.

If you can fix the error recovery path using EEXIST without needing a new
variable, I think that is better, but if not I suggest following [2] since the
logic of increase the dev_count after device successfully created makes sense to
me, but both works.

Thanks,
ferruh


[1]
         /* Register IPC feed callback */
         if (!tap_devices_count) {
                 ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
                 if (ret < 0) {
                         TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
                                 tuntap_name, strerror(rte_errno));
                         goto leave;
                 }
         }
         tap_devices_count++;
         tap_devices_count_increased = 1;
         ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
                 ETH_TUNTAP_TYPE_TAP);

 leave:
         if (ret == -1) {
                 TAP_LOG(ERR, "Failed to create pmd for %s as %s",
                         name, tap_name);
                 if (tap_devices_count_increased == 1) {
                         if (tap_devices_count == 1)
                                 rte_mp_action_unregister(TAP_MP_KEY);
                         tap_devices_count--;
                 }
                 tap_unit--;             /* Restore the unit number */
         }
         rte_kvargs_free(kvlist);



[2]

         /* Register IPC feed callback */
         if (!tap_devices_count) {
                 ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
                 if (ret < 0) {
                         TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
                                 tuntap_name, strerror(rte_errno));
                         goto leave;
                 }
                 mp_action_registered = 1;
         }
         ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
                 ETH_TUNTAP_TYPE_TAP);


 leave:
         if (ret == -1) {
                 TAP_LOG(ERR, "Failed to create pmd for %s as %s",
                         name, tap_name);
                 if (mp_action_registered == 1)
                         rte_mp_action_unregister(TAP_MP_KEY);
                 tap_unit--;             /* Restore the unit number */
         } else {
                 tap_devices_count++;
         }
         rte_kvargs_free(kvlist);
  
Raslan Darawsheh Oct. 17, 2018, 2:46 p.m. UTC | #2
You right about that fixed in the new version

Kindest regards,
Raslan Darawsheh

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Wednesday, October 17, 2018 3:07 PM
> To: Raslan Darawsheh <rasland@mellanox.com>; keith.wiles@intel.com
> Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf
> Shuler <shahafs@mellanox.com>; Ori Kam <orika@mellanox.com>
> Subject: Re: [PATCH v7 3/3] net/tap: allow secondary process to access
> primary device queues
> 
> On 10/17/2018 9:56 AM, Raslan Darawsheh wrote:
> > @@ -2082,6 +2214,16 @@ rte_pmd_tap_probe(struct rte_vdev_device
> *dev)
> >  	TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s",
> >  		name, tap_name);
> >
> > +	/* Register IPC feed callback */
> > +	if (!tap_devices_count) {
> > +		ret = rte_mp_action_register(TAP_MP_KEY,
> tap_mp_sync_queues);
> > +		if (ret < 0) {
> > +			TAP_LOG(ERR, "%s: Failed to register IPC callback:
> %s",
> > +				tuntap_name, strerror(rte_errno));
> > +			goto leave;
> > +		}
> > +	}
> > +	tap_devices_count++;
> >  	ret = eth_dev_tap_create(dev, tap_name, remote_iface,
> &user_mac,
> >  		ETH_TUNTAP_TYPE_TAP);
> >
> > @@ -2089,6 +2231,9 @@ rte_pmd_tap_probe(struct rte_vdev_device
> *dev)
> >  	if (ret == -1) {
> >  		TAP_LOG(ERR, "Failed to create pmd for %s as %s",
> >  			name, tap_name);
> > +		if (!tap_devices_count)
> > +			rte_mp_action_unregister(TAP_MP_KEY);
> > +		tap_devices_count--;
> >  		tap_unit--;		/* Restore the unit number */
> >  	}
> >  	rte_kvargs_free(kvlist);
> Fail recovery part seems broken, it can be like [1] or [2], but both requires a
> new variable.
> I double checked the logic in prev version of the patch that uses EEXIST
> return values, that is also broken. Overall the challenge is in error recovery
> part we don't know if we enter there before or after increasing dev_count,
> that is why a local variable required.
> 
> If you can fix the error recovery path using EEXIST without needing a new
> variable, I think that is better, but if not I suggest following [2] since the logic
> of increase the dev_count after device successfully created makes sense to
> me, but both works.
> 
> Thanks,
> ferruh
> 
> 
> [1]
>          /* Register IPC feed callback */
>          if (!tap_devices_count) {
>                  ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
>                  if (ret < 0) {
>                          TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
>                                  tuntap_name, strerror(rte_errno));
>                          goto leave;
>                  }
>          }
>          tap_devices_count++;
>          tap_devices_count_increased = 1;
>          ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
>                  ETH_TUNTAP_TYPE_TAP);
> 
>  leave:
>          if (ret == -1) {
>                  TAP_LOG(ERR, "Failed to create pmd for %s as %s",
>                          name, tap_name);
>                  if (tap_devices_count_increased == 1) {
>                          if (tap_devices_count == 1)
>                                  rte_mp_action_unregister(TAP_MP_KEY);
>                          tap_devices_count--;
>                  }
>                  tap_unit--;             /* Restore the unit number */
>          }
>          rte_kvargs_free(kvlist);
> 
> 
> 
> [2]
> 
>          /* Register IPC feed callback */
>          if (!tap_devices_count) {
>                  ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
>                  if (ret < 0) {
>                          TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
>                                  tuntap_name, strerror(rte_errno));
>                          goto leave;
>                  }
>                  mp_action_registered = 1;
>          }
>          ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
>                  ETH_TUNTAP_TYPE_TAP);
> 
> 
>  leave:
>          if (ret == -1) {
>                  TAP_LOG(ERR, "Failed to create pmd for %s as %s",
>                          name, tap_name);
>                  if (mp_action_registered == 1)
>                          rte_mp_action_unregister(TAP_MP_KEY);
>                  tap_unit--;             /* Restore the unit number */
>          } else {
>                  tap_devices_count++;
>          }
>          rte_kvargs_free(kvlist);
  

Patch

diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
index 2714868..9a3d7b3 100644
--- a/doc/guides/nics/tap.rst
+++ b/doc/guides/nics/tap.rst
@@ -152,6 +152,22 @@  Distribute IPv4 TCP packets using RSS to a given MAC address over queues 0-3::
    testpmd> flow create 0 priority 4 ingress pattern eth dst is 0a:0b:0c:0d:0e:0f \
             / ipv4 / tcp / end actions rss queues 0 1 2 3 end / end
 
+Multi-process sharing
+---------------------
+
+It is possible to attach an existing TAP device in a secondary process,
+by declaring it as a vdev with the same name as in the primary process,
+and without any parameter.
+
+The port attached in a secondary process will give access to the
+statistics and the queues.
+Therefore it can be used for monitoring or Rx/Tx processing.
+
+The IPC synchronization of Rx/Tx queues is currently limited:
+
+  - Maximum 8 queues shared
+  - Synchronized on probing, but not on later port update
+
 Example
 -------
 
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..3240b52 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,11 @@  New Features
   the specified port. The port must be stopped before the command call in order
   to reconfigure queues.
 
+* **Added TAP Rx/Tx queues sharing with a secondary process.**
+
+  A secondary process can attach a TAP device created in the primary process,
+  probe the queues, and process Rx/Tx in a secondary process.
+
 
 API Changes
 -----------
diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index 3243365..7748283 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -22,6 +22,7 @@  CFLAGS += -O3
 CFLAGS += -I$(SRCDIR)
 CFLAGS += -I.
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
 LDLIBS += -lrte_bus_vdev -lrte_gso
diff --git a/drivers/net/tap/meson.build b/drivers/net/tap/meson.build
index 37f65b7..f7e8852 100644
--- a/drivers/net/tap/meson.build
+++ b/drivers/net/tap/meson.build
@@ -35,6 +35,7 @@  args = [
 	  'TCA_ACT_BPF_FD' ],
 ]
 config = configuration_data()
+allow_experimental_apis = true
 foreach arg:args
 	config.set(arg[0], cc.has_header_symbol(arg[1], arg[2]))
 endforeach
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 3372d54..cfb2648 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -16,6 +16,8 @@ 
 #include <rte_debug.h>
 #include <rte_ip.h>
 #include <rte_string_fns.h>
+#include <rte_ethdev.h>
+#include <rte_errno.h>
 
 #include <assert.h>
 #include <sys/types.h>
@@ -62,6 +64,10 @@ 
 #define TAP_GSO_MBUFS_NUM \
 	(TAP_GSO_MBUFS_PER_CORE * TAP_GSO_MBUF_CACHE_SIZE)
 
+/* IPC key for queue fds sync */
+#define TAP_MP_KEY "tap_mp_sync_queues"
+
+static int tap_devices_count;
 static struct rte_vdev_driver pmd_tap_drv;
 static struct rte_vdev_driver pmd_tun_drv;
 
@@ -100,6 +106,17 @@  enum ioctl_mode {
 	REMOTE_ONLY,
 };
 
+/* Message header to synchronize queues via IPC */
+struct ipc_queues {
+	char port_name[RTE_DEV_NAME_MAX_LEN];
+	int rxq_count;
+	int txq_count;
+	/*
+	 * The file descriptors are in the dedicated part
+	 * of the Unix message to be translated by the kernel.
+	 */
+};
+
 static int tap_intr_handle_set(struct rte_eth_dev *dev, int set);
 
 /**
@@ -2006,6 +2023,102 @@  rte_pmd_tun_probe(struct rte_vdev_device *dev)
 	return ret;
 }
 
+/* Request queue file descriptors from secondary to primary. */
+static int
+tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
+{
+	int ret;
+	struct timespec timeout = {.tv_sec = 1, .tv_nsec = 0};
+	struct rte_mp_msg request, *reply;
+	struct rte_mp_reply replies;
+	struct ipc_queues *request_param = (struct ipc_queues *)request.param;
+	struct ipc_queues *reply_param;
+	struct pmd_process_private *process_private = dev->process_private;
+	int queue, fd_iterator;
+
+	/* Prepare the request */
+	strlcpy(request.name, TAP_MP_KEY, sizeof(request.name));
+	strlcpy(request_param->port_name, port_name,
+		sizeof(request_param->port_name));
+	request.len_param = sizeof(*request_param);
+	/* Send request and receive reply */
+	ret = rte_mp_request_sync(&request, &replies, &timeout);
+	if (ret < 0) {
+		TAP_LOG(ERR, "Failed to request queues from primary: %d",
+			rte_errno);
+		return -1;
+	}
+	reply = &replies.msgs[0];
+	reply_param = (struct ipc_queues *)reply->param;
+	TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
+
+	/* Attach the queues from received file descriptors */
+	dev->data->nb_rx_queues = reply_param->rxq_count;
+	dev->data->nb_tx_queues = reply_param->txq_count;
+	fd_iterator = 0;
+	for (queue = 0; queue < reply_param->rxq_count; queue++)
+		process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
+	for (queue = 0; queue < reply_param->txq_count; queue++)
+		process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+
+	return 0;
+}
+
+/* Send the queue file descriptors from the primary process to secondary. */
+static int
+tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
+{
+	struct rte_eth_dev *dev;
+	struct pmd_process_private *process_private;
+	struct rte_mp_msg reply;
+	const struct ipc_queues *request_param =
+		(const struct ipc_queues *)request->param;
+	struct ipc_queues *reply_param =
+		(struct ipc_queues *)reply.param;
+	uint16_t port_id;
+	int queue;
+	int ret;
+
+	/* Get requested port */
+	TAP_LOG(DEBUG, "Received IPC request for %s", request_param->port_name);
+	ret = rte_eth_dev_get_port_by_name(request_param->port_name, &port_id);
+	if (ret) {
+		TAP_LOG(ERR, "Failed to get port id for %s",
+			request_param->port_name);
+		return -1;
+	}
+	dev = &rte_eth_devices[port_id];
+	process_private = dev->process_private;
+
+	/* Fill file descriptors for all queues */
+	reply.num_fds = 0;
+	reply_param->rxq_count = 0;
+	for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
+		reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
+		reply_param->rxq_count++;
+	}
+	RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
+	RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
+	RTE_ASSERT(reply.num_fds <= RTE_MP_MAX_FD_NUM);
+
+	reply_param->txq_count = 0;
+	for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
+		reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
+		reply_param->txq_count++;
+	}
+
+	/* Send reply */
+	strlcpy(reply.name, request->name, sizeof(reply.name));
+	strlcpy(reply_param->port_name, request_param->port_name,
+		sizeof(reply_param->port_name));
+	reply.len_param = sizeof(*reply_param);
+	if (rte_mp_reply(&reply, peer) < 0) {
+		TAP_LOG(ERR, "Failed to reply an IPC request to sync queues");
+		return -1;
+	}
+	return 0;
+}
+
 /* Open a TAP interface device.
  */
 static int
@@ -2032,9 +2145,28 @@  rte_pmd_tap_probe(struct rte_vdev_device *dev)
 			TAP_LOG(ERR, "Failed to probe %s", name);
 			return -1;
 		}
-		/* TODO: request info from primary to set up Rx and Tx */
 		eth_dev->dev_ops = &ops;
 		eth_dev->device = &dev->device;
+		eth_dev->rx_pkt_burst = pmd_rx_burst;
+		eth_dev->tx_pkt_burst = pmd_tx_burst;
+		if (!rte_eal_primary_proc_alive(NULL)) {
+			TAP_LOG(ERR, "Primary process is missing");
+			return -1;
+		}
+		eth_dev->process_private = (struct pmd_process_private *)
+			rte_zmalloc_socket(name,
+				sizeof(struct pmd_process_private),
+				RTE_CACHE_LINE_SIZE,
+				eth_dev->device->numa_node);
+		if (eth_dev->process_private == NULL) {
+			TAP_LOG(ERR,
+				"Failed to alloc memory for process private");
+			return -1;
+		}
+
+		ret = tap_mp_attach_queues(name, eth_dev);
+		if (ret != 0)
+			return -1;
 		rte_eth_dev_probing_finish(eth_dev);
 		return 0;
 	}
@@ -2082,6 +2214,16 @@  rte_pmd_tap_probe(struct rte_vdev_device *dev)
 	TAP_LOG(NOTICE, "Initializing pmd_tap for %s as %s",
 		name, tap_name);
 
+	/* Register IPC feed callback */
+	if (!tap_devices_count) {
+		ret = rte_mp_action_register(TAP_MP_KEY, tap_mp_sync_queues);
+		if (ret < 0) {
+			TAP_LOG(ERR, "%s: Failed to register IPC callback: %s",
+				tuntap_name, strerror(rte_errno));
+			goto leave;
+		}
+	}
+	tap_devices_count++;
 	ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac,
 		ETH_TUNTAP_TYPE_TAP);
 
@@ -2089,6 +2231,9 @@  rte_pmd_tap_probe(struct rte_vdev_device *dev)
 	if (ret == -1) {
 		TAP_LOG(ERR, "Failed to create pmd for %s as %s",
 			name, tap_name);
+		if (!tap_devices_count)
+			rte_mp_action_unregister(TAP_MP_KEY);
+		tap_devices_count--;
 		tap_unit--;		/* Restore the unit number */
 	}
 	rte_kvargs_free(kvlist);
@@ -2137,6 +2282,9 @@  rte_pmd_tap_remove(struct rte_vdev_device *dev)
 	close(internals->ioctl_sock);
 	rte_free(eth_dev->data->dev_private);
 	rte_free(eth_dev->process_private);
+	if (tap_devices_count == 1)
+		rte_mp_action_unregister(TAP_MP_KEY);
+	tap_devices_count--;
 	rte_eth_dev_release_port(eth_dev);
 
 	if (internals->ka_fd != -1) {