[dpdk-dev,RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.

Message ID 1495986280-26207-1-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail Compilation issues

Commit Message

Guo, Jia May 28, 2017, 3:44 p.m. UTC
For HW hotplug feature, we had already got upstream that removal event adding from 6wind as bellow.

dependency of “add device removal event” patch set:
http://dpdk.org/dev/patchwork/patch/23693/
[dpdk-dev,v2,1/5] ethdev: introduce device removal event
http://dpdk.org/dev/patchwork/patch/23694/
[dpdk-dev,v2,2/5] net/mlx4: device removal event support
http://dpdk.org/dev/patchwork/patch/23695/
[dpdk-dev,v2,3/5] app/testpmd: generic event handler
http://dpdk.org/dev/patchwork/patch/23696/
[dpdk-dev,v2,4/5] app/testpmd: request link status interrupt
http://dpdk.org/dev/patchwork/patch/23697/
[dpdk-dev,v2,5/5] app/testpmd: request device removal interrupt

From the patches, we can see a new event type “RTE_ETH_DEV_INTR_RMV” has been added into the ethdev, and the event has been implemented in mlx4 driver, and Testpmd be use for testing purposes and as a practical usage example for how to use these event. The progress is use the mlx4 driver to register interrupt callback function to rte eal interrupt source, and when rte epolling detect the IBV_EVENT_DEVICE_FATAL , which is identify the device remove behavior, it will callback into the driver’s interrupt handle to handle it, and then callback to the user app, such as testpmd, to detach the pci device.

So far, except the mlx4 driver, other driver like i40, that not have the remove interrupt from hw, will not be able to monitoring the remove behavior, so in order to expand the hot plug feature for all driver cases, something must be done ot detect the remove event at the kernel level and offer a new line of interrupt to the userland. The idea is coming as below.

Use I40e as example, we know that the rmv interrupt is not added in hw, but we could monitor the uio file descriptor to catch the device remove event as bellow.

The info of uevent form FD monitoring:
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

Firstly, in order to monitor the uio file descriptor, we need to create socket to monitor the uio fd, that is defined as “hotplug_fd”, and then add the uio fd into the epoll fd list, rte eal could epoll all of the interrupt event from hw interrupt and also include the uevent from hotplug_fd.

Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler could use these API to enable the uevent monitoring, and read out the uevent type , then corresponding to handle these uevent, such as detach the device when get the remove type.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c                     |  15 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 146 ++++++++++++++++++++-
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  32 +++++
 3 files changed, 191 insertions(+), 2 deletions(-)
  

Comments

Gaëtan Rivet May 30, 2017, 7:14 a.m. UTC | #1
Hi Jeff,

A few comments below:

On Sun, May 28, 2017 at 11:44:40PM +0800, Jeff Guo wrote:
>For HW hotplug feature, we had already got upstream that removal event adding from 6wind as bellow.
>
>dependency of “add device removal event” patch set:
>http://dpdk.org/dev/patchwork/patch/23693/
>[dpdk-dev,v2,1/5] ethdev: introduce device removal event
>http://dpdk.org/dev/patchwork/patch/23694/
>[dpdk-dev,v2,2/5] net/mlx4: device removal event support
>http://dpdk.org/dev/patchwork/patch/23695/
>[dpdk-dev,v2,3/5] app/testpmd: generic event handler
>http://dpdk.org/dev/patchwork/patch/23696/
>[dpdk-dev,v2,4/5] app/testpmd: request link status interrupt
>http://dpdk.org/dev/patchwork/patch/23697/
>[dpdk-dev,v2,5/5] app/testpmd: request device removal interrupt
>
>From the patches, we can see a new event type “RTE_ETH_DEV_INTR_RMV” has been added into the ethdev, and the event has been implemented in mlx4 driver, and Testpmd be use for testing purposes and as a practical usage example for how to use these event. The progress is use the mlx4 driver to register interrupt callback function to rte eal interrupt source, and when rte epolling detect the IBV_EVENT_DEVICE_FATAL , which is identify the device remove behavior, it will callback into the driver’s interrupt handle to handle it, and then callback to the user app, such as testpmd, to detach the pci device.
>
>So far, except the mlx4 driver, other driver like i40, that not have the remove interrupt from hw, will not be able to monitoring the remove behavior, so in order to expand the hot plug feature for all driver cases, something must be done ot detect the remove event at the kernel level and offer a new line of interrupt to the userland. The idea is coming as below.
>

It's nice that this event is extended to other drivers.

>Use I40e as example, we know that the rmv interrupt is not added in hw, but we could monitor the uio file descriptor to catch the device remove event as bellow.
>
>The info of uevent form FD monitoring:
>remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
>ACTION=remove
>DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
>SUBSYSTEM=uio
>MAJOR=243
>MINOR=2
>DEVNAME=uio2
>SEQNUM=11366
>
>Firstly, in order to monitor the uio file descriptor, we need to create socket to monitor the uio fd, that is defined as “hotplug_fd”, and then add the uio fd into the epoll fd list, rte eal could epoll all of the interrupt event from hw interrupt and also include the uevent from hotplug_fd.
>
>Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler could use these API to enable the uevent monitoring, and read out the uevent type , then corresponding to handle these uevent, such as detach the device when get the remove type.
>

I find having a generic uevent API interesting.

However, all specifics pertaining to UIO use (hotplug_fd, subsystem
enum) should stay in UIO specific code (eal_pci_uio.c?).

I am currently moving the PCI bus out of the EAL. EAL subsystems should
not rely on PCI specifics, as they won't be available afterward.

It should also allow you to clean up your API. Exposing hotplug_fd and
requiring PMDs to link it can be avoided and should result in a simpler
API.

>Signed-off-by: Jeff Guo <jia.guo@intel.com>
>---
> drivers/net/i40e/i40e_ethdev.c                     |  15 +++
> lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 146 ++++++++++++++++++++-
> .../linuxapp/eal/include/exec-env/rte_interrupts.h |  32 +++++
> 3 files changed, 191 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
>index 4c49673..0336f82 100644
>--- a/drivers/net/i40e/i40e_ethdev.c
>+++ b/drivers/net/i40e/i40e_ethdev.c
>@@ -66,6 +66,8 @@
> #include "i40e_pf.h"
> #include "i40e_regs.h"
>
>+extern int hotplug_fd;
>+
> #define ETH_I40E_FLOATING_VEB_ARG	"enable_floating_veb"
> #define ETH_I40E_FLOATING_VEB_LIST_ARG	"floating_veb_list"
>
>@@ -5808,8 +5810,21 @@ i40e_dev_interrupt_handler(void *param)
> {
> 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
> 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>+	struct rte_uevent event;
> 	uint32_t icr0;
>
>+	/* check device  uevent */
>+	while (rte_get_uevent(hotplug_fd, &event) > 0) {
>+		if (event.subsystem == 1) {
>+			if (event.action == RTE_UEVENT_ADD) {
>+				//do nothing here
>+			} else if (event.action == RTE_UEVENT_REMOVE) {
>+				_rte_eth_dev_callback_process(dev,
>+					RTE_ETH_EVENT_INTR_RMV, NULL);
>+			}
>+		}
>+	}
>+
> 	/* Disable interrupt */
> 	i40e_pf_disable_irq0(hw);
>
>diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>index 2e3bd12..873ab5f 100644
>--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>@@ -65,6 +65,10 @@
> #include <rte_errno.h>
> #include <rte_spinlock.h>
>
>+#include <sys/socket.h>
>+#include <linux/netlink.h>
>+#include <sys/epoll.h>
>+
> #include "eal_private.h"
> #include "eal_vfio.h"
> #include "eal_thread.h"
>@@ -74,6 +78,11 @@
>
> static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
>
>+#define RTE_UEVENT_MSG_LEN 4096
>+#define RTE_UEVENT_SUBSYSTEM_UIO 1
>+
>+int hotplug_fd = -1;
>+
> /**
>  * union for pipe fds.
>  */
>@@ -669,10 +678,13 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
> 			RTE_SET_USED(r);
> 			return -1;
> 		}
>+
> 		rte_spinlock_lock(&intr_lock);
> 		TAILQ_FOREACH(src, &intr_sources, next)
>-			if (src->intr_handle.fd ==
>-					events[n].data.fd)
>+			if ((src->intr_handle.fd ==
>+					events[n].data.fd) ||
>+				(hotplug_fd ==
>+					events[n].data.fd))
> 				break;
> 		if (src == NULL){
> 			rte_spinlock_unlock(&intr_lock);
>@@ -858,7 +870,24 @@ eal_intr_thread_main(__rte_unused void *arg)
> 			}
> 			else
> 				numfds++;
>+
>+			/**
>+			 * add device uevent file descriptor
>+			 * into wait list for hot plug.
>+			 */
>+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
>+			ev.data.fd = hotplug_fd;
>+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
>+					hotplug_fd, &ev) < 0){
>+				rte_panic("Error adding hotplug_fd %d epoll_ctl, %s\n",
>+					hotplug_fd, strerror(errno));
>+			}
>+			else
>+				numfds++;
>+
> 		}
>+
>+
> 		rte_spinlock_unlock(&intr_lock);
> 		/* serve the interrupt */
> 		eal_intr_handle_interrupts(pfd, numfds);
>@@ -877,6 +906,9 @@ rte_eal_intr_init(void)
> 	int ret = 0, ret_1 = 0;
> 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
>
>+	/* connect to monitor device uevent  */
>+	rte_uevent_connect();
>+
> 	/* init the global interrupt source head */
> 	TAILQ_INIT(&intr_sources);
>
>@@ -1255,3 +1287,113 @@ rte_intr_cap_multiple(struct rte_intr_handle *intr_handle)
>
> 	return 0;
> }
>+
>+int
>+rte_uevent_connect(void)
>+{
>+	struct sockaddr_nl addr;
>+	int ret;
>+	int netlink_fd;
>+	int size = 64 * 1024;
>+	int nonblock = 1;
>+	memset(&addr, 0, sizeof(addr));
>+	addr.nl_family = AF_NETLINK;
>+	addr.nl_pid = getpid();
>+	addr.nl_groups = 0xffffffff;
>+
>+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
>+	if (netlink_fd < 0)
>+		return -1;
>+
>+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
>+
>+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
>+	if (ret != 0) {
>+		RTE_LOG(ERR, EAL,
>+		"ioctl(FIONBIO) failed\n");
>+		close(netlink_fd);
>+		return -1;
>+	}
>+
>+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
>+		close(netlink_fd);
>+		return -1;
>+	}
>+
>+	hotplug_fd = netlink_fd;
>+
>+	return netlink_fd;
>+}
>+
>+static int
>+parse_event(const char *buf, struct rte_uevent *event)
>+{
>+	char action[RTE_UEVENT_MSG_LEN];
>+	char subsystem[RTE_UEVENT_MSG_LEN];
>+	char dev_path[RTE_UEVENT_MSG_LEN];
>+
>+	memset(action, 0, RTE_UEVENT_MSG_LEN);
>+	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
>+	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
>+
>+	while (*buf) {
>+		if (!strncmp(buf, "ACTION=", 7)) {
>+			buf += 7;
>+			snprintf(action, sizeof(action), "%s", buf);
>+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
>+			buf += 8;
>+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
>+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>+			buf += 10;
>+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
>+		}
>+		while (*buf++)
>+			;
>+	}
>+
>+	if (!strncmp(subsystem, "uio", 3)) {
>+
>+		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
>+		if (!strncmp(action, "add", 3)) {
>+			event->action = RTE_UEVENT_ADD;
>+		}
>+		if (!strncmp(action, "remove", 6)) {
>+			event->action = RTE_UEVENT_REMOVE;
>+		}
>+		return 1;
>+	}
>+
>+	return -1;
>+}
>+
>+int
>+rte_get_uevent(int fd, struct rte_uevent *uevent)
>+{
>+	int ret;
>+	char buf[RTE_UEVENT_MSG_LEN];
>+
>+	memset(uevent, 0, sizeof(struct rte_uevent));
>+	memset(buf, 0, RTE_UEVENT_MSG_LEN);
>+
>+	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
>+	if (ret > 0) {
>+		return parse_event(buf, uevent);
>+	}
>+
>+	if (ret < 0) {
>+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
>+			return 0;
>+		} else {
>+			RTE_LOG(ERR, EAL,
>+			"Socket read error(%d): %s\n",
>+			errno, strerror(errno));
>+		}
>+	}
>+
>+	/* connection closed */
>+	if (ret == 0) {
>+		return -1;
>+	}
>+
>+	return 0;
>+}
>diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>index 6daffeb..d32ba01 100644
>--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>@@ -99,6 +99,16 @@ struct rte_intr_handle {
> 	int *intr_vec;                 /**< intr vector number array */
> };
>
>+enum rte_uevent_action {
>+	RTE_UEVENT_ADD = 0,		/**< uevent type of device add */
>+	RTE_UEVENT_REMOVE = 1,	/**< uevent type of device remove*/
>+};
>+
>+struct rte_uevent {
>+	enum rte_uevent_action action;	/**< uevent action type */
>+	int subsystem;				/**< subsystem id */
>+};
>+
> #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
>
> /**
>@@ -236,4 +246,26 @@ rte_intr_allow_others(struct rte_intr_handle *intr_handle);
> int
> rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
>
>+/**
>+ * It read out the uevent from the specific file descriptor.
>+ *
>+ * @param fd
>+ *   The fd which the uevent  associated to
>+ * @param uevent
>+ *   Pointer to the uevent which read from the monitoring fd.
>+ * @return
>+ *   - On success, one.
>+ *   - On failure, zeor or a negative value.
>+ */
>+int
>+rte_get_uevent(int fd, struct rte_uevent *uevent);
>+
>+/**
>+ * Connect to the device uevent file descriptor.
>+ * @return
>+ *   return the connected uevent fd.
>+ */
>+int
>+rte_uevent_connect(void);
>+
> #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
>-- 
>2.7.4
>
  
Jingjing Wu June 7, 2017, 7:27 a.m. UTC | #2
> -----Original Message-----

> From: Guo, Jia

> Sent: Sunday, May 28, 2017 11:45 PM

> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Richardson,

> Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;

> Liu, Yuanhan <yuanhan.liu@intel.com>; gaetan.rivet@6wind.com

> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>

> Subject: [dpdk-dev] [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e

> driver.

> 

> For HW hotplug feature, we had already got upstream that removal event adding from 6wind

> as bellow.

> 

> dependency of “add device removal event” patch set:

> http://dpdk.org/dev/patchwork/patch/23693/

> [dpdk-dev,v2,1/5] ethdev: introduce device removal event

> http://dpdk.org/dev/patchwork/patch/23694/

> [dpdk-dev,v2,2/5] net/mlx4: device removal event support

> http://dpdk.org/dev/patchwork/patch/23695/

> [dpdk-dev,v2,3/5] app/testpmd: generic event handler

> http://dpdk.org/dev/patchwork/patch/23696/

> [dpdk-dev,v2,4/5] app/testpmd: request link status interrupt

> http://dpdk.org/dev/patchwork/patch/23697/

> [dpdk-dev,v2,5/5] app/testpmd: request device removal interrupt

> 

> From the patches, we can see a new event type “RTE_ETH_DEV_INTR_RMV” has been added

> into the ethdev, and the event has been implemented in mlx4 driver, and Testpmd be use for

> testing purposes and as a practical usage example for how to use these event. The progress is

> use the mlx4 driver to register interrupt callback function to rte eal interrupt source, and

> when rte epolling detect the IBV_EVENT_DEVICE_FATAL , which is identify the device remove

> behavior, it will callback into the driver’s interrupt handle to handle it, and then callback to

> the user app, such as testpmd, to detach the pci device.

> 

> So far, except the mlx4 driver, other driver like i40, that not have the remove interrupt from

> hw, will not be able to monitoring the remove behavior, so in order to expand the hot plug

> feature for all driver cases, something must be done ot detect the remove event at the kernel

> level and offer a new line of interrupt to the userland. The idea is coming as below.

> 

> Use I40e as example, we know that the rmv interrupt is not added in hw, but we could

> monitor the uio file descriptor to catch the device remove event as bellow.

> 

> The info of uevent form FD monitoring:

> remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio

> /uio2

> ACTION=remove

> DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/ui

> o/uio2

> SUBSYSTEM=uio

> MAJOR=243

> MINOR=2

> DEVNAME=uio2

> SEQNUM=11366

> 

> Firstly, in order to monitor the uio file descriptor, we need to create socket to monitor the uio

> fd, that is defined as “hotplug_fd”, and then add the uio fd into the epoll fd list, rte eal could

> epoll all of the interrupt event from hw interrupt and also include the uevent from

> hotplug_fd.

> 

> Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte

> layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler

> could use these API to enable the uevent monitoring, and read out the uevent type , then

> corresponding to handle these uevent, such as detach the device when get the remove type.

> 

> Signed-off-by: Jeff Guo <jia.guo@intel.com>

> ---

>  drivers/net/i40e/i40e_ethdev.c                     |  15 +++

>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 146 ++++++++++++++++++++-

>  .../linuxapp/eal/include/exec-env/rte_interrupts.h |  32 +++++

>  3 files changed, 191 insertions(+), 2 deletions(-)

>

It will be better to split the patch to two sub patches, one is for eal change, the other is for driver enabling.

> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c

> index 4c49673..0336f82 100644

> --- a/drivers/net/i40e/i40e_ethdev.c

> +++ b/drivers/net/i40e/i40e_ethdev.c

> @@ -66,6 +66,8 @@

>  #include "i40e_pf.h"

>  #include "i40e_regs.h"

> 

> +extern int hotplug_fd;

> +

>  #define ETH_I40E_FLOATING_VEB_ARG	"enable_floating_veb"

>  #define ETH_I40E_FLOATING_VEB_LIST_ARG	"floating_veb_list"

> 

> @@ -5808,8 +5810,21 @@ i40e_dev_interrupt_handler(void *param)

>  {

>  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;

>  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);

> +	struct rte_uevent event;

>  	uint32_t icr0;

> 

> +	/* check device  uevent */

> +	while (rte_get_uevent(hotplug_fd, &event) > 0) {

The hotplug_fd is used as uevent fd for all devices, right? but in i40e driver, how distinguish the event is to this dev?
I saw you check the src in eal_intr_process_interrupts, but you didn't assign the fd in i40e device's intr_handle.
Is that to say all driver's callback will be triggered?

> +		if (event.subsystem == 1) {

What is the 1 meaning? 
> +			if (event.action == RTE_UEVENT_ADD) {

> +				//do nothing here

Will you plan do anything later? Such as RTE_ETH_EVENT_INTR_NEW? If no, please remove it.
And {} can be omit.

> +			} else if (event.action == RTE_UEVENT_REMOVE) {

> +				_rte_eth_dev_callback_process(dev,

> +					RTE_ETH_EVENT_INTR_RMV, NULL);

> +			}

> +		}



> +

> +	/* connection closed */

> +	if (ret == 0) {

> +		return -1;

> +	}

{} can be omit. Serval in this patch. Please check.


> +/**

> + * It read out the uevent from the specific file descriptor.

> + *

> + * @param fd

> + *   The fd which the uevent  associated to

> + * @param uevent

> + *   Pointer to the uevent which read from the monitoring fd.

> + * @return

> + *   - On success, one.

> + *   - On failure, zeor or a negative value.

Zeor -> zero
Generally speaking, we are using negative value to indicate failure, and 0 indicate success. Expect the result has more than two options (success, failure).

> +int

> +rte_get_uevent(int fd, struct rte_uevent *uevent);

> +

> +/**

> + * Connect to the device uevent file descriptor.

> + * @return

> + *   return the connected uevent fd.

> + */

Any return code for failure?

Thanks
Jingjing
  
Jingjing Wu June 7, 2017, 7:40 a.m. UTC | #3
> >

> >Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte

> layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler

> could use these API to enable the uevent monitoring, and read out the uevent type , then

> corresponding to handle these uevent, such as detach the device when get the remove type.

> >

> 

> I find having a generic uevent API interesting.

> 

> However, all specifics pertaining to UIO use (hotplug_fd, subsystem

> enum) should stay in UIO specific code (eal_pci_uio.c?).

>

Yes, but it can be also considered as interrupt mechanism, right?

> I am currently moving the PCI bus out of the EAL. EAL subsystems should

> not rely on PCI specifics, as they won't be available afterward.


Will the interrupt handling be kept in EAL, right?

> It should also allow you to clean up your API. Exposing hotplug_fd and

> requiring PMDs to link it can be avoided and should result in a simpler

> API.

Didn't get the idea. Why it will result in a simpler API? Is there any patch help
Me to understand?

Thanks
Jingjing
  
Gaëtan Rivet June 15, 2017, 9:22 p.m. UTC | #4
Hi Jingjing,

On Wed, Jun 07, 2017 at 07:40:37AM +0000, Wu, Jingjing wrote:
> > >
> > >Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte
> > layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
> > could use these API to enable the uevent monitoring, and read out the uevent type , then
> > corresponding to handle these uevent, such as detach the device when get the remove type.
> > >
> > 
> > I find having a generic uevent API interesting.
> > 
> > However, all specifics pertaining to UIO use (hotplug_fd, subsystem
> > enum) should stay in UIO specific code (eal_pci_uio.c?).
> >
> Yes, but it can be also considered as interrupt mechanism, right?
> 

Sure.

> > I am currently moving the PCI bus out of the EAL. EAL subsystems should
> > not rely on PCI specifics, as they won't be available afterward.
> 
> Will the interrupt handling be kept in EAL, right?
> 

Ah yes, I was actually mistaken and thought more UIO parts would be
moving.

> > It should also allow you to clean up your API. Exposing hotplug_fd and
> > requiring PMDs to link it can be avoided and should result in a simpler
> > API.
> Didn't get the idea. Why it will result in a simpler API? Is there any patch help
> Me to understand?
> 

How do you demux the hotplug_fd for several drivers / device?
  
Guo, Jia June 21, 2017, 2:50 a.m. UTC | #5
hi,gaetan


On 6/16/2017 5:22 AM, Gaëtan Rivet wrote:
> Hi Jingjing,
>
> On Wed, Jun 07, 2017 at 07:40:37AM +0000, Wu, Jingjing wrote:
>>>> Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte
>>> layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
>>> could use these API to enable the uevent monitoring, and read out the uevent type , then
>>> corresponding to handle these uevent, such as detach the device when get the remove type.
>>> I find having a generic uevent API interesting.
>>>
>>> However, all specifics pertaining to UIO use (hotplug_fd, subsystem
>>> enum) should stay in UIO specific code (eal_pci_uio.c?).
>>>
>> Yes, but it can be also considered as interrupt mechanism, right?
>>
> Sure.
>
>>> I am currently moving the PCI bus out of the EAL. EAL subsystems should
>>> not rely on PCI specifics, as they won't be available afterward.
>> Will the interrupt handling be kept in EAL, right?
>>
> Ah yes, I was actually mistaken and thought more UIO parts would be
> moving.
so , i assumption that the interrupt handling still be kept in EAL, so 
that would not affect my adding the uevent in the eal interrupt part, 
right? so if it have any other dependency, please shout to let me know.
>>> It should also allow you to clean up your API. Exposing hotplug_fd and
>>> requiring PMDs to link it can be avoided and should result in a simpler
>>> API.
>> Didn't get the idea. Why it will result in a simpler API? Is there any patch help
>> Me to understand?
>>
> How do you demux the hotplug_fd for several drivers / device?
it is related with the dual port/device/driver problem, i think what 
should be some mapping of the dev_path there,  i will refine the part to 
handle it  in v2.  Thanks.
  
Stephen Hemminger June 29, 2017, 5:27 p.m. UTC | #6
On Wed, 7 Jun 2017 07:40:37 +0000
"Wu, Jingjing" <jingjing.wu@intel.com> wrote:

> > >
> > >Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte  
> > layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
> > could use these API to enable the uevent monitoring, and read out the uevent type , then
> > corresponding to handle these uevent, such as detach the device when get the remove type.  
> > >  
> > 
> > I find having a generic uevent API interesting.
> > 
> > However, all specifics pertaining to UIO use (hotplug_fd, subsystem
> > enum) should stay in UIO specific code (eal_pci_uio.c?).
> >  
> Yes, but it can be also considered as interrupt mechanism, right?
> 
> > I am currently moving the PCI bus out of the EAL. EAL subsystems should
> > not rely on PCI specifics, as they won't be available afterward.  
> 
> Will the interrupt handling be kept in EAL, right?
> 
> > It should also allow you to clean up your API. Exposing hotplug_fd and
> > requiring PMDs to link it can be avoided and should result in a simpler
> > API.  

You were right given the current model this is the correct way to do it.
It would be good if the interrupt model stuff could be moved back into EAL
so that if device is removed, no code in driver needs to be added.

All the bus -> device -> interrupt state is visible, and EAL should
be able to unwind from there.  Thinking more of the Linux model where
there is no need  (in general) for hot plug specific code in each driver.
  
Jingjing Wu June 30, 2017, 3:36 a.m. UTC | #7
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, June 30, 2017 1:28 AM
> To: Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Gaëtan Rivet <gaetan.rivet@6wind.com>; Guo, Jia <jia.guo@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Liu, Yuanhan <yuanhan.liu@intel.com>;
> dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] Add hot plug event in rte eal interrupt and
> inplement it in i40e driver.
> 
> On Wed, 7 Jun 2017 07:40:37 +0000
> "Wu, Jingjing" <jingjing.wu@intel.com> wrote:
> 
> > > >
> > > >Secondly, in order to read out the uevent that monitoring, we need
> > > >to add uevent API in rte
> > > layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All
> > > driver interrupt handler could use these API to enable the uevent
> > > monitoring, and read out the uevent type , then corresponding to handle
> these uevent, such as detach the device when get the remove type.
> > > >
> > >
> > > I find having a generic uevent API interesting.
> > >
> > > However, all specifics pertaining to UIO use (hotplug_fd, subsystem
> > > enum) should stay in UIO specific code (eal_pci_uio.c?).
> > >
> > Yes, but it can be also considered as interrupt mechanism, right?
> >
> > > I am currently moving the PCI bus out of the EAL. EAL subsystems
> > > should not rely on PCI specifics, as they won't be available afterward.
> >
> > Will the interrupt handling be kept in EAL, right?
> >
> > > It should also allow you to clean up your API. Exposing hotplug_fd
> > > and requiring PMDs to link it can be avoided and should result in a
> > > simpler API.
> 
> You were right given the current model this is the correct way to do it.
Does it mean this way in this RFC is reasonable given the current model?

> It would be good if the interrupt model stuff could be moved back into EAL so
> that if device is removed, no code in driver needs to be added.
> 
At list we still need expose some interrupt/event by drivers. Such as LSC, Rx interrupt.

> All the bus -> device -> interrupt state is visible, and EAL should be able to
> unwind from there.  Thinking more of the Linux model where there is no need
> (in general) for hot plug specific code in each driver.

Yes, such simpler API is what I like to see too. But now, the remove event report
by this way is more economical.


Thanks
Jingjing
  

Patch

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4c49673..0336f82 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -66,6 +66,8 @@ 
 #include "i40e_pf.h"
 #include "i40e_regs.h"
 
+extern int hotplug_fd;
+
 #define ETH_I40E_FLOATING_VEB_ARG	"enable_floating_veb"
 #define ETH_I40E_FLOATING_VEB_LIST_ARG	"floating_veb_list"
 
@@ -5808,8 +5810,21 @@  i40e_dev_interrupt_handler(void *param)
 {
 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct rte_uevent event;
 	uint32_t icr0;
 
+	/* check device  uevent */
+	while (rte_get_uevent(hotplug_fd, &event) > 0) {
+		if (event.subsystem == 1) {
+			if (event.action == RTE_UEVENT_ADD) {
+				//do nothing here
+			} else if (event.action == RTE_UEVENT_REMOVE) {
+				_rte_eth_dev_callback_process(dev,
+					RTE_ETH_EVENT_INTR_RMV, NULL);
+			}
+		}
+	}
+
 	/* Disable interrupt */
 	i40e_pf_disable_irq0(hw);
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 2e3bd12..873ab5f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -65,6 +65,10 @@ 
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+
 #include "eal_private.h"
 #include "eal_vfio.h"
 #include "eal_thread.h"
@@ -74,6 +78,11 @@ 
 
 static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
 
+#define RTE_UEVENT_MSG_LEN 4096
+#define RTE_UEVENT_SUBSYSTEM_UIO 1
+
+int hotplug_fd = -1;
+
 /**
  * union for pipe fds.
  */
@@ -669,10 +678,13 @@  eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			RTE_SET_USED(r);
 			return -1;
 		}
+
 		rte_spinlock_lock(&intr_lock);
 		TAILQ_FOREACH(src, &intr_sources, next)
-			if (src->intr_handle.fd ==
-					events[n].data.fd)
+			if ((src->intr_handle.fd ==
+					events[n].data.fd) ||
+				(hotplug_fd ==
+					events[n].data.fd))
 				break;
 		if (src == NULL){
 			rte_spinlock_unlock(&intr_lock);
@@ -858,7 +870,24 @@  eal_intr_thread_main(__rte_unused void *arg)
 			}
 			else
 				numfds++;
+
+			/**
+			 * add device uevent file descriptor
+			 * into wait list for hot plug.
+			 */
+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+			ev.data.fd = hotplug_fd;
+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
+					hotplug_fd, &ev) < 0){
+				rte_panic("Error adding hotplug_fd %d epoll_ctl, %s\n",
+					hotplug_fd, strerror(errno));
+			}
+			else
+				numfds++;
+
 		}
+
+
 		rte_spinlock_unlock(&intr_lock);
 		/* serve the interrupt */
 		eal_intr_handle_interrupts(pfd, numfds);
@@ -877,6 +906,9 @@  rte_eal_intr_init(void)
 	int ret = 0, ret_1 = 0;
 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
 
+	/* connect to monitor device uevent  */
+	rte_uevent_connect();
+
 	/* init the global interrupt source head */
 	TAILQ_INIT(&intr_sources);
 
@@ -1255,3 +1287,113 @@  rte_intr_cap_multiple(struct rte_intr_handle *intr_handle)
 
 	return 0;
 }
+
+int
+rte_uevent_connect(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int netlink_fd;
+	int size = 64 * 1024;
+	int nonblock = 1;
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = getpid();
+	addr.nl_groups = 0xffffffff;
+
+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
+	if (netlink_fd < 0)
+		return -1;
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL,
+		"ioctl(FIONBIO) failed\n");
+		close(netlink_fd);
+		return -1;
+	}
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		close(netlink_fd);
+		return -1;
+	}
+
+	hotplug_fd = netlink_fd;
+
+	return netlink_fd;
+}
+
+static int
+parse_event(const char *buf, struct rte_uevent *event)
+{
+	char action[RTE_UEVENT_MSG_LEN];
+	char subsystem[RTE_UEVENT_MSG_LEN];
+	char dev_path[RTE_UEVENT_MSG_LEN];
+
+	memset(action, 0, RTE_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
+
+	while (*buf) {
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		}
+		while (*buf++)
+			;
+	}
+
+	if (!strncmp(subsystem, "uio", 3)) {
+
+		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3)) {
+			event->action = RTE_UEVENT_ADD;
+		}
+		if (!strncmp(action, "remove", 6)) {
+			event->action = RTE_UEVENT_REMOVE;
+		}
+		return 1;
+	}
+
+	return -1;
+}
+
+int
+rte_get_uevent(int fd, struct rte_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_uevent));
+	memset(buf, 0, RTE_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret > 0) {
+		return parse_event(buf, uevent);
+	}
+
+	if (ret < 0) {
+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
+			return 0;
+		} else {
+			RTE_LOG(ERR, EAL,
+			"Socket read error(%d): %s\n",
+			errno, strerror(errno));
+		}
+	}
+
+	/* connection closed */
+	if (ret == 0) {
+		return -1;
+	}
+
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6daffeb..d32ba01 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -99,6 +99,16 @@  struct rte_intr_handle {
 	int *intr_vec;                 /**< intr vector number array */
 };
 
+enum rte_uevent_action {
+	RTE_UEVENT_ADD = 0,		/**< uevent type of device add */
+	RTE_UEVENT_REMOVE = 1,	/**< uevent type of device remove*/
+};
+
+struct rte_uevent {
+	enum rte_uevent_action action;	/**< uevent action type */
+	int subsystem;				/**< subsystem id */
+};
+
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
 
 /**
@@ -236,4 +246,26 @@  rte_intr_allow_others(struct rte_intr_handle *intr_handle);
 int
 rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
 
+/**
+ * It read out the uevent from the specific file descriptor.
+ *
+ * @param fd
+ *   The fd which the uevent  associated to
+ * @param uevent
+ *   Pointer to the uevent which read from the monitoring fd.
+ * @return
+ *   - On success, one.
+ *   - On failure, zeor or a negative value.
+ */
+int
+rte_get_uevent(int fd, struct rte_uevent *uevent);
+
+/**
+ * Connect to the device uevent file descriptor.
+ * @return
+ *   return the connected uevent fd.
+ */
+int
+rte_uevent_connect(void);
+
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */