[dpdk-dev,v2,2/2] net/i40e: add hot plug monitor in i40e

Message ID 1498648044-57541-2-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/Intel-compilation fail Compilation issues
ci/checkpatch success coding style OK

Commit Message

Guo, Jia June 28, 2017, 11:07 a.m. UTC
From: "Guo, Jia" <jia.guo@intel.com>

This patch enable the hot plug feature in i40e, by monitoring the
hot plug uevent of the device. When remove event got, call the app
callback function to handle the detach process.

Signed-off-by: Guo, Jia <jia.guo@intel.com>
---
v2->v1: remove unused part for current stage.
---
 drivers/net/i40e/i40e_ethdev.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
  

Comments

Jingjing Wu June 29, 2017, 1:41 a.m. UTC | #1
> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, June 28, 2017 7:07 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>
> Subject: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the hot plug
> uevent of the device. When remove event got, call the app callback function to
> handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>
> ---
> v2->v1: remove unused part for current stage.
> ---
>  drivers/net/i40e/i40e_ethdev.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 4ee1113..122187e 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw
> *hw)
> 
>  	/* enable uio intr after callback register */
>  	rte_intr_enable(intr_handle);
> +
>  	/*
>  	 * Add an ethertype filter to drop all flow control frames transmitted
>  	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
> @@ -5832,11 +5833,28 @@ struct i40e_vsi *  {
>  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> +	struct rte_uevent event;
>  	uint32_t icr0;
> +	struct rte_pci_device *pci_dev;
> +	struct rte_intr_handle *intr_handle;
> +
> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	intr_handle = &pci_dev->intr_handle;
> 
>  	/* Disable interrupt */
>  	i40e_pf_disable_irq0(hw);
> 
> +	/* check device uevent */
> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) > 0) {

You declare the rte_uevnet_get like

+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent);


But here you check if it > 0?

> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
> +			if (event.action == RTE_UEVENT_REMOVE) {
> +				_rte_eth_dev_callback_process(dev,
> +					RTE_ETH_EVENT_INTR_RMV, NULL);
> +			}
> +		}
> +		goto done;

I think when the remove happen, no need to goto done, you can just return.
> +	}
> +
>  	/* read out interrupt causes */
>  	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
> 
> --
> 1.8.3.1
  
Stephen Hemminger June 29, 2017, 3:34 a.m. UTC | #2
On Wed, 28 Jun 2017 19:07:24 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the
> hot plug uevent of the device. When remove event got, call the app
> callback function to handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>
> ---

Hot plug is good and needed.

But it needs to be done in a generic fashion in the bus layer.
There is nothing about uevents that are unique to i40e or even Intel
devices. Plus the way hotplug is handled is OS specific, so this isn't going
to work well on BSD.

Sorry if I sound like a broken record but there has been a repeated pattern
of Intel developers  putting their head down (or in the sand) and creating
functionality inside device driver.
  
Guo, Jia June 29, 2017, 4:31 a.m. UTC | #3
Yes, if got remove uevent might be directly return to avoid invalid i/o. but if got other uevent such as add and change, must be go done to keep the interrupt process in device. I will refine this part, thanks. 

Best regards,
Jeff Guo


-----Original Message-----
From: Wu, Jingjing 
Sent: Thursday, June 29, 2017 9:42 AM
To: Guo, Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
Cc: dev@dpdk.org
Subject: RE: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e



> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, June 28, 2017 7:07 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing 
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>
> Subject: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the hot 
> plug uevent of the device. When remove event got, call the app 
> callback function to handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>
> ---
> v2->v1: remove unused part for current stage.
> ---
>  drivers/net/i40e/i40e_ethdev.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c 
> b/drivers/net/i40e/i40e_ethdev.c index 4ee1113..122187e 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct 
> i40e_hw
> *hw)
> 
>  	/* enable uio intr after callback register */
>  	rte_intr_enable(intr_handle);
> +
>  	/*
>  	 * Add an ethertype filter to drop all flow control frames transmitted
>  	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC 
> @@ -5832,11 +5833,28 @@ struct i40e_vsi *  {
>  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> +	struct rte_uevent event;
>  	uint32_t icr0;
> +	struct rte_pci_device *pci_dev;
> +	struct rte_intr_handle *intr_handle;
> +
> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	intr_handle = &pci_dev->intr_handle;
> 
>  	/* Disable interrupt */
>  	i40e_pf_disable_irq0(hw);
> 
> +	/* check device uevent */
> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) > 0) {

You declare the rte_uevnet_get like

+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent);


But here you check if it > 0?

> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
> +			if (event.action == RTE_UEVENT_REMOVE) {
> +				_rte_eth_dev_callback_process(dev,
> +					RTE_ETH_EVENT_INTR_RMV, NULL);
> +			}
> +		}
> +		goto done;

I think when the remove happen, no need to goto done, you can just return.
> +	}
> +
>  	/* read out interrupt causes */
>  	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
> 
> --
> 1.8.3.1
  
Guo, Jia June 29, 2017, 4:37 a.m. UTC | #4
From: "Guo, Jia" <jia.guo@intel.com>

This patch set aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”. The patch use i40e
for example, the driver could use these API to monitor and read
out the uevent, then corresponding to handle these uevent,
such as detach or attach the device.

Guo, Jia (2):
  eal: add uevent api for hot plug
  net/i40e: add hot plug monitor in i40e

 drivers/net/i40e/i40e_ethdev.c                     |  19 +++
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 136 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |   6 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  37 ++++++
 5 files changed, 201 insertions(+), 3 deletions(-)
  
Jingjing Wu June 29, 2017, 4:48 a.m. UTC | #5
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, June 29, 2017 11:35 AM
> To: Guo, Jia <jia.guo@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
> 
> On Wed, 28 Jun 2017 19:07:24 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
> 
> > From: "Guo, Jia" <jia.guo@intel.com>
> >
> > This patch enable the hot plug feature in i40e, by monitoring the hot
> > plug uevent of the device. When remove event got, call the app
> > callback function to handle the detach process.
> >
> > Signed-off-by: Guo, Jia <jia.guo@intel.com>
> > ---
> 
> Hot plug is good and needed.
> 
> But it needs to be done in a generic fashion in the bus layer.
> There is nothing about uevents that are unique to i40e or even Intel devices.
> Plus the way hotplug is handled is OS specific, so this isn't going to work well on
> BSD.
> 
This patch is not a way to full support hut plug. And we know it is handled in OS specific.
This patch just provides a way to tell DPDK user the remove happened on this device (DPDK dev).

And Mlx driver already supports that with patch
http://dpdk.org/dev/patchwork/patch/23695/

What GuoJia did is just making the EVENT can be process by application through interrupt callback
Mechanisms.

> Sorry if I sound like a broken record but there has been a repeated pattern of
> Intel developers  putting their head down (or in the sand) and creating
> functionality inside device driver.
Sorry, I cannot agree.

Thanks
Jingjing
  
Guo, Jia June 29, 2017, 5:01 a.m. UTC | #6
From: "Guo, Jia" <jia.guo@intel.com>

This patch set aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”. The patch use i40e
for example, the driver could use these API to monitor and read
out the uevent, then corresponding to handle these uevent,
such as detach or attach the device.

Guo, Jia (2):
  eal: add uevent api for hot plug
  net/i40e: add hot plug monitor in i40e

 drivers/net/i40e/i40e_ethdev.c                     |  19 +++
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 136 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |   6 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  37 ++++++
 5 files changed, 201 insertions(+), 3 deletions(-)
  
Guo, Jia June 29, 2017, 7:47 a.m. UTC | #7
Agree with jingjing.

That patch is definitely not for generic fashion of hot plug,  the uevent just give the adding  approach to monitor the remove event even if the driver not add it as interrupt , we know mlx driver have already implement the event of remove interrupt into their infinite framework driver, but other driver maybe not yet.
So uevent is not unique for i40e or other intel nic, the aim just let more diversity drivers which use pci-uio framework  to use the common hot plug feature in DPDK.

Best regards,
Jeff Guo


-----Original Message-----
From: Wu, Jingjing 
Sent: Thursday, June 29, 2017 12:48 PM
To: Stephen Hemminger <stephen@networkplumber.org>; Guo, Jia <jia.guo@intel.com>
Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Chang, Cunyin <cunyin.chang@intel.com>; Liang, Cunming <cunming.liang@intel.com>
Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, June 29, 2017 11:35 AM
> To: Guo, Jia <jia.guo@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing 
> <jingjing.wu@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add hot plug monitor 
> in i40e
> 
> On Wed, 28 Jun 2017 19:07:24 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
> 
> > From: "Guo, Jia" <jia.guo@intel.com>
> >
> > This patch enable the hot plug feature in i40e, by monitoring the 
> > hot plug uevent of the device. When remove event got, call the app 
> > callback function to handle the detach process.
> >
> > Signed-off-by: Guo, Jia <jia.guo@intel.com>
> > ---
> 
> Hot plug is good and needed.
> 
> But it needs to be done in a generic fashion in the bus layer.
> There is nothing about uevents that are unique to i40e or even Intel devices.
> Plus the way hotplug is handled is OS specific, so this isn't going to 
> work well on BSD.
> 
This patch is not a way to full support hut plug. And we know it is handled in OS specific.
This patch just provides a way to tell DPDK user the remove happened on this device (DPDK dev).

And Mlx driver already supports that with patch http://dpdk.org/dev/patchwork/patch/23695/

What GuoJia did is just making the EVENT can be process by application through interrupt callback Mechanisms.

> Sorry if I sound like a broken record but there has been a repeated 
> pattern of Intel developers  putting their head down (or in the sand) 
> and creating functionality inside device driver.
Sorry, I cannot agree.

Thanks
Jingjing
  
Guo, Jia April 13, 2018, 8:30 a.m. UTC | #8
About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring like below.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add device event monitor framework:
add several general api to enable uevent monitoring.

c.show example how to use uevent monitor
enable uevent monitoring in testpmd to show device event monitor machenism usage.

TODO: failure handler mechanism for hot plug and driver auto bind for hot insertion.
that would let the next hot plug patch set to cover.

patchset history:
v22->v21:
fix clang compile issue and doc style

v21->v20:
refine release note and some code cleaning.

v20->v19:
add more detail note and socket error handler.

v19->18:
fix some typo and misunderstanding part

v18->v17:
1.add feature announcement in release document, fix bsp compile issue.
2.refine socket configuration.
3.remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.

v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 101 +++++++++-
 app/test-pmd/testpmd.h                             |   2 +
 doc/guides/rel_notes/release_18_05.rst             |  12 ++
 doc/guides/testpmd_app_ug/run_app.rst              |   4 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  21 ++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 161 +++++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 +++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 223 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |   4 +
 test/test/test_interrupts.c                        |  39 +++-
 18 files changed, 692 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
  
Thomas Monjalon April 13, 2018, 10:03 a.m. UTC | #9
13/04/2018 10:30, Jeff Guo:
> About hot plug in dpdk, We already have proactive way to add/remove devices
> through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
> to offload the fail-safe work from the app user. But there are still lack
> of a general mechanism to monitor hotplug event for all driver, now the
> hotplug interrupt event is diversity between each device and driver, such
> as mlx4, pci driver and others.
> 
> Use the hot removal event for example, pci drivers not all exposure the
> remove interrupt, so in order to make user to easy use the hot plug
> feature for pci driver, something must be done to detect the remove event
> at the kernel level and offer a new line of interrupt to the user land.
> 
> Base on the uevent of kobject mechanism in kernel, we could use it to
> benefit for monitoring the hot plug status of the device which not only
> uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.
[...]
> Jeff Guo (4):
>   eal: add device event handle in interrupt thread
>   eal: add device event monitor framework
>   eal/linux: uevent parse and process
>   app/testpmd: enable device hotplug monitoring

Applied, thanks
  
Guo, Jia April 18, 2018, 1:38 p.m. UTC | #10
At the prior, device event monitor framework have been introduced, 
the typical usage of it is for device hot plug. If we want application
would not be break down when device hot plug in or out, we still need some
measures to do recovery to do preparation for device detach, so that we will
not encounter any memory fault after device be hot unplug, that will let
application to keep working.

This patch set will introduces an API to implement the recovery mechanism to 
handle hot plug, and also use testpmd to show example how to
use the API for process hot plug event, let the process could be
smoothly like below:

plug out->failure handle->stop forward->stop port->close port->detach port

with this mechanism, user such as fail-safe driver or testpmd could be able to
develop their own hot plug application.

patchset history:
v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding  

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set
"add device event monitor framework"

Jeff Guo (4):
  bus/pci: introduce device hot unplug handle
  eal: add failure handler mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handler hot unplug

 app/test-pmd/testpmd.c                  |  29 ++++++--
 doc/guides/rel_notes/release_18_05.rst  |   6 ++
 drivers/bus/pci/pci_common.c            |  67 +++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  32 +++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |   4 ++
 lib/librte_eal/common/include/rte_bus.h |  16 +++++
 lib/librte_eal/common/include/rte_dev.h |  11 +++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 124 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   1 +
 10 files changed, 297 insertions(+), 5 deletions(-)
  
Guo, Jia May 3, 2018, 8:57 a.m. UTC | #11
At the prior, device event monitor framework have been introduced,
the typical usage is for device hot plug. If we want application
would not be break down when device hot plug in or out, we still need
some measures to help app to handle that, such as recovery device for
device detaching, so that app can keep running smoothly but not be
disturbed by any hotplug behaviors.

This patch set will introduces an recovery mechanism to handle hot unplug,
and also use testpmd to show example of how to use this mechanism to process
hot plug event. The process could be shown as below:

plug out->failure handle->stop forward->stop port->close port->detach port

with this mechanism, user such as fail-safe driver or testpmd could be
able to develop their own hot plug application.

patchset history:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue
fix attach port issue for multiple devices case.  

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding  

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (4):
  bus/pci: handle device hot unplug
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug

 app/test-pmd/testpmd.c                  |  28 ++++--
 drivers/bus/pci/pci_common.c            |  65 ++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 kernel/linux/igb_uio/igb_uio.c          |   4 +
 lib/librte_eal/common/include/rte_bus.h |  16 ++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 154 +++++++++++++++++++++++++++++++-
 7 files changed, 306 insertions(+), 6 deletions(-)
  
Guo, Jia May 3, 2018, 10:48 a.m. UTC | #12
At the prior, device event monitor framework have been introduced,
the typical usage is for device hot plug. If we want application
would not be break down when device hot plug in or out, we still need
some measures to help app to handle that, such as recovery device for
device detaching, so that app can keep running smoothly but not be
disturbed by any hotplug behaviors.

This patch set will introduces an recovery mechanism to handle hot unplug,
and also use testpmd to show example of how to use this mechanism to process
hot plug event. The process could be shown as below:

plug out->failure handle->stop forward->stop port->close port->detach port

with this mechanism, user such as fail-safe driver or testpmd could be
able to develop their own hot plug application.

patchset history:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue
fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (4):
  bus/pci: handle device hot unplug
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug

 app/test-pmd/testpmd.c                  |  27 ++++--
 drivers/bus/pci/pci_common.c            |  65 ++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 kernel/linux/igb_uio/igb_uio.c          |   4 +
 lib/librte_eal/common/include/rte_bus.h |  16 ++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 154 +++++++++++++++++++++++++++++++-
 7 files changed, 301 insertions(+), 10 deletions(-)
  
Guo, Jia June 29, 2018, 10:30 a.m. UTC | #13
As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. At this stage, will only use testpmd as reference to show how to
use the mechanism.
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (9):
  bus: introduce hotplug failure handler
  bus/pci: implement hotplug handler operation
  bus: introduce sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c               | 20 ++++++--
 app/test-pmd/testpmd.c                  | 31 +++++++-----
 app/test-pmd/testpmd.h                  |  8 ++-
 doc/guides/testpmd_app_ug/run_app.rst   | 10 +++-
 drivers/bus/pci/pci_common.c            | 78 +++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++
 drivers/bus/pci/private.h               | 12 +++++
 kernel/linux/igb_uio/igb_uio.c          | 50 +++++++++++++++++--
 lib/librte_eal/common/eal_common_bus.c  | 34 ++++++++++++-
 lib/librte_eal/common/eal_private.h     | 11 +++++
 lib/librte_eal/common/include/rte_bus.h | 31 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 88 ++++++++++++++++++++++++++++++++-
 12 files changed, 381 insertions(+), 25 deletions(-)
  
Guo, Jia July 5, 2018, 8:21 a.m. UTC | #14
As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver 

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 ++++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  36 ++++++++++-
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  31 +++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 111 +++++++++++++++++++++++++++++++-
 8 files changed, 358 insertions(+), 5 deletions(-)
  
Guo, Jia July 9, 2018, 6:51 a.m. UTC | #15
As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 +++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 +++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  42 ++++++++++++
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  33 +++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 114 +++++++++++++++++++++++++++++++-
 8 files changed, 370 insertions(+), 4 deletions(-)
  
Guo, Jia July 9, 2018, noon UTC | #16
As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 ++++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  42 ++++++++++++
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  33 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 112 +++++++++++++++++++++++++++++++-
 8 files changed, 368 insertions(+), 4 deletions(-)
  
Guo, Jia Sept. 30, 2018, 11:29 a.m. UTC | #17
Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
add experimental tag.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  testpmd: use hot-unplug failure handle mechanism

 app/test-pmd/testpmd.c                  |  39 ++++++--
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 481 insertions(+), 9 deletions(-)
  
Stephen Hemminger Oct. 1, 2018, 9 a.m. UTC | #18
On Sun, 30 Sep 2018 19:29:56 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> Hotplug is an important feature for use-cases like the datacenter device's
> fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
> flexibility and continuality to networking services in multiple use-cases
> in the industry. So let's see how DPDK can help users implement hotplug
> solutions.
> 
> We already have a general device-event monitor mechanism, failsafe driver,
> and hot plug/unplug API in DPDK. We have already got the solution of
> “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
> got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
> need to considerate 2 different solutions between uio pci and vfio pci.
> 
> In the case of hotplug for igb_uio, when a hardware device be removed
> physically or disabled in software, the application needs to be notified
> and detach the device out of the bus, and then make the device invalidate.
> The problem is that, the removal of the device is not instantaneous in
> software. If the application data path tries to read/write to the device
> when removal is still in process, it will cause an MMIO error and
> application will crash.
> 
> In this patch set, we propose a PCIe bus failure handler mechanism for
> hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
> the application will not crash.
> 
> The mechanism should work as below:
> 
> First, the application enables the device event monitor, registers the
> hotplug event’s callback and enable hotplug handling before running the
> data path. Once the hot-unplug occurs, the mechanism will detect the
> removal event and then accordingly do the failure handling. In order to
> do that, the below functionality will be required:
>  - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
>  - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
>    it will be based on the failure address to remap memory for the corresponding
>    device that unplugged. For vfio pci, could seperate implement case by case.
> 
> For the data path or other unexpected behaviors from the control path
> when a hot unplug occurs:
>  - Add a new bus ops “sigbus_handler”, that is responsible for handling
>    the sigbus error which is either an original memory error, or a specific
>    memory error that is caused by a hot unplug. When a sigbus error is
>    captured, it will call this function to handle sigbus error.
>  - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
>    device on PCI bus to find which device encounter the failure.
>  - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
>    to handle the failure.
>  - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
>    “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
>    It will monitor the sigbus error by a handler which is per-process.
>    Based on the signal event principle, the control path thread and the
>    data path thread will randomly receive the sigbus error, but will call the
>    common sigbus process. When sigbus be captured, it will call the above API
>    to find bus to handle it.
> 
> The mechanism could be used by app or PMDs. For example, the whole process
> of hotplug in testpmd is:
>  - Enable device event monitor->Enable hotplug handle->Register event callback
>    ->attach port->start port->start forwarding->Device unplug->failure handle
>    ->stop forwarding->stop port->close port->detach port.  
> 
> This patch set would not cover hotplug insert and binding, and it is only
> implement the igb_uio failure handler, the vfio hotplug failure handler
> will be in next coming patch set.
> 
> patchset history:
> v11->v10:
> change the ops name, since both uio and vfio will use the hot-unplug ops.
> add experimental tag.
> since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
> RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
> move the igb_uio fixing part, since it is random issue and should be considarate
> as kernel driver defect but not include as this failure handler mechanism.
> 
> v10->v9:
> modify the api name and exposure out for public use.
> add hotplug handle enable/disable APIs
> refine commit log
> 
> v9->v8:
> refine commit log to be more readable.
> 
> v8->v7:
> refine errno process in sigbus handler.
> refine igb uio release process
> 
> v7->v6:
> delete some unused part
> 
> v6->v5:
> refine some description about bus ops
> refine commit log
> add some entry check.
> 
> v5->v4:
> split patches to focus on the failure handle, remove the event usage
> by testpmd to another patch.
> change the hotplug failure handler name.
> refine the sigbus handle logic.
> add lock for udev state in igb uio driver.
> 
> v4->v3:
> split patches to be small and clear.
> change to use new parameter "--hotplug-mode" in testpmd to identify
> the eal hotplug and ethdev hotplug.
> 
> v3->v2:
> change bus ops name to bus_hotplug_handler.
> add new API and bus ops of bus_signal_handler distingush handle generic.
> sigbus and hotplug sigbus.
> 
> v2->v1(v21):
> refine some doc and commit log.
> fix igb uio kernel issue for control path failure rebase testpmd code.
> 
> Since the hot plug solution be discussed serval around in the public,
> the scope be changed and the patch set be split into many times. Coming
> to the recently RFC and feature design, it just focus on the hot unplug
> failure handler at this patch set, so in order let this topic more clear
> and focus, summarize privours patch set in history “v1(v21)”, the v2 here
> go ahead for further track.
> 
> "v1(21)" == v21 as below:
> v21->v20:
> split function in hot unplug ops.
> sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
> combind rmv callback function to be only one.
> 
> v20->v19:
> clean the code.
> refine the remap logic for multiple device.
> remove the auto binding.
> 
> v19->18:
> note for limitation of multiple hotplug, fix some typo, sqeeze patch.
> 
> v18->v15:
> add document, add signal bus handler, refine the code to be more clear.
> 
> the prior patch history please check the patch set "add device event monitor framework".
> 
> Jeff Guo (7):
>   bus: add hot-unplug handler
>   bus/pci: implement hot-unplug handler ops
>   bus: add sigbus handler
>   bus/pci: implement sigbus handler ops
>   bus: add helper to handle sigbus
>   eal: add failure handle mechanism for hot-unplug
>   testpmd: use hot-unplug failure handle mechanism
> 
>  app/test-pmd/testpmd.c                  |  39 ++++++--
>  doc/guides/rel_notes/release_18_08.rst  |   5 +
>  drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
>  drivers/bus/pci/pci_common_uio.c        |  33 +++++++
>  drivers/bus/pci/private.h               |  12 +++
>  lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>  lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
>  lib/librte_eal/common/eal_private.h     |  39 ++++++++
>  lib/librte_eal/common/include/rte_bus.h |  34 +++++++
>  lib/librte_eal/common/include/rte_dev.h |  26 +++++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   2 +
>  12 files changed, 481 insertions(+), 9 deletions(-)
> 

I am glad to see this, hotplug is needed. But have a somewhat controversial
point of view. The DPDK project needs to do more to force users to go to
more modern kernels and API's; there has been too much effort already to
support new DPDK on older kernels and distributions. This leads to higher
testing burden, technical debt and multiple API's.

To take the extreme point of view.
	* igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
	* kni should be deprecated and replaced by virtio

When there are N ways of doing things against X kernel versions,
and Y distributions, and multiple device vendors; the combinational explosion of cases means
that interfaces don't get the depth of testing they deserve.

That means why not support hotplug on VFIO only?
  
Jerin Jacob Oct. 1, 2018, 9:55 a.m. UTC | #19
-----Original Message-----
> Date: Mon, 1 Oct 2018 11:00:12 +0200
> From: Stephen Hemminger <stephen@networkplumber.org>
> To: Jeff Guo <jia.guo@intel.com>
> Cc: bruce.richardson@intel.com, ferruh.yigit@intel.com,
>  konstantin.ananyev@intel.com, gaetan.rivet@6wind.com,
>  jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com,
>  matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com,
>  shaopeng.he@intel.com, bernard.iremonger@intel.com,
>  arybchenko@solarflare.com, wenzhuo.lu@intel.com,
>  anatoly.burakov@intel.com, jblunck@infradead.org, shreyansh.jain@nxp.com,
>  dev@dpdk.org, helin.zhang@intel.com
> Subject: Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism
> 
> 
> On Sun, 30 Sep 2018 19:29:56 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
> 
> > Hotplug is an important feature for use-cases like the datacenter device's
> > fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
> > flexibility and continuality to networking services in multiple use-cases
> > in the industry. So let's see how DPDK can help users implement hotplug
> > solutions.
> >
> > We already have a general device-event monitor mechanism, failsafe driver,
> > and hot plug/unplug API in DPDK. We have already got the solution of
> > “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
> > got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
> > need to considerate 2 different solutions between uio pci and vfio pci.
> >
> > In the case of hotplug for igb_uio, when a hardware device be removed
> > physically or disabled in software, the application needs to be notified
> > and detach the device out of the bus, and then make the device invalidate.
> > The problem is that, the removal of the device is not instantaneous in
> > software. If the application data path tries to read/write to the device
> > when removal is still in process, it will cause an MMIO error and
> > application will crash.
> >
> > In this patch set, we propose a PCIe bus failure handler mechanism for
> > hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
> > the application will not crash.
> >
> > The mechanism should work as below:
> >
> > First, the application enables the device event monitor, registers the
> > hotplug event’s callback and enable hotplug handling before running the
> > data path. Once the hot-unplug occurs, the mechanism will detect the
> > removal event and then accordingly do the failure handling. In order to
> > do that, the below functionality will be required:
> >  - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
> >  - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
> >    it will be based on the failure address to remap memory for the corresponding
> >    device that unplugged. For vfio pci, could seperate implement case by case.
> >
> > For the data path or other unexpected behaviors from the control path
> > when a hot unplug occurs:
> >  - Add a new bus ops “sigbus_handler”, that is responsible for handling
> >    the sigbus error which is either an original memory error, or a specific
> >    memory error that is caused by a hot unplug. When a sigbus error is
> >    captured, it will call this function to handle sigbus error.
> >  - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
> >    device on PCI bus to find which device encounter the failure.
> >  - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
> >    to handle the failure.
> >  - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
> >    “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
> >    It will monitor the sigbus error by a handler which is per-process.
> >    Based on the signal event principle, the control path thread and the
> >    data path thread will randomly receive the sigbus error, but will call the
> >    common sigbus process. When sigbus be captured, it will call the above API
> >    to find bus to handle it.
> >
> > The mechanism could be used by app or PMDs. For example, the whole process
> > of hotplug in testpmd is:
> >  - Enable device event monitor->Enable hotplug handle->Register event callback
> >    ->attach port->start port->start forwarding->Device unplug->failure handle
> >    ->stop forwarding->stop port->close port->detach port.
> >
> > This patch set would not cover hotplug insert and binding, and it is only
> > implement the igb_uio failure handler, the vfio hotplug failure handler
> > will be in next coming patch set.
> >
> > patchset history:
> > v11->v10:
> > change the ops name, since both uio and vfio will use the hot-unplug ops.
> > add experimental tag.
> > since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
> > RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
> > move the igb_uio fixing part, since it is random issue and should be considarate
> > as kernel driver defect but not include as this failure handler mechanism.
> >
> > v10->v9:
> > modify the api name and exposure out for public use.
> > add hotplug handle enable/disable APIs
> > refine commit log
> >
> > v9->v8:
> > refine commit log to be more readable.
> >
> > v8->v7:
> > refine errno process in sigbus handler.
> > refine igb uio release process
> >
> > v7->v6:
> > delete some unused part
> >
> > v6->v5:
> > refine some description about bus ops
> > refine commit log
> > add some entry check.
> >
> > v5->v4:
> > split patches to focus on the failure handle, remove the event usage
> > by testpmd to another patch.
> > change the hotplug failure handler name.
> > refine the sigbus handle logic.
> > add lock for udev state in igb uio driver.
> >
> > v4->v3:
> > split patches to be small and clear.
> > change to use new parameter "--hotplug-mode" in testpmd to identify
> > the eal hotplug and ethdev hotplug.
> >
> > v3->v2:
> > change bus ops name to bus_hotplug_handler.
> > add new API and bus ops of bus_signal_handler distingush handle generic.
> > sigbus and hotplug sigbus.
> >
> > v2->v1(v21):
> > refine some doc and commit log.
> > fix igb uio kernel issue for control path failure rebase testpmd code.
> >
> > Since the hot plug solution be discussed serval around in the public,
> > the scope be changed and the patch set be split into many times. Coming
> > to the recently RFC and feature design, it just focus on the hot unplug
> > failure handler at this patch set, so in order let this topic more clear
> > and focus, summarize privours patch set in history “v1(v21)”, the v2 here
> > go ahead for further track.
> >
> > "v1(21)" == v21 as below:
> > v21->v20:
> > split function in hot unplug ops.
> > sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
> > combind rmv callback function to be only one.
> >
> > v20->v19:
> > clean the code.
> > refine the remap logic for multiple device.
> > remove the auto binding.
> >
> > v19->18:
> > note for limitation of multiple hotplug, fix some typo, sqeeze patch.
> >
> > v18->v15:
> > add document, add signal bus handler, refine the code to be more clear.
> >
> > the prior patch history please check the patch set "add device event monitor framework".
> >
> > Jeff Guo (7):
> >   bus: add hot-unplug handler
> >   bus/pci: implement hot-unplug handler ops
> >   bus: add sigbus handler
> >   bus/pci: implement sigbus handler ops
> >   bus: add helper to handle sigbus
> >   eal: add failure handle mechanism for hot-unplug
> >   testpmd: use hot-unplug failure handle mechanism
> >
> >  app/test-pmd/testpmd.c                  |  39 ++++++--
> >  doc/guides/rel_notes/release_18_08.rst  |   5 +
> >  drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
> >  drivers/bus/pci/pci_common_uio.c        |  33 +++++++
> >  drivers/bus/pci/private.h               |  12 +++
> >  lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
> >  lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
> >  lib/librte_eal/common/eal_private.h     |  39 ++++++++
> >  lib/librte_eal/common/include/rte_bus.h |  34 +++++++
> >  lib/librte_eal/common/include/rte_dev.h |  26 +++++
> >  lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
> >  lib/librte_eal/rte_eal_version.map      |   2 +
> >  12 files changed, 481 insertions(+), 9 deletions(-)
> >
> 
> I am glad to see this, hotplug is needed. But have a somewhat controversial
> point of view. The DPDK project needs to do more to force users to go to
> more modern kernels and API's; there has been too much effort already to
> support new DPDK on older kernels and distributions. This leads to higher
> testing burden, technical debt and multiple API's.
> 
> To take the extreme point of view.
>         * igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
>         * kni should be deprecated and replaced by virtio

+1

I think, The only feature missing in upstream kernel for DPDK may be to
enable SRIOV on PF VFIO devices controlled by DPDK PMD.
I think, Binding a PF device to DPDK along with VFs(VF can be bound to netdev or DPDK
PMDs, Though binding VF to netdev considered as security breach) will be useful for 
a) rte_flow actions like redirecting the traffic to PF or VF on the given pattern
b) Some NICs can support promiscuous mode only on PF
c) Enable Switch representation devices
https://doc.dpdk.org/guides/prog_guide/switch_representation.html 

I think, igb_uio mainly used as the backdoor for this use case.

I think, there was some work in this area but it is not upstreamed due
to various reasons.
https://lkml.org/lkml/2018/3/8/1122

> 
> When there are N ways of doing things against X kernel versions,
> and Y distributions, and multiple device vendors; the combinational explosion of cases means
> that interfaces don't get the depth of testing they deserve.
> 
> That means why not support hotplug on VFIO only?
>
  
Guo, Jia Oct. 2, 2018, 9:57 a.m. UTC | #20
hi, stephen

Thanks for your review, my answer as below.

On 10/1/2018 5:00 PM, Stephen Hemminger wrote:
> On Sun, 30 Sep 2018 19:29:56 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> Hotplug is an important feature for use-cases like the datacenter device's
>> fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
>> flexibility and continuality to networking services in multiple use-cases
>> in the industry. So let's see how DPDK can help users implement hotplug
>> solutions.
>>
>> We already have a general device-event monitor mechanism, failsafe driver,
>> and hot plug/unplug API in DPDK. We have already got the solution of
>> “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
>> got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
>> need to considerate 2 different solutions between uio pci and vfio pci.
>>
>> In the case of hotplug for igb_uio, when a hardware device be removed
>> physically or disabled in software, the application needs to be notified
>> and detach the device out of the bus, and then make the device invalidate.
>> The problem is that, the removal of the device is not instantaneous in
>> software. If the application data path tries to read/write to the device
>> when removal is still in process, it will cause an MMIO error and
>> application will crash.
>>
>> In this patch set, we propose a PCIe bus failure handler mechanism for
>> hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
>> the application will not crash.
>>
>> The mechanism should work as below:
>>
>> First, the application enables the device event monitor, registers the
>> hotplug event’s callback and enable hotplug handling before running the
>> data path. Once the hot-unplug occurs, the mechanism will detect the
>> removal event and then accordingly do the failure handling. In order to
>> do that, the below functionality will be required:
>>   - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
>>   - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
>>     it will be based on the failure address to remap memory for the corresponding
>>     device that unplugged. For vfio pci, could seperate implement case by case.
>>
>> For the data path or other unexpected behaviors from the control path
>> when a hot unplug occurs:
>>   - Add a new bus ops “sigbus_handler”, that is responsible for handling
>>     the sigbus error which is either an original memory error, or a specific
>>     memory error that is caused by a hot unplug. When a sigbus error is
>>     captured, it will call this function to handle sigbus error.
>>   - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
>>     device on PCI bus to find which device encounter the failure.
>>   - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
>>     to handle the failure.
>>   - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
>>     “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
>>     It will monitor the sigbus error by a handler which is per-process.
>>     Based on the signal event principle, the control path thread and the
>>     data path thread will randomly receive the sigbus error, but will call the
>>     common sigbus process. When sigbus be captured, it will call the above API
>>     to find bus to handle it.
>>
>> The mechanism could be used by app or PMDs. For example, the whole process
>> of hotplug in testpmd is:
>>   - Enable device event monitor->Enable hotplug handle->Register event callback
>>     ->attach port->start port->start forwarding->Device unplug->failure handle
>>     ->stop forwarding->stop port->close port->detach port.
>>
>> This patch set would not cover hotplug insert and binding, and it is only
>> implement the igb_uio failure handler, the vfio hotplug failure handler
>> will be in next coming patch set.
>>
>> patchset history:
>> v11->v10:
>> change the ops name, since both uio and vfio will use the hot-unplug ops.
>> add experimental tag.
>> since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
>> RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
>> move the igb_uio fixing part, since it is random issue and should be considarate
>> as kernel driver defect but not include as this failure handler mechanism.
>>
>> v10->v9:
>> modify the api name and exposure out for public use.
>> add hotplug handle enable/disable APIs
>> refine commit log
>>
>> v9->v8:
>> refine commit log to be more readable.
>>
>> v8->v7:
>> refine errno process in sigbus handler.
>> refine igb uio release process
>>
>> v7->v6:
>> delete some unused part
>>
>> v6->v5:
>> refine some description about bus ops
>> refine commit log
>> add some entry check.
>>
>> v5->v4:
>> split patches to focus on the failure handle, remove the event usage
>> by testpmd to another patch.
>> change the hotplug failure handler name.
>> refine the sigbus handle logic.
>> add lock for udev state in igb uio driver.
>>
>> v4->v3:
>> split patches to be small and clear.
>> change to use new parameter "--hotplug-mode" in testpmd to identify
>> the eal hotplug and ethdev hotplug.
>>
>> v3->v2:
>> change bus ops name to bus_hotplug_handler.
>> add new API and bus ops of bus_signal_handler distingush handle generic.
>> sigbus and hotplug sigbus.
>>
>> v2->v1(v21):
>> refine some doc and commit log.
>> fix igb uio kernel issue for control path failure rebase testpmd code.
>>
>> Since the hot plug solution be discussed serval around in the public,
>> the scope be changed and the patch set be split into many times. Coming
>> to the recently RFC and feature design, it just focus on the hot unplug
>> failure handler at this patch set, so in order let this topic more clear
>> and focus, summarize privours patch set in history “v1(v21)”, the v2 here
>> go ahead for further track.
>>
>> "v1(21)" == v21 as below:
>> v21->v20:
>> split function in hot unplug ops.
>> sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
>> combind rmv callback function to be only one.
>>
>> v20->v19:
>> clean the code.
>> refine the remap logic for multiple device.
>> remove the auto binding.
>>
>> v19->18:
>> note for limitation of multiple hotplug, fix some typo, sqeeze patch.
>>
>> v18->v15:
>> add document, add signal bus handler, refine the code to be more clear.
>>
>> the prior patch history please check the patch set "add device event monitor framework".
>>
>> Jeff Guo (7):
>>    bus: add hot-unplug handler
>>    bus/pci: implement hot-unplug handler ops
>>    bus: add sigbus handler
>>    bus/pci: implement sigbus handler ops
>>    bus: add helper to handle sigbus
>>    eal: add failure handle mechanism for hot-unplug
>>    testpmd: use hot-unplug failure handle mechanism
>>
>>   app/test-pmd/testpmd.c                  |  39 ++++++--
>>   doc/guides/rel_notes/release_18_08.rst  |   5 +
>>   drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
>>   drivers/bus/pci/pci_common_uio.c        |  33 +++++++
>>   drivers/bus/pci/private.h               |  12 +++
>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>>   lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
>>   lib/librte_eal/common/eal_private.h     |  39 ++++++++
>>   lib/librte_eal/common/include/rte_bus.h |  34 +++++++
>>   lib/librte_eal/common/include/rte_dev.h |  26 +++++
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>>   lib/librte_eal/rte_eal_version.map      |   2 +
>>   12 files changed, 481 insertions(+), 9 deletions(-)
>>
> I am glad to see this, hotplug is needed. But have a somewhat controversial
> point of view. The DPDK project needs to do more to force users to go to
> more modern kernels and API's; there has been too much effort already to
> support new DPDK on older kernels and distributions. This leads to higher
> testing burden, technical debt and multiple API's.
>
> To take the extreme point of view.
> 	* igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
> 	* kni should be deprecated and replaced by virtio
>
> When there are N ways of doing things against X kernel versions,
> and Y distributions, and multiple device vendors; the combinational explosion of cases means
> that interfaces don't get the depth of testing they deserve.
>
> That means why not support hotplug on VFIO only?


I think you gave a very constructive suggestion, but i am not 100% sure 
for that, at least something need we considerate and discuss here.

1)Is it announced to all dpdk user that igb_uio will be deprecated, and 
is it plan the time slot when will be?

2)At next, the subject of considerate should be the cost of the 
transaction. During the transaction, what is better way, to set 
experimental and implement and testing it to benefit for igb_uio user 
and customer, or just let this part to be vacation and avoid any 
unnecessary combinational cost of implementation and testing .

I think we can fulfill hotplug for our dpdk user and customer, if we 
could find a better way to handle 1) and 2).
  
Guo, Jia Oct. 2, 2018, 10:08 a.m. UTC | #21
hi, jerin

Thanks for your comment and reply as below.

On 10/1/2018 5:55 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Mon, 1 Oct 2018 11:00:12 +0200
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> To: Jeff Guo <jia.guo@intel.com>
>> Cc: bruce.richardson@intel.com, ferruh.yigit@intel.com,
>>   konstantin.ananyev@intel.com, gaetan.rivet@6wind.com,
>>   jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com,
>>   matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com,
>>   shaopeng.he@intel.com, bernard.iremonger@intel.com,
>>   arybchenko@solarflare.com, wenzhuo.lu@intel.com,
>>   anatoly.burakov@intel.com, jblunck@infradead.org, shreyansh.jain@nxp.com,
>>   dev@dpdk.org, helin.zhang@intel.com
>> Subject: Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism
>>
>>
>> On Sun, 30 Sep 2018 19:29:56 +0800
>> Jeff Guo <jia.guo@intel.com> wrote:
>>
>>> Hotplug is an important feature for use-cases like the datacenter device's
>>> fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
>>> flexibility and continuality to networking services in multiple use-cases
>>> in the industry. So let's see how DPDK can help users implement hotplug
>>> solutions.
>>>
>>> We already have a general device-event monitor mechanism, failsafe driver,
>>> and hot plug/unplug API in DPDK. We have already got the solution of
>>> “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
>>> got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
>>> need to considerate 2 different solutions between uio pci and vfio pci.
>>>
>>> In the case of hotplug for igb_uio, when a hardware device be removed
>>> physically or disabled in software, the application needs to be notified
>>> and detach the device out of the bus, and then make the device invalidate.
>>> The problem is that, the removal of the device is not instantaneous in
>>> software. If the application data path tries to read/write to the device
>>> when removal is still in process, it will cause an MMIO error and
>>> application will crash.
>>>
>>> In this patch set, we propose a PCIe bus failure handler mechanism for
>>> hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
>>> the application will not crash.
>>>
>>> The mechanism should work as below:
>>>
>>> First, the application enables the device event monitor, registers the
>>> hotplug event’s callback and enable hotplug handling before running the
>>> data path. Once the hot-unplug occurs, the mechanism will detect the
>>> removal event and then accordingly do the failure handling. In order to
>>> do that, the below functionality will be required:
>>>   - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
>>>   - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
>>>     it will be based on the failure address to remap memory for the corresponding
>>>     device that unplugged. For vfio pci, could seperate implement case by case.
>>>
>>> For the data path or other unexpected behaviors from the control path
>>> when a hot unplug occurs:
>>>   - Add a new bus ops “sigbus_handler”, that is responsible for handling
>>>     the sigbus error which is either an original memory error, or a specific
>>>     memory error that is caused by a hot unplug. When a sigbus error is
>>>     captured, it will call this function to handle sigbus error.
>>>   - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
>>>     device on PCI bus to find which device encounter the failure.
>>>   - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
>>>     to handle the failure.
>>>   - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
>>>     “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
>>>     It will monitor the sigbus error by a handler which is per-process.
>>>     Based on the signal event principle, the control path thread and the
>>>     data path thread will randomly receive the sigbus error, but will call the
>>>     common sigbus process. When sigbus be captured, it will call the above API
>>>     to find bus to handle it.
>>>
>>> The mechanism could be used by app or PMDs. For example, the whole process
>>> of hotplug in testpmd is:
>>>   - Enable device event monitor->Enable hotplug handle->Register event callback
>>>     ->attach port->start port->start forwarding->Device unplug->failure handle
>>>     ->stop forwarding->stop port->close port->detach port.
>>>
>>> This patch set would not cover hotplug insert and binding, and it is only
>>> implement the igb_uio failure handler, the vfio hotplug failure handler
>>> will be in next coming patch set.
>>>
>>> patchset history:
>>> v11->v10:
>>> change the ops name, since both uio and vfio will use the hot-unplug ops.
>>> add experimental tag.
>>> since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
>>> RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
>>> move the igb_uio fixing part, since it is random issue and should be considarate
>>> as kernel driver defect but not include as this failure handler mechanism.
>>>
>>> v10->v9:
>>> modify the api name and exposure out for public use.
>>> add hotplug handle enable/disable APIs
>>> refine commit log
>>>
>>> v9->v8:
>>> refine commit log to be more readable.
>>>
>>> v8->v7:
>>> refine errno process in sigbus handler.
>>> refine igb uio release process
>>>
>>> v7->v6:
>>> delete some unused part
>>>
>>> v6->v5:
>>> refine some description about bus ops
>>> refine commit log
>>> add some entry check.
>>>
>>> v5->v4:
>>> split patches to focus on the failure handle, remove the event usage
>>> by testpmd to another patch.
>>> change the hotplug failure handler name.
>>> refine the sigbus handle logic.
>>> add lock for udev state in igb uio driver.
>>>
>>> v4->v3:
>>> split patches to be small and clear.
>>> change to use new parameter "--hotplug-mode" in testpmd to identify
>>> the eal hotplug and ethdev hotplug.
>>>
>>> v3->v2:
>>> change bus ops name to bus_hotplug_handler.
>>> add new API and bus ops of bus_signal_handler distingush handle generic.
>>> sigbus and hotplug sigbus.
>>>
>>> v2->v1(v21):
>>> refine some doc and commit log.
>>> fix igb uio kernel issue for control path failure rebase testpmd code.
>>>
>>> Since the hot plug solution be discussed serval around in the public,
>>> the scope be changed and the patch set be split into many times. Coming
>>> to the recently RFC and feature design, it just focus on the hot unplug
>>> failure handler at this patch set, so in order let this topic more clear
>>> and focus, summarize privours patch set in history “v1(v21)”, the v2 here
>>> go ahead for further track.
>>>
>>> "v1(21)" == v21 as below:
>>> v21->v20:
>>> split function in hot unplug ops.
>>> sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
>>> combind rmv callback function to be only one.
>>>
>>> v20->v19:
>>> clean the code.
>>> refine the remap logic for multiple device.
>>> remove the auto binding.
>>>
>>> v19->18:
>>> note for limitation of multiple hotplug, fix some typo, sqeeze patch.
>>>
>>> v18->v15:
>>> add document, add signal bus handler, refine the code to be more clear.
>>>
>>> the prior patch history please check the patch set "add device event monitor framework".
>>>
>>> Jeff Guo (7):
>>>    bus: add hot-unplug handler
>>>    bus/pci: implement hot-unplug handler ops
>>>    bus: add sigbus handler
>>>    bus/pci: implement sigbus handler ops
>>>    bus: add helper to handle sigbus
>>>    eal: add failure handle mechanism for hot-unplug
>>>    testpmd: use hot-unplug failure handle mechanism
>>>
>>>   app/test-pmd/testpmd.c                  |  39 ++++++--
>>>   doc/guides/rel_notes/release_18_08.rst  |   5 +
>>>   drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
>>>   drivers/bus/pci/pci_common_uio.c        |  33 +++++++
>>>   drivers/bus/pci/private.h               |  12 +++
>>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>>>   lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
>>>   lib/librte_eal/common/eal_private.h     |  39 ++++++++
>>>   lib/librte_eal/common/include/rte_bus.h |  34 +++++++
>>>   lib/librte_eal/common/include/rte_dev.h |  26 +++++
>>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>>>   lib/librte_eal/rte_eal_version.map      |   2 +
>>>   12 files changed, 481 insertions(+), 9 deletions(-)
>>>
>> I am glad to see this, hotplug is needed. But have a somewhat controversial
>> point of view. The DPDK project needs to do more to force users to go to
>> more modern kernels and API's; there has been too much effort already to
>> support new DPDK on older kernels and distributions. This leads to higher
>> testing burden, technical debt and multiple API's.
>>
>> To take the extreme point of view.
>>          * igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
>>          * kni should be deprecated and replaced by virtio
> +1
>
> I think, The only feature missing in upstream kernel for DPDK may be to
> enable SRIOV on PF VFIO devices controlled by DPDK PMD.
> I think, Binding a PF device to DPDK along with VFs(VF can be bound to netdev or DPDK
> PMDs, Though binding VF to netdev considered as security breach) will be useful for
> a) rte_flow actions like redirecting the traffic to PF or VF on the given pattern
> b) Some NICs can support promiscuous mode only on PF
> c) Enable Switch representation devices
> https://doc.dpdk.org/guides/prog_guide/switch_representation.html
>
> I think, igb_uio mainly used as the backdoor for this use case.
>
> I think, there was some work in this area but it is not upstreamed due
> to various reasons.
> https://lkml.org/lkml/2018/3/8/1122


I think you definite raise some meaningful of the hotplug for SRIOV. The 
igb_uio still is using now, its usage need to highlight to discuss.


>> When there are N ways of doing things against X kernel versions,
>> and Y distributions, and multiple device vendors; the combinational explosion of cases means
>> that interfaces don't get the depth of testing they deserve.
>>
>> That means why not support hotplug on VFIO only?
>>
  
Guo, Jia Oct. 2, 2018, 12:35 p.m. UTC | #22
Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  testpmd: use hot-unplug failure handle mechanism

 app/test-pmd/testpmd.c                  |  39 ++++++--
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 164 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 483 insertions(+), 9 deletions(-)
  

Patch

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4ee1113..122187e 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1283,6 +1283,7 @@  static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
+
 	/*
 	 * Add an ethertype filter to drop all flow control frames transmitted
 	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
@@ -5832,11 +5833,28 @@  struct i40e_vsi *
 {
 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct rte_uevent event;
 	uint32_t icr0;
+	struct rte_pci_device *pci_dev;
+	struct rte_intr_handle *intr_handle;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	intr_handle = &pci_dev->intr_handle;
 
 	/* Disable interrupt */
 	i40e_pf_disable_irq0(hw);
 
+	/* check device uevent */
+	if (rte_uevent_get(intr_handle->uevent_fd, &event) > 0) {
+		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
+			if (event.action == RTE_UEVENT_REMOVE) {
+				_rte_eth_dev_callback_process(dev,
+					RTE_ETH_EVENT_INTR_RMV, NULL);
+			}
+		}
+		goto done;
+	}
+
 	/* read out interrupt causes */
 	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);