[RFC,2/2] ethdev: add API to set process to primary or secondary

Message ID 20221201082005.732252-3-rongweil@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series add API to set process to primary or secondary |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Rongwei Liu Dec. 1, 2022, 8:20 a.m. UTC
  Users may want to change the DPDK process to different versions
such as hot upgrade.
There is a strong requirement to simplify the logic and shorten the
traffic downtime as much as possible.

This update introduces new rte_eth process role definitions: primary
or secondary.

The primary role means rules are programmed to HW immediately, and no
behavior changed. This is the default state.
The secondary role means rules are queued in the HW. If no primary roles
alive or back to primary, the rules are effective immediately.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 doc/guides/nics/mlx5.rst   | 10 ++++++
 lib/ethdev/ethdev_driver.h | 63 ++++++++++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.c    | 20 ++++++++++++
 lib/ethdev/version.map     |  3 ++
 4 files changed, 96 insertions(+)
  

Comments

Stephen Hemminger Dec. 1, 2022, 3:10 p.m. UTC | #1
On Thu, 1 Dec 2022 10:20:05 +0200
Rongwei Liu <rongweil@nvidia.com> wrote:

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Set the rte_eth process to the primary or secondary role which affects
> + * the flow rules offloading. It doesn't allow multiple processes to be the
> + * same role unless no offload rules are set.
> + * The primary process's flow rules are effective immediately while the secondary
> + * process's rules will be queued in hardware until it becomes primary or no
> + * primary process is alive.
> + * The primary application will always receive traffic while the secondary
> + * application will receive traffic when no matching rules are present from
> + * the primary application.
> + *
> + * The application is primary by default if this API is not called.
> + *
> + * When a process transforms from a secondary to a primary role, all preceding
> + * flow rules which are queued by hardware will be effective immediately.
> + * Before role transition, all the rules set by the primary process should be
> + * flushed first.
> + *
> + * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY" is set, NIC domain
> + * flow rules are effective immediately even if a process is secondary.
> + *
> + * @param active
> + *   Process primary (role) or not (secondary).
> + * @param flag
> + *   The role flag.
> + * @return
> + *   - (>=0) Number of rte devices which have been switched successfully.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_process_set_primary(bool primary, uint32_t flag);

The state of the devices and the system is really unstable if
this fails. There is no rollback here.

I think this should have a PMD capability flag so that application
can check that device supports doing this. And it would have to
be opt-in so that existing devices would always fail.
  
Rongwei Liu Dec. 2, 2022, 3:27 a.m. UTC | #2
BR
Rongwei

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, December 1, 2022 23:10
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
> 
> External email: Use caution opening links or attachments
> 
> 
> On Thu, 1 Dec 2022 10:20:05 +0200
> Rongwei Liu <rongweil@nvidia.com> wrote:
> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Set the rte_eth process to the primary or secondary role which
> > +affects
> > + * the flow rules offloading. It doesn't allow multiple processes to
> > +be the
> > + * same role unless no offload rules are set.
> > + * The primary process's flow rules are effective immediately while
> > +the secondary
> > + * process's rules will be queued in hardware until it becomes
> > +primary or no
> > + * primary process is alive.
> > + * The primary application will always receive traffic while the
> > +secondary
> > + * application will receive traffic when no matching rules are
> > +present from
> > + * the primary application.
> > + *
> > + * The application is primary by default if this API is not called.
> > + *
> > + * When a process transforms from a secondary to a primary role, all
> > +preceding
> > + * flow rules which are queued by hardware will be effective immediately.
> > + * Before role transition, all the rules set by the primary process
> > +should be
> > + * flushed first.
> > + *
> > + * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY" is set,
> > +NIC domain
> > + * flow rules are effective immediately even if a process is secondary.
> > + *
> > + * @param active
> > + *   Process primary (role) or not (secondary).
> > + * @param flag
> > + *   The role flag.
> > + * @return
> > + *   - (>=0) Number of rte devices which have been switched successfully.
> > + *   - (-EINVAL) if bad parameter.
> > + */
> > +__rte_experimental
> > +int rte_eth_process_set_primary(bool primary, uint32_t flag);
> 
> The state of the devices and the system is really unstable if this fails. There is
> no rollback here.
> 
Assume application is calling rte_eth_process_set_primary(false);
Once failed, call all preceding successful ports as rte_eth_process_set_primary(true);
What do you think?
> I think this should have a PMD capability flag so that application can check
> that device supports doing this. And it would have to be opt-in so that existing
> devices would always fail.
If device doesn't support it, it can set the ethdev callback to NULL or return failure for all devices.
Then the devices' state will be consistent.
  
Stephen Hemminger Dec. 5, 2022, 4:08 p.m. UTC | #3
On Fri, 2 Dec 2022 03:27:38 +0000
Rongwei Liu <rongweil@nvidia.com> wrote:

> > 
> > The state of the devices and the system is really unstable if this fails. There is
> > no rollback here.
> >   
> Assume application is calling rte_eth_process_set_primary(false);
> Once failed, call all preceding successful ports as rte_eth_process_set_primary(true);
> What do you think?
> > I think this should have a PMD capability flag so that application can check
> > that device supports doing this. And it would have to be opt-in so that existing
> > devices would always fail.  
> If device doesn't support it, it can set the ethdev callback to NULL or return failure for all devices.
> Then the devices' state will be consistent.


Assume there are two DPDK ports.
If the application tries to change roles and one of the devices does not
support the change over, then that error is fatal. The first device has
changed state already, and the second doesn't allow it.

This needs to be a capability flag for the device, and would need an additional
flag in the device documentation as well.

I bet many devices do regular malloc or mmap in the primary process and that
is not going to work with this change.
  
Rongwei Liu Dec. 6, 2022, 3:47 a.m. UTC | #4
Hi

BR
Rongwei

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, December 6, 2022 00:08
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, 2 Dec 2022 03:27:38 +0000
> Rongwei Liu <rongweil@nvidia.com> wrote:
> 
> > >
> > > The state of the devices and the system is really unstable if this
> > > fails. There is no rollback here.
> > >
> > Assume application is calling rte_eth_process_set_primary(false);
> > Once failed, call all preceding successful ports as
> > rte_eth_process_set_primary(true);
> > What do you think?
> > > I think this should have a PMD capability flag so that application
> > > can check that device supports doing this. And it would have to be
> > > opt-in so that existing devices would always fail.
> > If device doesn't support it, it can set the ethdev callback to NULL or return
> failure for all devices.
> > Then the devices' state will be consistent.
> 
> 
> Assume there are two DPDK ports.
> If the application tries to change roles and one of the devices does not support
> the change over, then that error is fatal. The first device has changed state
> already, and the second doesn't allow it.
> 
If my understanding is correct, you are saying one application to probe two PMD vendors. This is difficult to handle. 
> This needs to be a capability flag for the device, and would need an additional
> flag in the device documentation as well.
> 
For multiple vendors simultaneously probing. Capability flag is a must. But no need for single vendor, right?
> I bet many devices do regular malloc or mmap in the primary process and that
> is not going to work with this change.
Sorry, looks like I mis-lead you. The words "Primary/Secondary" in this update have no relationship with current PRIMARY/SECDONARY definition.
Doesn't focus on the resource ownership.
In this update: Primary application' offload rules are effective immediately. Secondary' rules are in queue which will be effective if primary application exits or primary application doesn't insert any rule.
Maybe we can call it as "active/standby" or "main/standby" or "active/backup"?  Do you have naming suggestion?
  
Stephen Hemminger Dec. 6, 2022, 5:54 a.m. UTC | #5
On Tue, 6 Dec 2022 03:47:42 +0000
Rongwei Liu <rongweil@nvidia.com> wrote:

> Hi
> 
> BR
> Rongwei
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Tuesday, December 6, 2022 00:08
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > <rasland@nvidia.com>
> > Subject: Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Fri, 2 Dec 2022 03:27:38 +0000
> > Rongwei Liu <rongweil@nvidia.com> wrote:
> >   
> > > >
> > > > The state of the devices and the system is really unstable if this
> > > > fails. There is no rollback here.
> > > >  
> > > Assume application is calling rte_eth_process_set_primary(false);
> > > Once failed, call all preceding successful ports as
> > > rte_eth_process_set_primary(true);
> > > What do you think?  
> > > > I think this should have a PMD capability flag so that application
> > > > can check that device supports doing this. And it would have to be
> > > > opt-in so that existing devices would always fail.  
> > > If device doesn't support it, it can set the ethdev callback to NULL or return  
> > failure for all devices.  
> > > Then the devices' state will be consistent.  
> > 
> > 
> > Assume there are two DPDK ports.
> > If the application tries to change roles and one of the devices does not support
> > the change over, then that error is fatal. The first device has changed state
> > already, and the second doesn't allow it.
> >   
> If my understanding is correct, you are saying one application to probe two PMD vendors. This is difficult to handle. 
> > This needs to be a capability flag for the device, and would need an additional
> > flag in the device documentation as well.
> >   
> For multiple vendors simultaneously probing. Capability flag is a must. But no need for single vendor, right?
> > I bet many devices do regular malloc or mmap in the primary process and that
> > is not going to work with this change.  
> Sorry, looks like I mis-lead you. The words "Primary/Secondary" in this update have no relationship with current PRIMARY/SECDONARY definition.
> Doesn't focus on the resource ownership.
> In this update: Primary application' offload rules are effective immediately. Secondary' rules are in queue which will be effective if primary application exits or primary application doesn't insert any rule.
> Maybe we can call it as "active/standby" or "main/standby" or "active/backup"?  Do you have naming suggestion?  


DPDK supports any combination of PCI and virtual devices.
Any patch that restricts that is a bad idea.

There already is a capability mechanism in ethdev API.
A well written application would look at those flags for all ethdev's before
attempting transition. Layered PMD's like bonding, failsafe, and netvsc would also
need to handle the nesting.

The problem is hard, what you did so far is a start but there are lots more issues
that need to be considered.
  

Patch

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 51f51259e3..ea78e14b80 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2001,3 +2001,13 @@  where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+ethdev set process primary or secondary
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+User should only program group 0 (fdb_def_rule_en=0) when ``rte_eth_process_set_primary``
+has been called and set to a secondary role.
+Group 0 is shared across different DPDK processes while the other groups are limited
+to the current process scope.
+The process can't move from primary to secondary role if preceding primary application's
+rules are still present and vice versa.
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 6a550cfc83..67b53840c8 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -179,6 +179,16 @@  struct rte_eth_dev_data {
 	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex */
 } __rte_cache_aligned;
 
+/**@{@name Different rte_eth role flag definitions which will be used
+ *  when miagrating DPDK to a different version.
+ */
+/*
+ * Traffic coming from NIC domain rules will reach
+ * both primary and secondary processes.
+ */
+#define RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY RTE_BIT32(0),
+/**@}*/
+
 /**
  * @internal
  * The pool of *rte_eth_dev* structures. The size of the pool
@@ -1087,6 +1097,22 @@  typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rt
  */
 typedef int (*eth_dev_priv_dump_t)(struct rte_eth_dev *dev, FILE *file);
 
+/**
+ * @internal
+ * Set rte_eth process to primary or secondary role.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param active
+ *   Device (role) primary or not (secondary).
+ * @param flag
+ *   Role specific flag.
+ *
+ * @return
+ *   Negative value on error, 0 on success.
+ */
+typedef int (*eth_process_set_primary_t)(struct rte_eth_dev *dev, bool primary, uint32_t flag);
+
 /**
  * @internal Set Rx queue available descriptors threshold.
  * @see rte_eth_rx_avail_thresh_set()
@@ -1403,6 +1429,8 @@  struct eth_dev_ops {
 	eth_cman_config_set_t cman_config_set;
 	/** Retrieve congestion management configuration */
 	eth_cman_config_get_t cman_config_get;
+	/** Set the whole rte_eth process to primary or secondary role. */
+	eth_process_set_primary_t eth_process_set_primary;
 };
 
 /**
@@ -2046,6 +2074,41 @@  struct rte_eth_fdir_conf {
 	struct rte_eth_fdir_flex_conf flex_conf;
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Set the rte_eth process to the primary or secondary role which affects
+ * the flow rules offloading. It doesn't allow multiple processes to be the
+ * same role unless no offload rules are set.
+ * The primary process's flow rules are effective immediately while the secondary
+ * process's rules will be queued in hardware until it becomes primary or no
+ * primary process is alive.
+ * The primary application will always receive traffic while the secondary
+ * application will receive traffic when no matching rules are present from
+ * the primary application.
+ *
+ * The application is primary by default if this API is not called.
+ *
+ * When a process transforms from a secondary to a primary role, all preceding
+ * flow rules which are queued by hardware will be effective immediately.
+ * Before role transition, all the rules set by the primary process should be
+ * flushed first.
+ *
+ * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY" is set, NIC domain
+ * flow rules are effective immediately even if a process is secondary.
+ *
+ * @param active
+ *   Process primary (role) or not (secondary).
+ * @param flag
+ *   The role flag.
+ * @return
+ *   - (>=0) Number of rte devices which have been switched successfully.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_process_set_primary(bool primary, uint32_t flag);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5d5e18db1e..251908b4e3 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6318,6 +6318,26 @@  rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes
 	return j;
 }
 
+int rte_eth_process_set_primary(bool primary, uint32_t flag)
+{
+	struct rte_eth_dev *dev;
+	uint16_t port_id;
+	int ret, val;
+
+	ret = 0;
+	RTE_ETH_FOREACH_DEV(port_id) {
+		dev = &rte_eth_devices[port_id];
+		if (*dev->dev_ops->eth_process_set_primary == NULL)
+			return -ENOTSUP;
+		val = (*dev->dev_ops->eth_process_set_primary)(dev, primary, flag);
+
+		if (val < 0)
+			return -EINVAL;
+		ret++;
+	}
+	return ret;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 17201fbe0f..1823869e73 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -298,6 +298,9 @@  EXPERIMENTAL {
 	rte_flow_get_q_aged_flows;
 	rte_mtr_meter_policy_get;
 	rte_mtr_meter_profile_get;
+
+	# added in 23.03
+	rte_eth_process_set_primary;
 };
 
 INTERNAL {