[v7] eal: add manual probing option
diff mbox series

Message ID 3825f1afeebd62b3e50535574ddedadb31435616.1579772895.git.grive@u256.net
State New
Delegated to: Thomas Monjalon
Headers show
Series
  • [v7] eal: add manual probing option
Related show

Checks

Context Check Description
ci/travis-robot success Travis build: passed
ci/Intel-compilation success Compilation OK
ci/iol-nxp-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Performance fail Performance Testing issues
ci/checkpatch success coding style OK

Commit Message

Gaëtan Rivet Jan. 23, 2020, 9:58 a.m. UTC
Add a new EAL option enabling manual probing in the EAL.
This command line option will configure the EAL so that buses
will not trigger their probe step on their own.

Applications are then expected to hotplug devices as they see fit.

Devices declared on the command line by the user (using -w and --vdev),
will be probed using the hotplug API, in the order they are declared.

This has the effect of offering a way for users to control probe order
of their devices, for drivers requiring it.

Signed-off-by: Gaetan Rivet <grive@u256.net>
Acked-by : Vamsi Attunuru <vattunuru@marvell.com>
Tested-by: Vamsi Attunuru <vattunuru@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
---

 haven't heard many opinions on the matter, please shout if you see an issue
with this approach.

@Slava: I have tested rather quickly that it does not break anything,
        and that it works as intended for basic cases.
        Can you test it further for your use-case and tell me if it works fine?

Beyond the obvious difference between both probe mode, something to keep in mind:
while using -w on invalid devices would not block (PCI) bus probing, it will stop manual
probing in its track. All devices need to exist and be valid device IDs.

v2: fixed a few typos, map file (and used Travis to validate).

    Slava, are you able to test this patch?

v3: properly fixed the map file (inherited 19.08 instead of 19.05).

    Added a function to set the probe manual from the application,
    without having the user do it from the command line.

    Stopped spamming Slava about it, Vamsi was actually the one interested in it!

Standing issue worth chiming in:

  Currently manual-probe will cut off probing from all buses.
  It could be interesting to be able to only cut buses supporting hotplug,
  given that they are the one able to probe devices afterward.

  No real use-case for this right now, so leaving as-is. Might be worth
  considering in the future.

v4: Rebased on master,
    Moved implementation in common EAL,
    Used public API within the EAL to set the option,
    Made the API experimental

v5: added note in the Getting Started Guide.

v6: Rebased on master

    see http://mails.dpdk.org/archives/dev/2020-January/154178.html
    for reference to this version, linking v7 to v5 thread.

v7: Updated author and SoB.

 doc/guides/linux_gsg/eal_args.include.rst  | 13 ++++++
 doc/guides/rel_notes/release_20_02.rst     |  9 ++++
 lib/librte_eal/common/eal_common_bus.c     |  6 +++
 lib/librte_eal/common/eal_common_dev.c     | 54 ++++++++++++++++++++++
 lib/librte_eal/common/eal_common_options.c |  8 ++++
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h        |  2 +
 lib/librte_eal/common/eal_private.h        |  9 ++++
 lib/librte_eal/common/include/rte_eal.h    | 36 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  4 ++
 10 files changed, 142 insertions(+)

Comments

Pavan Nikhilesh Bhagavatula Feb. 3, 2020, 5:16 a.m. UTC | #1
@David Marchand @thomas@monjalon.net

Ping?

Are there any more changes required for this patch? It's been in queue since last October.

>-----Original Message-----
>From: dev <dev-bounces@dpdk.org> On Behalf Of Gaetan Rivet
>Sent: Thursday, January 23, 2020 3:28 PM
>To: dev@dpdk.org
>Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; Jerin Jacob
>Kollanukkaran <jerinj@marvell.com>
>Subject: [dpdk-dev] [PATCH v7] eal: add manual probing option
>
>Add a new EAL option enabling manual probing in the EAL.
>This command line option will configure the EAL so that buses
>will not trigger their probe step on their own.
>
>Applications are then expected to hotplug devices as they see fit.
>
>Devices declared on the command line by the user (using -w and --
>vdev),
>will be probed using the hotplug API, in the order they are declared.
>
>This has the effect of offering a way for users to control probe order
>of their devices, for drivers requiring it.
>
>Signed-off-by: Gaetan Rivet <grive@u256.net>
>Acked-by : Vamsi Attunuru <vattunuru@marvell.com>
>Tested-by: Vamsi Attunuru <vattunuru@marvell.com>
>Reviewed-by: Jerin Jacob <jerinj@marvell.com>
>---
>
> haven't heard many opinions on the matter, please shout if you see an
>issue
>with this approach.
>
>@Slava: I have tested rather quickly that it does not break anything,
>        and that it works as intended for basic cases.
>        Can you test it further for your use-case and tell me if it works fine?
>
>Beyond the obvious difference between both probe mode, something
>to keep in mind:
>while using -w on invalid devices would not block (PCI) bus probing, it
>will stop manual
>probing in its track. All devices need to exist and be valid device IDs.
>
>v2: fixed a few typos, map file (and used Travis to validate).
>
>    Slava, are you able to test this patch?
>
>v3: properly fixed the map file (inherited 19.08 instead of 19.05).
>
>    Added a function to set the probe manual from the application,
>    without having the user do it from the command line.
>
>    Stopped spamming Slava about it, Vamsi was actually the one
>interested in it!
>
>Standing issue worth chiming in:
>
>  Currently manual-probe will cut off probing from all buses.
>  It could be interesting to be able to only cut buses supporting hotplug,
>  given that they are the one able to probe devices afterward.
>
>  No real use-case for this right now, so leaving as-is. Might be worth
>  considering in the future.
>
>v4: Rebased on master,
>    Moved implementation in common EAL,
>    Used public API within the EAL to set the option,
>    Made the API experimental
>
>v5: added note in the Getting Started Guide.
>
>v6: Rebased on master
>
>    see https://urldefense.proofpoint.com/v2/url?u=http-
>3A__mails.dpdk.org_archives_dev_2020-
>2DJanuary_154178.html&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=
>1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=BsQe7kO_de-
>Kb6YFvVZPpZgIhGnWWpW8Pou2KkH_Cjk&s=oFGrj6beplYlYJWodLDGY
>GCUExSbvb1iJEERsA18QnA&e=
>    for reference to this version, linking v7 to v5 thread.
>
>v7: Updated author and SoB.
>
> doc/guides/linux_gsg/eal_args.include.rst  | 13 ++++++
> doc/guides/rel_notes/release_20_02.rst     |  9 ++++
> lib/librte_eal/common/eal_common_bus.c     |  6 +++
> lib/librte_eal/common/eal_common_dev.c     | 54
>++++++++++++++++++++++
> lib/librte_eal/common/eal_common_options.c |  8 ++++
> lib/librte_eal/common/eal_internal_cfg.h   |  1 +
> lib/librte_eal/common/eal_options.h        |  2 +
> lib/librte_eal/common/eal_private.h        |  9 ++++
> lib/librte_eal/common/include/rte_eal.h    | 36 +++++++++++++++
> lib/librte_eal/rte_eal_version.map         |  4 ++
> 10 files changed, 142 insertions(+)
>
>diff --git a/doc/guides/linux_gsg/eal_args.include.rst
>b/doc/guides/linux_gsg/eal_args.include.rst
>index ed8b0e35b..d0717d4a0 100644
>--- a/doc/guides/linux_gsg/eal_args.include.rst
>+++ b/doc/guides/linux_gsg/eal_args.include.rst
>@@ -69,6 +69,19 @@ Device-related options
>
>        --vdev 'net_pcap0,rx_pcap=input.pcap,tx_pcap=output.pcap'
>
>+*   ``--manual-probe``
>+
>+    Switch the ``EAL`` probe mode to manual. The main bus probe step
>+    is disabled and applications are expected to manually probe
>+    devices using ``rte_dev_probe()``.
>+
>+    Devices declared on the command-line using ``-w`` and ``-vdev``
>+    are interpreted as hotplug commands. They are thus probed in the
>+    order they are declared.
>+
>+    This makes this option useful to enforce a specific device probe
>+    order, instead of relying on each bus scan implementation details.
>+
> *   ``-d <path to shared object or directory>``
>
>     Load external drivers. An argument can be a single shared object file,
>or a
>diff --git a/doc/guides/rel_notes/release_20_02.rst
>b/doc/guides/rel_notes/release_20_02.rst
>index 50e2c1484..f6b3b3def 100644
>--- a/doc/guides/rel_notes/release_20_02.rst
>+++ b/doc/guides/rel_notes/release_20_02.rst
>@@ -56,6 +56,15 @@ New Features
>      Also, make sure to start the actual text at the margin.
>
>======================================================
>===
>
>+* **EAL will now allow probing devices manually.**
>+
>+  Previously, a user could not force an order when probing declared
>devices.
>+  This could cause issues for drivers depending on another device being
>present.
>+  A new option ``--manual-probe`` is now available to do just that.
>+  This new option relies on the device bus supporting hotplug. It can
>+  also be used to disable automatic probing from the ``PCI`` bus without
>+  having to disable the whole bus.
>+
> * **Added Wait Until Equal API.**
>
>   A new API has been added to wait for a memory location to be
>updated with a
>diff --git a/lib/librte_eal/common/eal_common_bus.c
>b/lib/librte_eal/common/eal_common_bus.c
>index baa5b532a..145a96812 100644
>--- a/lib/librte_eal/common/eal_common_bus.c
>+++ b/lib/librte_eal/common/eal_common_bus.c
>@@ -6,6 +6,7 @@
> #include <string.h>
> #include <sys/queue.h>
>
>+#include <rte_eal.h>
> #include <rte_bus.h>
> #include <rte_debug.h>
> #include <rte_string_fns.h>
>@@ -63,6 +64,11 @@ rte_bus_probe(void)
> 	int ret;
> 	struct rte_bus *bus, *vbus = NULL;
>
>+	if (rte_eal_manual_probe()) {
>+		RTE_LOG(DEBUG, EAL, "Manual probing enabled.\n");
>+		return rte_dev_probe_devargs_list();
>+	}
>+
> 	TAILQ_FOREACH(bus, &rte_bus_list, next) {
> 		if (!strcmp(bus->name, "vdev")) {
> 			vbus = bus;
>diff --git a/lib/librte_eal/common/eal_common_dev.c
>b/lib/librte_eal/common/eal_common_dev.c
>index 9e4f09d83..368afa273 100644
>--- a/lib/librte_eal/common/eal_common_dev.c
>+++ b/lib/librte_eal/common/eal_common_dev.c
>@@ -21,6 +21,7 @@
> #include <rte_malloc.h>
> #include <rte_string_fns.h>
>
>+#include "eal_internal_cfg.h"
> #include "eal_private.h"
> #include "hotplug_mp.h"
>
>@@ -83,6 +84,59 @@ rte_dev_is_probed(const struct rte_device *dev)
> 	return dev->driver != NULL;
> }
>
>+int
>+rte_eal_manual_probe(void)
>+{
>+	return internal_config.manual_probe;
>+}
>+
>+void
>+rte_eal_manual_probe_set(int enabled)
>+{
>+	internal_config.manual_probe = !!enabled;
>+}
>+
>+int
>+rte_dev_probe_devargs_list(void)
>+{
>+	struct rte_device *dev;
>+	struct rte_devargs *da;
>+	int ret;
>+
>+	RTE_EAL_DEVARGS_FOREACH(NULL, da) {
>+		dev = da->bus->find_device(NULL, cmp_dev_name,
>da->name);
>+		if (dev == NULL) {
>+			RTE_LOG(ERR, EAL, "Unable to find device %s
>on bus %s\n",
>+				da->name, da->bus->name);
>+			continue;
>+		}
>+
>+		if (rte_dev_is_probed(dev))
>+			continue;
>+
>+		if (dev->bus->plug == NULL) {
>+			RTE_LOG(ERR, EAL, "Manual probing (hotplug)
>not supported by bus %s, "
>+					  "required by device %s\n",
>+				dev->bus->name, dev->name);
>+			continue;
>+		}
>+
>+		ret = dev->bus->plug(dev);
>+		/* Ignore positive return values, they are possibly
>+		 * triggered by blacklisted devices on the PCI bus.
>Probing
>+		 * should then continue.
>+		 */
>+		if (ret < 0) {
>+			RTE_LOG(ERR, EAL, "Driver cannot attach device
>%s\n",
>+				dev->name);
>+			/* Fail on first real probe error. */
>+			return ret;
>+		}
>+	}
>+
>+	return 0;
>+}
>+
> /* helper function to build devargs, caller should free the memory */
> static int
> build_devargs(const char *busname, const char *devname,
>diff --git a/lib/librte_eal/common/eal_common_options.c
>b/lib/librte_eal/common/eal_common_options.c
>index 5920233bc..f899eea4d 100644
>--- a/lib/librte_eal/common/eal_common_options.c
>+++ b/lib/librte_eal/common/eal_common_options.c
>@@ -82,6 +82,7 @@ eal_long_options[] = {
> 	{OPT_LEGACY_MEM,        0, NULL, OPT_LEGACY_MEM_NUM
>},
> 	{OPT_SINGLE_FILE_SEGMENTS, 0, NULL,
>OPT_SINGLE_FILE_SEGMENTS_NUM},
> 	{OPT_MATCH_ALLOCATIONS, 0, NULL,
>OPT_MATCH_ALLOCATIONS_NUM},
>+	{OPT_MANUAL_PROBE,      0, NULL,
>OPT_MANUAL_PROBE_NUM     },
> 	{0,                     0, NULL, 0                        }
> };
>
>@@ -1443,6 +1444,9 @@ eal_parse_common_option(int opt, const
>char *optarg,
> 			return -1;
> 		}
> 		break;
>+	case OPT_MANUAL_PROBE_NUM:
>+		rte_eal_manual_probe_set(1);
>+		break;
>
> 	/* don't know what to do, leave this to caller */
> 	default:
>@@ -1669,6 +1673,10 @@ eal_common_usage(void)
> 	       "  --"OPT_VDEV"              Add a virtual device.\n"
> 	       "                      The argument format is
><driver><id>[,key=val,...]\n"
> 	       "                      (ex: --vdev=net_pcap0,iface=eth2).\n"
>+	       "  --"OPT_MANUAL_PROBE"      Enable manual probing.\n"
>+	       "                      Disable probe step for all buses.\n"
>+	       "                      Devices will need to be probed using the hotplug
>API.\n"
>+	       "                      PCI and vdev declarations will be treated in
>order as hotplug commands.\n"
> 	       "  --"OPT_IOVA_MODE"   Set IOVA mode. 'pa' for
>IOVA_PA\n"
> 	       "                      'va' for IOVA_VA\n"
> 	       "  -d LIB.so|DIR       Add a driver or driver directory\n"
>diff --git a/lib/librte_eal/common/eal_internal_cfg.h
>b/lib/librte_eal/common/eal_internal_cfg.h
>index a42f34923..0006f903f 100644
>--- a/lib/librte_eal/common/eal_internal_cfg.h
>+++ b/lib/librte_eal/common/eal_internal_cfg.h
>@@ -44,6 +44,7 @@ struct internal_config {
> 	unsigned hugepage_unlink;         /**< true to unlink backing files
>*/
> 	volatile unsigned no_pci;         /**< true to disable PCI */
> 	volatile unsigned no_hpet;        /**< true to disable HPET */
>+	volatile unsigned manual_probe;   /**< true to enable manual
>device probing. */
> 	volatile unsigned vmware_tsc_map; /**< true to use VMware
>TSC mapping
>
>	* instead of native TSC */
> 	volatile unsigned no_shconf;      /**< true if there is no shared
>config */
>diff --git a/lib/librte_eal/common/eal_options.h
>b/lib/librte_eal/common/eal_options.h
>index 9855429e5..588fa32a6 100644
>--- a/lib/librte_eal/common/eal_options.h
>+++ b/lib/librte_eal/common/eal_options.h
>@@ -69,6 +69,8 @@ enum {
> 	OPT_IOVA_MODE_NUM,
> #define OPT_MATCH_ALLOCATIONS  "match-allocations"
> 	OPT_MATCH_ALLOCATIONS_NUM,
>+#define OPT_MANUAL_PROBE "manual-probe"
>+	OPT_MANUAL_PROBE_NUM,
> 	OPT_LONG_MAX_NUM
> };
>
>diff --git a/lib/librte_eal/common/eal_private.h
>b/lib/librte_eal/common/eal_private.h
>index ddcfbe2e4..680c7db88 100644
>--- a/lib/librte_eal/common/eal_private.h
>+++ b/lib/librte_eal/common/eal_private.h
>@@ -443,4 +443,13 @@ rte_option_usage(void);
> uint64_t
> eal_get_baseaddr(void);
>
>+/**
>+ * Go through the devargs list and probe everything in order.
>+ *
>+ * @return
>+ *   0 on success, negative on error.
>+ */
>+int
>+rte_dev_probe_devargs_list(void);
>+
> #endif /* _EAL_PRIVATE_H_ */
>diff --git a/lib/librte_eal/common/include/rte_eal.h
>b/lib/librte_eal/common/include/rte_eal.h
>index 2f9ed298d..7195f6859 100644
>--- a/lib/librte_eal/common/include/rte_eal.h
>+++ b/lib/librte_eal/common/include/rte_eal.h
>@@ -421,6 +421,42 @@ int rte_eal_has_hugepages(void);
>  */
> int rte_eal_has_pci(void);
>
>+/**
>+ * Whether EAL probe is manual.
>+ * Enabled by the --manual-probe option or by
>+ * using rte_eal_manual_probe_set().
>+ *
>+ * When manual probing is enabled, batched bus probe of
>+ * their devices is disabled. All devices need to be probed
>+ * using the proper rte_dev API.
>+ *
>+ * In this mode, devices declared on the command line will
>+ * be probed using the bus hotplug API. It is used to enforce
>+ * a specific probe order.
>+ *
>+ * @return
>+ *   Nonzero if manual device probing is enabled.
>+ *
>+ * @see rte_eal_manual_probe_set
>+ */
>+__rte_experimental
>+int rte_eal_manual_probe(void);
>+
>+/**
>+ * Configure EAL probe mode -- manual or automatic.
>+ *
>+ * Enable or disable manual probe mode in EAL.
>+ * This function can be called at any time, but must be used
>+ * before calling rte_eal_init() to have any effect.
>+ *
>+ * @param enabled
>+ *   zero to disable manual probe, non-zero to enable it.
>+ *
>+ * @see rte_eal_manual_probe
>+ */
>+__rte_experimental
>+void rte_eal_manual_probe_set(int enabled);
>+
> /**
>  * Whether the EAL was asked to create UIO device.
>  *
>diff --git a/lib/librte_eal/rte_eal_version.map
>b/lib/librte_eal/rte_eal_version.map
>index e38d02530..13d04a8bc 100644
>--- a/lib/librte_eal/rte_eal_version.map
>+++ b/lib/librte_eal/rte_eal_version.map
>@@ -332,4 +332,8 @@ EXPERIMENTAL {
> 	# added in 19.11
> 	rte_log_get_stream;
> 	rte_mcfg_get_single_file_segments;
>+
>+	# added in 20.02
>+	rte_eal_manual_probe;
>+	rte_eal_manual_probe_set;
> };
>--
>2.25.0
Thomas Monjalon Feb. 3, 2020, 10:21 p.m. UTC | #2
03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
> @David Marchand @thomas@monjalon.net
> 
> Ping?
> 
> Are there any more changes required for this patch? It's been in queue since last October.

Sorry we have not decided whether it is a good idea or not.

All changes related to probing are very sensitive,
and we know a big refactoring would be better than stacking
more and more options and corner cases.

As we are busy with ABI stability stuff, we did not allocate
enough time to properly think about this feature.
Please accept our apologies, and let's consider it as
a high priority for 20.05 cycle.
Gaëtan Rivet Feb. 4, 2020, 10:03 a.m. UTC | #3
On 03/02/2020 23:21, Thomas Monjalon wrote:
> 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
>> @David Marchand @thomas@monjalon.net
>>
>> Ping?
>>
>> Are there any more changes required for this patch? It's been in queue since last October.
> 
> Sorry we have not decided whether it is a good idea or not.
> 
> All changes related to probing are very sensitive,
> and we know a big refactoring would be better than stacking
> more and more options and corner cases.
> 
> As we are busy with ABI stability stuff, we did not allocate
> enough time to properly think about this feature.
> Please accept our apologies, and let's consider it as
> a high priority for 20.05 cycle.
> 
> 
> 

Hello Thomas,

This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.

The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.
Thomas Monjalon Feb. 4, 2020, 11:07 a.m. UTC | #4
04/02/2020 11:03, Gaetan Rivet:
> On 03/02/2020 23:21, Thomas Monjalon wrote:
> > 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
> >> @David Marchand @thomas@monjalon.net
> >>
> >> Ping?
> >>
> >> Are there any more changes required for this patch? It's been in queue since last October.
> > 
> > Sorry we have not decided whether it is a good idea or not.
> > 
> > All changes related to probing are very sensitive,
> > and we know a big refactoring would be better than stacking
> > more and more options and corner cases.
> > 
> > As we are busy with ABI stability stuff, we did not allocate
> > enough time to properly think about this feature.
> > Please accept our apologies, and let's consider it as
> > a high priority for 20.05 cycle.
> > 
> 
> Hello Thomas,
> 
> This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.
> 
> The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.


Yes, life is full of bad decisions and consequences.

I still think there is a risk in adding new user expectations,
and maintaining some code to workaround unknown issues.

The real question here is to know why this patch?
Is it to workaround a broken driver?
Or to workaround a broken design in EAL and bus drivers?
Gaëtan Rivet Feb. 4, 2020, 12:43 p.m. UTC | #5
On 04/02/2020 12:07, Thomas Monjalon wrote:
> 04/02/2020 11:03, Gaetan Rivet:
>> On 03/02/2020 23:21, Thomas Monjalon wrote:
>>> 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
>>>> @David Marchand @thomas@monjalon.net
>>>>
>>>> Ping?
>>>>
>>>> Are there any more changes required for this patch? It's been in queue since last October.
>>>
>>> Sorry we have not decided whether it is a good idea or not.
>>>
>>> All changes related to probing are very sensitive,
>>> and we know a big refactoring would be better than stacking
>>> more and more options and corner cases.
>>>
>>> As we are busy with ABI stability stuff, we did not allocate
>>> enough time to properly think about this feature.
>>> Please accept our apologies, and let's consider it as
>>> a high priority for 20.05 cycle.
>>>
>>
>> Hello Thomas,
>>
>> This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.
>>
>> The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.
> 
> 
> Yes, life is full of bad decisions and consequences.


Ah, yes, but I stand by my initial opinion, the first implementation [1] was riskier and less useful.

> 
> I still think there is a risk in adding new user expectations,
> and maintaining some code to workaround unknown issues.
> 
> The real question here is to know why this patch?
> Is it to workaround a broken driver?
> Or to workaround a broken design in EAL and bus drivers?
> 
> 

Two birds - one stone here: OVS needed a way to disable automatic probing cleanly (current workaround seen in multiple deployment is to add a dummy whitelisted device, which will be ignored by the PCI bus --> it sets the bus in whitelist mode but avoid probing anything), and as a bonus this option allows using devices that depends on other devices being probed already (LAG, representors, failsafe, etc).

I'm not sure having a dependent-probe by default is good, and that would be a big change.

If we are doing the genesis of this patch, the initial motivation should be asked for more details from Marvell people and David for the OVS side.

[1]: First proposal:
        http://mails.dpdk.org/archives/dev/2019-September/144166.html
      My arguments:
        http://mails.dpdk.org/archives/dev/2019-September/144564.html
Thomas Monjalon Feb. 4, 2020, 3:06 p.m. UTC | #6
04/02/2020 13:43, Gaetan Rivet:
> On 04/02/2020 12:07, Thomas Monjalon wrote:
> > 04/02/2020 11:03, Gaetan Rivet:
> >> On 03/02/2020 23:21, Thomas Monjalon wrote:
> >>> 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
> >>>> @David Marchand @thomas@monjalon.net
> >>>>
> >>>> Ping?
> >>>>
> >>>> Are there any more changes required for this patch? It's been in queue since last October.
> >>>
> >>> Sorry we have not decided whether it is a good idea or not.
> >>>
> >>> All changes related to probing are very sensitive,
> >>> and we know a big refactoring would be better than stacking
> >>> more and more options and corner cases.
> >>>
> >>> As we are busy with ABI stability stuff, we did not allocate
> >>> enough time to properly think about this feature.
> >>> Please accept our apologies, and let's consider it as
> >>> a high priority for 20.05 cycle.
> >>>
> >>
> >> Hello Thomas,
> >>
> >> This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.
> >>
> >> The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.
> > 
> > 
> > Yes, life is full of bad decisions and consequences.
> 
> 
> Ah, yes, but I stand by my initial opinion, the first implementation [1] was riskier and less useful.
> 
> > 
> > I still think there is a risk in adding new user expectations,
> > and maintaining some code to workaround unknown issues.
> > 
> > The real question here is to know why this patch?
> > Is it to workaround a broken driver?
> > Or to workaround a broken design in EAL and bus drivers?
> 
> Two birds - one stone here: OVS needed a way to disable automatic probing cleanly (current workaround seen in multiple deployment is to add a dummy whitelisted device, which will be ignored by the PCI bus --> it sets the bus in whitelist mode but avoid probing anything), and as a bonus this option allows using devices that depends on other devices being probed already (LAG, representors, failsafe, etc).
> 
> I'm not sure having a dependent-probe by default is good, and that would be a big change.
> 
> If we are doing the genesis of this patch, the initial motivation should be asked for more details from Marvell people and David for the OVS side.
> 
> [1]: First proposal:
>         http://mails.dpdk.org/archives/dev/2019-September/144166.html
>       My arguments:
>         http://mails.dpdk.org/archives/dev/2019-September/144564.html


OK so there are two needs:

1/ Better control whitelist/blacklist mode.
We already know that a rework is needed here.
Unfortunately neither you or me had time to work on it,
and others who were interested disappeared.

2/ Associate ports with equivalent properties in applications.
This must be done in applications.
Tweaking the probe order is a hack.
Gaëtan Rivet Feb. 4, 2020, 4:02 p.m. UTC | #7
On 04/02/2020 16:06, Thomas Monjalon wrote:
> 04/02/2020 13:43, Gaetan Rivet:
>> On 04/02/2020 12:07, Thomas Monjalon wrote:
>>> 04/02/2020 11:03, Gaetan Rivet:
>>>> On 03/02/2020 23:21, Thomas Monjalon wrote:
>>>>> 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
>>>>>> @David Marchand @thomas@monjalon.net
>>>>>>
>>>>>> Ping?
>>>>>>
>>>>>> Are there any more changes required for this patch? It's been in queue since last October.
>>>>>
>>>>> Sorry we have not decided whether it is a good idea or not.
>>>>>
>>>>> All changes related to probing are very sensitive,
>>>>> and we know a big refactoring would be better than stacking
>>>>> more and more options and corner cases.
>>>>>
>>>>> As we are busy with ABI stability stuff, we did not allocate
>>>>> enough time to properly think about this feature.
>>>>> Please accept our apologies, and let's consider it as
>>>>> a high priority for 20.05 cycle.
>>>>>
>>>>
>>>> Hello Thomas,
>>>>
>>>> This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.
>>>>
>>>> The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.
>>>
>>>
>>> Yes, life is full of bad decisions and consequences.
>>
>>
>> Ah, yes, but I stand by my initial opinion, the first implementation [1] was riskier and less useful.
>>
>>>
>>> I still think there is a risk in adding new user expectations,
>>> and maintaining some code to workaround unknown issues.
>>>
>>> The real question here is to know why this patch?
>>> Is it to workaround a broken driver?
>>> Or to workaround a broken design in EAL and bus drivers?
>>
>> Two birds - one stone here: OVS needed a way to disable automatic probing cleanly (current workaround seen in multiple deployment is to add a dummy whitelisted device, which will be ignored by the PCI bus --> it sets the bus in whitelist mode but avoid probing anything), and as a bonus this option allows using devices that depends on other devices being probed already (LAG, representors, failsafe, etc).
>>
>> I'm not sure having a dependent-probe by default is good, and that would be a big change.
>>
>> If we are doing the genesis of this patch, the initial motivation should be asked for more details from Marvell people and David for the OVS side.
>>
>> [1]: First proposal:
>>          http://mails.dpdk.org/archives/dev/2019-September/144166.html
>>        My arguments:
>>          http://mails.dpdk.org/archives/dev/2019-September/144564.html
> 
> 
> OK so there are two needs:
> 
> 1/ Better control whitelist/blacklist mode.
> We already know that a rework is needed here.
> Unfortunately neither you or me had time to work on it,
> and others who were interested disappeared.
> 
> 2/ Associate ports with equivalent properties in applications.
> This must be done in applications.
> Tweaking the probe order is a hack.
> 
> 

An application that want to tightly control the port init order, currently (by doing exactly like me here, hotpluging one by one the ports), would still need the bigger hack that consist in inserting a whitelist PCI devargs with a dummy address, depending on a undocumented PCI bus feature consisting in ignoring matching errors but keeping probing policy from failed devargs processing.

Instead, with this patch this app can do

   rte_eal_manual_probe_set(1);
   rte_eal_init();

to have the same behavior and be able to hotplug ports as it sees fit.

You are worried about creating user expectations about this behavior (being forced to replicate in some way the functionality during the rewrite, as I understand it?), but then you are currently forcing users to expect this workaround to exist in the PCI bus, blocking devs from touching it as it will thus break current app configurations. I've seen systemd unit file using this -w dummy flag, as well as the programmatic equivalent. Which is better, to have to rework it cutting short these configs, or to propose beforehand a deprecation path?.

This rework won't happen in 20.05, nor in the medium term unless you decide to drive this change. This workaround serves three needs (PCI normalization, port congruence and port dependency) in a low-risk implementation.
Jerin Jacob Feb. 10, 2020, 2:51 p.m. UTC | #8
On Tue, Feb 4, 2020 at 9:32 PM Gaetan Rivet <grive@u256.net> wrote:
>
> On 04/02/2020 16:06, Thomas Monjalon wrote:
> > 04/02/2020 13:43, Gaetan Rivet:
> >> On 04/02/2020 12:07, Thomas Monjalon wrote:
> >>> 04/02/2020 11:03, Gaetan Rivet:
> >>>> On 03/02/2020 23:21, Thomas Monjalon wrote:
> >>>>> 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
> >>>>>> @David Marchand @thomas@monjalon.net
> >>>>>>
> >>>>>> Ping?
> >>>>>>
> >>>>>> Are there any more changes required for this patch? It's been in queue since last October.
> >>>>>
> >>>>> Sorry we have not decided whether it is a good idea or not.
> >>>>>
> >>>>> All changes related to probing are very sensitive,
> >>>>> and we know a big refactoring would be better than stacking
> >>>>> more and more options and corner cases.
> >>>>>
> >>>>> As we are busy with ABI stability stuff, we did not allocate
> >>>>> enough time to properly think about this feature.
> >>>>> Please accept our apologies, and let's consider it as
> >>>>> a high priority for 20.05 cycle.
> >>>>>
> >>>>
> >>>> Hello Thomas,
> >>>>
> >>>> This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.
> >>>>
> >>>> The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.
> >>>
> >>>
> >>> Yes, life is full of bad decisions and consequences.
> >>
> >>
> >> Ah, yes, but I stand by my initial opinion, the first implementation [1] was riskier and less useful.
> >>
> >>>
> >>> I still think there is a risk in adding new user expectations,
> >>> and maintaining some code to workaround unknown issues.
> >>>
> >>> The real question here is to know why this patch?
> >>> Is it to workaround a broken driver?
> >>> Or to workaround a broken design in EAL and bus drivers?
> >>
> >> Two birds - one stone here: OVS needed a way to disable automatic probing cleanly (current workaround seen in multiple deployment is to add a dummy whitelisted device, which will be ignored by the PCI bus --> it sets the bus in whitelist mode but avoid probing anything), and as a bonus this option allows using devices that depends on other devices being probed already (LAG, representors, failsafe, etc).
> >>
> >> I'm not sure having a dependent-probe by default is good, and that would be a big change.
> >>
> >> If we are doing the genesis of this patch, the initial motivation should be asked for more details from Marvell people and David for the OVS side.
> >>
> >> [1]: First proposal:
> >>          http://mails.dpdk.org/archives/dev/2019-September/144166.html
> >>        My arguments:
> >>          http://mails.dpdk.org/archives/dev/2019-September/144564.html
> >
> >
> > OK so there are two needs:
> >
> > 1/ Better control whitelist/blacklist mode.
> > We already know that a rework is needed here.
> > Unfortunately neither you or me had time to work on it,
> > and others who were interested disappeared.
> >
> > 2/ Associate ports with equivalent properties in applications.
> > This must be done in applications.
> > Tweaking the probe order is a hack.
> >
> >
>
> An application that want to tightly control the port init order, currently (by doing exactly like me here, hotpluging one by one the ports), would still need the bigger hack that consist in inserting a whitelist PCI devargs with a dummy address, depending on a undocumented PCI bus feature consisting in ignoring matching errors but keeping probing policy from failed devargs processing.
>
> Instead, with this patch this app can do
>
>    rte_eal_manual_probe_set(1);
>    rte_eal_init();
>
> to have the same behavior and be able to hotplug ports as it sees fit.
>
> You are worried about creating user expectations about this behavior (being forced to replicate in some way the functionality during the rewrite, as I understand it?), but then you are currently forcing users to expect this workaround to exist in the PCI bus, blocking devs from touching it as it will thus break current app configurations. I've seen systemd unit file using this -w dummy flag, as well as the programmatic equivalent. Which is better, to have to rework it cutting short these configs, or to propose beforehand a deprecation path?.
>
> This rework won't happen in 20.05, nor in the medium term unless you decide to drive this change. This workaround serves three needs (PCI normalization, port congruence and port dependency) in a low-risk implementation.

Thomas,

What would be the resolution of this? What is your recommendation to
fix the issue as you have the concern of this patch?

Issue:
1) When l2fwd does the forwarding for simplicity and performance
reason it just xor the port to find the destination port to forward.
2) If the adjacent ports are not symmetrical(example: one is 40G and
other 10G) then forwarding will drop the packets.

So, either
a) We need to control the probing order

b) Or Application need
1) To track the symmetrical ports and maintain the forwarding table  OR
2) Have the command-line option to specify destination port like l3fwd.

We can fix it in the application, but do we need to complicate l2fwd?
I am fine with that, if that is consensus.

Thoughts? If you think, there is a rework needed in eal then could you
enumerate the items for the rework.
Thomas Monjalon Feb. 10, 2020, 3:27 p.m. UTC | #9
10/02/2020 15:51, Jerin Jacob:
> On Tue, Feb 4, 2020 at 9:32 PM Gaetan Rivet <grive@u256.net> wrote:
> >
> > On 04/02/2020 16:06, Thomas Monjalon wrote:
> > > 04/02/2020 13:43, Gaetan Rivet:
> > >> On 04/02/2020 12:07, Thomas Monjalon wrote:
> > >>> 04/02/2020 11:03, Gaetan Rivet:
> > >>>> On 03/02/2020 23:21, Thomas Monjalon wrote:
> > >>>>> 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
> > >>>>>> @David Marchand @thomas@monjalon.net
> > >>>>>>
> > >>>>>> Ping?
> > >>>>>>
> > >>>>>> Are there any more changes required for this patch? It's been in queue since last October.
> > >>>>>
> > >>>>> Sorry we have not decided whether it is a good idea or not.
> > >>>>>
> > >>>>> All changes related to probing are very sensitive,
> > >>>>> and we know a big refactoring would be better than stacking
> > >>>>> more and more options and corner cases.
> > >>>>>
> > >>>>> As we are busy with ABI stability stuff, we did not allocate
> > >>>>> enough time to properly think about this feature.
> > >>>>> Please accept our apologies, and let's consider it as
> > >>>>> a high priority for 20.05 cycle.
> > >>>>>
> > >>>>
> > >>>> Hello Thomas,
> > >>>>
> > >>>> This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.
> > >>>>
> > >>>> The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.
> > >>>
> > >>>
> > >>> Yes, life is full of bad decisions and consequences.
> > >>
> > >>
> > >> Ah, yes, but I stand by my initial opinion, the first implementation [1] was riskier and less useful.
> > >>
> > >>>
> > >>> I still think there is a risk in adding new user expectations,
> > >>> and maintaining some code to workaround unknown issues.
> > >>>
> > >>> The real question here is to know why this patch?
> > >>> Is it to workaround a broken driver?
> > >>> Or to workaround a broken design in EAL and bus drivers?
> > >>
> > >> Two birds - one stone here: OVS needed a way to disable automatic probing cleanly (current workaround seen in multiple deployment is to add a dummy whitelisted device, which will be ignored by the PCI bus --> it sets the bus in whitelist mode but avoid probing anything), and as a bonus this option allows using devices that depends on other devices being probed already (LAG, representors, failsafe, etc).
> > >>
> > >> I'm not sure having a dependent-probe by default is good, and that would be a big change.
> > >>
> > >> If we are doing the genesis of this patch, the initial motivation should be asked for more details from Marvell people and David for the OVS side.
> > >>
> > >> [1]: First proposal:
> > >>          http://mails.dpdk.org/archives/dev/2019-September/144166.html
> > >>        My arguments:
> > >>          http://mails.dpdk.org/archives/dev/2019-September/144564.html
> > >
> > >
> > > OK so there are two needs:
> > >
> > > 1/ Better control whitelist/blacklist mode.
> > > We already know that a rework is needed here.
> > > Unfortunately neither you or me had time to work on it,
> > > and others who were interested disappeared.
> > >
> > > 2/ Associate ports with equivalent properties in applications.
> > > This must be done in applications.
> > > Tweaking the probe order is a hack.
> > >
> > >
> >
> > An application that want to tightly control the port init order, currently (by doing exactly like me here, hotpluging one by one the ports), would still need the bigger hack that consist in inserting a whitelist PCI devargs with a dummy address, depending on a undocumented PCI bus feature consisting in ignoring matching errors but keeping probing policy from failed devargs processing.
> >
> > Instead, with this patch this app can do
> >
> >    rte_eal_manual_probe_set(1);
> >    rte_eal_init();
> >
> > to have the same behavior and be able to hotplug ports as it sees fit.
> >
> > You are worried about creating user expectations about this behavior (being forced to replicate in some way the functionality during the rewrite, as I understand it?), but then you are currently forcing users to expect this workaround to exist in the PCI bus, blocking devs from touching it as it will thus break current app configurations. I've seen systemd unit file using this -w dummy flag, as well as the programmatic equivalent. Which is better, to have to rework it cutting short these configs, or to propose beforehand a deprecation path?.
> >
> > This rework won't happen in 20.05, nor in the medium term unless you decide to drive this change. This workaround serves three needs (PCI normalization, port congruence and port dependency) in a low-risk implementation.
> 
> Thomas,
> 
> What would be the resolution of this? What is your recommendation to
> fix the issue as you have the concern of this patch?
> 
> Issue:
> 1) When l2fwd does the forwarding for simplicity and performance
> reason it just xor the port to find the destination port to forward.
> 2) If the adjacent ports are not symmetrical(example: one is 40G and
> other 10G) then forwarding will drop the packets.
> 
> So, either
> a) We need to control the probing order
> 
> b) Or Application need
> 1) To track the symmetrical ports and maintain the forwarding table  OR
> 2) Have the command-line option to specify destination port like l3fwd.
> 
> We can fix it in the application, but do we need to complicate l2fwd?
> I am fine with that, if that is consensus.

You are describing an application issue,
that's why I believe it should be fixed in the application.

Should we have a command line option to configure the forwarding rules
in the application (2)? I think yes.
Should we implement an application logic to automatically create
the best forwarding rules (1)? It would be nice, but anyway,
I think we need manual config (2) as a fallback.


> Thoughts? If you think, there is a rework needed in eal then could you
> enumerate the items for the rework.

Sorry I don't have time to describe dive into EAL probing and
enumerate the items to rework.
The most important issues I remind are:
	- white/blacklist policy is a mess and should be done in a higher layer
	- devargs syntax should allow generic matching (thanks to class awareness and generic syntax)

Starting from these 2 items, we could imagine a generic path to
disable automatic probing, but I think the l2fwd logic should not
rely on probing order anyway.
Jerin Jacob Feb. 10, 2020, 4:33 p.m. UTC | #10
On Mon, Feb 10, 2020 at 8:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 10/02/2020 15:51, Jerin Jacob:
> > On Tue, Feb 4, 2020 at 9:32 PM Gaetan Rivet <grive@u256.net> wrote:
> > >
> > > On 04/02/2020 16:06, Thomas Monjalon wrote:
> > > > 04/02/2020 13:43, Gaetan Rivet:
> > > >> On 04/02/2020 12:07, Thomas Monjalon wrote:
> > > >>> 04/02/2020 11:03, Gaetan Rivet:
> > > >>>> On 03/02/2020 23:21, Thomas Monjalon wrote:
> > > >>>>> 03/02/2020 06:16, Pavan Nikhilesh Bhagavatula:
> > > >>>>>> @David Marchand @thomas@monjalon.net
> > > >>>>>>
> > > >>>>>> Ping?
> > > >>>>>>
> > > >>>>>> Are there any more changes required for this patch? It's been in queue since last October.
> > > >>>>>
> > > >>>>> Sorry we have not decided whether it is a good idea or not.
> > > >>>>>
> > > >>>>> All changes related to probing are very sensitive,
> > > >>>>> and we know a big refactoring would be better than stacking
> > > >>>>> more and more options and corner cases.
> > > >>>>>
> > > >>>>> As we are busy with ABI stability stuff, we did not allocate
> > > >>>>> enough time to properly think about this feature.
> > > >>>>> Please accept our apologies, and let's consider it as
> > > >>>>> a high priority for 20.05 cycle.
> > > >>>>>
> > > >>>>
> > > >>>> Hello Thomas,
> > > >>>>
> > > >>>> This is unfortunate. I pushed Pavan to accept an alternative implementation of this functionality that was less obtrusive, to make the integration smoother. I took care to alleviate those risks from the common path.
> > > >>>>
> > > >>>> The big refactoring is needed yes, but considering the current path I'm not seeing it happen in 20.05. If that means taking this patch as-is in 20.05 for Marvell users, I'm not sure much is gained from waiting 3 months, except minimal risk avoidance.
> > > >>>
> > > >>>
> > > >>> Yes, life is full of bad decisions and consequences.
> > > >>
> > > >>
> > > >> Ah, yes, but I stand by my initial opinion, the first implementation [1] was riskier and less useful.
> > > >>
> > > >>>
> > > >>> I still think there is a risk in adding new user expectations,
> > > >>> and maintaining some code to workaround unknown issues.
> > > >>>
> > > >>> The real question here is to know why this patch?
> > > >>> Is it to workaround a broken driver?
> > > >>> Or to workaround a broken design in EAL and bus drivers?
> > > >>
> > > >> Two birds - one stone here: OVS needed a way to disable automatic probing cleanly (current workaround seen in multiple deployment is to add a dummy whitelisted device, which will be ignored by the PCI bus --> it sets the bus in whitelist mode but avoid probing anything), and as a bonus this option allows using devices that depends on other devices being probed already (LAG, representors, failsafe, etc).
> > > >>
> > > >> I'm not sure having a dependent-probe by default is good, and that would be a big change.
> > > >>
> > > >> If we are doing the genesis of this patch, the initial motivation should be asked for more details from Marvell people and David for the OVS side.
> > > >>
> > > >> [1]: First proposal:
> > > >>          http://mails.dpdk.org/archives/dev/2019-September/144166.html
> > > >>        My arguments:
> > > >>          http://mails.dpdk.org/archives/dev/2019-September/144564.html
> > > >
> > > >
> > > > OK so there are two needs:
> > > >
> > > > 1/ Better control whitelist/blacklist mode.
> > > > We already know that a rework is needed here.
> > > > Unfortunately neither you or me had time to work on it,
> > > > and others who were interested disappeared.
> > > >
> > > > 2/ Associate ports with equivalent properties in applications.
> > > > This must be done in applications.
> > > > Tweaking the probe order is a hack.
> > > >
> > > >
> > >
> > > An application that want to tightly control the port init order, currently (by doing exactly like me here, hotpluging one by one the ports), would still need the bigger hack that consist in inserting a whitelist PCI devargs with a dummy address, depending on a undocumented PCI bus feature consisting in ignoring matching errors but keeping probing policy from failed devargs processing.
> > >
> > > Instead, with this patch this app can do
> > >
> > >    rte_eal_manual_probe_set(1);
> > >    rte_eal_init();
> > >
> > > to have the same behavior and be able to hotplug ports as it sees fit.
> > >
> > > You are worried about creating user expectations about this behavior (being forced to replicate in some way the functionality during the rewrite, as I understand it?), but then you are currently forcing users to expect this workaround to exist in the PCI bus, blocking devs from touching it as it will thus break current app configurations. I've seen systemd unit file using this -w dummy flag, as well as the programmatic equivalent. Which is better, to have to rework it cutting short these configs, or to propose beforehand a deprecation path?.
> > >
> > > This rework won't happen in 20.05, nor in the medium term unless you decide to drive this change. This workaround serves three needs (PCI normalization, port congruence and port dependency) in a low-risk implementation.
> >
> > Thomas,
> >
> > What would be the resolution of this? What is your recommendation to
> > fix the issue as you have the concern of this patch?
> >
> > Issue:
> > 1) When l2fwd does the forwarding for simplicity and performance
> > reason it just xor the port to find the destination port to forward.
> > 2) If the adjacent ports are not symmetrical(example: one is 40G and
> > other 10G) then forwarding will drop the packets.
> >
> > So, either
> > a) We need to control the probing order
> >
> > b) Or Application need
> > 1) To track the symmetrical ports and maintain the forwarding table  OR
> > 2) Have the command-line option to specify destination port like l3fwd.
> >
> > We can fix it in the application, but do we need to complicate l2fwd?
> > I am fine with that, if that is consensus.
>
> You are describing an application issue,
> that's why I believe it should be fixed in the application.

Thanks for the quick reply and I agree.


>
> Should we have a command line option to configure the forwarding rules
> in the application (2)? I think yes.
> Should we implement an application logic to automatically create
> the best forwarding rules (1)? It would be nice, but anyway,
> I think we need manual config (2) as a fallback.
>
>
> > Thoughts? If you think, there is a rework needed in eal then could you
> > enumerate the items for the rework.
>
> Sorry I don't have time to describe dive into EAL probing and
> enumerate the items to rework.
> The most important issues I remind are:
>         - white/blacklist policy is a mess and should be done in a higher layer
>         - devargs syntax should allow generic matching (thanks to class awareness and generic syntax)
>
> Starting from these 2 items, we could imagine a generic path to
> disable automatic probing, but I think the l2fwd logic should not
> rely on probing order anyway.

+ Bruce as l2fwd maintainer.

Since in any case, l2fwd needs to be updated, We will focus on l2fwd
change for v20.05 and leaving the fate of this patch to EAL
maintainers.
Let us know, Are we are OK with below change in l2fwd as
- Introduce an array-based port lookup table instead of hardcoding to
xor based lookup.
- if no argument specified fill dest port as xor of source
- If argument is specified override the lookup table with a
user-specified destination port.



>
>
Jerin Kollanukkaran April 4, 2020, 4:34 p.m. UTC | #11
> >
> > Should we have a command line option to configure the forwarding rules
> > in the application (2)? I think yes.
> > Should we implement an application logic to automatically create the
> > best forwarding rules (1)? It would be nice, but anyway, I think we
> > need manual config (2) as a fallback.
> >
> >
> > > Thoughts? If you think, there is a rework needed in eal then could
> > > you enumerate the items for the rework.
> >
> > Sorry I don't have time to describe dive into EAL probing and
> > enumerate the items to rework.
> > The most important issues I remind are:
> >         - white/blacklist policy is a mess and should be done in a higher layer
> >         - devargs syntax should allow generic matching (thanks to
> > class awareness and generic syntax)
> >
> > Starting from these 2 items, we could imagine a generic path to
> > disable automatic probing, but I think the l2fwd logic should not rely
> > on probing order anyway.
> 
> + Bruce as l2fwd maintainer.
> 
> Since in any case, l2fwd needs to be updated, We will focus on l2fwd change
> for v20.05 and leaving the fate of this patch to EAL maintainers.
> Let us know, Are we are OK with below change in l2fwd as
> - Introduce an array-based port lookup table instead of hardcoding to xor based
> lookup.
> - if no argument specified fill dest port as xor of source
> - If argument is specified override the lookup table with a user-specified
> destination port.

I think, this thread can be closed here.

L2fwd change sent as different patch.

Please review if it is for interest. 

http://patches.dpdk.org/patch/67722/


> 
> 
> 
> >
> >

Patch
diff mbox series

diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index ed8b0e35b..d0717d4a0 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -69,6 +69,19 @@  Device-related options
 
        --vdev 'net_pcap0,rx_pcap=input.pcap,tx_pcap=output.pcap'
 
+*   ``--manual-probe``
+
+    Switch the ``EAL`` probe mode to manual. The main bus probe step
+    is disabled and applications are expected to manually probe
+    devices using ``rte_dev_probe()``.
+
+    Devices declared on the command-line using ``-w`` and ``-vdev``
+    are interpreted as hotplug commands. They are thus probed in the
+    order they are declared.
+
+    This makes this option useful to enforce a specific device probe
+    order, instead of relying on each bus scan implementation details.
+
 *   ``-d <path to shared object or directory>``
 
     Load external drivers. An argument can be a single shared object file, or a
diff --git a/doc/guides/rel_notes/release_20_02.rst b/doc/guides/rel_notes/release_20_02.rst
index 50e2c1484..f6b3b3def 100644
--- a/doc/guides/rel_notes/release_20_02.rst
+++ b/doc/guides/rel_notes/release_20_02.rst
@@ -56,6 +56,15 @@  New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **EAL will now allow probing devices manually.**
+
+  Previously, a user could not force an order when probing declared devices.
+  This could cause issues for drivers depending on another device being present.
+  A new option ``--manual-probe`` is now available to do just that.
+  This new option relies on the device bus supporting hotplug. It can
+  also be used to disable automatic probing from the ``PCI`` bus without
+  having to disable the whole bus.
+
 * **Added Wait Until Equal API.**
 
   A new API has been added to wait for a memory location to be updated with a
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index baa5b532a..145a96812 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -6,6 +6,7 @@ 
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_eal.h>
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
@@ -63,6 +64,11 @@  rte_bus_probe(void)
 	int ret;
 	struct rte_bus *bus, *vbus = NULL;
 
+	if (rte_eal_manual_probe()) {
+		RTE_LOG(DEBUG, EAL, "Manual probing enabled.\n");
+		return rte_dev_probe_devargs_list();
+	}
+
 	TAILQ_FOREACH(bus, &rte_bus_list, next) {
 		if (!strcmp(bus->name, "vdev")) {
 			vbus = bus;
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 9e4f09d83..368afa273 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -21,6 +21,7 @@ 
 #include <rte_malloc.h>
 #include <rte_string_fns.h>
 
+#include "eal_internal_cfg.h"
 #include "eal_private.h"
 #include "hotplug_mp.h"
 
@@ -83,6 +84,59 @@  rte_dev_is_probed(const struct rte_device *dev)
 	return dev->driver != NULL;
 }
 
+int
+rte_eal_manual_probe(void)
+{
+	return internal_config.manual_probe;
+}
+
+void
+rte_eal_manual_probe_set(int enabled)
+{
+	internal_config.manual_probe = !!enabled;
+}
+
+int
+rte_dev_probe_devargs_list(void)
+{
+	struct rte_device *dev;
+	struct rte_devargs *da;
+	int ret;
+
+	RTE_EAL_DEVARGS_FOREACH(NULL, da) {
+		dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
+		if (dev == NULL) {
+			RTE_LOG(ERR, EAL, "Unable to find device %s on bus %s\n",
+				da->name, da->bus->name);
+			continue;
+		}
+
+		if (rte_dev_is_probed(dev))
+			continue;
+
+		if (dev->bus->plug == NULL) {
+			RTE_LOG(ERR, EAL, "Manual probing (hotplug) not supported by bus %s, "
+					  "required by device %s\n",
+				dev->bus->name, dev->name);
+			continue;
+		}
+
+		ret = dev->bus->plug(dev);
+		/* Ignore positive return values, they are possibly
+		 * triggered by blacklisted devices on the PCI bus. Probing
+		 * should then continue.
+		 */
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL, "Driver cannot attach device %s\n",
+				dev->name);
+			/* Fail on first real probe error. */
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 /* helper function to build devargs, caller should free the memory */
 static int
 build_devargs(const char *busname, const char *devname,
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 5920233bc..f899eea4d 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -82,6 +82,7 @@  eal_long_options[] = {
 	{OPT_LEGACY_MEM,        0, NULL, OPT_LEGACY_MEM_NUM       },
 	{OPT_SINGLE_FILE_SEGMENTS, 0, NULL, OPT_SINGLE_FILE_SEGMENTS_NUM},
 	{OPT_MATCH_ALLOCATIONS, 0, NULL, OPT_MATCH_ALLOCATIONS_NUM},
+	{OPT_MANUAL_PROBE,      0, NULL, OPT_MANUAL_PROBE_NUM     },
 	{0,                     0, NULL, 0                        }
 };
 
@@ -1443,6 +1444,9 @@  eal_parse_common_option(int opt, const char *optarg,
 			return -1;
 		}
 		break;
+	case OPT_MANUAL_PROBE_NUM:
+		rte_eal_manual_probe_set(1);
+		break;
 
 	/* don't know what to do, leave this to caller */
 	default:
@@ -1669,6 +1673,10 @@  eal_common_usage(void)
 	       "  --"OPT_VDEV"              Add a virtual device.\n"
 	       "                      The argument format is <driver><id>[,key=val,...]\n"
 	       "                      (ex: --vdev=net_pcap0,iface=eth2).\n"
+	       "  --"OPT_MANUAL_PROBE"      Enable manual probing.\n"
+	       "                      Disable probe step for all buses.\n"
+	       "                      Devices will need to be probed using the hotplug API.\n"
+	       "                      PCI and vdev declarations will be treated in order as hotplug commands.\n"
 	       "  --"OPT_IOVA_MODE"   Set IOVA mode. 'pa' for IOVA_PA\n"
 	       "                      'va' for IOVA_VA\n"
 	       "  -d LIB.so|DIR       Add a driver or driver directory\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index a42f34923..0006f903f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -44,6 +44,7 @@  struct internal_config {
 	unsigned hugepage_unlink;         /**< true to unlink backing files */
 	volatile unsigned no_pci;         /**< true to disable PCI */
 	volatile unsigned no_hpet;        /**< true to disable HPET */
+	volatile unsigned manual_probe;   /**< true to enable manual device probing. */
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
 	volatile unsigned no_shconf;      /**< true if there is no shared config */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 9855429e5..588fa32a6 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -69,6 +69,8 @@  enum {
 	OPT_IOVA_MODE_NUM,
 #define OPT_MATCH_ALLOCATIONS  "match-allocations"
 	OPT_MATCH_ALLOCATIONS_NUM,
+#define OPT_MANUAL_PROBE "manual-probe"
+	OPT_MANUAL_PROBE_NUM,
 	OPT_LONG_MAX_NUM
 };
 
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index ddcfbe2e4..680c7db88 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -443,4 +443,13 @@  rte_option_usage(void);
 uint64_t
 eal_get_baseaddr(void);
 
+/**
+ * Go through the devargs list and probe everything in order.
+ *
+ * @return
+ *   0 on success, negative on error.
+ */
+int
+rte_dev_probe_devargs_list(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 2f9ed298d..7195f6859 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -421,6 +421,42 @@  int rte_eal_has_hugepages(void);
  */
 int rte_eal_has_pci(void);
 
+/**
+ * Whether EAL probe is manual.
+ * Enabled by the --manual-probe option or by
+ * using rte_eal_manual_probe_set().
+ *
+ * When manual probing is enabled, batched bus probe of
+ * their devices is disabled. All devices need to be probed
+ * using the proper rte_dev API.
+ *
+ * In this mode, devices declared on the command line will
+ * be probed using the bus hotplug API. It is used to enforce
+ * a specific probe order.
+ *
+ * @return
+ *   Nonzero if manual device probing is enabled.
+ *
+ * @see rte_eal_manual_probe_set
+ */
+__rte_experimental
+int rte_eal_manual_probe(void);
+
+/**
+ * Configure EAL probe mode -- manual or automatic.
+ *
+ * Enable or disable manual probe mode in EAL.
+ * This function can be called at any time, but must be used
+ * before calling rte_eal_init() to have any effect.
+ *
+ * @param enabled
+ *   zero to disable manual probe, non-zero to enable it.
+ *
+ * @see rte_eal_manual_probe
+ */
+__rte_experimental
+void rte_eal_manual_probe_set(int enabled);
+
 /**
  * Whether the EAL was asked to create UIO device.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index e38d02530..13d04a8bc 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -332,4 +332,8 @@  EXPERIMENTAL {
 	# added in 19.11
 	rte_log_get_stream;
 	rte_mcfg_get_single_file_segments;
+
+	# added in 20.02
+	rte_eal_manual_probe;
+	rte_eal_manual_probe_set;
 };