[v2,03/17] doc: add detail on using max SIMD bitwidth

Message ID 20200827161304.32300-4-ciara.power@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series add max SIMD bitwidth to EAL |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Ciara Power Aug. 27, 2020, 4:12 p.m. UTC
  This patch adds documentation on the usage of the max SIMD bitwidth EAL
setting, and how to use it to enable AVX-512 at runtime.

Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: John McNamara <john.mcnamara@intel.com>
Cc: Marko Kovacevic <marko.kovacevic@intel.com>

Signed-off-by: Ciara Power <ciara.power@intel.com>
---
 doc/guides/howto/avx512.rst                   | 36 +++++++++++++++++++
 doc/guides/linux_gsg/eal_args.include.rst     | 12 +++++++
 .../prog_guide/env_abstraction_layer.rst      | 31 ++++++++++++++++
 3 files changed, 79 insertions(+)
 create mode 100644 doc/guides/howto/avx512.rst
  

Comments

Ananyev, Konstantin Sept. 6, 2020, 10:20 p.m. UTC | #1
> This patch adds documentation on the usage of the max SIMD bitwidth EAL
> setting, and how to use it to enable AVX-512 at runtime.
> 
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: John McNamara <john.mcnamara@intel.com>
> Cc: Marko Kovacevic <marko.kovacevic@intel.com>
> 
> Signed-off-by: Ciara Power <ciara.power@intel.com>
> ---
>  doc/guides/howto/avx512.rst                   | 36 +++++++++++++++++++
>  doc/guides/linux_gsg/eal_args.include.rst     | 12 +++++++
>  .../prog_guide/env_abstraction_layer.rst      | 31 ++++++++++++++++
>  3 files changed, 79 insertions(+)
>  create mode 100644 doc/guides/howto/avx512.rst
> 
> diff --git a/doc/guides/howto/avx512.rst b/doc/guides/howto/avx512.rst
> new file mode 100644
> index 0000000000..ebae0f2b4f
> --- /dev/null
> +++ b/doc/guides/howto/avx512.rst
> @@ -0,0 +1,36 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2020 Intel Corporation.
> +
> +
> +Using AVX-512 with DPDK
> +=======================
> +
> +AVX-512 is not used by default in DPDK, but it can be selected at runtime by apps through the use of EAL API,
> +and by the user with a commandline argument. DPDK has a setting for max SIMD bitwidth,
> +which can be modified and will then limit the vector path taken by the code.

It's is a good idea to have such ability,
though just one global variable for all DPDK lib/drivers
seems a bit coarse to me.
Let say we have 2 libs: libA and libB.
Both do have RTE_MAX_512_SIMD specific code-path,
though libA  would cause frequency level change, while libB wouldn't.
So user (to avoid frequency level change) would have to block
512_SIMD for both libs.
I think it would be much better to follow the strategy we use for log-level:
there is a global simd_width, but each DDPK entity (lib/driver) also has   
it's own simd_width that overrules a global one (more fine-grained control).

> +
> +
> +Using the API in apps
> +---------------------
> +
> +Apps can request DPDK uses AVX-512 at runtime, if it provides improved application performance.
> +This can be done by modifying the EAL setting for max SIMD bitwidth to 512, as by default it is 256,
> +which does not allow for AVX-512.
> +
> +.. code-block:: c
> +
> +   rte_set_max_simd_bitwidth(RTE_MAX_512_SIMD);
> +
> +This API should only be called once at initialization, before EAL init.

If the only possible usage scenario for that function is init time before  EAL init,
then do we really need it at all?
As we have cmd-line flag anyway?
User can achieve similar goal, by just:  rte_eal_init(,..."--force-max-simd-bitwidth=..."...); 

> +For more information on the possible enum values to use as a parameter, go to :ref:`max_simd_bitwidth`:
> +
> +
> +Using the command-line argument
> +---------------------------------------------
> +
> +The user can select to use AVX-512 at runtime, using the following argument to set the max bitwidth::
> +
> +   ./app/dpdk-testpmd --force-max-simd-bitwidth=512
> +
> +This will override any further changes to the max SIMD bitwidth in DPDK,
> +which is useful for testing purposes.
> diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
> index 0fe4457968..bab3e14e47 100644
> --- a/doc/guides/linux_gsg/eal_args.include.rst
> +++ b/doc/guides/linux_gsg/eal_args.include.rst
> @@ -210,3 +210,15 @@ Other options
>  *    ``--no-telemetry``:
> 
>      Disable telemetry.
> +
> +*    ``--force-max-simd-bitwidth=<val>``:
> +
> +    Specify the maximum SIMD bitwidth size to handle. This limits which vector paths,
> +    if any, are taken, as any paths taken must use a bitwidth below the max bitwidth limit.
> +    For example, to allow all SIMD bitwidths up to and including AVX-512::
> +
> +        --force-max-simd-bitwidth=512
> +
> +    The following example shows limiting the bitwidth to 64-bits to disable all vector code::
> +
> +        --force-max-simd-bitwidth=64
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index f64ae953d1..74f26ed6c9 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -486,6 +486,37 @@ the desired addressing mode when virtual devices that are not directly attached
>  To facilitate forcing the IOVA mode to a specific value the EAL command line option ``--iova-mode`` can
>  be used to select either physical addressing('pa') or virtual addressing('va').
> 
> +.. _max_simd_bitwidth:
> +
> +
> +Max SIMD bitwidth
> +~~~~~~~~~~~~~~~~~
> +
> +The EAL provides a single setting to limit the max SIMD bitwidth used by DPDK,
> +which is used in determining the vector path, if any, chosen by a component.
> +The value can be set at runtime by an application using the 'rte_set_max_simd_bitwidth(uint16_t bitwidth)' function,
> +which should only be called once at initialization, before EAL init.
> +The value can be overridden by the user using the EAL command-line option '--force-max-sim-bitwidth'.
> +
> +When choosing a vector path, along with checking the CPU feature support,
> +the value of the max SIMD bitwidth must also be checked, and can be retrieved using the 'rte_get_max_simd_bitwidth()' function.
> +The value should be compared against the enum values for accepted max SIMD bitwidths:
> +
> +.. code-block:: c
> +
> +   enum rte_max_simd_t {
> +       RTE_NO_SIMD = 64,
> +       RTE_MAX_128_SIMD = 128,
> +       RTE_MAX_256_SIMD = 256,
> +       RTE_MAX_512_SIMD = 512
> +   };
> +
> +    if (rte_get_max_simd_bitwidth() >= RTE_MAX_512_SIMD)
> +        /* Take AVX-512 vector path */
> +    else if (rte_get_max_simd_bitwidth() >= RTE_MAX_256_SIMD)
> +        /* Take AVX2 vector path */
> +
> +
>  Memory Segments and Memory Zones (memzone)
>  ------------------------------------------
> 
> --
> 2.17.1
  
Bruce Richardson Sept. 7, 2020, 8:44 a.m. UTC | #2
On Sun, Sep 06, 2020 at 10:20:30PM +0000, Ananyev, Konstantin wrote:
> > This patch adds documentation on the usage of the max SIMD bitwidth EAL
> > setting, and how to use it to enable AVX-512 at runtime.
> > 
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: John McNamara <john.mcnamara@intel.com>
> > Cc: Marko Kovacevic <marko.kovacevic@intel.com>
> > 
> > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > ---
> >  doc/guides/howto/avx512.rst                   | 36 +++++++++++++++++++
> >  doc/guides/linux_gsg/eal_args.include.rst     | 12 +++++++
> >  .../prog_guide/env_abstraction_layer.rst      | 31 ++++++++++++++++
> >  3 files changed, 79 insertions(+)
> >  create mode 100644 doc/guides/howto/avx512.rst
> > 
> > diff --git a/doc/guides/howto/avx512.rst b/doc/guides/howto/avx512.rst
> > new file mode 100644
> > index 0000000000..ebae0f2b4f
> > --- /dev/null
> > +++ b/doc/guides/howto/avx512.rst
> > @@ -0,0 +1,36 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(c) 2020 Intel Corporation.
> > +
> > +
> > +Using AVX-512 with DPDK
> > +=======================
> > +
> > +AVX-512 is not used by default in DPDK, but it can be selected at runtime by apps through the use of EAL API,
> > +and by the user with a commandline argument. DPDK has a setting for max SIMD bitwidth,
> > +which can be modified and will then limit the vector path taken by the code.
> 
> It's is a good idea to have such ability,
> though just one global variable for all DPDK lib/drivers
> seems a bit coarse to me.
> Let say we have 2 libs: libA and libB.
> Both do have RTE_MAX_512_SIMD specific code-path,
> though libA  would cause frequency level change, while libB wouldn't.
> So user (to avoid frequency level change) would have to block
> 512_SIMD for both libs.
> I think it would be much better to follow the strategy we use for log-level:
> there is a global simd_width, but each DDPK entity (lib/driver) also has   
> it's own simd_width that overrules a global one (more fine-grained control).

That for me is a nightmare scenario. How is the user meant to know what
libs could cause him a frequency or not, or is he meant to determine that
empirically by trial and error on each platform? This scenario is
completely unlike logging in that it's non-obvious to the user, and so
needs to be kept as consumable as possible to the app-developer and the
user. Unless we find a concrete scenario where having a single switch is
causing real user problems, I'd much rather keep things simple. See also
answer below, where I point out that the main target of this is developers,
who can use this flag to indicate what vector bitwidth their app uses, and
then allow DPDK to match that.

> 
> > +
> > +
> > +Using the API in apps
> > +---------------------
> > +
> > +Apps can request DPDK uses AVX-512 at runtime, if it provides improved application performance.
> > +This can be done by modifying the EAL setting for max SIMD bitwidth to 512, as by default it is 256,
> > +which does not allow for AVX-512.
> > +
> > +.. code-block:: c
> > +
> > +   rte_set_max_simd_bitwidth(RTE_MAX_512_SIMD);
> > +
> > +This API should only be called once at initialization, before EAL init.
> 
> If the only possible usage scenario for that function is init time before  EAL init,
> then do we really need it at all?
> As we have cmd-line flag anyway?
> User can achieve similar goal, by just:  rte_eal_init(,..."--force-max-simd-bitwidth=..."...); 

Ideally, the user should never know or care about the cmdline flag, it's
only for testing. The main criteria for allowing DPDK to use longer
instruction sets is whether the application itself will similarly use them,
and that's something for the programmer to do. Having the programmer muck
about with cmdline arguments is less than ideal, so a proper API is
warrented here. The reason for the note about EAL init, is that we don't
want libraries to have to check the max bitwidth each time an API is
called, so we want to have a way to prevent people changing things at
runtime. This therefore seemed simplest.

/Bruce
  
Ananyev, Konstantin Sept. 7, 2020, 12:01 p.m. UTC | #3
> On Sun, Sep 06, 2020 at 10:20:30PM +0000, Ananyev, Konstantin wrote:
> > > This patch adds documentation on the usage of the max SIMD bitwidth EAL
> > > setting, and how to use it to enable AVX-512 at runtime.
> > >
> > > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > > Cc: John McNamara <john.mcnamara@intel.com>
> > > Cc: Marko Kovacevic <marko.kovacevic@intel.com>
> > >
> > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > > ---
> > >  doc/guides/howto/avx512.rst                   | 36 +++++++++++++++++++
> > >  doc/guides/linux_gsg/eal_args.include.rst     | 12 +++++++
> > >  .../prog_guide/env_abstraction_layer.rst      | 31 ++++++++++++++++
> > >  3 files changed, 79 insertions(+)
> > >  create mode 100644 doc/guides/howto/avx512.rst
> > >
> > > diff --git a/doc/guides/howto/avx512.rst b/doc/guides/howto/avx512.rst
> > > new file mode 100644
> > > index 0000000000..ebae0f2b4f
> > > --- /dev/null
> > > +++ b/doc/guides/howto/avx512.rst
> > > @@ -0,0 +1,36 @@
> > > +..  SPDX-License-Identifier: BSD-3-Clause
> > > +    Copyright(c) 2020 Intel Corporation.
> > > +
> > > +
> > > +Using AVX-512 with DPDK
> > > +=======================
> > > +
> > > +AVX-512 is not used by default in DPDK, but it can be selected at runtime by apps through the use of EAL API,
> > > +and by the user with a commandline argument. DPDK has a setting for max SIMD bitwidth,
> > > +which can be modified and will then limit the vector path taken by the code.
> >
> > It's is a good idea to have such ability,
> > though just one global variable for all DPDK lib/drivers
> > seems a bit coarse to me.
> > Let say we have 2 libs: libA and libB.
> > Both do have RTE_MAX_512_SIMD specific code-path,
> > though libA  would cause frequency level change, while libB wouldn't.
> > So user (to avoid frequency level change) would have to block
> > 512_SIMD for both libs.
> > I think it would be much better to follow the strategy we use for log-level:
> > there is a global simd_width, but each DDPK entity (lib/driver) also has
> > it's own simd_width that overrules a global one (more fine-grained control).
> 
> That for me is a nightmare scenario. How is the user meant to know what
> libs could cause him a frequency or not, or is he meant to determine that
> empirically by trial and error on each platform? 

I suppose yes.
Let say user can try to run the appp with global
--force-max-simd-bitwidth=256 and --force-max-simd-bitwidth=512
and check the diffenrence.
If he is happy with performance he get, he can stick with one of global values (256/512).
If not he can try further with choosing different max-simd-width for different components.

>This scenario is
> completely unlike logging in that it's non-obvious to the user, and so
> needs to be kept as consumable as possible to the app-developer and the
> user.

This feature is totally optional, if user feels like he doesn't need to care about it,
he can simply ignore it and use default values.
Though for those who do care, one global value seems too restrictive.

> Unless we find a concrete scenario where having a single switch is
> causing real user problems, I'd much rather keep things simple.

As an example, I run several perf tests with acl avx512 code path and
so far didn't see any switches to CORE_POWER.LVL2_TURBO_LICENSE
(heavy AVX512 instructions).
I presume there might be other light-weight avx512 codepaths (lpm, etc.).
Though for crypto cpu PMDs (aesni-mb, etc.) I think it would cause switch
to the LVL2.

> See also answer below, where I point out that the main target of this is developers,
> who can use this flag to indicate what vector bitwidth their app uses,
> and then allow DPDK to match that.

But in majority if cases developer doesn't know for sure on what platform his app will run
(unless quite rare situation when app is developed for one particular platform).
Again for complex/multi-purpose applications (like VPP, DPDK-OVS) developer can't even
always predict what modules will be used and which wouldn't.
Again app can be configured in a way that different modules can run on different cores
(let say module that does ACL lookup on core X, module that does actual crypto on core Y).  
All that depends on particular deployment scenarios.
So in many cases only end-user has all information to decide what max-simd width will be optimal.  

> 
> >
> > > +
> > > +
> > > +Using the API in apps
> > > +---------------------
> > > +
> > > +Apps can request DPDK uses AVX-512 at runtime, if it provides improved application performance.
> > > +This can be done by modifying the EAL setting for max SIMD bitwidth to 512, as by default it is 256,
> > > +which does not allow for AVX-512.
> > > +
> > > +.. code-block:: c
> > > +
> > > +   rte_set_max_simd_bitwidth(RTE_MAX_512_SIMD);
> > > +
> > > +This API should only be called once at initialization, before EAL init.
> >
> > If the only possible usage scenario for that function is init time before  EAL init,
> > then do we really need it at all?
> > As we have cmd-line flag anyway?
> > User can achieve similar goal, by just:  rte_eal_init(,..."--force-max-simd-bitwidth=..."...);
> 
> Ideally, the user should never know or care about the cmdline flag, it's
> only for testing. The main criteria for allowing DPDK to use longer
> instruction sets is whether the application itself will similarly use them,
> and that's something for the programmer to do.

Unfortunately, I don't think programmer also has all information to make such decisions.
A lot depends on deployment scenarios, see above. 
 
> Having the programmer muck
> about with cmdline arguments is less than ideal, so a proper API is
> warrented here. 

Agree, function call is more convenient for the developer.

>The reason for the note about EAL init, is that we don't
> want libraries to have to check the max bitwidth each time an API is
> called, so we want to have a way to prevent people changing things at
> runtime. This therefore seemed simplest.

I understand that, but for that purpose just cmd-line flag is enough,
that's why I asked do we need an API call at all.
It seems a bit strange to me to introduce an API that supposed to be called
only *before* eal_init(), but from other side I don't see much harm from it either.
So if you and other guys still prefer to keep it - ok by me.
Konstantin
  

Patch

diff --git a/doc/guides/howto/avx512.rst b/doc/guides/howto/avx512.rst
new file mode 100644
index 0000000000..ebae0f2b4f
--- /dev/null
+++ b/doc/guides/howto/avx512.rst
@@ -0,0 +1,36 @@ 
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Intel Corporation.
+
+
+Using AVX-512 with DPDK
+=======================
+
+AVX-512 is not used by default in DPDK, but it can be selected at runtime by apps through the use of EAL API,
+and by the user with a commandline argument. DPDK has a setting for max SIMD bitwidth,
+which can be modified and will then limit the vector path taken by the code.
+
+
+Using the API in apps
+---------------------
+
+Apps can request DPDK uses AVX-512 at runtime, if it provides improved application performance.
+This can be done by modifying the EAL setting for max SIMD bitwidth to 512, as by default it is 256,
+which does not allow for AVX-512.
+
+.. code-block:: c
+
+   rte_set_max_simd_bitwidth(RTE_MAX_512_SIMD);
+
+This API should only be called once at initialization, before EAL init.
+For more information on the possible enum values to use as a parameter, go to :ref:`max_simd_bitwidth`:
+
+
+Using the command-line argument
+---------------------------------------------
+
+The user can select to use AVX-512 at runtime, using the following argument to set the max bitwidth::
+
+   ./app/dpdk-testpmd --force-max-simd-bitwidth=512
+
+This will override any further changes to the max SIMD bitwidth in DPDK,
+which is useful for testing purposes.
diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 0fe4457968..bab3e14e47 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -210,3 +210,15 @@  Other options
 *    ``--no-telemetry``:
 
     Disable telemetry.
+
+*    ``--force-max-simd-bitwidth=<val>``:
+
+    Specify the maximum SIMD bitwidth size to handle. This limits which vector paths,
+    if any, are taken, as any paths taken must use a bitwidth below the max bitwidth limit.
+    For example, to allow all SIMD bitwidths up to and including AVX-512::
+
+        --force-max-simd-bitwidth=512
+
+    The following example shows limiting the bitwidth to 64-bits to disable all vector code::
+
+        --force-max-simd-bitwidth=64
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index f64ae953d1..74f26ed6c9 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -486,6 +486,37 @@  the desired addressing mode when virtual devices that are not directly attached
 To facilitate forcing the IOVA mode to a specific value the EAL command line option ``--iova-mode`` can
 be used to select either physical addressing('pa') or virtual addressing('va').
 
+.. _max_simd_bitwidth:
+
+
+Max SIMD bitwidth
+~~~~~~~~~~~~~~~~~
+
+The EAL provides a single setting to limit the max SIMD bitwidth used by DPDK,
+which is used in determining the vector path, if any, chosen by a component.
+The value can be set at runtime by an application using the 'rte_set_max_simd_bitwidth(uint16_t bitwidth)' function,
+which should only be called once at initialization, before EAL init.
+The value can be overridden by the user using the EAL command-line option '--force-max-sim-bitwidth'.
+
+When choosing a vector path, along with checking the CPU feature support,
+the value of the max SIMD bitwidth must also be checked, and can be retrieved using the 'rte_get_max_simd_bitwidth()' function.
+The value should be compared against the enum values for accepted max SIMD bitwidths:
+
+.. code-block:: c
+
+   enum rte_max_simd_t {
+       RTE_NO_SIMD = 64,
+       RTE_MAX_128_SIMD = 128,
+       RTE_MAX_256_SIMD = 256,
+       RTE_MAX_512_SIMD = 512
+   };
+
+    if (rte_get_max_simd_bitwidth() >= RTE_MAX_512_SIMD)
+        /* Take AVX-512 vector path */
+    else if (rte_get_max_simd_bitwidth() >= RTE_MAX_256_SIMD)
+        /* Take AVX2 vector path */
+
+
 Memory Segments and Memory Zones (memzone)
 ------------------------------------------