[v3,17/18] net: add checks for max SIMD bitwidth

Message ID 20200930130415.11211-18-ciara.power@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series add max SIMD bitwidth to EAL |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Power, Ciara Sept. 30, 2020, 1:04 p.m. UTC
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path.

The vector path was initially chosen in RTE_INIT, however this is no
longer suitable as we cannot check the max SIMD bitwidth at that time.
The default chosen in RTE_INIT is now scalar. For best performance
and to use vector paths, apps must explicitly call the set algorithm
function before using other functions from this library, as this is
where vector handlers are now chosen.

Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>

Signed-off-by: Ciara Power <ciara.power@intel.com>

---
v3:
  - Moved choosing vector paths out of RTE_INIT.
  - Moved checking max_simd_bitwidth into the set_alg function.
---
 lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
 lib/librte_net/rte_net_crc.h |  3 ++-
 2 files changed, 19 insertions(+), 10 deletions(-)
  

Comments

Coyle, David Sept. 30, 2020, 3:03 p.m. UTC | #1
Hi Ciara,

> From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power
> When choosing a vector path to take, an extra condition must be satisfied to
> ensure the max SIMD bitwidth allows for the CPU enabled path.
> 
> The vector path was initially chosen in RTE_INIT, however this is no longer
> suitable as we cannot check the max SIMD bitwidth at that time.
> The default chosen in RTE_INIT is now scalar. For best performance and to
> use vector paths, apps must explicitly call the set algorithm function before
> using other functions from this library, as this is where vector handlers are
> now chosen.

[DC] Has it been decided that it is ok to now require applications to pick the
CRC algorithm they want to use?

An application which previously automatically got SSE4.2 CRC, for example, will
now automatically only get scalar.

If this is ok, this should probably be called out explicitly in release notes as it may
not be Immediately noticeable to users that they now need to select the CRC algo.

Actually, in general, the release notes need to be updated for this patchset.

> 
> Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
> 
> Signed-off-by: Ciara Power <ciara.power@intel.com>
> 
> ---
> v3:
>   - Moved choosing vector paths out of RTE_INIT.
>   - Moved checking max_simd_bitwidth into the set_alg function.
> ---
>  lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
> lib/librte_net/rte_net_crc.h |  3 ++-
>  2 files changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index
> 9fd4794a9d..241eb16399 100644
> --- a/lib/librte_net/rte_net_crc.c
> +++ b/lib/librte_net/rte_net_crc.c

<snip>

> @@ -145,18 +149,26 @@ rte_crc32_eth_handler(const uint8_t *data,
> uint32_t data_len)  void  rte_net_crc_set_alg(enum rte_net_crc_alg alg)  {
> +	if (max_simd_bitwidth == 0)
> +		max_simd_bitwidth = rte_get_max_simd_bitwidth();
> +
>  	switch (alg) {
>  #ifdef X86_64_SSE42_PCLMULQDQ
>  	case RTE_NET_CRC_SSE42:
> -		handlers = handlers_sse42;
> -		break;
> +		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
> +			handlers = handlers_sse42;
> +			return;
> +		}
> +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using
> scalar\n");

[DC] Not sure if you're aware but there is another patchset which adds an AVX512 CRC
implementation and run-time checking of cpuflags to select the CRC path to use:
https://patchwork.dpdk.org/project/dpdk/list/?series=12596

There will be a task to merge these 2 patchsets if both are merged. It looks fairly
straightforward to me to merge these, but it would be good if you take a look too
  
Jasvinder Singh Sept. 30, 2020, 3:49 p.m. UTC | #2
> -----Original Message-----
> From: Coyle, David <david.coyle@intel.com>
> Sent: Wednesday, September 30, 2020 4:04 PM
> To: Power, Ciara <ciara.power@intel.com>; dev@dpdk.org
> Cc: Power, Ciara <ciara.power@intel.com>; Singh, Jasvinder
> <jasvinder.singh@intel.com>; Olivier Matz <olivier.matz@6wind.com>;
> O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>; Ryan, Brendan
> <brendan.ryan@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v3 17/18] net: add checks for max SIMD
> bitwidth
> 
> Hi Ciara,
> 
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power When
> > choosing a vector path to take, an extra condition must be satisfied
> > to ensure the max SIMD bitwidth allows for the CPU enabled path.
> >
> > The vector path was initially chosen in RTE_INIT, however this is no
> > longer suitable as we cannot check the max SIMD bitwidth at that time.
> > The default chosen in RTE_INIT is now scalar. For best performance and
> > to use vector paths, apps must explicitly call the set algorithm
> > function before using other functions from this library, as this is
> > where vector handlers are now chosen.
> 
> [DC] Has it been decided that it is ok to now require applications to pick the
> CRC algorithm they want to use?
> 
> An application which previously automatically got SSE4.2 CRC, for example,
> will now automatically only get scalar.
> 
> If this is ok, this should probably be called out explicitly in release notes as it
> may not be Immediately noticeable to users that they now need to select the
> CRC algo.
> 
> Actually, in general, the release notes need to be updated for this patchset.

The decision to move rte_set_alg() out of RTE_INIT was taken to avoid check on max_simd_bitwidth in data path for every single time when crc_calc() api is invoked. Based on my understanding, max_simd_bitwidth is set after eal init, and when used in crc_calc(), it might override the default crc algo set during RTE_INIT. Therefore, to avoid extra check on max_simd_bitwidth in data path,  better option will be to use this static configuration one time after eal init in the set_algo API. 

 
> >
> > Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
> >
> > Signed-off-by: Ciara Power <ciara.power@intel.com>
> >
> > ---
> > v3:
> >   - Moved choosing vector paths out of RTE_INIT.
> >   - Moved checking max_simd_bitwidth into the set_alg function.
> > ---
> >  lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
> > lib/librte_net/rte_net_crc.h |  3 ++-
> >  2 files changed, 19 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/librte_net/rte_net_crc.c
> > b/lib/librte_net/rte_net_crc.c index
> > 9fd4794a9d..241eb16399 100644
> > --- a/lib/librte_net/rte_net_crc.c
> > +++ b/lib/librte_net/rte_net_crc.c
> 
> <snip>
> 
> > @@ -145,18 +149,26 @@ rte_crc32_eth_handler(const uint8_t *data,
> > uint32_t data_len)  void  rte_net_crc_set_alg(enum rte_net_crc_alg
> > alg)  {
> > +	if (max_simd_bitwidth == 0)
> > +		max_simd_bitwidth = rte_get_max_simd_bitwidth();
> > +
> >  	switch (alg) {
> >  #ifdef X86_64_SSE42_PCLMULQDQ
> >  	case RTE_NET_CRC_SSE42:
> > -		handlers = handlers_sse42;
> > -		break;
> > +		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
> > +			handlers = handlers_sse42;
> > +			return;
> > +		}
> > +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using
> > scalar\n");
> 
> [DC] Not sure if you're aware but there is another patchset which adds an
> AVX512 CRC implementation and run-time checking of cpuflags to select the
> CRC path to use:
> https://patchwork.dpdk.org/project/dpdk/list/?series=12596
> 
> There will be a task to merge these 2 patchsets if both are merged. It looks
> fairly straightforward to me to merge these, but it would be good if you take
> a look too
  
Coyle, David Oct. 1, 2020, 2:16 p.m. UTC | #3
Hi Jasvinder/Ciara

> -----Original Message-----
> From: Singh, Jasvinder <jasvinder.singh@intel.com>
> >
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power When
> > > choosing a vector path to take, an extra condition must be satisfied
> > > to ensure the max SIMD bitwidth allows for the CPU enabled path.
> > >
> > > The vector path was initially chosen in RTE_INIT, however this is no
> > > longer suitable as we cannot check the max SIMD bitwidth at that time.
> > > The default chosen in RTE_INIT is now scalar. For best performance
> > > and to use vector paths, apps must explicitly call the set algorithm
> > > function before using other functions from this library, as this is
> > > where vector handlers are now chosen.
> >
> > [DC] Has it been decided that it is ok to now require applications to
> > pick the CRC algorithm they want to use?
> >
> > An application which previously automatically got SSE4.2 CRC, for
> > example, will now automatically only get scalar.
> >
> > If this is ok, this should probably be called out explicitly in
> > release notes as it may not be Immediately noticeable to users that
> > they now need to select the CRC algo.
> >
> > Actually, in general, the release notes need to be updated for this
> patchset.
> 
> The decision to move rte_set_alg() out of RTE_INIT was taken to avoid check
> on max_simd_bitwidth in data path for every single time when crc_calc() api
> is invoked. Based on my understanding, max_simd_bitwidth is set after eal
> init, and when used in crc_calc(), it might override the default crc algo set
> during RTE_INIT. Therefore, to avoid extra check on max_simd_bitwidth in
> data path,  better option will be to use this static configuration one time after
> eal init in the set_algo API.

[DC] Yes that is a good change to have made to avoid extra datapath checks.

Based on off-list discussion, I now also know the reason behind now defaulting
to scalar CRC in RTE_INIT. If a higher bitwidth CRC was chosen by RTE_INIT (e.g.
SSE4.2 CRC) but the max_simd_bitwidth was then set to RTE_NO_SIMD (64) through
the EAL parameter or call to rte_set_max_simd_bitwidth(), then there is a mismatch
if rte_net_crc_set_alg() is not then called to reconfigure the CRC. Defaulting to scalar
avoids this mismatch and works on all archs

As I mentioned before, I think this needs to be called out in release notes, as it's an
under-the-hood change which could cause app performance to drop if app developers
aren't aware of it - the API itself hasn't changed, so they may not read the doxygen :)

> 
> 
> > >
> > > Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
> > >
> > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > >
> > > ---
> > > v3:
> > >   - Moved choosing vector paths out of RTE_INIT.
> > >   - Moved checking max_simd_bitwidth into the set_alg function.
> > > ---
> > >  lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
> > > lib/librte_net/rte_net_crc.h |  3 ++-
> > >  2 files changed, 19 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/lib/librte_net/rte_net_crc.c
> > > b/lib/librte_net/rte_net_crc.c index
> > > 9fd4794a9d..241eb16399 100644
> > > --- a/lib/librte_net/rte_net_crc.c
> > > +++ b/lib/librte_net/rte_net_crc.c
> >
> > <snip>
> >
> > > @@ -145,18 +149,26 @@ rte_crc32_eth_handler(const uint8_t *data,
> > > uint32_t data_len)  void  rte_net_crc_set_alg(enum rte_net_crc_alg
> > > alg)  {
> > > +	if (max_simd_bitwidth == 0)
> > > +		max_simd_bitwidth = rte_get_max_simd_bitwidth();
> > > +
> > >  	switch (alg) {
> > >  #ifdef X86_64_SSE42_PCLMULQDQ
> > >  	case RTE_NET_CRC_SSE42:
> > > -		handlers = handlers_sse42;
> > > -		break;
> > > +		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
> > > +			handlers = handlers_sse42;
> > > +			return;
> > > +		}
> > > +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using
> > > scalar\n");
> >
> > [DC] Not sure if you're aware but there is another patchset which adds
> > an
> > AVX512 CRC implementation and run-time checking of cpuflags to select
> > the CRC path to use:
> > https://patchwork.dpdk.org/project/dpdk/list/?series=12596
> >
> > There will be a task to merge these 2 patchsets if both are merged. It
> > looks fairly straightforward to me to merge these, but it would be
> > good if you take a look too
  
Power, Ciara Oct. 1, 2020, 2:19 p.m. UTC | #4
Hi David,

Thanks for reviewing, 

>-----Original Message-----
>From: Coyle, David <david.coyle@intel.com>
>Sent: Thursday 1 October 2020 15:17
>To: Singh, Jasvinder <jasvinder.singh@intel.com>; Power, Ciara
><ciara.power@intel.com>; dev@dpdk.org
>Cc: Power, Ciara <ciara.power@intel.com>; Olivier Matz
><olivier.matz@6wind.com>; O'loingsigh, Mairtin
><mairtin.oloingsigh@intel.com>; Ryan, Brendan <brendan.ryan@intel.com>;
>Richardson, Bruce <bruce.richardson@intel.com>
>Subject: RE: [dpdk-dev] [PATCH v3 17/18] net: add checks for max SIMD
>bitwidth
>
>Hi Jasvinder/Ciara
>
>> -----Original Message-----
>> From: Singh, Jasvinder <jasvinder.singh@intel.com>
>> >
>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power When
>> > > choosing a vector path to take, an extra condition must be
>> > > satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
>path.
>> > >
>> > > The vector path was initially chosen in RTE_INIT, however this is
>> > > no longer suitable as we cannot check the max SIMD bitwidth at that
>time.
>> > > The default chosen in RTE_INIT is now scalar. For best performance
>> > > and to use vector paths, apps must explicitly call the set
>> > > algorithm function before using other functions from this library,
>> > > as this is where vector handlers are now chosen.
>> >
>> > [DC] Has it been decided that it is ok to now require applications
>> > to pick the CRC algorithm they want to use?
>> >
>> > An application which previously automatically got SSE4.2 CRC, for
>> > example, will now automatically only get scalar.
>> >
>> > If this is ok, this should probably be called out explicitly in
>> > release notes as it may not be Immediately noticeable to users that
>> > they now need to select the CRC algo.
>> >
>> > Actually, in general, the release notes need to be updated for this
>> patchset.
>>
>> The decision to move rte_set_alg() out of RTE_INIT was taken to avoid
>> check on max_simd_bitwidth in data path for every single time when
>> crc_calc() api is invoked. Based on my understanding,
>> max_simd_bitwidth is set after eal init, and when used in crc_calc(),
>> it might override the default crc algo set during RTE_INIT. Therefore,
>> to avoid extra check on max_simd_bitwidth in data path,  better option
>> will be to use this static configuration one time after eal init in the set_algo
>API.
>
>[DC] Yes that is a good change to have made to avoid extra datapath checks.
>
>Based on off-list discussion, I now also know the reason behind now
>defaulting to scalar CRC in RTE_INIT. If a higher bitwidth CRC was chosen by
>RTE_INIT (e.g.
>SSE4.2 CRC) but the max_simd_bitwidth was then set to RTE_NO_SIMD (64)
>through the EAL parameter or call to rte_set_max_simd_bitwidth(), then
>there is a mismatch if rte_net_crc_set_alg() is not then called to reconfigure
>the CRC. Defaulting to scalar avoids this mismatch and works on all archs
>
>As I mentioned before, I think this needs to be called out in release notes, as
>it's an under-the-hood change which could cause app performance to drop if
>app developers aren't aware of it - the API itself hasn't changed, so they may
>not read the doxygen :)
>

Yes that is a good point, I can add to the release notes for this to call it out. 

>>
>>
>> > >
>> > > Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
>> > >
>> > > Signed-off-by: Ciara Power <ciara.power@intel.com>
>> > >
>> > > ---
>> > > v3:
>> > >   - Moved choosing vector paths out of RTE_INIT.
>> > >   - Moved checking max_simd_bitwidth into the set_alg function.
>> > > ---
>> > >  lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
>> > > lib/librte_net/rte_net_crc.h |  3 ++-
>> > >  2 files changed, 19 insertions(+), 10 deletions(-)
>> > >
>> > > diff --git a/lib/librte_net/rte_net_crc.c
>> > > b/lib/librte_net/rte_net_crc.c index
>> > > 9fd4794a9d..241eb16399 100644
>> > > --- a/lib/librte_net/rte_net_crc.c
>> > > +++ b/lib/librte_net/rte_net_crc.c
>> >
>> > <snip>
>> >
>> > > @@ -145,18 +149,26 @@ rte_crc32_eth_handler(const uint8_t *data,
>> > > uint32_t data_len)  void  rte_net_crc_set_alg(enum rte_net_crc_alg
>> > > alg)  {
>> > > +	if (max_simd_bitwidth == 0)
>> > > +		max_simd_bitwidth = rte_get_max_simd_bitwidth();
>> > > +
>> > >  	switch (alg) {
>> > >  #ifdef X86_64_SSE42_PCLMULQDQ
>> > >  	case RTE_NET_CRC_SSE42:
>> > > -		handlers = handlers_sse42;
>> > > -		break;
>> > > +		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
>> > > +			handlers = handlers_sse42;
>> > > +			return;
>> > > +		}
>> > > +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using
>> > > scalar\n");
>> >
>> > [DC] Not sure if you're aware but there is another patchset which
>> > adds an
>> > AVX512 CRC implementation and run-time checking of cpuflags to
>> > select the CRC path to use:
>> > https://patchwork.dpdk.org/project/dpdk/list/?series=12596
>> >
>> > There will be a task to merge these 2 patchsets if both are merged.
>> > It looks fairly straightforward to me to merge these, but it would
>> > be good if you take a look too

I have looked at that patchset, I agree, I think they will be straightforward to merge together.

Thanks,
Ciara
  
Olivier Matz Oct. 6, 2020, 9:58 a.m. UTC | #5
Hi,

On Wed, Sep 30, 2020 at 02:04:13PM +0100, Ciara Power wrote:
> When choosing a vector path to take, an extra condition must be
> satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
> path.
> 
> The vector path was initially chosen in RTE_INIT, however this is no
> longer suitable as we cannot check the max SIMD bitwidth at that time.
> The default chosen in RTE_INIT is now scalar. For best performance
> and to use vector paths, apps must explicitly call the set algorithm
> function before using other functions from this library, as this is
> where vector handlers are now chosen.
> 
> Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
> 
> Signed-off-by: Ciara Power <ciara.power@intel.com>
> 
> ---
> v3:
>   - Moved choosing vector paths out of RTE_INIT.
>   - Moved checking max_simd_bitwidth into the set_alg function.
> ---
>  lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
>  lib/librte_net/rte_net_crc.h |  3 ++-
>  2 files changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
> index 9fd4794a9d..241eb16399 100644
> --- a/lib/librte_net/rte_net_crc.c
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -9,6 +9,7 @@
>  #include <rte_cpuflags.h>
>  #include <rte_common.h>
>  #include <rte_net_crc.h>
> +#include <rte_eal.h>
>  
>  #if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_PCLMULQDQ)
>  #define X86_64_SSE42_PCLMULQDQ     1
> @@ -60,6 +61,9 @@ static rte_net_crc_handler handlers_neon[] = {
>  };
>  #endif
>  
> +static uint16_t max_simd_bitwidth;
> +#define RTE_LOGTYPE_NET RTE_LOGTYPE_USER1

RTE_LOG_REGISTER() should be used instead.

> +
>  /**
>   * Reflect the bits about the middle
>   *
> @@ -145,18 +149,26 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
>  void
>  rte_net_crc_set_alg(enum rte_net_crc_alg alg)
>  {
> +	if (max_simd_bitwidth == 0)
> +		max_simd_bitwidth = rte_get_max_simd_bitwidth();
> +
>  	switch (alg) {
>  #ifdef X86_64_SSE42_PCLMULQDQ
>  	case RTE_NET_CRC_SSE42:
> -		handlers = handlers_sse42;
> -		break;
> +		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
> +			handlers = handlers_sse42;
> +			return;
> +		}
> +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using scalar\n");

If max_simd_bitwidth is too low, it will keep the previous value.
I think we should avoid to say "using scalar" in the log, even if it is
correct today. For instance, when the avx implementation will be added,
the log will become wrong.


>  #elif defined ARM64_NEON_PMULL
>  		/* fall-through */
>  	case RTE_NET_CRC_NEON:
> -		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> +		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL) &&
> +				max_simd_bitwidth >= RTE_MAX_128_SIMD) {
>  			handlers = handlers_neon;
> -			break;
> +			return;
>  		}
> +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low or CPU flag not enabled, using scalar\n");
>  #endif
>  		/* fall-through */
>  	case RTE_NET_CRC_SCALAR:
> @@ -184,19 +196,15 @@ rte_net_crc_calc(const void *data,
>  /* Select highest available crc algorithm as default one */
>  RTE_INIT(rte_net_crc_init)
>  {
> -	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
> -
>  	rte_net_crc_scalar_init();
>  
>  #ifdef X86_64_SSE42_PCLMULQDQ
> -	alg = RTE_NET_CRC_SSE42;
>  	rte_net_crc_sse42_init();
>  #elif defined ARM64_NEON_PMULL
>  	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> -		alg = RTE_NET_CRC_NEON;
>  		rte_net_crc_neon_init();
>  	}
>  #endif
>  
> -	rte_net_crc_set_alg(alg);
> +	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
>  }
> diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
> index 16e85ca970..7a45ebe193 100644
> --- a/lib/librte_net/rte_net_crc.h
> +++ b/lib/librte_net/rte_net_crc.h
> @@ -28,7 +28,8 @@ enum rte_net_crc_alg {
>  /**
>   * This API set the CRC computation algorithm (i.e. scalar version,
>   * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
> - * structure.
> + * structure. This should be called before any other functions, to
> + * choose the algorithm for best performance.
>   *
>   * @param alg
>   *   This parameter is used to select the CRC implementation version.
> -- 
> 2.17.1
>
  
Olivier Matz Oct. 6, 2020, 10 a.m. UTC | #6
Hi,

On Thu, Oct 01, 2020 at 02:19:37PM +0000, Power, Ciara wrote:
> Hi David,
> 
> Thanks for reviewing, 
> 
> >-----Original Message-----
> >From: Coyle, David <david.coyle@intel.com>
> >Sent: Thursday 1 October 2020 15:17
> >To: Singh, Jasvinder <jasvinder.singh@intel.com>; Power, Ciara
> ><ciara.power@intel.com>; dev@dpdk.org
> >Cc: Power, Ciara <ciara.power@intel.com>; Olivier Matz
> ><olivier.matz@6wind.com>; O'loingsigh, Mairtin
> ><mairtin.oloingsigh@intel.com>; Ryan, Brendan <brendan.ryan@intel.com>;
> >Richardson, Bruce <bruce.richardson@intel.com>
> >Subject: RE: [dpdk-dev] [PATCH v3 17/18] net: add checks for max SIMD
> >bitwidth
> >
> >Hi Jasvinder/Ciara
> >
> >> -----Original Message-----
> >> From: Singh, Jasvinder <jasvinder.singh@intel.com>
> >> >
> >> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power When
> >> > > choosing a vector path to take, an extra condition must be
> >> > > satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
> >path.
> >> > >
> >> > > The vector path was initially chosen in RTE_INIT, however this is
> >> > > no longer suitable as we cannot check the max SIMD bitwidth at that
> >time.
> >> > > The default chosen in RTE_INIT is now scalar. For best performance
> >> > > and to use vector paths, apps must explicitly call the set
> >> > > algorithm function before using other functions from this library,
> >> > > as this is where vector handlers are now chosen.
> >> >
> >> > [DC] Has it been decided that it is ok to now require applications
> >> > to pick the CRC algorithm they want to use?
> >> >
> >> > An application which previously automatically got SSE4.2 CRC, for
> >> > example, will now automatically only get scalar.
> >> >
> >> > If this is ok, this should probably be called out explicitly in
> >> > release notes as it may not be Immediately noticeable to users that
> >> > they now need to select the CRC algo.
> >> >
> >> > Actually, in general, the release notes need to be updated for this
> >> patchset.
> >>
> >> The decision to move rte_set_alg() out of RTE_INIT was taken to avoid
> >> check on max_simd_bitwidth in data path for every single time when
> >> crc_calc() api is invoked. Based on my understanding,
> >> max_simd_bitwidth is set after eal init, and when used in crc_calc(),
> >> it might override the default crc algo set during RTE_INIT. Therefore,
> >> to avoid extra check on max_simd_bitwidth in data path,  better option
> >> will be to use this static configuration one time after eal init in the set_algo
> >API.
> >
> >[DC] Yes that is a good change to have made to avoid extra datapath checks.
> >
> >Based on off-list discussion, I now also know the reason behind now
> >defaulting to scalar CRC in RTE_INIT. If a higher bitwidth CRC was chosen by
> >RTE_INIT (e.g.
> >SSE4.2 CRC) but the max_simd_bitwidth was then set to RTE_NO_SIMD (64)
> >through the EAL parameter or call to rte_set_max_simd_bitwidth(), then
> >there is a mismatch if rte_net_crc_set_alg() is not then called to reconfigure
> >the CRC. Defaulting to scalar avoids this mismatch and works on all archs
> >
> >As I mentioned before, I think this needs to be called out in release notes, as
> >it's an under-the-hood change which could cause app performance to drop if
> >app developers aren't aware of it - the API itself hasn't changed, so they may
> >not read the doxygen :)
> >
> 
> Yes that is a good point, I can add to the release notes for this to call it out. 

I don't think it is a good idea to have the scalar crc by default.
To me, the fastest available CRC has to be enabled by default.

I understand the technical reason why you did it like this however: the
SIMD bitwidth may not be known at the time the
RTE_INIT(rte_net_crc_init) function is called.

A simple approach to solve this issue would be to initialize the
rte_net_crc_handler pointer to a handlers_default. The first time a crc
is called, the rte_crc32_*_default_handler() function would check the
configured SIMD bitwidth, and set the handler to the correct one, to
avoid to do the test for next time.

This approach still does not solve the case where the SIMD bitwidth is
modified during the life of the application. In this case, a callback
would have to be registered to notify SIMD bitwidth changes... but I
don't think it is worth to do it. Instead, it can be documented that
rte_set_max_simd_bitwidth() has to be called early, before
rte_eal_init().



> >>
> >>
> >> > >
> >> > > Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
> >> > >
> >> > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> >> > >
> >> > > ---
> >> > > v3:
> >> > >   - Moved choosing vector paths out of RTE_INIT.
> >> > >   - Moved checking max_simd_bitwidth into the set_alg function.
> >> > > ---
> >> > >  lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
> >> > > lib/librte_net/rte_net_crc.h |  3 ++-
> >> > >  2 files changed, 19 insertions(+), 10 deletions(-)
> >> > >
> >> > > diff --git a/lib/librte_net/rte_net_crc.c
> >> > > b/lib/librte_net/rte_net_crc.c index
> >> > > 9fd4794a9d..241eb16399 100644
> >> > > --- a/lib/librte_net/rte_net_crc.c
> >> > > +++ b/lib/librte_net/rte_net_crc.c
> >> >
> >> > <snip>
> >> >
> >> > > @@ -145,18 +149,26 @@ rte_crc32_eth_handler(const uint8_t *data,
> >> > > uint32_t data_len)  void  rte_net_crc_set_alg(enum rte_net_crc_alg
> >> > > alg)  {
> >> > > +	if (max_simd_bitwidth == 0)
> >> > > +		max_simd_bitwidth = rte_get_max_simd_bitwidth();
> >> > > +
> >> > >  	switch (alg) {
> >> > >  #ifdef X86_64_SSE42_PCLMULQDQ
> >> > >  	case RTE_NET_CRC_SSE42:
> >> > > -		handlers = handlers_sse42;
> >> > > -		break;
> >> > > +		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
> >> > > +			handlers = handlers_sse42;
> >> > > +			return;
> >> > > +		}
> >> > > +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using
> >> > > scalar\n");
> >> >
> >> > [DC] Not sure if you're aware but there is another patchset which
> >> > adds an
> >> > AVX512 CRC implementation and run-time checking of cpuflags to
> >> > select the CRC path to use:
> >> > https://patchwork.dpdk.org/project/dpdk/list/?series=12596
> >> >
> >> > There will be a task to merge these 2 patchsets if both are merged.
> >> > It looks fairly straightforward to me to merge these, but it would
> >> > be good if you take a look too
> 
> I have looked at that patchset, I agree, I think they will be straightforward to merge together.
> 
> Thanks,
> Ciara
  
Power, Ciara Oct. 7, 2020, 11:16 a.m. UTC | #7
Hi Olivier,

 
>-----Original Message-----
>From: Olivier Matz <olivier.matz@6wind.com>
>Sent: Tuesday 6 October 2020 11:01
>To: Power, Ciara <ciara.power@intel.com>
>Cc: Coyle, David <david.coyle@intel.com>; Singh, Jasvinder
><jasvinder.singh@intel.com>; dev@dpdk.org; O'loingsigh, Mairtin
><mairtin.oloingsigh@intel.com>; Ryan, Brendan <brendan.ryan@intel.com>;
>Richardson, Bruce <bruce.richardson@intel.com>
>Subject: Re: [dpdk-dev] [PATCH v3 17/18] net: add checks for max SIMD
>bitwidth
>
>Hi,
>
>On Thu, Oct 01, 2020 at 02:19:37PM +0000, Power, Ciara wrote:
>> Hi David,
>>
>> Thanks for reviewing,
>>
>> >-----Original Message-----
>> >From: Coyle, David <david.coyle@intel.com>
>> >Sent: Thursday 1 October 2020 15:17
>> >To: Singh, Jasvinder <jasvinder.singh@intel.com>; Power, Ciara
>> ><ciara.power@intel.com>; dev@dpdk.org
>> >Cc: Power, Ciara <ciara.power@intel.com>; Olivier Matz
>> ><olivier.matz@6wind.com>; O'loingsigh, Mairtin
>> ><mairtin.oloingsigh@intel.com>; Ryan, Brendan
>> ><brendan.ryan@intel.com>; Richardson, Bruce
>> ><bruce.richardson@intel.com>
>> >Subject: RE: [dpdk-dev] [PATCH v3 17/18] net: add checks for max SIMD
>> >bitwidth
>> >
>> >Hi Jasvinder/Ciara
>> >
>> >> -----Original Message-----
>> >> From: Singh, Jasvinder <jasvinder.singh@intel.com>
>> >> >
>> >> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power When
>> >> > > choosing a vector path to take, an extra condition must be
>> >> > > satisfied to ensure the max SIMD bitwidth allows for the CPU
>> >> > > enabled
>> >path.
>> >> > >
>> >> > > The vector path was initially chosen in RTE_INIT, however this
>> >> > > is no longer suitable as we cannot check the max SIMD bitwidth
>> >> > > at that
>> >time.
>> >> > > The default chosen in RTE_INIT is now scalar. For best
>> >> > > performance and to use vector paths, apps must explicitly call
>> >> > > the set algorithm function before using other functions from
>> >> > > this library, as this is where vector handlers are now chosen.
>> >> >
>> >> > [DC] Has it been decided that it is ok to now require
>> >> > applications to pick the CRC algorithm they want to use?
>> >> >
>> >> > An application which previously automatically got SSE4.2 CRC, for
>> >> > example, will now automatically only get scalar.
>> >> >
>> >> > If this is ok, this should probably be called out explicitly in
>> >> > release notes as it may not be Immediately noticeable to users
>> >> > that they now need to select the CRC algo.
>> >> >
>> >> > Actually, in general, the release notes need to be updated for
>> >> > this
>> >> patchset.
>> >>
>> >> The decision to move rte_set_alg() out of RTE_INIT was taken to
>> >> avoid check on max_simd_bitwidth in data path for every single time
>> >> when
>> >> crc_calc() api is invoked. Based on my understanding,
>> >> max_simd_bitwidth is set after eal init, and when used in
>> >> crc_calc(), it might override the default crc algo set during
>> >> RTE_INIT. Therefore, to avoid extra check on max_simd_bitwidth in
>> >> data path,  better option will be to use this static configuration
>> >> one time after eal init in the set_algo
>> >API.
>> >
>> >[DC] Yes that is a good change to have made to avoid extra datapath
>checks.
>> >
>> >Based on off-list discussion, I now also know the reason behind now
>> >defaulting to scalar CRC in RTE_INIT. If a higher bitwidth CRC was
>> >chosen by RTE_INIT (e.g.
>> >SSE4.2 CRC) but the max_simd_bitwidth was then set to RTE_NO_SIMD
>> >(64) through the EAL parameter or call to
>> >rte_set_max_simd_bitwidth(), then there is a mismatch if
>> >rte_net_crc_set_alg() is not then called to reconfigure the CRC.
>> >Defaulting to scalar avoids this mismatch and works on all archs
>> >
>> >As I mentioned before, I think this needs to be called out in release
>> >notes, as it's an under-the-hood change which could cause app
>> >performance to drop if app developers aren't aware of it - the API
>> >itself hasn't changed, so they may not read the doxygen :)
>> >
>>
>> Yes that is a good point, I can add to the release notes for this to call it out.
>
>I don't think it is a good idea to have the scalar crc by default.
>To me, the fastest available CRC has to be enabled by default.
>
>I understand the technical reason why you did it like this however: the SIMD
>bitwidth may not be known at the time the
>RTE_INIT(rte_net_crc_init) function is called.
>
>A simple approach to solve this issue would be to initialize the
>rte_net_crc_handler pointer to a handlers_default. The first time a crc is
>called, the rte_crc32_*_default_handler() function would check the
>configured SIMD bitwidth, and set the handler to the correct one, to avoid to
>do the test for next time.

Thanks for this suggestion, will try this for the next version, it seems it will work quite well, thanks.

>This approach still does not solve the case where the SIMD bitwidth is
>modified during the life of the application. In this case, a callback would have
>to be registered to notify SIMD bitwidth changes... but I don't think it is worth
>to do it. Instead, it can be documented that
>rte_set_max_simd_bitwidth() has to be called early, before rte_eal_init().
>

Yes, It is documented in the Doxygen comment for the rte_set_max_simd_bitwidth() function
 that it should be called early, as you mentioned.

<snip>

Thanks,
Ciara
  
Ananyev, Konstantin Oct. 8, 2020, 2:55 p.m. UTC | #8
> > >> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power When
> > >> > > choosing a vector path to take, an extra condition must be
> > >> > > satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
> > >path.
> > >> > >
> > >> > > The vector path was initially chosen in RTE_INIT, however this is
> > >> > > no longer suitable as we cannot check the max SIMD bitwidth at that
> > >time.
> > >> > > The default chosen in RTE_INIT is now scalar. For best performance
> > >> > > and to use vector paths, apps must explicitly call the set
> > >> > > algorithm function before using other functions from this library,
> > >> > > as this is where vector handlers are now chosen.
> > >> >
> > >> > [DC] Has it been decided that it is ok to now require applications
> > >> > to pick the CRC algorithm they want to use?
> > >> >
> > >> > An application which previously automatically got SSE4.2 CRC, for
> > >> > example, will now automatically only get scalar.
> > >> >
> > >> > If this is ok, this should probably be called out explicitly in
> > >> > release notes as it may not be Immediately noticeable to users that
> > >> > they now need to select the CRC algo.
> > >> >
> > >> > Actually, in general, the release notes need to be updated for this
> > >> patchset.
> > >>
> > >> The decision to move rte_set_alg() out of RTE_INIT was taken to avoid
> > >> check on max_simd_bitwidth in data path for every single time when
> > >> crc_calc() api is invoked. Based on my understanding,
> > >> max_simd_bitwidth is set after eal init, and when used in crc_calc(),
> > >> it might override the default crc algo set during RTE_INIT. Therefore,
> > >> to avoid extra check on max_simd_bitwidth in data path,  better option
> > >> will be to use this static configuration one time after eal init in the set_algo
> > >API.
> > >
> > >[DC] Yes that is a good change to have made to avoid extra datapath checks.
> > >
> > >Based on off-list discussion, I now also know the reason behind now
> > >defaulting to scalar CRC in RTE_INIT. If a higher bitwidth CRC was chosen by
> > >RTE_INIT (e.g.
> > >SSE4.2 CRC) but the max_simd_bitwidth was then set to RTE_NO_SIMD (64)
> > >through the EAL parameter or call to rte_set_max_simd_bitwidth(), then
> > >there is a mismatch if rte_net_crc_set_alg() is not then called to reconfigure
> > >the CRC. Defaulting to scalar avoids this mismatch and works on all archs
> > >
> > >As I mentioned before, I think this needs to be called out in release notes, as
> > >it's an under-the-hood change which could cause app performance to drop if
> > >app developers aren't aware of it - the API itself hasn't changed, so they may
> > >not read the doxygen :)
> > >
> >
> > Yes that is a good point, I can add to the release notes for this to call it out.
> 
> I don't think it is a good idea to have the scalar crc by default.
> To me, the fastest available CRC has to be enabled by default.
> 
> I understand the technical reason why you did it like this however: the
> SIMD bitwidth may not be known at the time the
> RTE_INIT(rte_net_crc_init) function is called.
> 
> A simple approach to solve this issue would be to initialize the
> rte_net_crc_handler pointer to a handlers_default. The first time a crc
> is called, the rte_crc32_*_default_handler() function would check the
> configured SIMD bitwidth, and set the handler to the correct one, to
> avoid to do the test for next time.
> 
> This approach still does not solve the case where the SIMD bitwidth is
> modified during the life of the application. In this case, a callback
> would have to be registered to notify SIMD bitwidth changes... but I
> don't think it is worth to do it. Instead, it can be documented that
> rte_set_max_simd_bitwidth() has to be called early, before
> rte_eal_init().

Actually I also thought about callback approach.
It does complicate things a bit for sure, but on a positive side -
it allows to solve RTE_INIT() code-path selection problem
in a generic way, plus it means zero changes in the data-path. 
So probably worth to consider it.

> 
> 
> 
> > >>
> > >>
> > >> > >
> > >> > > Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
> > >> > >
> > >> > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > >> > >
> > >> > > ---
> > >> > > v3:
> > >> > >   - Moved choosing vector paths out of RTE_INIT.
> > >> > >   - Moved checking max_simd_bitwidth into the set_alg function.
> > >> > > ---
> > >> > >  lib/librte_net/rte_net_crc.c | 26 +++++++++++++++++---------
> > >> > > lib/librte_net/rte_net_crc.h |  3 ++-
> > >> > >  2 files changed, 19 insertions(+), 10 deletions(-)
> > >> > >
> > >> > > diff --git a/lib/librte_net/rte_net_crc.c
> > >> > > b/lib/librte_net/rte_net_crc.c index
> > >> > > 9fd4794a9d..241eb16399 100644
> > >> > > --- a/lib/librte_net/rte_net_crc.c
> > >> > > +++ b/lib/librte_net/rte_net_crc.c
> > >> >
> > >> > <snip>
> > >> >
> > >> > > @@ -145,18 +149,26 @@ rte_crc32_eth_handler(const uint8_t *data,
> > >> > > uint32_t data_len)  void  rte_net_crc_set_alg(enum rte_net_crc_alg
> > >> > > alg)  {
> > >> > > +	if (max_simd_bitwidth == 0)
> > >> > > +		max_simd_bitwidth = rte_get_max_simd_bitwidth();
> > >> > > +
> > >> > >  	switch (alg) {
> > >> > >  #ifdef X86_64_SSE42_PCLMULQDQ
> > >> > >  	case RTE_NET_CRC_SSE42:
> > >> > > -		handlers = handlers_sse42;
> > >> > > -		break;
> > >> > > +		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
> > >> > > +			handlers = handlers_sse42;
> > >> > > +			return;
> > >> > > +		}
> > >> > > +		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using
> > >> > > scalar\n");
> > >> >
> > >> > [DC] Not sure if you're aware but there is another patchset which
> > >> > adds an
> > >> > AVX512 CRC implementation and run-time checking of cpuflags to
> > >> > select the CRC path to use:
> > >> > https://patchwork.dpdk.org/project/dpdk/list/?series=12596
> > >> >
> > >> > There will be a task to merge these 2 patchsets if both are merged.
> > >> > It looks fairly straightforward to me to merge these, but it would
> > >> > be good if you take a look too
> >
> > I have looked at that patchset, I agree, I think they will be straightforward to merge together.
> >
> > Thanks,
> > Ciara
  
Power, Ciara Oct. 13, 2020, 11:27 a.m. UTC | #9
Hi Konstantin,


>-----Original Message-----
>From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>Sent: Thursday 8 October 2020 15:55
>To: Olivier Matz <olivier.matz@6wind.com>; Power, Ciara
><ciara.power@intel.com>
>Cc: Coyle, David <david.coyle@intel.com>; Singh, Jasvinder
><jasvinder.singh@intel.com>; dev@dpdk.org; O'loingsigh, Mairtin
><mairtin.oloingsigh@intel.com>; Ryan, Brendan <brendan.ryan@intel.com>;
>Richardson, Bruce <bruce.richardson@intel.com>
>Subject: RE: [dpdk-dev] [PATCH v3 17/18] net: add checks for max SIMD
>bitwidth
>
>> > >> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ciara Power
>> > >> > > When choosing a vector path to take, an extra condition must
>> > >> > > be satisfied to ensure the max SIMD bitwidth allows for the
>> > >> > > CPU enabled
>> > >path.
>> > >> > >
>> > >> > > The vector path was initially chosen in RTE_INIT, however
>> > >> > > this is no longer suitable as we cannot check the max SIMD
>> > >> > > bitwidth at that
>> > >time.
>> > >> > > The default chosen in RTE_INIT is now scalar. For best
>> > >> > > performance and to use vector paths, apps must explicitly
>> > >> > > call the set algorithm function before using other functions
>> > >> > > from this library, as this is where vector handlers are now chosen.
>> > >> >
>> > >> > [DC] Has it been decided that it is ok to now require
>> > >> > applications to pick the CRC algorithm they want to use?
>> > >> >
>> > >> > An application which previously automatically got SSE4.2 CRC,
>> > >> > for example, will now automatically only get scalar.
>> > >> >
>> > >> > If this is ok, this should probably be called out explicitly in
>> > >> > release notes as it may not be Immediately noticeable to users
>> > >> > that they now need to select the CRC algo.
>> > >> >
>> > >> > Actually, in general, the release notes need to be updated for
>> > >> > this
>> > >> patchset.
>> > >>
>> > >> The decision to move rte_set_alg() out of RTE_INIT was taken to
>> > >> avoid check on max_simd_bitwidth in data path for every single
>> > >> time when
>> > >> crc_calc() api is invoked. Based on my understanding,
>> > >> max_simd_bitwidth is set after eal init, and when used in
>> > >> crc_calc(), it might override the default crc algo set during
>> > >> RTE_INIT. Therefore, to avoid extra check on max_simd_bitwidth in
>> > >> data path,  better option will be to use this static
>> > >> configuration one time after eal init in the set_algo
>> > >API.
>> > >
>> > >[DC] Yes that is a good change to have made to avoid extra datapath
>checks.
>> > >
>> > >Based on off-list discussion, I now also know the reason behind now
>> > >defaulting to scalar CRC in RTE_INIT. If a higher bitwidth CRC was
>> > >chosen by RTE_INIT (e.g.
>> > >SSE4.2 CRC) but the max_simd_bitwidth was then set to RTE_NO_SIMD
>> > >(64) through the EAL parameter or call to
>> > >rte_set_max_simd_bitwidth(), then there is a mismatch if
>> > >rte_net_crc_set_alg() is not then called to reconfigure the CRC.
>> > >Defaulting to scalar avoids this mismatch and works on all archs
>> > >
>> > >As I mentioned before, I think this needs to be called out in
>> > >release notes, as it's an under-the-hood change which could cause
>> > >app performance to drop if app developers aren't aware of it - the
>> > >API itself hasn't changed, so they may not read the doxygen :)
>> > >
>> >
>> > Yes that is a good point, I can add to the release notes for this to call it
>out.
>>
>> I don't think it is a good idea to have the scalar crc by default.
>> To me, the fastest available CRC has to be enabled by default.
>>
>> I understand the technical reason why you did it like this however:
>> the SIMD bitwidth may not be known at the time the
>> RTE_INIT(rte_net_crc_init) function is called.
>>
>> A simple approach to solve this issue would be to initialize the
>> rte_net_crc_handler pointer to a handlers_default. The first time a
>> crc is called, the rte_crc32_*_default_handler() function would check
>> the configured SIMD bitwidth, and set the handler to the correct one,
>> to avoid to do the test for next time.
>>
>> This approach still does not solve the case where the SIMD bitwidth is
>> modified during the life of the application. In this case, a callback
>> would have to be registered to notify SIMD bitwidth changes... but I
>> don't think it is worth to do it. Instead, it can be documented that
>> rte_set_max_simd_bitwidth() has to be called early, before
>> rte_eal_init().
>
>Actually I also thought about callback approach.
>It does complicate things a bit for sure, but on a positive side - it allows to
>solve RTE_INIT() code-path selection problem in a generic way, plus it means
>zero changes in the data-path.
>So probably worth to consider it.
>

I am not sure adding callbacks to allow for runtime changes to max SIMD bitwidth is worth it.
I have sent a new version of my patchset which currently does not have this suggested rework to use callbacks.

Thanks,
Ciara

<snip>
  

Patch

diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
index 9fd4794a9d..241eb16399 100644
--- a/lib/librte_net/rte_net_crc.c
+++ b/lib/librte_net/rte_net_crc.c
@@ -9,6 +9,7 @@ 
 #include <rte_cpuflags.h>
 #include <rte_common.h>
 #include <rte_net_crc.h>
+#include <rte_eal.h>
 
 #if defined(RTE_ARCH_X86_64) && defined(RTE_MACHINE_CPUFLAG_PCLMULQDQ)
 #define X86_64_SSE42_PCLMULQDQ     1
@@ -60,6 +61,9 @@  static rte_net_crc_handler handlers_neon[] = {
 };
 #endif
 
+static uint16_t max_simd_bitwidth;
+#define RTE_LOGTYPE_NET RTE_LOGTYPE_USER1
+
 /**
  * Reflect the bits about the middle
  *
@@ -145,18 +149,26 @@  rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 {
+	if (max_simd_bitwidth == 0)
+		max_simd_bitwidth = rte_get_max_simd_bitwidth();
+
 	switch (alg) {
 #ifdef X86_64_SSE42_PCLMULQDQ
 	case RTE_NET_CRC_SSE42:
-		handlers = handlers_sse42;
-		break;
+		if (max_simd_bitwidth >= RTE_MAX_128_SIMD) {
+			handlers = handlers_sse42;
+			return;
+		}
+		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low, using scalar\n");
 #elif defined ARM64_NEON_PMULL
 		/* fall-through */
 	case RTE_NET_CRC_NEON:
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL) &&
+				max_simd_bitwidth >= RTE_MAX_128_SIMD) {
 			handlers = handlers_neon;
-			break;
+			return;
 		}
+		RTE_LOG(INFO, NET, "Max SIMD Bitwidth too low or CPU flag not enabled, using scalar\n");
 #endif
 		/* fall-through */
 	case RTE_NET_CRC_SCALAR:
@@ -184,19 +196,15 @@  rte_net_crc_calc(const void *data,
 /* Select highest available crc algorithm as default one */
 RTE_INIT(rte_net_crc_init)
 {
-	enum rte_net_crc_alg alg = RTE_NET_CRC_SCALAR;
-
 	rte_net_crc_scalar_init();
 
 #ifdef X86_64_SSE42_PCLMULQDQ
-	alg = RTE_NET_CRC_SSE42;
 	rte_net_crc_sse42_init();
 #elif defined ARM64_NEON_PMULL
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
-		alg = RTE_NET_CRC_NEON;
 		rte_net_crc_neon_init();
 	}
 #endif
 
-	rte_net_crc_set_alg(alg);
+	rte_net_crc_set_alg(RTE_NET_CRC_SCALAR);
 }
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
index 16e85ca970..7a45ebe193 100644
--- a/lib/librte_net/rte_net_crc.h
+++ b/lib/librte_net/rte_net_crc.h
@@ -28,7 +28,8 @@  enum rte_net_crc_alg {
 /**
  * This API set the CRC computation algorithm (i.e. scalar version,
  * x86 64-bit sse4.2 intrinsic version, etc.) and internal data
- * structure.
+ * structure. This should be called before any other functions, to
+ * choose the algorithm for best performance.
  *
  * @param alg
  *   This parameter is used to select the CRC implementation version.