[2/2] net: have checksum routines accept unaligned data

Message ID 20220707183450.3203361-2-hofors@lysator.liu.se (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [1/2] app/test: add cksum performance test |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/github-robot: build fail github build: failed
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing fail Testing issues
ci/iol-x86_64-compile-testing fail Testing issues
ci/iol-abi-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Mattias Rönnblom July 7, 2022, 6:34 p.m. UTC
  From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

__rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
data through an uint16_t pointer, which allowed the compiler to assume
the data was 16-bit aligned. This in turn would, with certain
architectures and compiler flag combinations, result in code with SIMD
load or store instructions with restrictions on data alignment.

This patch keeps the old algorithm, but data is read using memcpy()
instead of direct pointer access, forcing the compiler to always
generate code that handles unaligned input. The __may_alias__ GCC
attribute is no longer needed.

The data on which the Internet checksum functions operates are almost
always 16-bit aligned, but there are exceptions. In particular, the
PDCP protocol header may (literally) have an odd size.

Performance impact seems to range from none to a very slight
regression.

Bugzilla ID: 1035
Cc: stable@dpdk.org

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/net/rte_ip.h | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)
  

Comments

Morten Brørup July 7, 2022, 9:44 p.m. UTC | #1
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Thursday, 7 July 2022 20.35
> 
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 
> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
> data through an uint16_t pointer, which allowed the compiler to assume
> the data was 16-bit aligned. This in turn would, with certain
> architectures and compiler flag combinations, result in code with SIMD
> load or store instructions with restrictions on data alignment.
> 
> This patch keeps the old algorithm, but data is read using memcpy()
> instead of direct pointer access, forcing the compiler to always
> generate code that handles unaligned input. The __may_alias__ GCC
> attribute is no longer needed.
> 
> The data on which the Internet checksum functions operates are almost
> always 16-bit aligned, but there are exceptions. In particular, the
> PDCP protocol header may (literally) have an odd size.
> 
> Performance impact seems to range from none to a very slight
> regression.
> 
> Bugzilla ID: 1035
> Cc: stable@dpdk.org
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  lib/net/rte_ip.h | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index b502481670..a9e6251f14 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> *ipv4_hdr)
>  static inline uint32_t
>  __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>  {
> -	/* extend strict-aliasing rules */
> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> -	const u16_p *u16_buf = (const u16_p *)buf;
> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> +	const void *end;

I would set "end" here instead, possibly making the pointer const too. And add spaces around '/'.
const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * sizeof(uint16_t));

> 
> -	for (; u16_buf != end; ++u16_buf)
> -		sum += *u16_buf;
> +	for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) *
> sizeof(uint16_t));
> +	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
> +		uint16_t v;
> +
> +		memcpy(&v, buf, sizeof(uint16_t));
> +		sum += v;
> +	}
> 
>  	/* if length is odd, keeping it byte order independent */
>  	if (unlikely(len % 2)) {
> +		uint8_t last;
>  		uint16_t left = 0;
> -		*(unsigned char *)&left = *(const unsigned char *)end;
> +
> +		memcpy(&last, end, 1);
> +		*(unsigned char *)&left = last;

Couldn't you just memcpy(&left, end, 1), and omit the temporary variable "last"?

>  		sum += left;
>  	}
> 
> --
> 2.25.1
> 

With our without my suggested changes, it looks good.

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
  
Mattias Rönnblom July 8, 2022, 12:43 p.m. UTC | #2
On 2022-07-07 23:44, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Thursday, 7 July 2022 20.35
>>
>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>
>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
>> data through an uint16_t pointer, which allowed the compiler to assume
>> the data was 16-bit aligned. This in turn would, with certain
>> architectures and compiler flag combinations, result in code with SIMD
>> load or store instructions with restrictions on data alignment.
>>
>> This patch keeps the old algorithm, but data is read using memcpy()
>> instead of direct pointer access, forcing the compiler to always
>> generate code that handles unaligned input. The __may_alias__ GCC
>> attribute is no longer needed.
>>
>> The data on which the Internet checksum functions operates are almost
>> always 16-bit aligned, but there are exceptions. In particular, the
>> PDCP protocol header may (literally) have an odd size.
>>
>> Performance impact seems to range from none to a very slight
>> regression.
>>
>> Bugzilla ID: 1035
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   lib/net/rte_ip.h | 19 ++++++++++++-------
>>   1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
>> index b502481670..a9e6251f14 100644
>> --- a/lib/net/rte_ip.h
>> +++ b/lib/net/rte_ip.h
>> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
>> *ipv4_hdr)
>>   static inline uint32_t
>>   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>>   {
>> -	/* extend strict-aliasing rules */
>> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
>> -	const u16_p *u16_buf = (const u16_p *)buf;
>> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
>> +	const void *end;
> 
> I would set "end" here instead, possibly making the pointer const too. And add spaces around '/'.
> const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * sizeof(uint16_t));
> 

I don't think that makes the code more readable.

>>
>> -	for (; u16_buf != end; ++u16_buf)
>> -		sum += *u16_buf;
>> +	for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) *
>> sizeof(uint16_t));
>> +	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
>> +		uint16_t v;
>> +
>> +		memcpy(&v, buf, sizeof(uint16_t));
>> +		sum += v;
>> +	}
>>
>>   	/* if length is odd, keeping it byte order independent */
>>   	if (unlikely(len % 2)) {
>> +		uint8_t last;
>>   		uint16_t left = 0;
>> -		*(unsigned char *)&left = *(const unsigned char *)end;
>> +
>> +		memcpy(&last, end, 1);
>> +		*(unsigned char *)&left = last;
> 
> Couldn't you just memcpy(&left, end, 1), and omit the temporary variable "last"?
> 

Good point.

I don't like how this code is clever vis-à-vis byte order, but then I 
also don't have a better suggestion.

>>   		sum += left;
>>   	}
>>
>> --
>> 2.25.1
>>
> 
> With our without my suggested changes, it looks good.
> 
> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> 

Thanks!
  
Morten Brørup July 8, 2022, 1:02 p.m. UTC | #3
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Friday, 8 July 2022 14.44
> 
> On 2022-07-07 23:44, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Thursday, 7 July 2022 20.35
> >>
> >> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>
> >> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed
> its
> >> data through an uint16_t pointer, which allowed the compiler to
> assume
> >> the data was 16-bit aligned. This in turn would, with certain
> >> architectures and compiler flag combinations, result in code with
> SIMD
> >> load or store instructions with restrictions on data alignment.
> >>
> >> This patch keeps the old algorithm, but data is read using memcpy()
> >> instead of direct pointer access, forcing the compiler to always
> >> generate code that handles unaligned input. The __may_alias__ GCC
> >> attribute is no longer needed.
> >>
> >> The data on which the Internet checksum functions operates are
> almost
> >> always 16-bit aligned, but there are exceptions. In particular, the
> >> PDCP protocol header may (literally) have an odd size.
> >>
> >> Performance impact seems to range from none to a very slight
> >> regression.
> >>
> >> Bugzilla ID: 1035
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> ---
> >>   lib/net/rte_ip.h | 19 ++++++++++++-------
> >>   1 file changed, 12 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> >> index b502481670..a9e6251f14 100644
> >> --- a/lib/net/rte_ip.h
> >> +++ b/lib/net/rte_ip.h
> >> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> >> *ipv4_hdr)
> >>   static inline uint32_t
> >>   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> >>   {
> >> -	/* extend strict-aliasing rules */
> >> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> >> -	const u16_p *u16_buf = (const u16_p *)buf;
> >> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> >> +	const void *end;
> >
> > I would set "end" here instead, possibly making the pointer const
> too. And add spaces around '/'.
> > const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) *
> sizeof(uint16_t));
> >
> 
> I don't think that makes the code more readable.

It's only a matter of taste... Your code, your decision. :-)

I think the spaces are required by the coding standard; not sure, though.

> 
> >>
> >> -	for (; u16_buf != end; ++u16_buf)
> >> -		sum += *u16_buf;
> >> +	for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) *
> >> sizeof(uint16_t));
> >> +	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
> >> +		uint16_t v;
> >> +
> >> +		memcpy(&v, buf, sizeof(uint16_t));
> >> +		sum += v;
> >> +	}
> >>
> >>   	/* if length is odd, keeping it byte order independent */
> >>   	if (unlikely(len % 2)) {
> >> +		uint8_t last;
> >>   		uint16_t left = 0;
> >> -		*(unsigned char *)&left = *(const unsigned char *)end;
> >> +
> >> +		memcpy(&last, end, 1);
> >> +		*(unsigned char *)&left = last;
> >
> > Couldn't you just memcpy(&left, end, 1), and omit the temporary
> variable "last"?
> >
> 
> Good point.
> 
> I don't like how this code is clever vis-à-vis byte order, but then I
> also don't have a better suggestion.

The byte ordering cleverness has its roots in RFC 1071.

Stephen suggested using a union, although in a slightly different context. I'm not sure it will be more readable here, because it will require #ifdef to support byte ordering. Just thought I'd mention it, for your consideration.

Your patch v2 just reached my inbox, and it looks good. No further response to this email is expected.
  
Mattias Rönnblom July 8, 2022, 1:52 p.m. UTC | #4
On 2022-07-08 15:02, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Friday, 8 July 2022 14.44
>>
>> On 2022-07-07 23:44, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>>>> Sent: Thursday, 7 July 2022 20.35
>>>>
>>>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>
>>>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed
>> its
>>>> data through an uint16_t pointer, which allowed the compiler to
>> assume
>>>> the data was 16-bit aligned. This in turn would, with certain
>>>> architectures and compiler flag combinations, result in code with
>> SIMD
>>>> load or store instructions with restrictions on data alignment.
>>>>
>>>> This patch keeps the old algorithm, but data is read using memcpy()
>>>> instead of direct pointer access, forcing the compiler to always
>>>> generate code that handles unaligned input. The __may_alias__ GCC
>>>> attribute is no longer needed.
>>>>
>>>> The data on which the Internet checksum functions operates are
>> almost
>>>> always 16-bit aligned, but there are exceptions. In particular, the
>>>> PDCP protocol header may (literally) have an odd size.
>>>>
>>>> Performance impact seems to range from none to a very slight
>>>> regression.
>>>>
>>>> Bugzilla ID: 1035
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> ---
>>>>    lib/net/rte_ip.h | 19 ++++++++++++-------
>>>>    1 file changed, 12 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
>>>> index b502481670..a9e6251f14 100644
>>>> --- a/lib/net/rte_ip.h
>>>> +++ b/lib/net/rte_ip.h
>>>> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
>>>> *ipv4_hdr)
>>>>    static inline uint32_t
>>>>    __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>>>>    {
>>>> -	/* extend strict-aliasing rules */
>>>> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
>>>> -	const u16_p *u16_buf = (const u16_p *)buf;
>>>> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
>>>> +	const void *end;
>>>
>>> I would set "end" here instead, possibly making the pointer const
>> too. And add spaces around '/'.
>>> const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) *
>> sizeof(uint16_t));
>>>
>>
>> I don't think that makes the code more readable.
> 
> It's only a matter of taste... Your code, your decision. :-)
> 
> I think the spaces are required by the coding standard; not sure, though.
> 

If it isn't in the coding standard, it should be. But if you add spaces, 
you have to break the line, to fit into 80 characters. A net loss, IMO.

>>
>>>>
>>>> -	for (; u16_buf != end; ++u16_buf)
>>>> -		sum += *u16_buf;
>>>> +	for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) *
>>>> sizeof(uint16_t));
>>>> +	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
>>>> +		uint16_t v;
>>>> +
>>>> +		memcpy(&v, buf, sizeof(uint16_t));
>>>> +		sum += v;
>>>> +	}
>>>>
>>>>    	/* if length is odd, keeping it byte order independent */
>>>>    	if (unlikely(len % 2)) {
>>>> +		uint8_t last;
>>>>    		uint16_t left = 0;
>>>> -		*(unsigned char *)&left = *(const unsigned char *)end;
>>>> +
>>>> +		memcpy(&last, end, 1);
>>>> +		*(unsigned char *)&left = last;
>>>
>>> Couldn't you just memcpy(&left, end, 1), and omit the temporary
>> variable "last"?
>>>
>>
>> Good point.
>>
>> I don't like how this code is clever vis-à-vis byte order, but then I
>> also don't have a better suggestion.
> 
> The byte ordering cleverness has its roots in RFC 1071.
> 
> Stephen suggested using a union, although in a slightly different context. I'm not sure it will be more readable here, because it will require #ifdef to support byte ordering. Just thought I'd mention it, for your consideration.
> 
> Your patch v2 just reached my inbox, and it looks good. No further response to this email is expected.
>
  
Bruce Richardson July 8, 2022, 2:10 p.m. UTC | #5
On Fri, Jul 08, 2022 at 01:52:12PM +0000, Mattias Rönnblom wrote:
> On 2022-07-08 15:02, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Friday, 8 July 2022 14.44
> >>
> >> On 2022-07-07 23:44, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >>>> Sent: Thursday, 7 July 2022 20.35
> >>>>
> >>>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>
> >>>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed
> >> its
> >>>> data through an uint16_t pointer, which allowed the compiler to
> >> assume
> >>>> the data was 16-bit aligned. This in turn would, with certain
> >>>> architectures and compiler flag combinations, result in code with
> >> SIMD
> >>>> load or store instructions with restrictions on data alignment.
> >>>>
> >>>> This patch keeps the old algorithm, but data is read using memcpy()
> >>>> instead of direct pointer access, forcing the compiler to always
> >>>> generate code that handles unaligned input. The __may_alias__ GCC
> >>>> attribute is no longer needed.
> >>>>
> >>>> The data on which the Internet checksum functions operates are
> >> almost
> >>>> always 16-bit aligned, but there are exceptions. In particular, the
> >>>> PDCP protocol header may (literally) have an odd size.
> >>>>
> >>>> Performance impact seems to range from none to a very slight
> >>>> regression.
> >>>>
> >>>> Bugzilla ID: 1035
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>> ---
> >>>>    lib/net/rte_ip.h | 19 ++++++++++++-------
> >>>>    1 file changed, 12 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> >>>> index b502481670..a9e6251f14 100644
> >>>> --- a/lib/net/rte_ip.h
> >>>> +++ b/lib/net/rte_ip.h
> >>>> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> >>>> *ipv4_hdr)
> >>>>    static inline uint32_t
> >>>>    __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> >>>>    {
> >>>> -	/* extend strict-aliasing rules */
> >>>> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> >>>> -	const u16_p *u16_buf = (const u16_p *)buf;
> >>>> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> >>>> +	const void *end;
> >>>
> >>> I would set "end" here instead, possibly making the pointer const
> >> too. And add spaces around '/'.
> >>> const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) *
> >> sizeof(uint16_t));
> >>>
> >>
> >> I don't think that makes the code more readable.
> > 
> > It's only a matter of taste... Your code, your decision. :-)
> > 
> > I think the spaces are required by the coding standard; not sure, though.
> > 
> 
> If it isn't in the coding standard, it should be. But if you add spaces, 
> you have to break the line, to fit into 80 characters. A net loss, IMO.
> 

Just FYI, lines up to 100 chars are ok now. Automated checkpatch checks as
shown in patchwork only flag lines longer than that.

/Bruce
  
Morten Brørup July 8, 2022, 2:30 p.m. UTC | #6
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 8 July 2022 16.10
> 
> On Fri, Jul 08, 2022 at 01:52:12PM +0000, Mattias Rönnblom wrote:
> > On 2022-07-08 15:02, Morten Brørup wrote:
> > >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> > >> Sent: Friday, 8 July 2022 14.44
> > >>
> > >> On 2022-07-07 23:44, Morten Brørup wrote:
> > >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> > >>>> Sent: Thursday, 7 July 2022 20.35
> > >>>>

[...]

> > >>>>    static inline uint32_t
> > >>>>    __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> > >>>>    {
> > >>>> -	/* extend strict-aliasing rules */
> > >>>> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > >>>> -	const u16_p *u16_buf = (const u16_p *)buf;
> > >>>> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > >>>> +	const void *end;
> > >>>
> > >>> I would set "end" here instead, possibly making the pointer const
> > >> too. And add spaces around '/'.
> > >>> const void * const end = RTE_PTR_ADD(buf, (len /
> sizeof(uint16_t)) *
> > >> sizeof(uint16_t));
> > >>>
> > >>
> > >> I don't think that makes the code more readable.
> > >
> > > It's only a matter of taste... Your code, your decision. :-)
> > >
> > > I think the spaces are required by the coding standard; not sure,
> though.
> > >
> >
> > If it isn't in the coding standard, it should be. But if you add
> spaces,
> > you have to break the line, to fit into 80 characters. A net loss,
> IMO.
> >
> 
> Just FYI, lines up to 100 chars are ok now. Automated checkpatch checks
> as
> shown in patchwork only flag lines longer than that.
> 
> /Bruce

The coding style [1] recommends max 80 characters, with an (easy to miss) note below, saying that up to 100 characters are acceptable. I think it should be simplified, replacing the 80 characters recommendation by a 100 characters limit, so the additional note can be removed. Why still recommend 80, when we really mean 100?

[1] https://doc.dpdk.org/guides/contributing/coding_style.html
  

Patch

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index b502481670..a9e6251f14 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -160,18 +160,23 @@  rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr)
 static inline uint32_t
 __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
 {
-	/* extend strict-aliasing rules */
-	typedef uint16_t __attribute__((__may_alias__)) u16_p;
-	const u16_p *u16_buf = (const u16_p *)buf;
-	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
+	const void *end;
 
-	for (; u16_buf != end; ++u16_buf)
-		sum += *u16_buf;
+	for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * sizeof(uint16_t));
+	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
+		uint16_t v;
+
+		memcpy(&v, buf, sizeof(uint16_t));
+		sum += v;
+	}
 
 	/* if length is odd, keeping it byte order independent */
 	if (unlikely(len % 2)) {
+		uint8_t last;
 		uint16_t left = 0;
-		*(unsigned char *)&left = *(const unsigned char *)end;
+
+		memcpy(&last, end, 1);
+		*(unsigned char *)&left = last;
 		sum += left;
 	}