rte_memcpy: fix off by one for size 16 and 32

Message ID 20240302204923.227105-1-stephen@networkplumber.org (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series rte_memcpy: fix off by one for size 16 and 32 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Stephen Hemminger March 2, 2024, 8:49 p.m. UTC
  The rte_memcpy code would do extra instructions for size 16
and 32 which potentially could reference past end of data.

For size of 16, only single mov16 is needed.
same for size of 32, only single mov32.

Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")

Suggested-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/x86/include/rte_memcpy.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

Stephen Hemminger March 2, 2024, 8:56 p.m. UTC | #1
On Sat,  2 Mar 2024 12:49:23 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> The rte_memcpy code would do extra instructions for size 16
> and 32 which potentially could reference past end of data.
> 
> For size of 16, only single mov16 is needed.
> same for size of 32, only single mov32.
> 
> Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
> Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
> 
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Self-NAK, more is needed here.

The code has lots of pre-existing bugs where it will reference past the end
of the data in some cases.
  
Morten Brørup March 2, 2024, 10:10 p.m. UTC | #2
I'm also working on a fix.

Med venlig hilsen / Kind regards,
-Morten Brørup

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 2 March 2024 21.57
> To: dev@dpdk.org
> Cc: Morten Brørup; Bruce Richardson; Konstantin Ananyev; Zhihong Wang;
> Yuanhan Liu; Xiaoyun Li
> Subject: Re: [PATCH] rte_memcpy: fix off by one for size 16 and 32
> 
> On Sat,  2 Mar 2024 12:49:23 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> > The rte_memcpy code would do extra instructions for size 16
> > and 32 which potentially could reference past end of data.
> >
> > For size of 16, only single mov16 is needed.
> > same for size of 32, only single mov32.
> >
> > Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
> > Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-
> time")
> >
> > Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> Self-NAK, more is needed here.
> 
> The code has lots of pre-existing bugs where it will reference past the
> end
> of the data in some cases.
  
Morten Brørup March 2, 2024, 11:57 p.m. UTC | #3
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 2 March 2024 21.49
> 
> The rte_memcpy code would do extra instructions for size 16
> and 32 which potentially could reference past end of data.

It's a somewhat weird concept, but they don't reference past end of data.
They reference data in chunks of 16.
E.g. when copying 17 bytes: first copy [0..15] and then copy [1..16] (as "-16 + n" in the code).

By referencing an address "-16" in a block of 16 bytes, they are not referencing past end of data.

> 
> For size of 16, only single mov16 is needed.
> same for size of 32, only single mov32.

I fixed the duplicate copies with my patch [1]. Please review.

[1]: https://inbox.dpdk.org/dev/20240302234812.9137-1-mb@smartsharesystems.com/T/#u
  
Mattias Rönnblom March 3, 2024, 6:46 a.m. UTC | #4
On 2024-03-02 21:56, Stephen Hemminger wrote:
> On Sat,  2 Mar 2024 12:49:23 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
>> The rte_memcpy code would do extra instructions for size 16
>> and 32 which potentially could reference past end of data.
>>
>> For size of 16, only single mov16 is needed.
>> same for size of 32, only single mov32.
>>
>> Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
>> Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
>>
>> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> Self-NAK, more is needed here.
> 
> The code has lots of pre-existing bugs where it will reference past the end
> of the data in some cases.

Memory beyond the buffer is not accessed in this case. The rte_mov16() 
copies just overlap.

A colleague pointed out the same "bug" to me a couple of years ago. We 
didn't realize what code would be generated in the n == 16 case though. 
That seems very much worth fixing.

Maybe it's worth adding a comment regarding the overlap.
  
Mattias Rönnblom March 3, 2024, 6:47 a.m. UTC | #5
On 2024-03-02 21:49, Stephen Hemminger wrote:
> The rte_memcpy code would do extra instructions for size 16
> and 32 which potentially could reference past end of data.
> 
> For size of 16, only single mov16 is needed.
> same for size of 32, only single mov32.
> 
> Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
> Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
> 
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   lib/eal/x86/include/rte_memcpy.h | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
> index 72a92290e05d..00479a4bfbe6 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -233,13 +233,15 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
>   	 */
>   	if (n <= 32) {
>   		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
> -		rte_mov16((uint8_t *)dst - 16 + n,
> +		if (n > 16)
> +			rte_mov16((uint8_t *)dst - 16 + n,
>   				  (const uint8_t *)src - 16 + n);
>   		return ret;
>   	}
>   	if (n <= 64) {
>   		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
> -		rte_mov32((uint8_t *)dst - 32 + n,
> +		if (n > 32)

n is always > 32 here.

> +			rte_mov32((uint8_t *)dst - 32 + n,
>   				  (const uint8_t *)src - 32 + n);
>   		return ret;
>   	}
  

Patch

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index 72a92290e05d..00479a4bfbe6 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -233,13 +233,15 @@  rte_memcpy_generic(void *dst, const void *src, size_t n)
 	 */
 	if (n <= 32) {
 		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
-		rte_mov16((uint8_t *)dst - 16 + n,
+		if (n > 16)
+			rte_mov16((uint8_t *)dst - 16 + n,
 				  (const uint8_t *)src - 16 + n);
 		return ret;
 	}
 	if (n <= 64) {
 		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
-		rte_mov32((uint8_t *)dst - 32 + n,
+		if (n > 32)
+			rte_mov32((uint8_t *)dst - 32 + n,
 				  (const uint8_t *)src - 32 + n);
 		return ret;
 	}