[20.11] raw/ioat: added a flag to control copying handle parameters

Message ID 20200713071519.110662-1-Cheng1.jiang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [20.11] raw/ioat: added a flag to control copying handle parameters |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Performance-Testing fail build patch failure
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS

Commit Message

Jiang, Cheng1 July 13, 2020, 7:15 a.m. UTC
  Added a flag which controls whether rte_ioat_enqueue_copy
and rte_ioat_completed_copies function should process
handle parameters to improve the performance when handle
parameters are not necessary to use. This is targeting
20.11 release.

Signed-off-by: Cheng Jiang <Cheng1.jiang@intel.com>
---
 drivers/raw/ioat/ioat_rawdev.c     |  1 +
 drivers/raw/ioat/rte_ioat_rawdev.h | 14 +++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)
  

Comments

Bruce Richardson July 13, 2020, 9:31 a.m. UTC | #1
On Mon, Jul 13, 2020 at 07:15:19AM +0000, Cheng Jiang wrote:
> Added a flag which controls whether rte_ioat_enqueue_copy
> and rte_ioat_completed_copies function should process
> handle parameters to improve the performance when handle
> parameters are not necessary to use. This is targeting
> 20.11 release.
> 
> Signed-off-by: Cheng Jiang <Cheng1.jiang@intel.com>
> ---
Thanks for this. I'm also preparing some changes to this driver for 20.11
release, so I'll base my changes on top of this one.
  
Bruce Richardson July 13, 2020, 9:55 a.m. UTC | #2
On Mon, Jul 13, 2020 at 07:15:19AM +0000, Cheng Jiang wrote:
> Added a flag which controls whether rte_ioat_enqueue_copy
> and rte_ioat_completed_copies function should process
> handle parameters to improve the performance when handle
> parameters are not necessary to use. This is targeting
> 20.11 release.
> 
> Signed-off-by: Cheng Jiang <Cheng1.jiang@intel.com>
> ---
>  drivers/raw/ioat/ioat_rawdev.c     |  1 +
>  drivers/raw/ioat/rte_ioat_rawdev.h | 14 +++++++++++---
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
<snip>
> @@ -208,6 +213,11 @@ rte_ioat_completed_copies(int dev_id, uint8_t max_copies,
>  	if (count > max_copies)
>  		count = max_copies;
>  
> +	ioat->next_read = read + count;
> +	ioat->completed += count;
> +	if (!ioat->hdls_enable)
> +		return count;
> +
>  	for (; i < count - 1; i += 2, read += 2) {
>  		__m128i hdls0 = _mm_load_si128(&ioat->hdls[read & mask]);
>  		__m128i hdls1 = _mm_load_si128(&ioat->hdls[(read + 1) & mask]);
> @@ -223,8 +233,6 @@ rte_ioat_completed_copies(int dev_id, uint8_t max_copies,
>  		dst_hdls[i] = hdls[1];
>  	}
>  
> -	ioat->next_read = read;
> -	ioat->completed += count;
>  	return count;
>  }

This change I think may cause problems if we ever want to have one thread
enqueuing and another taking completions. The next_read and completed
counters should really only be updated after we have finished reading the
completed handles array. Therefore, for safety, I tihnk it might be better
to keep the updates in their original places and put an "end:" label before
them. Then the "return count" in the middle of the function can be "goto
end;"

Regards,
/Bruce
  
Bruce Richardson July 13, 2020, 9:59 a.m. UTC | #3
On Mon, Jul 13, 2020 at 10:55:30AM +0100, Bruce Richardson wrote:
> On Mon, Jul 13, 2020 at 07:15:19AM +0000, Cheng Jiang wrote:
> > Added a flag which controls whether rte_ioat_enqueue_copy
> > and rte_ioat_completed_copies function should process
> > handle parameters to improve the performance when handle
> > parameters are not necessary to use. This is targeting
> > 20.11 release.
> > 
> > Signed-off-by: Cheng Jiang <Cheng1.jiang@intel.com>
> > ---
> >  drivers/raw/ioat/ioat_rawdev.c     |  1 +
> >  drivers/raw/ioat/rte_ioat_rawdev.h | 14 +++++++++++---
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> > 
> <snip>
> > @@ -208,6 +213,11 @@ rte_ioat_completed_copies(int dev_id, uint8_t max_copies,
> >  	if (count > max_copies)
> >  		count = max_copies;
> >  
> > +	ioat->next_read = read + count;
> > +	ioat->completed += count;
> > +	if (!ioat->hdls_enable)
> > +		return count;
> > +
> >  	for (; i < count - 1; i += 2, read += 2) {
> >  		__m128i hdls0 = _mm_load_si128(&ioat->hdls[read & mask]);
> >  		__m128i hdls1 = _mm_load_si128(&ioat->hdls[(read + 1) & mask]);
> > @@ -223,8 +233,6 @@ rte_ioat_completed_copies(int dev_id, uint8_t max_copies,
> >  		dst_hdls[i] = hdls[1];
> >  	}
> >  
> > -	ioat->next_read = read;
> > -	ioat->completed += count;
> >  	return count;
> >  }
> 
> This change I think may cause problems if we ever want to have one thread
> enqueuing and another taking completions. The next_read and completed
> counters should really only be updated after we have finished reading the
> completed handles array. Therefore, for safety, I tihnk it might be better
> to keep the updates in their original places and put an "end:" label before
> them. Then the "return count" in the middle of the function can be "goto
> end;"
> 
A further suggestion to the changes to this function: if we are not
actually returning completion handles, then there is no need to limit the
count to "max_copies". Therefore move the "return count" or "goto end"
above the previous length check, and update the doxygen comments to
indicate that max_copies is ignored if "hdls_enable" is false, and that the
final two parameters can also be NULL when calling the function in this
case.

/Bruce
  
Bruce Richardson July 13, 2020, 1:28 p.m. UTC | #4
On Mon, Jul 13, 2020 at 07:15:19AM +0000, Cheng Jiang wrote:
> Added a flag which controls whether rte_ioat_enqueue_copy
> and rte_ioat_completed_copies function should process
> handle parameters to improve the performance when handle
> parameters are not necessary to use. This is targeting
> 20.11 release.
> 
> Signed-off-by: Cheng Jiang <Cheng1.jiang@intel.com>
> ---
>  drivers/raw/ioat/ioat_rawdev.c     |  1 +
>  drivers/raw/ioat/rte_ioat_rawdev.h | 14 +++++++++++---
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
> index 87fd088aa..5bf030785 100644
> --- a/drivers/raw/ioat/ioat_rawdev.c
> +++ b/drivers/raw/ioat/ioat_rawdev.c
> @@ -57,6 +57,7 @@ ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
>  		return -EINVAL;
>  
>  	ioat->ring_size = params->ring_size;
> +	ioat->hdls_enable = params->hdls_enable;
>  	if (ioat->desc_ring != NULL) {
>  		rte_memzone_free(ioat->desc_mz);
>  		ioat->desc_ring = NULL;
> diff --git a/drivers/raw/ioat/rte_ioat_rawdev.h b/drivers/raw/ioat/rte_ioat_rawdev.h
> index f765a6557..daca04dd3 100644
> --- a/drivers/raw/ioat/rte_ioat_rawdev.h
> +++ b/drivers/raw/ioat/rte_ioat_rawdev.h
> @@ -35,6 +35,7 @@
>   */
>  struct rte_ioat_rawdev_config {
>  	unsigned short ring_size;
> +	bool hdls_enable;
>  };
>  
You need a doxygen comment on the new structure member (and add to existing
one). While it's fairly clear what ring_size parameter should do, the
hdls_enable one needs a proper explanation.

I also think you might want to investigate changing it to "hdls_disable"
so that the default zero-value is the current behaviour. Requiring it to be
set as 1 means that existing code will likely recompile but fail to work.
For example, the ioat_rawdev_autotest fails on completion checks now.
  
Jiang, Cheng1 July 14, 2020, 10:45 a.m. UTC | #5
Hi Bruce,

Thank you very much for these comments which really make sense for me. I will correct these problems in the next version.

Thanks again.
Cheng


> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Monday, July 13, 2020 9:28 PM
> To: Jiang, Cheng1 <cheng1.jiang@intel.com>
> Cc: dev@dpdk.org; Fu, Patrick <patrick.fu@intel.com>
> Subject: Re: [PATCH 20.11] raw/ioat: added a flag to control copying handle
> parameters
> 
> On Mon, Jul 13, 2020 at 07:15:19AM +0000, Cheng Jiang wrote:
> > Added a flag which controls whether rte_ioat_enqueue_copy and
> > rte_ioat_completed_copies function should process handle parameters to
> > improve the performance when handle parameters are not necessary to
> > use. This is targeting
> > 20.11 release.
> >
> > Signed-off-by: Cheng Jiang <Cheng1.jiang@intel.com>
> > ---
> >  drivers/raw/ioat/ioat_rawdev.c     |  1 +
> >  drivers/raw/ioat/rte_ioat_rawdev.h | 14 +++++++++++---
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/raw/ioat/ioat_rawdev.c
> > b/drivers/raw/ioat/ioat_rawdev.c index 87fd088aa..5bf030785 100644
> > --- a/drivers/raw/ioat/ioat_rawdev.c
> > +++ b/drivers/raw/ioat/ioat_rawdev.c
> > @@ -57,6 +57,7 @@ ioat_dev_configure(const struct rte_rawdev *dev,
> rte_rawdev_obj_t config)
> >  		return -EINVAL;
> >
> >  	ioat->ring_size = params->ring_size;
> > +	ioat->hdls_enable = params->hdls_enable;
> >  	if (ioat->desc_ring != NULL) {
> >  		rte_memzone_free(ioat->desc_mz);
> >  		ioat->desc_ring = NULL;
> > diff --git a/drivers/raw/ioat/rte_ioat_rawdev.h
> > b/drivers/raw/ioat/rte_ioat_rawdev.h
> > index f765a6557..daca04dd3 100644
> > --- a/drivers/raw/ioat/rte_ioat_rawdev.h
> > +++ b/drivers/raw/ioat/rte_ioat_rawdev.h
> > @@ -35,6 +35,7 @@
> >   */
> >  struct rte_ioat_rawdev_config {
> >  	unsigned short ring_size;
> > +	bool hdls_enable;
> >  };
> >
> You need a doxygen comment on the new structure member (and add to
> existing one). While it's fairly clear what ring_size parameter should do, the
> hdls_enable one needs a proper explanation.
> 
> I also think you might want to investigate changing it to "hdls_disable"
> so that the default zero-value is the current behaviour. Requiring it to be set
> as 1 means that existing code will likely recompile but fail to work.
> For example, the ioat_rawdev_autotest fails on completion checks now.
  

Patch

diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index 87fd088aa..5bf030785 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -57,6 +57,7 @@  ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 		return -EINVAL;
 
 	ioat->ring_size = params->ring_size;
+	ioat->hdls_enable = params->hdls_enable;
 	if (ioat->desc_ring != NULL) {
 		rte_memzone_free(ioat->desc_mz);
 		ioat->desc_ring = NULL;
diff --git a/drivers/raw/ioat/rte_ioat_rawdev.h b/drivers/raw/ioat/rte_ioat_rawdev.h
index f765a6557..daca04dd3 100644
--- a/drivers/raw/ioat/rte_ioat_rawdev.h
+++ b/drivers/raw/ioat/rte_ioat_rawdev.h
@@ -35,6 +35,7 @@ 
  */
 struct rte_ioat_rawdev_config {
 	unsigned short ring_size;
+	bool hdls_enable;
 };
 
 /**
@@ -52,6 +53,8 @@  struct rte_ioat_rawdev {
 
 	unsigned short ring_size;
 	struct rte_ioat_generic_hw_desc *desc_ring;
+
+	bool hdls_enable; /* control if handles need to be copied */
 	__m128i *hdls; /* completion handles for returning to user */
 
 
@@ -121,8 +124,10 @@  rte_ioat_enqueue_copy(int dev_id, phys_addr_t src, phys_addr_t dst,
 	desc->u.control_raw = (uint32_t)((!!fence << 4) | (!(write & 0xF)) << 3);
 	desc->src_addr = src;
 	desc->dest_addr = dst;
+	if (ioat->hdls_enable)
+		ioat->hdls[write] = _mm_set_epi64x((int64_t)dst_hdl,
+					(int64_t)src_hdl);
 
-	ioat->hdls[write] = _mm_set_epi64x((int64_t)dst_hdl, (int64_t)src_hdl);
 	rte_prefetch0(&ioat->desc_ring[ioat->next_write & mask]);
 
 	ioat->enqueued++;
@@ -208,6 +213,11 @@  rte_ioat_completed_copies(int dev_id, uint8_t max_copies,
 	if (count > max_copies)
 		count = max_copies;
 
+	ioat->next_read = read + count;
+	ioat->completed += count;
+	if (!ioat->hdls_enable)
+		return count;
+
 	for (; i < count - 1; i += 2, read += 2) {
 		__m128i hdls0 = _mm_load_si128(&ioat->hdls[read & mask]);
 		__m128i hdls1 = _mm_load_si128(&ioat->hdls[(read + 1) & mask]);
@@ -223,8 +233,6 @@  rte_ioat_completed_copies(int dev_id, uint8_t max_copies,
 		dst_hdls[i] = hdls[1];
 	}
 
-	ioat->next_read = read;
-	ioat->completed += count;
 	return count;
 }