[v2] cryptodev: change raw data path dequeue API

Message ID 20210331172038.1718973-1-roy.fan.zhang@intel.com (mailing list archive)
State Accepted, archived
Delegated to: akhil goyal
Headers
Series [v2] cryptodev: change raw data path dequeue API |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Fan Zhang March 31, 2021, 5:20 p.m. UTC
  This patch changes the experimental raw data path dequeue burst API.
Originally the API enforces the user to provide callback function
to get maximum dequeue count. This change gives the user one more
option to pass directly the expected dequeue count.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_cryptodev.c              |  8 +-------
 doc/guides/rel_notes/release_21_05.rst |  3 +++
 drivers/crypto/qat/qat_sym_hw_dp.c     | 21 ++++++++++++++++++---
 lib/librte_cryptodev/rte_cryptodev.c   |  5 +++--
 lib/librte_cryptodev/rte_cryptodev.h   |  8 ++++++++
 5 files changed, 33 insertions(+), 12 deletions(-)
  

Comments

Akhil Goyal April 13, 2021, 10:19 a.m. UTC | #1
Hi Fan,

> This patch changes the experimental raw data path dequeue burst API.
> Originally the API enforces the user to provide callback function
> to get maximum dequeue count. This change gives the user one more
> option to pass directly the expected dequeue count.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  app/test/test_cryptodev.c              |  8 +-------
>  doc/guides/rel_notes/release_21_05.rst |  3 +++
>  drivers/crypto/qat/qat_sym_hw_dp.c     | 21 ++++++++++++++++++---
>  lib/librte_cryptodev/rte_cryptodev.c   |  5 +++--
>  lib/librte_cryptodev/rte_cryptodev.h   |  8 ++++++++
>  5 files changed, 33 insertions(+), 12 deletions(-)
> 
> diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
> index f91debc168..a910547423 100644
> --- a/app/test/test_cryptodev.c
> +++ b/app/test/test_cryptodev.c
> @@ -162,12 +162,6 @@ ceil_byte_length(uint32_t num_bits)
>  		return (num_bits >> 3);
>  }
> 
> -static uint32_t
> -get_raw_dp_dequeue_count(void *user_data __rte_unused)
> -{
> -	return 1;
> -}
> -
>  static void
>  post_process_raw_dp_op(void *user_data,	uint32_t index __rte_unused,
>  		uint8_t is_op_success)
> @@ -345,7 +339,7 @@ process_sym_raw_dp_op(uint8_t dev_id, uint16_t
> qp_id,
>  	n = n_success = 0;
>  	while (count++ < MAX_RAW_DEQUEUE_COUNT && n == 0) {
>  		n = rte_cryptodev_raw_dequeue_burst(ctx,
> -			get_raw_dp_dequeue_count,
> post_process_raw_dp_op,
> +			NULL, 1, post_process_raw_dp_op,
>  				(void **)&ret_op, 0, &n_success,
>  				&dequeue_status);
>  		if (dequeue_status < 0) {
> diff --git a/doc/guides/rel_notes/release_21_05.rst
> b/doc/guides/rel_notes/release_21_05.rst
> index 8e686cc627..943f1596c5 100644
> --- a/doc/guides/rel_notes/release_21_05.rst
> +++ b/doc/guides/rel_notes/release_21_05.rst
> @@ -130,6 +130,9 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =======================================================
> 
> +* cryptodev: the function ``rte_cryptodev_raw_dequeue_burst`` is added a
> +  parameter ``max_nb_to_dequeue`` to give user a more flexible dequeue
> control.
> +

Shouldn't we remove the callback completely?
What is the use case of having 2 different methods of passing a 
Simple dequeue count?
Why do we need such flexibility?

Regards,
Akhil
  
Fan Zhang April 15, 2021, 10:14 a.m. UTC | #2
Hi Akhil,

It is possible the user don't know how many ops to dequeue. 
For example in VPP crypto up to 64 buffers (vnet_crypto_async_frame_elt_t) are wrapped into the following data structure

typedef struct
{
  CLIB_CACHE_LINE_ALIGN_MARK (cacheline0);
  vnet_crypto_async_frame_state_t state;
  vnet_crypto_async_op_id_t op:8;
  u16 n_elts;
  vnet_crypto_async_frame_elt_t elts[VNET_CRYPTO_FRAME_SIZE];
  u32 buffer_indices[VNET_CRYPTO_FRAME_SIZE];
  u16 next_node_index[VNET_CRYPTO_FRAME_SIZE];
  u32 enqueue_thread_index;
} vnet_crypto_async_frame_t;

Instead of passing vnet_crypto_async_frame_elt_t Pointer as metadata to cryptodev, we have to pass vnet_crypto_async_frame_t pointer into cryptodev.
The callback function helps parse the first dequeued metadata to get n_elts and will dequeue that many ops.

But in case we cannot dequeue the whole frame, passing the number of ops not dequeued yet in the next dequeue_burst operation should help us to dequeue the whole frame. In this case we only have to cache up to 1 frame pointer for half dequeued frame.

As the patch stated this should help cover both cases for user either dequeue the wrapped data structure with multiple buffers, or dequeue a burst of packets - hence giving people more flexibility. 

Regards,
Fan

> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Tuesday, April 13, 2021 11:20 AM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Subject: RE: [EXT] [dpdk-dev v2] cryptodev: change raw data path dequeue
> API
> 
> Hi Fan,
> 
> > This patch changes the experimental raw data path dequeue burst API.
> > Originally the API enforces the user to provide callback function
> > to get maximum dequeue count. This change gives the user one more
> > option to pass directly the expected dequeue count.
> >
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > ---
> >  app/test/test_cryptodev.c              |  8 +-------
> >  doc/guides/rel_notes/release_21_05.rst |  3 +++
> >  drivers/crypto/qat/qat_sym_hw_dp.c     | 21 ++++++++++++++++++---
> >  lib/librte_cryptodev/rte_cryptodev.c   |  5 +++--
> >  lib/librte_cryptodev/rte_cryptodev.h   |  8 ++++++++
> >  5 files changed, 33 insertions(+), 12 deletions(-)
> >
> > diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
> > index f91debc168..a910547423 100644
> > --- a/app/test/test_cryptodev.c
> > +++ b/app/test/test_cryptodev.c
> > @@ -162,12 +162,6 @@ ceil_byte_length(uint32_t num_bits)
> >  		return (num_bits >> 3);
> >  }
> >
> > -static uint32_t
> > -get_raw_dp_dequeue_count(void *user_data __rte_unused)
> > -{
> > -	return 1;
> > -}
> > -
> >  static void
> >  post_process_raw_dp_op(void *user_data,	uint32_t index __rte_unused,
> >  		uint8_t is_op_success)
> > @@ -345,7 +339,7 @@ process_sym_raw_dp_op(uint8_t dev_id, uint16_t
> > qp_id,
> >  	n = n_success = 0;
> >  	while (count++ < MAX_RAW_DEQUEUE_COUNT && n == 0) {
> >  		n = rte_cryptodev_raw_dequeue_burst(ctx,
> > -			get_raw_dp_dequeue_count,
> > post_process_raw_dp_op,
> > +			NULL, 1, post_process_raw_dp_op,
> >  				(void **)&ret_op, 0, &n_success,
> >  				&dequeue_status);
> >  		if (dequeue_status < 0) {
> > diff --git a/doc/guides/rel_notes/release_21_05.rst
> > b/doc/guides/rel_notes/release_21_05.rst
> > index 8e686cc627..943f1596c5 100644
> > --- a/doc/guides/rel_notes/release_21_05.rst
> > +++ b/doc/guides/rel_notes/release_21_05.rst
> > @@ -130,6 +130,9 @@ API Changes
> >     Also, make sure to start the actual text at the margin.
> >     =======================================================
> >
> > +* cryptodev: the function ``rte_cryptodev_raw_dequeue_burst`` is added
> a
> > +  parameter ``max_nb_to_dequeue`` to give user a more flexible
> dequeue
> > control.
> > +
> 
> Shouldn't we remove the callback completely?
> What is the use case of having 2 different methods of passing a
> Simple dequeue count?
> Why do we need such flexibility?
> 
> Regards,
> Akhil
  
Akhil Goyal April 15, 2021, 2:45 p.m. UTC | #3
> 
> Hi Akhil,
> 
> It is possible the user don't know how many ops to dequeue.
> For example in VPP crypto up to 64 buffers (vnet_crypto_async_frame_elt_t)
> are wrapped into the following data structure
> 
> typedef struct
> {
>   CLIB_CACHE_LINE_ALIGN_MARK (cacheline0);
>   vnet_crypto_async_frame_state_t state;
>   vnet_crypto_async_op_id_t op:8;
>   u16 n_elts;
>   vnet_crypto_async_frame_elt_t elts[VNET_CRYPTO_FRAME_SIZE];
>   u32 buffer_indices[VNET_CRYPTO_FRAME_SIZE];
>   u16 next_node_index[VNET_CRYPTO_FRAME_SIZE];
>   u32 enqueue_thread_index;
> } vnet_crypto_async_frame_t;
> 
> Instead of passing vnet_crypto_async_frame_elt_t Pointer as metadata to
> cryptodev, we have to pass vnet_crypto_async_frame_t pointer into
> cryptodev.
> The callback function helps parse the first dequeued metadata to get n_elts
> and will dequeue that many ops.
> 
> But in case we cannot dequeue the whole frame, passing the number of ops
> not dequeued yet in the next dequeue_burst operation should help us to
> dequeue the whole frame. In this case we only have to cache up to 1 frame
> pointer for half dequeued frame.
> 
> As the patch stated this should help cover both cases for user either dequeue
> the wrapped data structure with multiple buffers, or dequeue a burst of
> packets - hence giving people more flexibility.
> 
> Regards,
> Fan
> 
Ok.

Acked-by: Akhil Goyal <gakhil@marvell.com>
  
Akhil Goyal April 16, 2021, 10:12 a.m. UTC | #4
> >
> > Hi Akhil,
> >
> > It is possible the user don't know how many ops to dequeue.
> > For example in VPP crypto up to 64 buffers
> (vnet_crypto_async_frame_elt_t)
> > are wrapped into the following data structure
> >
> > typedef struct
> > {
> >   CLIB_CACHE_LINE_ALIGN_MARK (cacheline0);
> >   vnet_crypto_async_frame_state_t state;
> >   vnet_crypto_async_op_id_t op:8;
> >   u16 n_elts;
> >   vnet_crypto_async_frame_elt_t elts[VNET_CRYPTO_FRAME_SIZE];
> >   u32 buffer_indices[VNET_CRYPTO_FRAME_SIZE];
> >   u16 next_node_index[VNET_CRYPTO_FRAME_SIZE];
> >   u32 enqueue_thread_index;
> > } vnet_crypto_async_frame_t;
> >
> > Instead of passing vnet_crypto_async_frame_elt_t Pointer as metadata to
> > cryptodev, we have to pass vnet_crypto_async_frame_t pointer into
> > cryptodev.
> > The callback function helps parse the first dequeued metadata to get n_elts
> > and will dequeue that many ops.
> >
> > But in case we cannot dequeue the whole frame, passing the number of
> ops
> > not dequeued yet in the next dequeue_burst operation should help us to
> > dequeue the whole frame. In this case we only have to cache up to 1 frame
> > pointer for half dequeued frame.
> >
> > As the patch stated this should help cover both cases for user either
> dequeue
> > the wrapped data structure with multiple buffers, or dequeue a burst of
> > packets - hence giving people more flexibility.
> >
> > Regards,
> > Fan
> >
> Ok.
> 
> Acked-by: Akhil Goyal <gakhil@marvell.com>

Applied to dpdk-next-crypto

Thanks.
  

Patch

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index f91debc168..a910547423 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -162,12 +162,6 @@  ceil_byte_length(uint32_t num_bits)
 		return (num_bits >> 3);
 }
 
-static uint32_t
-get_raw_dp_dequeue_count(void *user_data __rte_unused)
-{
-	return 1;
-}
-
 static void
 post_process_raw_dp_op(void *user_data,	uint32_t index __rte_unused,
 		uint8_t is_op_success)
@@ -345,7 +339,7 @@  process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id,
 	n = n_success = 0;
 	while (count++ < MAX_RAW_DEQUEUE_COUNT && n == 0) {
 		n = rte_cryptodev_raw_dequeue_burst(ctx,
-			get_raw_dp_dequeue_count, post_process_raw_dp_op,
+			NULL, 1, post_process_raw_dp_op,
 				(void **)&ret_op, 0, &n_success,
 				&dequeue_status);
 		if (dequeue_status < 0) {
diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
index 8e686cc627..943f1596c5 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -130,6 +130,9 @@  API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* cryptodev: the function ``rte_cryptodev_raw_dequeue_burst`` is added a
+  parameter ``max_nb_to_dequeue`` to give user a more flexible dequeue control.
+
 
 ABI Changes
 -----------
diff --git a/drivers/crypto/qat/qat_sym_hw_dp.c b/drivers/crypto/qat/qat_sym_hw_dp.c
index 01afb883e3..2f64de44a1 100644
--- a/drivers/crypto/qat/qat_sym_hw_dp.c
+++ b/drivers/crypto/qat/qat_sym_hw_dp.c
@@ -707,6 +707,7 @@  qat_sym_dp_enqueue_chain_jobs(void *qp_data, uint8_t *drv_ctx,
 static __rte_always_inline uint32_t
 qat_sym_dp_dequeue_burst(void *qp_data, uint8_t *drv_ctx,
 	rte_cryptodev_raw_get_dequeue_count_t get_dequeue_count,
+	uint32_t max_nb_to_dequeue,
 	rte_cryptodev_raw_post_dequeue_t post_dequeue,
 	void **out_user_data, uint8_t is_user_data_array,
 	uint32_t *n_success_jobs, int *return_status)
@@ -736,9 +737,23 @@  qat_sym_dp_dequeue_burst(void *qp_data, uint8_t *drv_ctx,
 
 	resp_opaque = (void *)(uintptr_t)resp->opaque_data;
 	/* get the dequeue count */
-	n = get_dequeue_count(resp_opaque);
-	if (unlikely(n == 0))
-		return 0;
+	if (get_dequeue_count) {
+		n = get_dequeue_count(resp_opaque);
+		if (unlikely(n == 0))
+			return 0;
+		else if (n > 1) {
+			head = (head + rx_queue->msg_size * (n - 1)) &
+				rx_queue->modulo_mask;
+			resp = (struct icp_qat_fw_comn_resp *)(
+				(uint8_t *)rx_queue->base_addr + head);
+			if (*(uint32_t *)resp == ADF_RING_EMPTY_SIG)
+				return 0;
+		}
+	} else {
+		if (unlikely(max_nb_to_dequeue == 0))
+			return 0;
+		n = max_nb_to_dequeue;
+	}
 
 	out_user_data[0] = resp_opaque;
 	status = QAT_SYM_DP_IS_RESP_SUCCESS(resp);
diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c
index 40f55a3cd0..0c16b04f80 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -2232,13 +2232,14 @@  rte_cryptodev_raw_enqueue_done(struct rte_crypto_raw_dp_ctx *ctx,
 uint32_t
 rte_cryptodev_raw_dequeue_burst(struct rte_crypto_raw_dp_ctx *ctx,
 	rte_cryptodev_raw_get_dequeue_count_t get_dequeue_count,
+	uint32_t max_nb_to_dequeue,
 	rte_cryptodev_raw_post_dequeue_t post_dequeue,
 	void **out_user_data, uint8_t is_user_data_array,
 	uint32_t *n_success_jobs, int *status)
 {
 	return (*ctx->dequeue_burst)(ctx->qp_data, ctx->drv_ctx_data,
-		get_dequeue_count, post_dequeue, out_user_data,
-		is_user_data_array, n_success_jobs, status);
+		get_dequeue_count, max_nb_to_dequeue, post_dequeue,
+		out_user_data, is_user_data_array, n_success_jobs, status);
 }
 
 int
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index ae34f33f69..b2a1255112 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -1546,6 +1546,9 @@  typedef void (*rte_cryptodev_raw_post_dequeue_t)(void *user_data,
  * @param	drv_ctx			Driver specific context data.
  * @param	get_dequeue_count	User provided callback function to
  *					obtain dequeue operation count.
+ * @param	max_nb_to_dequeue	When get_dequeue_count is NULL this
+ *					value is used to pass the maximum
+ *					number of operations to be dequeued.
  * @param	post_dequeue		User provided callback function to
  *					post-process a dequeued operation.
  * @param	out_user_data		User data pointer array to be retrieve
@@ -1580,6 +1583,7 @@  typedef void (*rte_cryptodev_raw_post_dequeue_t)(void *user_data,
 typedef uint32_t (*cryptodev_sym_raw_dequeue_burst_t)(void *qp,
 	uint8_t *drv_ctx,
 	rte_cryptodev_raw_get_dequeue_count_t get_dequeue_count,
+	uint32_t max_nb_to_dequeue,
 	rte_cryptodev_raw_post_dequeue_t post_dequeue,
 	void **out_user_data, uint8_t is_user_data_array,
 	uint32_t *n_success, int *dequeue_status);
@@ -1747,6 +1751,9 @@  rte_cryptodev_raw_enqueue_done(struct rte_crypto_raw_dp_ctx *ctx,
  *					data.
  * @param	get_dequeue_count	User provided callback function to
  *					obtain dequeue operation count.
+ * @param	max_nb_to_dequeue	When get_dequeue_count is NULL this
+ *					value is used to pass the maximum
+ *					number of operations to be dequeued.
  * @param	post_dequeue		User provided callback function to
  *					post-process a dequeued operation.
  * @param	out_user_data		User data pointer array to be retrieve
@@ -1782,6 +1789,7 @@  __rte_experimental
 uint32_t
 rte_cryptodev_raw_dequeue_burst(struct rte_crypto_raw_dp_ctx *ctx,
 	rte_cryptodev_raw_get_dequeue_count_t get_dequeue_count,
+	uint32_t max_nb_to_dequeue,
 	rte_cryptodev_raw_post_dequeue_t post_dequeue,
 	void **out_user_data, uint8_t is_user_data_array,
 	uint32_t *n_success, int *dequeue_status);