[v24,3/6] dmadev: add data plane API support

Message ID 20211009093340.43237-4-fengchengwen@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series support dmadev |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Chengwen Feng Oct. 9, 2021, 9:33 a.m. UTC
  This patch add data plane API for dmadev.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
---
 doc/guides/prog_guide/dmadev.rst       |  22 ++
 doc/guides/rel_notes/release_21_11.rst |   2 +-
 lib/dmadev/meson.build                 |   1 +
 lib/dmadev/rte_dmadev.c                | 134 ++++++++
 lib/dmadev/rte_dmadev.h                | 451 +++++++++++++++++++++++++
 lib/dmadev/rte_dmadev_core.h           |  78 +++++
 lib/dmadev/rte_dmadev_pmd.h            |   7 +
 lib/dmadev/version.map                 |   6 +
 8 files changed, 700 insertions(+), 1 deletion(-)
 create mode 100644 lib/dmadev/rte_dmadev_core.h
  

Comments

Chengwen Feng Oct. 9, 2021, 10:03 a.m. UTC | #1
This patch follows the idea of Konstantin, and introduces rte_dma_fp_object to hide
implementation detail.
This change modify the first parameter of drivers's dataplane interface: from
'struct rte_dma_dev *dev' to 'void *dev_private'.

On 2021/10/9 17:33, Chengwen Feng wrote:
> This patch add data plane API for dmadev.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
> Reviewed-by: Conor Walsh <conor.walsh@intel.com>
> ---
>  doc/guides/prog_guide/dmadev.rst       |  22 ++
>  doc/guides/rel_notes/release_21_11.rst |   2 +-
>  lib/dmadev/meson.build                 |   1 +
>  lib/dmadev/rte_dmadev.c                | 134 ++++++++
>  lib/dmadev/rte_dmadev.h                | 451 +++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h           |  78 +++++
>  lib/dmadev/rte_dmadev_pmd.h            |   7 +
>  lib/dmadev/version.map                 |   6 +
>  8 files changed, 700 insertions(+), 1 deletion(-)
>  create mode 100644 lib/dmadev/rte_dmadev_core.h
> 

[snip]
  
Bruce Richardson Oct. 11, 2021, 10:40 a.m. UTC | #2
On Sat, Oct 09, 2021 at 05:33:37PM +0800, Chengwen Feng wrote:
> This patch add data plane API for dmadev.
>

A few initial comments inline. I'll work on rebasing my follow-up patchset
to this, and let you know if I have any more feedback based on that.

/Bruce
 
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> index a6a5680d2b..891ceeb988 100644
> --- a/lib/dmadev/rte_dmadev.c
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -17,6 +17,7 @@
>  
>  static int16_t dma_devices_max;
>  
> +struct rte_dma_fp_object *rte_dma_fp_objs;

While I think I like this approach of making more of the dmadev hidden, I
think we need a better name for this. While there is the dev_private
pointer in it, the struct is pretty much the datapath functions, so how
about "rte_dma_funcs" as a name?

>  struct rte_dma_dev *rte_dma_devices;
>  

<snip>

> +/**
> + * @internal
> + * Fast-path dmadev functions and related data are hold in a flat array.
> + * One entry per dmadev.
> + *
> + * On 64-bit systems contents of this structure occupy exactly two 64B lines.
> + * On 32-bit systems contents of this structure fits into one 64B line.
> + *
> + * The 'dev_private' field was placed in the first cache line to optimize
> + * performance because the PMD driver mainly depends on this field.
> + */
> +struct rte_dma_fp_object {
> +	void *dev_private; /**< PMD-specific private data. */
> +	rte_dma_copy_t             copy;
> +	rte_dma_copy_sg_t          copy_sg;
> +	rte_dma_fill_t             fill;
> +	rte_dma_submit_t           submit;
> +	rte_dma_completed_t        completed;
> +	rte_dma_completed_status_t completed_status;
> +	void *reserved_cl0;
> +	/** Reserve space for future IO functions, while keeping data and
> +	 * dev_ops pointers on the second cacheline.
> +	 */
This comment is out of date.

> +	void *reserved_cl1[6];
> +} __rte_cache_aligned;

Small suggestion: since there is no data at the end of the structure,
rather than adding in padding arrays which need to be adjusted as we add
fields into the struct, let's just change the "__rte_cache_aligned" macro
to "__rte_aligned(128)". This will explicitly set the size to 128-bytes and
allow us to remove the reserved fields - making it easier to add new
pointers.

> +
> +extern struct rte_dma_fp_object *rte_dma_fp_objs;
> +
> +#endif /* RTE_DMADEV_CORE_H */
  
Chengwen Feng Oct. 11, 2021, 12:31 p.m. UTC | #3
On 2021/10/11 18:40, Bruce Richardson wrote:
> On Sat, Oct 09, 2021 at 05:33:37PM +0800, Chengwen Feng wrote:
>> This patch add data plane API for dmadev.
>>
> 
> A few initial comments inline. I'll work on rebasing my follow-up patchset
> to this, and let you know if I have any more feedback based on that.
> 
> /Bruce
>  
>> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
>> index a6a5680d2b..891ceeb988 100644
>> --- a/lib/dmadev/rte_dmadev.c
>> +++ b/lib/dmadev/rte_dmadev.c
>> @@ -17,6 +17,7 @@
>>  
>>  static int16_t dma_devices_max;
>>  
>> +struct rte_dma_fp_object *rte_dma_fp_objs;
> 
> While I think I like this approach of making more of the dmadev hidden, I
> think we need a better name for this. While there is the dev_private
> pointer in it, the struct is pretty much the datapath functions, so how
> about "rte_dma_funcs" as a name?

en, I notice ethdev and eventdev both use rte_xxx_fp_ops, but this structure
has other fileds(e.g. data pointers) in addition to ops, it's inappropriate to
use ops suffix. So I use the 'object' which is widely used in object-oriented.

It's better to use uniform naming in ethdev/eventdev/dmadev and so on, would
be happy to hear more.

> 
>>  struct rte_dma_dev *rte_dma_devices;
>>  
> 
> <snip>
> 
>> +/**
>> + * @internal
>> + * Fast-path dmadev functions and related data are hold in a flat array.
>> + * One entry per dmadev.
>> + *
>> + * On 64-bit systems contents of this structure occupy exactly two 64B lines.
>> + * On 32-bit systems contents of this structure fits into one 64B line.
>> + *
>> + * The 'dev_private' field was placed in the first cache line to optimize
>> + * performance because the PMD driver mainly depends on this field.
>> + */
>> +struct rte_dma_fp_object {
>> +	void *dev_private; /**< PMD-specific private data. */
>> +	rte_dma_copy_t             copy;
>> +	rte_dma_copy_sg_t          copy_sg;
>> +	rte_dma_fill_t             fill;
>> +	rte_dma_submit_t           submit;
>> +	rte_dma_completed_t        completed;
>> +	rte_dma_completed_status_t completed_status;
>> +	void *reserved_cl0;
>> +	/** Reserve space for future IO functions, while keeping data and
>> +	 * dev_ops pointers on the second cacheline.
>> +	 */
> This comment is out of date.
> 
>> +	void *reserved_cl1[6];
>> +} __rte_cache_aligned;
> 
> Small suggestion: since there is no data at the end of the structure,
> rather than adding in padding arrays which need to be adjusted as we add
> fields into the struct, let's just change the "__rte_cache_aligned" macro
> to "__rte_aligned(128)". This will explicitly set the size to 128-bytes and
> allow us to remove the reserved fields - making it easier to add new
> pointers.

Agree

> 
>> +
>> +extern struct rte_dma_fp_object *rte_dma_fp_objs;
>> +
>> +#endif /* RTE_DMADEV_CORE_H */
> 
> .
> 

Thanks
  

Patch

diff --git a/doc/guides/prog_guide/dmadev.rst b/doc/guides/prog_guide/dmadev.rst
index 5c70ad3d6a..2e2a4bb62a 100644
--- a/doc/guides/prog_guide/dmadev.rst
+++ b/doc/guides/prog_guide/dmadev.rst
@@ -96,3 +96,25 @@  can be used to get the device info and supported features.
 
 Silent mode is a special device capability which does not require the
 application to invoke dequeue APIs.
+
+
+Enqueue / Dequeue APIs
+~~~~~~~~~~~~~~~~~~~~~~
+
+Enqueue APIs such as ``rte_dma_copy`` and ``rte_dma_fill`` can be used to
+enqueue operations to hardware. If an enqueue is successful, a ``ring_idx`` is
+returned. This ``ring_idx`` can be used by applications to track per operation
+metadata in an application-defined circular ring.
+
+The ``rte_dma_submit`` API is used to issue doorbell to hardware.
+Alternatively the ``RTE_DMA_OP_FLAG_SUBMIT`` flag can be passed to the enqueue
+APIs to also issue the doorbell to hardware.
+
+There are two dequeue APIs ``rte_dma_completed`` and
+``rte_dma_completed_status``, these are used to obtain the results of the
+enqueue requests. ``rte_dma_completed`` will return the number of successfully
+completed operations. ``rte_dma_completed_status`` will return the number of
+completed operations along with the status of each operation (filled into the
+``status`` array passed by user). These two APIs can also return the last
+completed operation's ``ring_idx`` which could help user track operations within
+their own application-defined rings.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index f935a3f395..d1d7abf694 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -144,7 +144,7 @@  New Features
 * **Introduced dmadev library with:**
 
   * Device allocation functions.
-  * Control plane API.
+  * Control and data plane API.
 
 
 Removed Items
diff --git a/lib/dmadev/meson.build b/lib/dmadev/meson.build
index f8d54c6e74..d2fc85e8c7 100644
--- a/lib/dmadev/meson.build
+++ b/lib/dmadev/meson.build
@@ -3,4 +3,5 @@ 
 
 sources = files('rte_dmadev.c')
 headers = files('rte_dmadev.h')
+indirect_headers += files('rte_dmadev_core.h')
 driver_sdk_headers += files('rte_dmadev_pmd.h')
diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index a6a5680d2b..891ceeb988 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -17,6 +17,7 @@ 
 
 static int16_t dma_devices_max;
 
+struct rte_dma_fp_object *rte_dma_fp_objs;
 struct rte_dma_dev *rte_dma_devices;
 
 RTE_LOG_REGISTER_DEFAULT(rte_dma_logtype, INFO);
@@ -97,6 +98,38 @@  dma_find_by_name(const char *name)
 	return NULL;
 }
 
+static void dma_fp_object_reset(int16_t dev_id);
+
+static int
+dma_fp_data_prepare(void)
+{
+	size_t size;
+	void *ptr;
+	int i;
+
+	if (rte_dma_fp_objs != NULL)
+		return 0;
+
+	/* Fast-path object must align cacheline, but the return value of malloc
+	 * may not be aligned to the cache line. Therefore, extra memory is
+	 * applied for realignment.
+	 * note: We do not call posix_memalign/aligned_alloc because it is
+	 * version dependent on libc.
+	 */
+	size = dma_devices_max * sizeof(struct rte_dma_fp_object) +
+		RTE_CACHE_LINE_SIZE;
+	ptr = malloc(size);
+	if (ptr == NULL)
+		return -ENOMEM;
+	memset(ptr, 0, size);
+
+	rte_dma_fp_objs = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
+	for (i = 0; i < dma_devices_max; i++)
+		dma_fp_object_reset(i);
+
+	return 0;
+}
+
 static int
 dma_dev_data_prepare(void)
 {
@@ -117,8 +150,15 @@  dma_dev_data_prepare(void)
 static int
 dma_data_prepare(void)
 {
+	int ret;
+
 	if (dma_devices_max == 0)
 		dma_devices_max = RTE_DMADEV_DEFAULT_MAX;
+
+	ret = dma_fp_data_prepare();
+	if (ret)
+		return ret;
+
 	return dma_dev_data_prepare();
 }
 
@@ -317,6 +357,8 @@  rte_dma_configure(int16_t dev_id, const struct rte_dma_conf *dev_conf)
 	return ret;
 }
 
+static void dma_fp_object_setup(int16_t dev_id,	const struct rte_dma_dev *dev);
+
 int
 rte_dma_start(int16_t dev_id)
 {
@@ -344,6 +386,7 @@  rte_dma_start(int16_t dev_id)
 		return ret;
 
 mark_started:
+	dma_fp_object_setup(dev_id, dev);
 	dev->dev_started = 1;
 	return 0;
 }
@@ -370,6 +413,7 @@  rte_dma_stop(int16_t dev_id)
 		return ret;
 
 mark_stopped:
+	dma_fp_object_reset(dev_id);
 	dev->dev_started = 0;
 	return 0;
 }
@@ -604,3 +648,93 @@  rte_dma_dump(int16_t dev_id, FILE *f)
 
 	return 0;
 }
+
+static int
+dummy_copy(__rte_unused void *dev_private, __rte_unused uint16_t vchan,
+	   __rte_unused rte_iova_t src, __rte_unused rte_iova_t dst,
+	   __rte_unused uint32_t length, __rte_unused uint64_t flags)
+{
+	RTE_DMA_LOG(ERR, "copy is not configured or not supported.");
+	return -EINVAL;
+}
+
+static int
+dummy_copy_sg(__rte_unused void *dev_private, __rte_unused uint16_t vchan,
+	      __rte_unused const struct rte_dma_sge *src,
+	      __rte_unused const struct rte_dma_sge *dst,
+	      __rte_unused uint16_t nb_src, __rte_unused uint16_t nb_dst,
+	      __rte_unused uint64_t flags)
+{
+	RTE_DMA_LOG(ERR, "copy_sg is not configured or not supported.");
+	return -EINVAL;
+}
+
+static int
+dummy_fill(__rte_unused void *dev_private, __rte_unused uint16_t vchan,
+	   __rte_unused uint64_t pattern, __rte_unused rte_iova_t dst,
+	   __rte_unused uint32_t length, __rte_unused uint64_t flags)
+{
+	RTE_DMA_LOG(ERR, "fill is not configured or not supported.");
+	return -EINVAL;
+}
+
+static int
+dummy_submit(__rte_unused void *dev_private, __rte_unused uint16_t vchan)
+{
+	RTE_DMA_LOG(ERR, "submit is not configured or not supported.");
+	return -EINVAL;
+}
+
+static uint16_t
+dummy_completed(__rte_unused void *dev_private,	__rte_unused uint16_t vchan,
+		__rte_unused const uint16_t nb_cpls,
+		__rte_unused uint16_t *last_idx, __rte_unused bool *has_error)
+{
+	RTE_DMA_LOG(ERR, "completed is not configured or not supported.");
+	return 0;
+}
+
+static uint16_t
+dummy_completed_status(__rte_unused void *dev_private,
+		       __rte_unused uint16_t vchan,
+		       __rte_unused const uint16_t nb_cpls,
+		       __rte_unused uint16_t *last_idx,
+		       __rte_unused enum rte_dma_status_code *status)
+{
+	RTE_DMA_LOG(ERR,
+		    "completed_status is not configured or not supported.");
+	return 0;
+}
+
+static void
+dma_fp_object_reset(int16_t dev_id)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+	obj->copy             = dummy_copy;
+	obj->copy_sg          = dummy_copy_sg;
+	obj->fill             = dummy_fill;
+	obj->submit           = dummy_submit;
+	obj->completed        = dummy_completed;
+	obj->completed_status = dummy_completed_status;
+}
+
+static void
+dma_fp_object_setup(int16_t dev_id, const struct rte_dma_dev *dev)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+	obj->dev_private = dev->dev_private;
+	if (dev->dev_ops->copy)
+		obj->copy = dev->dev_ops->copy;
+	if (dev->dev_ops->copy_sg)
+		obj->copy_sg = dev->dev_ops->copy_sg;
+	if (dev->dev_ops->fill)
+		obj->fill = dev->dev_ops->fill;
+	if (dev->dev_ops->submit)
+		obj->submit = dev->dev_ops->submit;
+	if (dev->dev_ops->completed)
+		obj->completed = dev->dev_ops->completed;
+	if (dev->dev_ops->completed_status)
+		obj->completed_status = dev->dev_ops->completed_status;
+}
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
index 34a4c26851..95b6a0a810 100644
--- a/lib/dmadev/rte_dmadev.h
+++ b/lib/dmadev/rte_dmadev.h
@@ -65,6 +65,77 @@ 
  * Finally, an application can close a dmadev by invoking the rte_dma_close()
  * function.
  *
+ * The dataplane APIs include two parts:
+ * The first part is the submission of operation requests:
+ *     - rte_dma_copy()
+ *     - rte_dma_copy_sg()
+ *     - rte_dma_fill()
+ *     - rte_dma_submit()
+ *
+ * These APIs could work with different virtual DMA channels which have
+ * different contexts.
+ *
+ * The first three APIs are used to submit the operation request to the virtual
+ * DMA channel, if the submission is successful, a positive
+ * ring_idx <= UINT16_MAX is returned, otherwise a negative number is returned.
+ *
+ * The last API is used to issue doorbell to hardware, and also there are flags
+ * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
+ * same work.
+ * @note When enqueuing a set of jobs to the device, having a separate submit
+ * outside a loop makes for clearer code than having a check for the last
+ * iteration inside the loop to set a special submit flag.  However, for cases
+ * where one item alone is to be submitted or there is a small set of jobs to
+ * be submitted sequentially, having a submit flag provides a lower-overhead
+ * way of doing the submission while still keeping the code clean.
+ *
+ * The second part is to obtain the result of requests:
+ *     - rte_dma_completed()
+ *         - return the number of operation requests completed successfully.
+ *     - rte_dma_completed_status()
+ *         - return the number of operation requests completed.
+ *
+ * @note If the dmadev works in silent mode (@see RTE_DMA_CAPA_SILENT),
+ * application does not invoke the above two completed APIs.
+ *
+ * About the ring_idx which enqueue APIs (e.g. rte_dma_copy(), rte_dma_fill())
+ * return, the rules are as follows:
+ *     - ring_idx for each virtual DMA channel are independent.
+ *     - For a virtual DMA channel, the ring_idx is monotonically incremented,
+ *       when it reach UINT16_MAX, it wraps back to zero.
+ *     - This ring_idx can be used by applications to track per-operation
+ *       metadata in an application-defined circular ring.
+ *     - The initial ring_idx of a virtual DMA channel is zero, after the
+ *       device is stopped, the ring_idx needs to be reset to zero.
+ *
+ * One example:
+ *     - step-1: start one dmadev
+ *     - step-2: enqueue a copy operation, the ring_idx return is 0
+ *     - step-3: enqueue a copy operation again, the ring_idx return is 1
+ *     - ...
+ *     - step-101: stop the dmadev
+ *     - step-102: start the dmadev
+ *     - step-103: enqueue a copy operation, the ring_idx return is 0
+ *     - ...
+ *     - step-x+0: enqueue a fill operation, the ring_idx return is 65535
+ *     - step-x+1: enqueue a copy operation, the ring_idx return is 0
+ *     - ...
+ *
+ * The DMA operation address used in enqueue APIs (i.e. rte_dma_copy(),
+ * rte_dma_copy_sg(), rte_dma_fill()) is defined as rte_iova_t type.
+ *
+ * The dmadev supports two types of address: memory address and device address.
+ *
+ * - memory address: the source and destination address of the memory-to-memory
+ * transfer type, or the source address of the memory-to-device transfer type,
+ * or the destination address of the device-to-memory transfer type.
+ * @note If the device support SVA (@see RTE_DMA_CAPA_SVA), the memory address
+ * can be any VA address, otherwise it must be an IOVA address.
+ *
+ * - device address: the source and destination address of the device-to-device
+ * transfer type, or the source address of the device-to-memory transfer type,
+ * or the destination address of the memory-to-device transfer type.
+ *
  * About MT-safe, all the functions of the dmadev API implemented by a PMD are
  * lock-free functions which assume to not be invoked in parallel on different
  * logical cores to work on the same target dmadev object.
@@ -590,6 +661,386 @@  int rte_dma_stats_reset(int16_t dev_id, uint16_t vchan);
 __rte_experimental
 int rte_dma_dump(int16_t dev_id, FILE *f);
 
+/**
+ * DMA transfer result status code defines.
+ *
+ * @see rte_dma_completed_status
+ */
+enum rte_dma_status_code {
+	/** The operation completed successfully. */
+	RTE_DMA_STATUS_SUCCESSFUL,
+	/** The operation failed to complete due abort by user.
+	 * This is mainly used when processing dev_stop, user could modidy the
+	 * descriptors (e.g. change one bit to tell hardware abort this job),
+	 * it allows outstanding requests to be complete as much as possible,
+	 * so reduce the time to stop the device.
+	 */
+	RTE_DMA_STATUS_USER_ABORT,
+	/** The operation failed to complete due to following scenarios:
+	 * The jobs in a particular batch are not attempted because they
+	 * appeared after a fence where a previous job failed. In some HW
+	 * implementation it's possible for jobs from later batches would be
+	 * completed, though, so report the status from the not attempted jobs
+	 * before reporting those newer completed jobs.
+	 */
+	RTE_DMA_STATUS_NOT_ATTEMPTED,
+	/** The operation failed to complete due invalid source address. */
+	RTE_DMA_STATUS_INVALID_SRC_ADDR,
+	/** The operation failed to complete due invalid destination address. */
+	RTE_DMA_STATUS_INVALID_DST_ADDR,
+	/** The operation failed to complete due invalid source or destination
+	 * address, cover the case that only knows the address error, but not
+	 * sure which address error.
+	 */
+	RTE_DMA_STATUS_INVALID_ADDR,
+	/** The operation failed to complete due invalid length. */
+	RTE_DMA_STATUS_INVALID_LENGTH,
+	/** The operation failed to complete due invalid opcode.
+	 * The DMA descriptor could have multiple format, which are
+	 * distinguished by the opcode field.
+	 */
+	RTE_DMA_STATUS_INVALID_OPCODE,
+	/** The operation failed to complete due bus read error. */
+	RTE_DMA_STATUS_BUS_READ_ERROR,
+	/** The operation failed to complete due bus write error. */
+	RTE_DMA_STATUS_BUS_WRITE_ERROR,
+	/** The operation failed to complete due bus error, cover the case that
+	 * only knows the bus error, but not sure which direction error.
+	 */
+	RTE_DMA_STATUS_BUS_ERROR,
+	/** The operation failed to complete due data poison. */
+	RTE_DMA_STATUS_DATA_POISION,
+	/** The operation failed to complete due descriptor read error. */
+	RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
+	/** The operation failed to complete due device link error.
+	 * Used to indicates that the link error in the memory-to-device/
+	 * device-to-memory/device-to-device transfer scenario.
+	 */
+	RTE_DMA_STATUS_DEV_LINK_ERROR,
+	/** The operation failed to complete due lookup page fault. */
+	RTE_DMA_STATUS_PAGE_FAULT,
+	/** The operation failed to complete due unknown reason.
+	 * The initial value is 256, which reserves space for future errors.
+	 */
+	RTE_DMA_STATUS_ERROR_UNKNOWN = 0x100,
+};
+
+/**
+ * A structure used to hold scatter-gather DMA operation request entry.
+ *
+ * @see rte_dma_copy_sg
+ */
+struct rte_dma_sge {
+	rte_iova_t addr; /**< The DMA operation address. */
+	uint32_t length; /**< The DMA operation length. */
+};
+
+#include "rte_dmadev_core.h"
+
+/**@{@name DMA operation flag
+ * @see rte_dma_copy()
+ * @see rte_dma_copy_sg()
+ * @see rte_dma_fill()
+ */
+#define RTE_DMA_OP_FLAG_FENCE	RTE_BIT64(0)
+/**< Fence flag.
+ * It means the operation with this flag must be processed only after all
+ * previous operations are completed.
+ * If the specify DMA HW works in-order (it means it has default fence between
+ * operations), this flag could be NOP.
+ */
+#define RTE_DMA_OP_FLAG_SUBMIT	RTE_BIT64(1)
+/**< Submit flag.
+ * It means the operation with this flag must issue doorbell to hardware after
+ * enqueued jobs.
+ */
+#define RTE_DMA_OP_FLAG_LLC	RTE_BIT64(2)
+/**< Write data to low level cache hint.
+ * Used for performance optimization, this is just a hint, and there is no
+ * capability bit for this, driver should not return error if this flag was set.
+ */
+/**@}*/
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a copy operation onto the virtual DMA channel.
+ *
+ * This queues up a copy operation to be performed by hardware, if the 'flags'
+ * parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell to begin
+ * this operation, otherwise do not trigger doorbell.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param src
+ *   The address of the source buffer.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the data to be copied.
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+static inline int
+rte_dma_copy(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
+	     uint32_t length, uint64_t flags)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || length == 0)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*obj->copy, -ENOTSUP);
+#endif
+
+	return (*obj->copy)(obj->dev_private, vchan, src, dst, length, flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a scatter-gather list copy operation onto the virtual DMA channel.
+ *
+ * This queues up a scatter-gather list copy operation to be performed by
+ * hardware, if the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then
+ * trigger doorbell to begin this operation, otherwise do not trigger doorbell.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param src
+ *   The pointer of source scatter-gather entry array.
+ * @param dst
+ *   The pointer of destination scatter-gather entry array.
+ * @param nb_src
+ *   The number of source scatter-gather entry.
+ *   @see struct rte_dma_info::max_sges
+ * @param nb_dst
+ *   The number of destination scatter-gather entry.
+ *   @see struct rte_dma_info::max_sges
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+static inline int
+rte_dma_copy_sg(int16_t dev_id, uint16_t vchan, struct rte_dma_sge *src,
+		struct rte_dma_sge *dst, uint16_t nb_src, uint16_t nb_dst,
+		uint64_t flags)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || src == NULL || dst == NULL ||
+	    nb_src == 0 || nb_dst == 0)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*obj->copy_sg, -ENOTSUP);
+#endif
+
+	return (*obj->copy_sg)(obj->dev_private, vchan, src, dst, nb_src,
+			       nb_dst, flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a fill operation onto the virtual DMA channel.
+ *
+ * This queues up a fill operation to be performed by hardware, if the 'flags'
+ * parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell to begin
+ * this operation, otherwise do not trigger doorbell.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param pattern
+ *   The pattern to populate the destination buffer with.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the destination buffer.
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+static inline int
+rte_dma_fill(int16_t dev_id, uint16_t vchan, uint64_t pattern,
+	     rte_iova_t dst, uint32_t length, uint64_t flags)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || length == 0)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*obj->fill, -ENOTSUP);
+#endif
+
+	return (*obj->fill)(obj->dev_private, vchan, pattern, dst, length,
+			    flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Trigger hardware to begin performing enqueued operations.
+ *
+ * This API is used to write the "doorbell" to the hardware to trigger it
+ * to begin the operations previously enqueued by rte_dma_copy/fill().
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+static inline int
+rte_dma_submit(int16_t dev_id, uint16_t vchan)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id))
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*obj->submit, -ENOTSUP);
+#endif
+
+	return (*obj->submit)(obj->dev_private, vchan);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return the number of operations that have been successfully completed.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param nb_cpls
+ *   The maximum number of completed operations that can be processed.
+ * @param[out] last_idx
+ *   The last completed operation's ring_idx.
+ *   If not required, NULL can be passed in.
+ * @param[out] has_error
+ *   Indicates if there are transfer error.
+ *   If not required, NULL can be passed in.
+ *
+ * @return
+ *   The number of operations that successfully completed. This return value
+ *   must be less than or equal to the value of nb_cpls.
+ */
+__rte_experimental
+static inline uint16_t
+rte_dma_completed(int16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
+		  uint16_t *last_idx, bool *has_error)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+	uint16_t idx;
+	bool err;
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || nb_cpls == 0)
+		return 0;
+	RTE_FUNC_PTR_OR_ERR_RET(*obj->completed, 0);
+#endif
+
+	/* Ensure the pointer values are non-null to simplify drivers.
+	 * In most cases these should be compile time evaluated, since this is
+	 * an inline function.
+	 * - If NULL is explicitly passed as parameter, then compiler knows the
+	 *   value is NULL
+	 * - If address of local variable is passed as parameter, then compiler
+	 *   can know it's non-NULL.
+	 */
+	if (last_idx == NULL)
+		last_idx = &idx;
+	if (has_error == NULL)
+		has_error = &err;
+
+	*has_error = false;
+	return (*obj->completed)(obj->dev_private, vchan, nb_cpls, last_idx,
+				 has_error);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return the number of operations that have been completed, and the operations
+ * result may succeed or fail.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param nb_cpls
+ *   Indicates the size of status array.
+ * @param[out] last_idx
+ *   The last completed operation's ring_idx.
+ *   If not required, NULL can be passed in.
+ * @param[out] status
+ *   This is a pointer to an array of length 'nb_cpls' that holds the completion
+ *   status code of each operation.
+ *   @see enum rte_dma_status_code
+ *
+ * @return
+ *   The number of operations that completed. This return value must be less
+ *   than or equal to the value of nb_cpls.
+ *   If this number is greater than zero (assuming n), then n values in the
+ *   status array are also set.
+ */
+__rte_experimental
+static inline uint16_t
+rte_dma_completed_status(int16_t dev_id, uint16_t vchan,
+			 const uint16_t nb_cpls, uint16_t *last_idx,
+			 enum rte_dma_status_code *status)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+	uint16_t idx;
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || nb_cpls == 0 || status == NULL)
+		return 0;
+	RTE_FUNC_PTR_OR_ERR_RET(*obj->completed_status, 0);
+#endif
+
+	if (last_idx == NULL)
+		last_idx = &idx;
+
+	return (*obj->completed_status)(obj->dev_private, vchan, nb_cpls,
+					last_idx, status);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
new file mode 100644
index 0000000000..be08faa202
--- /dev/null
+++ b/lib/dmadev/rte_dmadev_core.h
@@ -0,0 +1,78 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 HiSilicon Limited
+ * Copyright(c) 2021 Intel Corporation
+ */
+
+#ifndef RTE_DMADEV_CORE_H
+#define RTE_DMADEV_CORE_H
+
+/**
+ * @file
+ *
+ * DMA Device internal header.
+ *
+ * This header contains internal data types which are used by dataplane inline
+ * function.
+ *
+ * Applications should not use these functions directly.
+ */
+
+/** @internal Used to enqueue a copy operation. */
+typedef int (*rte_dma_copy_t)(void *dev_private, uint16_t vchan,
+			      rte_iova_t src, rte_iova_t dst,
+			      uint32_t length, uint64_t flags);
+
+/** @internal Used to enqueue a scatter-gather list copy operation. */
+typedef int (*rte_dma_copy_sg_t)(void *dev_private, uint16_t vchan,
+				 const struct rte_dma_sge *src,
+				 const struct rte_dma_sge *dst,
+				 uint16_t nb_src, uint16_t nb_dst,
+				 uint64_t flags);
+
+/** @internal Used to enqueue a fill operation. */
+typedef int (*rte_dma_fill_t)(void *dev_private, uint16_t vchan,
+			      uint64_t pattern, rte_iova_t dst,
+			      uint32_t length, uint64_t flags);
+
+/** @internal Used to trigger hardware to begin working. */
+typedef int (*rte_dma_submit_t)(void *dev_private, uint16_t vchan);
+
+/** @internal Used to return number of successful completed operations. */
+typedef uint16_t (*rte_dma_completed_t)(void *dev_private,
+				uint16_t vchan, const uint16_t nb_cpls,
+				uint16_t *last_idx, bool *has_error);
+
+/** @internal Used to return number of completed operations. */
+typedef uint16_t (*rte_dma_completed_status_t)(void *dev_private,
+			uint16_t vchan, const uint16_t nb_cpls,
+			uint16_t *last_idx, enum rte_dma_status_code *status);
+
+/**
+ * @internal
+ * Fast-path dmadev functions and related data are hold in a flat array.
+ * One entry per dmadev.
+ *
+ * On 64-bit systems contents of this structure occupy exactly two 64B lines.
+ * On 32-bit systems contents of this structure fits into one 64B line.
+ *
+ * The 'dev_private' field was placed in the first cache line to optimize
+ * performance because the PMD driver mainly depends on this field.
+ */
+struct rte_dma_fp_object {
+	void *dev_private; /**< PMD-specific private data. */
+	rte_dma_copy_t             copy;
+	rte_dma_copy_sg_t          copy_sg;
+	rte_dma_fill_t             fill;
+	rte_dma_submit_t           submit;
+	rte_dma_completed_t        completed;
+	rte_dma_completed_status_t completed_status;
+	void *reserved_cl0;
+	/** Reserve space for future IO functions, while keeping data and
+	 * dev_ops pointers on the second cacheline.
+	 */
+	void *reserved_cl1[6];
+} __rte_cache_aligned;
+
+extern struct rte_dma_fp_object *rte_dma_fp_objs;
+
+#endif /* RTE_DMADEV_CORE_H */
diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
index 5fcf0f60b8..07056b45e7 100644
--- a/lib/dmadev/rte_dmadev_pmd.h
+++ b/lib/dmadev/rte_dmadev_pmd.h
@@ -75,6 +75,13 @@  struct rte_dma_dev_ops {
 	rte_dma_stats_reset_t      stats_reset;
 
 	rte_dma_dump_t             dev_dump;
+
+	rte_dma_copy_t             copy;
+	rte_dma_copy_sg_t          copy_sg;
+	rte_dma_fill_t             fill;
+	rte_dma_submit_t           submit;
+	rte_dma_completed_t        completed;
+	rte_dma_completed_status_t completed_status;
 };
 /**
  * Possible states of a DMA device.
diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
index e925dfcd6d..4d40104689 100644
--- a/lib/dmadev/version.map
+++ b/lib/dmadev/version.map
@@ -2,10 +2,15 @@  EXPERIMENTAL {
 	global:
 
 	rte_dma_close;
+	rte_dma_completed;
+	rte_dma_completed_status;
 	rte_dma_configure;
+	rte_dma_copy;
+	rte_dma_copy_sg;
 	rte_dma_count_avail;
 	rte_dma_dev_max;
 	rte_dma_dump;
+	rte_dma_fill;
 	rte_dma_get_dev_id_by_name;
 	rte_dma_info_get;
 	rte_dma_is_valid;
@@ -13,6 +18,7 @@  EXPERIMENTAL {
 	rte_dma_stats_get;
 	rte_dma_stats_reset;
 	rte_dma_stop;
+	rte_dma_submit;
 	rte_dma_vchan_setup;
 
 	local: *;