[v1,1/3] dmadev: add inter-domain operations

Message ID 8866a5c7ea36e476b2a92e3e4cea6c2c127ab82f.1691768110.git.anatoly.burakov@intel.com (mailing list archive)
State Rejected
Delegated to: Thomas Monjalon
Headers
Series Add support for inter-domain DMA operations |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Anatoly Burakov Aug. 11, 2023, 4:14 p.m. UTC
Add a flag to indicate that a specific device supports inter-domain
operations, and add an API for inter-domain copy and fill.

Inter-domain operation is an operation that is very similar to regular
DMA operation, except either source or destination addresses can be in a
different process's address space, indicated by source and destination
handle values. These values are currently meant to be provided by
private drivers' API's.

This commit also adds a controller ID field into the DMA device API.
This is an arbitrary value that may not be implemented by hardware, but
it is meant to represent some kind of device hierarchy.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/prog_guide/dmadev.rst |  18 +++++
 lib/dmadev/rte_dmadev.c          |   2 +
 lib/dmadev/rte_dmadev.h          | 133 +++++++++++++++++++++++++++++++
 lib/dmadev/rte_dmadev_core.h     |  12 +++
 4 files changed, 165 insertions(+)
  

Comments

Anoob Joseph Aug. 18, 2023, 8:08 a.m. UTC | #1
Hi Anatoly,

Marvell CNXK DMA hardware also supports this feature, and it would be a good feature to add. Thanks for introducing the feature. Please see inline.

Thanks,
Anoob

> -----Original Message-----
> From: Anatoly Burakov <anatoly.burakov@intel.com>
> Sent: Friday, August 11, 2023 9:45 PM
> To: dev@dpdk.org; Chengwen Feng <fengchengwen@huawei.com>; Kevin
> Laatz <kevin.laatz@intel.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Subject: [EXT] [PATCH v1 1/3] dmadev: add inter-domain operations
> 
> External Email
> 
> ----------------------------------------------------------------------
> Add a flag to indicate that a specific device supports inter-domain operations,
> and add an API for inter-domain copy and fill.
> 
> Inter-domain operation is an operation that is very similar to regular DMA
> operation, except either source or destination addresses can be in a
> different process's address space, indicated by source and destination
> handle values. These values are currently meant to be provided by private
> drivers' API's.
> 
> This commit also adds a controller ID field into the DMA device API.
> This is an arbitrary value that may not be implemented by hardware, but it is
> meant to represent some kind of device hierarchy.
> 
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  doc/guides/prog_guide/dmadev.rst |  18 +++++
>  lib/dmadev/rte_dmadev.c          |   2 +
>  lib/dmadev/rte_dmadev.h          | 133
> +++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h     |  12 +++
>  4 files changed, 165 insertions(+)
> 
<snip>

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue an inter-domain copy operation.
> + *
> + * This queues up an inter-domain copy operation to be performed by
> +hardware, if
> + * the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger
> +doorbell
> + * to begin this operation, otherwise do not trigger doorbell.
> + *
> + * The source and destination handle parameters are arbitrary opaque
> +values,
> + * currently meant to be provided by private device driver API's. If
> +the source
> + * handle value is meaningful, RTE_DMA_OP_FLAG_SRC_HANDLE flag must
> be set.
> + * Similarly, if the destination handle value is meaningful,
> + * RTE_DMA_OP_FLAG_DST_HANDLE flag must be set. Source and
> destination
> +handle
> + * values are meant to provide information to the hardware about source
> +and/or
> + * destination PASID for the inter-domain copy operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer (if `src_handle` is set, source address
> + *   will be in address space of process referred to by source handle).
> + * @param dst
> + *   The address of the destination buffer (if `dst_handle` is set, destination
> + *   address will be in address space of process referred to by destination
> + *   handle).
> + * @param length
> + *   The length of the data to be copied.
> + * @param src_handle
> + *   Source handle value (if used, RTE_DMA_OP_FLAG_SRC_HANDLE flag
> must be set).
> + * @param dst_handle
> + *   Destination handle value (if used, RTE_DMA_OP_FLAG_DST_HANDLE
> flag must be
> + *   set).
> + * @param flags
> + *   Flags for this operation.
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued job.
> + *   - -ENOSPC: if no space left to enqueue.
> + *   - other values < 0 on failure.
> + */
> +__rte_experimental
> +static inline int
> +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
> +		rte_iova_t dst, uint32_t length, uint16_t src_handle,
> +		uint16_t dst_handle, uint64_t flags)
> +{

[Anoob] Won't this lead to duplication of all datapath APIs? Also, this approach assumes that 'inter-domain' operations always support run-time setting of 'src_handle' and 'dst_handle' within one DMA channel, which need not be supported by all platforms.

Can we move this 'src_handle' and 'dst_handle' registration to rte_dma_vchan_setup so that the 'src_handle' and 'dst_handle' can be configured in control path and the existing datapath APIs can work as is. The op flags (that is proposed) can be used to determine whether 'inter-domain' operation is requested. Having a fixed 'src_handle' & 'dst_handle' per vchan would be better for performance as well.

<snip>
  
fengchengwen Oct. 8, 2023, 2:33 a.m. UTC | #2
Hi Anatoly,

On 2023/8/12 0:14, Anatoly Burakov wrote:
> Add a flag to indicate that a specific device supports inter-domain
> operations, and add an API for inter-domain copy and fill.
> 
> Inter-domain operation is an operation that is very similar to regular
> DMA operation, except either source or destination addresses can be in a
> different process's address space, indicated by source and destination
> handle values. These values are currently meant to be provided by
> private drivers' API's.
> 
> This commit also adds a controller ID field into the DMA device API.
> This is an arbitrary value that may not be implemented by hardware, but
> it is meant to represent some kind of device hierarchy.
> 
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

...

> +__rte_experimental
> +static inline int
> +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
> +		rte_iova_t dst, uint32_t length, uint16_t src_handle,
> +		uint16_t dst_handle, uint64_t flags)

I would suggest add more general extension:
rte_dma_copy*(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
              uint32_t length, uint64_t flags, void *param)
The param only valid under some flags bits.
As for this inter-domain extension: we could define inter-domain param struct.


Whether add in current rte_dma_copy() API or add one new API, I think it mainly
depend on performance impact of parameter transfer. Suggest more discuss for
differnt platform and call specification.


And last, Could you introduce the application scenarios of this feature?


Thanks.
  
Jerin Jacob Oct. 9, 2023, 5:05 a.m. UTC | #3
On Sun, Oct 8, 2023 at 8:03 AM fengchengwen <fengchengwen@huawei.com> wrote:
>
> Hi Anatoly,
>
> On 2023/8/12 0:14, Anatoly Burakov wrote:
> > Add a flag to indicate that a specific device supports inter-domain
> > operations, and add an API for inter-domain copy and fill.
> >
> > Inter-domain operation is an operation that is very similar to regular
> > DMA operation, except either source or destination addresses can be in a
> > different process's address space, indicated by source and destination
> > handle values. These values are currently meant to be provided by
> > private drivers' API's.
> >
> > This commit also adds a controller ID field into the DMA device API.
> > This is an arbitrary value that may not be implemented by hardware, but
> > it is meant to represent some kind of device hierarchy.
> >
> > Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
>
> ...
>
> > +__rte_experimental
> > +static inline int
> > +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
> > +             rte_iova_t dst, uint32_t length, uint16_t src_handle,
> > +             uint16_t dst_handle, uint64_t flags)
>
> I would suggest add more general extension:
> rte_dma_copy*(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
>               uint32_t length, uint64_t flags, void *param)
> The param only valid under some flags bits.
> As for this inter-domain extension: we could define inter-domain param struct.
>
>
> Whether add in current rte_dma_copy() API or add one new API, I think it mainly
> depend on performance impact of parameter transfer. Suggest more discuss for
> differnt platform and call specification.

Or move src_handle/dst_hanel to vchan config to enable better performance.
Application create N number of vchan based on the requirements.

>
>
> And last, Could you introduce the application scenarios of this feature?

Looks like VM to VM or container to container copy.

>
>
> Thanks.
>
  
Vladimir Medvedkin Oct. 27, 2023, 1:46 p.m. UTC | #4
Hi Satananda, Anoob, Chengwen, Jerin, all,

After a number of internal discussions we have decided that we're going 
to postpone this feature/patchset till next release.

 >[Satananda] Have you considered extending  rte_dma_port_param and 
rte_dma_vchan_conf to represent interdomain memory transfer setup as a 
separate port type like RTE_DMA_PORT_INTER_DOMAIN ?

 >[Anoob] Can we move this 'src_handle' and 'dst_handle' registration to 
rte_dma_vchan_setup so that the 'src_handle' and 'dst_handle' can be 
configured in control path and the existing datapath APIs can work as is.

 >[Jerin] Or move src_handle/dst_hanel to vchan config

We've listened to feedback on implementation, and have prototyped a 
vchan-based interface. This has a number of advantages and 
disadvantages, both in terms of API usage and in terms of our specific 
driver.

Setting up inter-domain operations as separate vchans allow us to store 
data inside the PMD and not duplicate any API paths, so having multiple 
vchans addresses that problem. However, this also means that any new 
vchans added while the PMD is active (such as attaching to a new 
process) will have to be gated by start/stop. This is probably fine from 
API point of view, but a hassle for user (previously, we could've just 
started using the new inter-domain handle right away).

Another usability issue with multiple vchan approach is that now, each 
vchan will have its own enqueue/submit/completion cycle, so any use case 
relying on one thread communicating with many processes will have to 
process each vchan separately, instead of everything going into one 
vchan - again, looks fine API-wise, but a hassle for the user, since 
this requires calling submit and completion for each vchan, and in some 
cases it requires maintaining some kind of reordering queue. (On the 
other hand, it would be much easier to separate operations intended for 
different processes with this approach, so perhaps this is not such a 
big issue)

Finally, there is also an IDXD-specific issue. Currently, IDXD HW 
acceleration is implemented in such a way that each work queue will have 
a unique DMA device ID (rather than a unique vchan), and each device can 
technically process requests for both local and remote memory (local to 
remote, remote to local, remote to remote), all in one queue - as it was 
in our original implementation.

By changing implementation to use vchans, we're essentially bifurcating 
this single queue - all vchans would have their own rings etc., but the 
enqueue-to-hardware operation is still common to all vchans, because 
there's a single underlying queue as far as hardware is concerned. The 
queue is atomic in hardware, and technically, ENQCMD instruction returns 
status in case of enqueue failure (such as when too many requests are in 
flight), so technically we could just not pay attention to number of 
in-flight operations and just rely on ENQCMD returning failures to 
handle error/retry, but the problem with this is that this failure is 
only happening on submit, not on enqueue.

So, in essence, with IDXD driver we have two choices: either we 
implement some kind of in-flight counter to prevent our driver from 
submitting too many requests (that is, vchans will have to cooperate - 
use atomics or similar), or every user will have to handle not just 
errors on enqueue, but also on submit (which I don't believe many people 
do currently, even though technically submit can return failure - all 
non-test usage in DPDK seems to assume submit realistically won't fail, 
and I'd like to keep it that way).

We're in process of measuring performance impact of different 
implementations, however I should note that while atomic operations on 
data path are unfortunate, realistically these atomics are accessed only 
at beginning/end of every 'enqueue-submit-complete' cycle, and not on 
every operation. At the first glance where are no observable performance 
penalty in regular use case (assuming we are not calling submit for 
every enqueued job).

 >[Satananda]Do you have usecases where a process from 3rd domain sets 
up transfer between memories from 2 domains? i.e process 1 is src, 
process 2 is dest and process 3 executes transfer.

This usecase is working with proposed API on our hardware.

 >[Chengwen]And last, Could you introduce the application scenarios of 
this feature?

We have used this feature to improve performance for memif driver.


On 09/10/2023 06:05, Jerin Jacob wrote:
> On Sun, Oct 8, 2023 at 8:03 AM fengchengwen <fengchengwen@huawei.com> wrote:
>> Hi Anatoly,
>>
>> On 2023/8/12 0:14, Anatoly Burakov wrote:
>>> Add a flag to indicate that a specific device supports inter-domain
>>> operations, and add an API for inter-domain copy and fill.
>>>
>>> Inter-domain operation is an operation that is very similar to regular
>>> DMA operation, except either source or destination addresses can be in a
>>> different process's address space, indicated by source and destination
>>> handle values. These values are currently meant to be provided by
>>> private drivers' API's.
>>>
>>> This commit also adds a controller ID field into the DMA device API.
>>> This is an arbitrary value that may not be implemented by hardware, but
>>> it is meant to represent some kind of device hierarchy.
>>>
>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> ---
>> ...
>>
>>> +__rte_experimental
>>> +static inline int
>>> +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
>>> +             rte_iova_t dst, uint32_t length, uint16_t src_handle,
>>> +             uint16_t dst_handle, uint64_t flags)
>> I would suggest add more general extension:
>> rte_dma_copy*(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
>>                uint32_t length, uint64_t flags, void *param)
>> The param only valid under some flags bits.
>> As for this inter-domain extension: we could define inter-domain param struct.
>>
>>
>> Whether add in current rte_dma_copy() API or add one new API, I think it mainly
>> depend on performance impact of parameter transfer. Suggest more discuss for
>> differnt platform and call specification.
> Or move src_handle/dst_hanel to vchan config to enable better performance.
> Application create N number of vchan based on the requirements.
>
>>
>> And last, Could you introduce the application scenarios of this feature?
> Looks like VM to VM or container to container copy.
>
>>
>> Thanks.
>>
  
Jerin Jacob Nov. 23, 2023, 5:24 a.m. UTC | #5
On Fri, Oct 27, 2023 at 7:16 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
>
> Hi Satananda, Anoob, Chengwen, Jerin, all,
>
> After a number of internal discussions we have decided that we're going
> to postpone this feature/patchset till next release.
>
>  >[Satananda] Have you considered extending  rte_dma_port_param and
> rte_dma_vchan_conf to represent interdomain memory transfer setup as a
> separate port type like RTE_DMA_PORT_INTER_DOMAIN ?
>
>  >[Anoob] Can we move this 'src_handle' and 'dst_handle' registration to
> rte_dma_vchan_setup so that the 'src_handle' and 'dst_handle' can be
> configured in control path and the existing datapath APIs can work as is.
>
>  >[Jerin] Or move src_handle/dst_hanel to vchan config
>
> We've listened to feedback on implementation, and have prototyped a
> vchan-based interface. This has a number of advantages and
> disadvantages, both in terms of API usage and in terms of our specific
> driver.
>
> Setting up inter-domain operations as separate vchans allow us to store
> data inside the PMD and not duplicate any API paths, so having multiple
> vchans addresses that problem. However, this also means that any new
> vchans added while the PMD is active (such as attaching to a new

This could be mitigated by setup max number of vchan up front before start()
and use as demanded.

> process) will have to be gated by start/stop. This is probably fine from
> API point of view, but a hassle for user (previously, we could've just
> started using the new inter-domain handle right away).
>
> Another usability issue with multiple vchan approach is that now, each
> vchan will have its own enqueue/submit/completion cycle, so any use case
> relying on one thread communicating with many processes will have to
> process each vchan separately, instead of everything going into one
> vchan - again, looks fine API-wise, but a hassle for the user, since
> this requires calling submit and completion for each vchan, and in some
> cases it requires maintaining some kind of reordering queue. (On the
> other hand, it would be much easier to separate operations intended for
> different processes with this approach, so perhaps this is not such a
> big issue)

IMO, The design principle behind vchan was,
-A single HW queue be serving N number of vchan
-A vchan is nothing, but it creates desired HW instruction format as
template in slow path to use in fast path or write some slow path
registers to define the attribute of vchan.

IMO, The above-mentioned usability constraints will be there in all
PMD as vchan is muxing a single HW queue.

IMO, Decision for vchan vs fast path API could be
a) Number of vchan is required - In this case, we are using for VM to
VM or Container to Container copy. So I think, it is limited
b) HW support - Some HW's in order to reduce size of HW descriptor
size, some features will be configured as slow path only(can not be
changed at runtime, without reconfiguring the vchan).
  

Patch

diff --git a/doc/guides/prog_guide/dmadev.rst b/doc/guides/prog_guide/dmadev.rst
index 2aa26d33b8..e4e5196416 100644
--- a/doc/guides/prog_guide/dmadev.rst
+++ b/doc/guides/prog_guide/dmadev.rst
@@ -108,6 +108,24 @@  completed operations along with the status of each operation (filled into the
 completed operation's ``ring_idx`` which could help user track operations within
 their own application-defined rings.
 
+.. _dmadev_inter_dom:
+
+
+Inter-domain operations
+~~~~~~~~~~~~~~~~~~~~~~~
+
+For some devices, inter-domain DMA operations may be supported (indicated by
+`RTE_DMA_CAPA_OPS_INTER_DOM` flag being set in DMA device capabilities flag). An
+inter-domain operation (such as `rte_dma_copy_inter_dom`) is similar to regular
+DMA device operation, except the user also needs to specify source and
+destination handles, which the hardware will then use to get source and/or
+destination PASID to perform the operation. When `src_handle` value is set,
+`RTE_DMA_OP_FLAG_SRC_HANDLE` op flag must also be set. Similarly, when
+`dst_handle` value is set, `RTE_DMA_OP_FLAG_DST_HANDLE` op flag must be set.
+
+Currently, source and destination handles are opaque values the user has to get
+from private API's of those DMA device drivers that support the operation.
+
 
 Querying Device Statistics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index 8c095e1f35..ff00612f84 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -425,6 +425,8 @@  rte_dma_info_get(int16_t dev_id, struct rte_dma_info *dev_info)
 	if (*dev->dev_ops->dev_info_get == NULL)
 		return -ENOTSUP;
 	memset(dev_info, 0, sizeof(struct rte_dma_info));
+	/* set to -1 by default, as other drivers may not implement this */
+	dev_info->controller_id = -1;
 	ret = (*dev->dev_ops->dev_info_get)(dev, dev_info,
 					    sizeof(struct rte_dma_info));
 	if (ret != 0)
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
index e61d71959e..1cad36f0b6 100644
--- a/lib/dmadev/rte_dmadev.h
+++ b/lib/dmadev/rte_dmadev.h
@@ -278,6 +278,8 @@  int16_t rte_dma_next_dev(int16_t start_dev_id);
 #define RTE_DMA_CAPA_OPS_COPY_SG	RTE_BIT64(33)
 /** Support fill operation. */
 #define RTE_DMA_CAPA_OPS_FILL		RTE_BIT64(34)
+/** Support inter-domain operation. */
+#define RTE_DMA_CAPA_OPS_INTER_DOM	RTE_BIT64(48)
 /**@}*/
 
 /**
@@ -307,6 +309,8 @@  struct rte_dma_info {
 	int16_t numa_node;
 	/** Number of virtual DMA channel configured. */
 	uint16_t nb_vchans;
+	/** Controller ID, -1 if unknown */
+	int16_t controller_id;
 };
 
 /**
@@ -819,6 +823,16 @@  struct rte_dma_sge {
  * capability bit for this, driver should not return error if this flag was set.
  */
 #define RTE_DMA_OP_FLAG_LLC     RTE_BIT64(2)
+/** Source handle is set.
+ * Used for inter-domain operations to indicate source handle value will be
+ * meaningful and can be used by hardware to learn source PASID.
+ */
+#define RTE_DMA_OP_FLAG_SRC_HANDLE RTE_BIT64(16)
+/** Destination handle is set.
+ * Used for inter-domain operations to indicate destination handle value will be
+ * meaningful and can be used by hardware to learn destination PASID.
+ */
+#define RTE_DMA_OP_FLAG_DST_HANDLE RTE_BIT64(17)
 /**@}*/
 
 /**
@@ -1141,6 +1155,125 @@  rte_dma_burst_capacity(int16_t dev_id, uint16_t vchan)
 	return (*obj->burst_capacity)(obj->dev_private, vchan);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue an inter-domain copy operation.
+ *
+ * This queues up an inter-domain copy operation to be performed by hardware, if
+ * the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell
+ * to begin this operation, otherwise do not trigger doorbell.
+ *
+ * The source and destination handle parameters are arbitrary opaque values,
+ * currently meant to be provided by private device driver API's. If the source
+ * handle value is meaningful, RTE_DMA_OP_FLAG_SRC_HANDLE flag must be set.
+ * Similarly, if the destination handle value is meaningful,
+ * RTE_DMA_OP_FLAG_DST_HANDLE flag must be set. Source and destination handle
+ * values are meant to provide information to the hardware about source and/or
+ * destination PASID for the inter-domain copy operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param src
+ *   The address of the source buffer (if `src_handle` is set, source address
+ *   will be in address space of process referred to by source handle).
+ * @param dst
+ *   The address of the destination buffer (if `dst_handle` is set, destination
+ *   address will be in address space of process referred to by destination
+ *   handle).
+ * @param length
+ *   The length of the data to be copied.
+ * @param src_handle
+ *   Source handle value (if used, RTE_DMA_OP_FLAG_SRC_HANDLE flag must be set).
+ * @param dst_handle
+ *   Destination handle value (if used, RTE_DMA_OP_FLAG_DST_HANDLE flag must be
+ *   set).
+ * @param flags
+ *   Flags for this operation.
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+static inline int
+rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
+		rte_iova_t dst, uint32_t length, uint16_t src_handle,
+		uint16_t dst_handle, uint64_t flags)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || length == 0)
+		return -EINVAL;
+	if (*obj->copy_inter_dom == NULL)
+		return -ENOTSUP;
+#endif
+	return (*obj->copy_inter_dom)(obj->dev_private, vchan, src, dst, length,
+			src_handle, dst_handle, flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue an inter-domain fill operation.
+ *
+ * This queues up an inter-domain fill operation to be performed by hardware, if
+ * the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell
+ * to begin this operation, otherwise do not trigger doorbell.
+ *
+ * The source and destination handle parameters are arbitrary opaque values,
+ * currently meant to be provided by private device driver API's. If the source
+ * handle value is meaningful, RTE_DMA_OP_FLAG_SRC_HANDLE flag must be set.
+ * Similarly, if the destination handle value is meaningful,
+ * RTE_DMA_OP_FLAG_DST_HANDLE flag must be set. Source and destination handle
+ * values are meant to provide information to the hardware about source and/or
+ * destination PASID for the inter-domain fill operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param pattern
+ *   The pattern to populate the destination buffer with.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the destination buffer.
+ * @param dst_handle
+ *   Destination handle value (if used, RTE_DMA_OP_FLAG_DST_HANDLE flag must be
+ *   set).
+ * @param flags
+ *   Flags for this operation.
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+static inline int
+rte_dma_fill_inter_dom(int16_t dev_id, uint16_t vchan, uint64_t pattern,
+		rte_iova_t dst, uint32_t length, uint16_t dst_handle,
+		uint64_t flags)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || length == 0)
+		return -EINVAL;
+	if (*obj->fill_inter_dom == NULL)
+		return -ENOTSUP;
+#endif
+
+	return (*obj->fill_inter_dom)(obj->dev_private, vchan, pattern, dst,
+			length, dst_handle, flags);
+}
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
index 064785686f..b3a020f9de 100644
--- a/lib/dmadev/rte_dmadev_core.h
+++ b/lib/dmadev/rte_dmadev_core.h
@@ -50,6 +50,16 @@  typedef uint16_t (*rte_dma_completed_status_t)(void *dev_private,
 /** @internal Used to check the remaining space in descriptor ring. */
 typedef uint16_t (*rte_dma_burst_capacity_t)(const void *dev_private, uint16_t vchan);
 
+/** @internal Used to enqueue an inter-domain copy operation. */
+typedef int (*rte_dma_copy_inter_dom_t)(void *dev_private, uint16_t vchan,
+			rte_iova_t src, rte_iova_t dst,	unsigned int length,
+			uint16_t src_handle, uint16_t dst_handle, uint64_t flags);
+/** @internal Used to enqueue an inter-domain fill operation. */
+typedef int (*rte_dma_fill_inter_dom_t)(void *dev_private, uint16_t vchan,
+			uint64_t pattern, rte_iova_t dst, uint32_t length,
+			uint16_t dst_handle, uint64_t flags);
+
+
 /**
  * @internal
  * Fast-path dmadev functions and related data are hold in a flat array.
@@ -73,6 +83,8 @@  struct rte_dma_fp_object {
 	rte_dma_completed_t        completed;
 	rte_dma_completed_status_t completed_status;
 	rte_dma_burst_capacity_t   burst_capacity;
+	rte_dma_copy_inter_dom_t   copy_inter_dom;
+	rte_dma_fill_inter_dom_t   fill_inter_dom;
 } __rte_aligned(128);
 
 extern struct rte_dma_fp_object *rte_dma_fp_objs;