[v21,1/7] dmadev: introduce DMA device library public APIs

Message ID 20210907125649.49794-2-fengchengwen@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series support dmadev |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

fengchengwen Sept. 7, 2021, 12:56 p.m. UTC
  The 'dmadevice' is a generic type of DMA device.

This patch introduce the 'dmadevice' public APIs which expose generic
operations that can enable configuration and I/O with the DMA devices.

Maintainers update is also included in this patch.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Jerin Jacob <jerinjacobk@gmail.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
---
 MAINTAINERS                            |   4 +
 doc/api/doxy-api-index.md              |   1 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/rel_notes/release_21_11.rst |   5 +
 lib/dmadev/meson.build                 |   4 +
 lib/dmadev/rte_dmadev.h                | 951 +++++++++++++++++++++++++
 lib/dmadev/version.map                 |  24 +
 lib/meson.build                        |   1 +
 8 files changed, 991 insertions(+)
 create mode 100644 lib/dmadev/meson.build
 create mode 100644 lib/dmadev/rte_dmadev.h
 create mode 100644 lib/dmadev/version.map
  

Comments

Thomas Monjalon Sept. 9, 2021, 10:33 a.m. UTC | #1
Hi,

I am having a surface look at the API.
I hope we can do better than previous libs.

07/09/2021 14:56, Chengwen Feng:
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
>  
> +DMA device API - EXPERIMENTAL
> +M: Chengwen Feng <fengchengwen@huawei.com>
> +F: lib/dmadev/


I think it should before (preferably) or after eventdev,
but let's keep rawdev as the last one.

Then please apply the same order in other non-alphabetical lists (doc, meson, etc).

> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + * Copyright(c) 2021 Marvell International Ltd.
> + * Copyright(c) 2021 SmartShare Systems.
> + */

No need for final dot in copyright lines.

> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_

No need for surrounding underscores.

> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.

RTE has no meaning when used in a sentence.
And given it is a DPDK library, you don't really need to specify.
I would also remove the final "s" as the library is one interface.

> + *
> + * The DMA framework is built on the following model:
> + *
> + *     ---------------   ---------------       ---------------
> + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> + *     | channel     |   | channel     |       | channel     |
> + *     ---------------   ---------------       ---------------
> + *            |                |                      |
> + *            ------------------                      |
> + *                     |                              |
> + *               ------------                    ------------
> + *               |  dmadev  |                    |  dmadev  |
> + *               ------------                    ------------
> + *                     |                              |
> + *            ------------------               ------------------
> + *            | HW-DMA-channel |               | HW-DMA-channel |
> + *            ------------------               ------------------
> + *                     |                              |
> + *                     --------------------------------
> + *                                     |
> + *                           ---------------------
> + *                           | HW-DMA-Controller |
> + *                           ---------------------

You don't hyphens between the words I think.

> + *
> + * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
> + * each HW-DMA-channel should be represented by a dmadev.
> + *
> + * The dmadev could create multiple virtual DMA channels, each virtual DMA
> + * channel represents a different transfer context. The DMA operation request
> + * must be submitted to the virtual DMA channel. e.g. Application could create
> + * virtual DMA channel 0 for memory-to-memory transfer scenario, and create
> + * virtual DMA channel 1 for memory-to-device transfer scenario.

What is the benefit of virtual channels compared to have separate dmadevs
for the same HW channel?

> + *
> + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> + * PCI/SoC device probing phase performed at EAL initialization time. And could

Not clear what is this phase. Do you mean bus probing?

> + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> + * phase.

Again what is this phase?
I think freeing should be done only via the "close" function.

> + *
> + * This framework uses 'uint16_t dev_id' as the device identifier of a dmadev,
> + * and 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.

I think it is better to use signed int16_t so you can express "none" in the API,
which can simplify some functions and error management.

> + *
> + * The functions exported by the dmadev API to setup a device designated by its
> + * device identifier must be invoked in the following order:
> + *     - rte_dmadev_configure()
> + *     - rte_dmadev_vchan_setup()
> + *     - rte_dmadev_start()
> + *
> + * Then, the application can invoke dataplane APIs to process jobs.

You mean "dataplane functions".

> + *
> + * If the application wants to change the configuration (i.e. invoke
> + * rte_dmadev_configure() or rte_dmadev_vchan_setup()), it must invoke
> + * rte_dmadev_stop() first to stop the device and then do the reconfiguration
> + * before invoking rte_dmadev_start() again. The dataplane APIs should not be

again, APIs -> functions

> + * invoked when the device is stopped.
> + *
> + * Finally, an application can close a dmadev by invoking the
> + * rte_dmadev_close() function.
> + *
> + * The dataplane APIs include two parts:
> + * The first part is the submission of operation requests:
> + *     - rte_dmadev_copy()
> + *     - rte_dmadev_copy_sg()
> + *     - rte_dmadev_fill()
> + *     - rte_dmadev_submit()
> + *
> + * These APIs could work with different virtual DMA channels which have
> + * different contexts.
> + *
> + * The first three APIs are used to submit the operation request to the virtual
> + * DMA channel, if the submission is successful, an uint16_t ring_idx is
> + * returned, otherwise a negative number is returned.

unsigned or negative? looks weird.

> + *
> + * The last API was used to issue doorbell to hardware, and also there are flags
> + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
> + * same work.

I don't understand this sentence.
You mean rte_dmadev_submit function?
Why past tense "was"?
Why having a redundant function?

> + *
> + * The second part is to obtain the result of requests:
> + *     - rte_dmadev_completed()
> + *         - return the number of operation requests completed successfully.
> + *     - rte_dmadev_completed_status()
> + *         - return the number of operation requests completed.
> + *
> + * @note The two completed APIs also support return the last completed
> + * operation's ring_idx.

Not sure why this note here.

> + * @note If the dmadev works in silent mode (@see RTE_DMADEV_CAPA_SILENT),
> + * application does not invoke the above two completed APIs.
> + *
> + * About the ring_idx which enqueue APIs (e.g. rte_dmadev_copy()
> + * rte_dmadev_fill()) returned, the rules are as follows:

I feel a word is missing above.

> + *     - ring_idx for each virtual DMA channel are independent.
> + *     - For a virtual DMA channel, the ring_idx is monotonically incremented,
> + *       when it reach UINT16_MAX, it wraps back to zero.
> + *     - This ring_idx can be used by applications to track per-operation
> + *       metadata in an application-defined circular ring.
> + *     - The initial ring_idx of a virtual DMA channel is zero, after the
> + *       device is stopped, the ring_idx needs to be reset to zero.
> + *
> + * One example:
> + *     - step-1: start one dmadev
> + *     - step-2: enqueue a copy operation, the ring_idx return is 0
> + *     - step-3: enqueue a copy operation again, the ring_idx return is 1
> + *     - ...
> + *     - step-101: stop the dmadev
> + *     - step-102: start the dmadev
> + *     - step-103: enqueue a copy operation, the ring_idx return is 0
> + *     - ...
> + *     - step-x+0: enqueue a fill operation, the ring_idx return is 65535
> + *     - step-x+1: enqueue a copy operation, the ring_idx return is 0
> + *     - ...
> + *
> + * The DMA operation address used in enqueue APIs (i.e. rte_dmadev_copy(),
> + * rte_dmadev_copy_sg(), rte_dmadev_fill()) defined as rte_iova_t type. The

"is" defined

> + * dmadev supports two types of address: memory address and device address.

Please try to start new sentences on a new line.

> + *
> + * - memory address: the source and destination address of the memory-to-memory
> + * transfer type, or the source address of the memory-to-device transfer type,
> + * or the destination address of the device-to-memory transfer type.
> + * @note If the device support SVA (@see RTE_DMADEV_CAPA_SVA), the memory
> + * address can be any VA address, otherwise it must be an IOVA address.
> + *
> + * - device address: the source and destination address of the device-to-device
> + * transfer type, or the source address of the device-to-memory transfer type,
> + * or the destination address of the memory-to-device transfer type.
> + *
> + * By default, all the functions of the dmadev API exported by a PMD are

What do you mean "by default"? Is there some exceptions?

> + * lock-free functions which assume to not be invoked in parallel on different
> + * logical cores to work on the same target dmadev object.
> + * @note Different virtual DMA channels on the same dmadev *DO NOT* support
> + * parallel invocation because these virtual DMA channels share the same
> + * HW-DMA-channel.
> + *
> + */
[...]
> +#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN

Why not using RTE_DEV_NAME_MAX_LEN directly?
If you keep it, it should be commented, explaining whether it takes '\0'
into account or not.

> +__rte_experimental
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id);

I would suggest dropping the final "_dev" in the function name.


> +uint16_t
> +rte_dmadev_count(void);

It would be safer to name it rte_dmadev_count_avail
in case we need new kind of device count later.

> +
> +/* Enumerates DMA device capabilities. */

You should group them with a doxygen group syntax.
See https://patches.dpdk.org/project/dpdk/patch/20210830104232.598703-1-thomas@monjalon.net/

> +#define RTE_DMADEV_CAPA_MEM_TO_MEM	(1ull << 0)

Please use RTE_BIT macro (32 or 64).

> +/**< DMA device support memory-to-memory transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */

It is preferred to have documentation before the item.

[...]

> +/**
> + * A structure used to retrieve the information of a DMA device.
> + */
> +struct rte_dmadev_info {
> +	struct rte_device *device; /**< Generic Device information. */

Please do not expose this.

> +	uint64_t dev_capa; /**< Device capabilities (RTE_DMADEV_CAPA_*). */
> +	uint16_t max_vchans;
> +	/**< Maximum number of virtual DMA channels supported. */
> +	uint16_t max_desc;
> +	/**< Maximum allowed number of virtual DMA channel descriptors. */
> +	uint16_t min_desc;
> +	/**< Minimum allowed number of virtual DMA channel descriptors. */
> +	uint16_t max_sges;
> +	/**< Maximum number of source or destination scatter-gather entry
> +	 * supported.
> +	 * If the device does not support COPY_SG capability, this value can be
> +	 * zero.
> +	 * If the device supports COPY_SG capability, then rte_dmadev_copy_sg()
> +	 * parameter nb_src/nb_dst should not exceed this value.
> +	 */
> +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */

What about adding NUMA node?

    /* Local NUMA memory ID. -1 if unknown. */
    int16_t numa_node;

> +};

> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);

In .h files, the return type should not be on a separate line.

> +
> +/**
> + * A structure used to configure a DMA device.
> + */

You should mention where it is used with @see.

> +struct rte_dmadev_conf {
> +	uint16_t nb_vchans;
> +	/**< The number of virtual DMA channels to set up for the DMA device.
> +	 * This value cannot be greater than the field 'max_vchans' of struct
> +	 * rte_dmadev_info which get from rte_dmadev_info_get().
> +	 */
> +	bool enable_silent;
> +	/**< Indicates whether to enable silent mode.
> +	 * false-default mode, true-silent mode.
> +	 * This value can be set to true only when the SILENT capability is
> +	 * supported.
> +	 *
> +	 * @see RTE_DMADEV_CAPA_SILENT
> +	 */
> +};
[...]
> +#define RTE_DMADEV_ALL_VCHAN	0xFFFFu

Please do not add this kind of constant without a doxygen comment.


It seems you don't manage the maximum number of devices.
It is fixed in the .c file:
	struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];

Instead I would suggest a more dynamic approach with an init function,
so the application can extend it before calling rte_eal_init.
Please see how it is implemented here:
https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-2-thomas@monjalon.net/

Above patch could also inspire you to start some docs in this patch.


This series add one file per patch.
Instead it would be better to have groups of features per patch,
meaning the implementation and the driver interface should be
in the same patch.
You can split like this:
	1/ device allocation
	2/ configuration and start/stop
	3/ dataplane functions

I would suggest 2 more patches:
	4/ event notification
see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-3-thomas@monjalon.net/
	5/ multi-process
see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-5-thomas@monjalon.net/


Thanks for the work
  
Bruce Richardson Sept. 9, 2021, 11:18 a.m. UTC | #2
On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
> Hi,
> 
> I am having a surface look at the API.
> I hope we can do better than previous libs.
> 
A few bits of feedback on your comments and the API below.

/Bruce

> 07/09/2021 14:56, Chengwen Feng:
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
> >  F: app/test/test_rawdev.c
> >  F: doc/guides/prog_guide/rawdev.rst
> >  
> > +DMA device API - EXPERIMENTAL
> > +M: Chengwen Feng <fengchengwen@huawei.com>
> > +F: lib/dmadev/
> 
> 
> I think it should before (preferably) or after eventdev,
> but let's keep rawdev as the last one.
> 
> Then please apply the same order in other non-alphabetical lists (doc, meson, etc).
> 
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2021 HiSilicon Limited.
> > + * Copyright(c) 2021 Intel Corporation.
> > + * Copyright(c) 2021 Marvell International Ltd.
> > + * Copyright(c) 2021 SmartShare Systems.
> > + */
> 
> No need for final dot in copyright lines.
> 
> > +#ifndef _RTE_DMADEV_H_
> > +#define _RTE_DMADEV_H_
> 
> No need for surrounding underscores.
> 
> > +
> > +/**
> > + * @file rte_dmadev.h
> > + *
> > + * RTE DMA (Direct Memory Access) device APIs.
> 
> RTE has no meaning when used in a sentence.
> And given it is a DPDK library, you don't really need to specify.
> I would also remove the final "s" as the library is one interface.
> 
> > + *
> > + * The DMA framework is built on the following model:
> > + *
> > + *     ---------------   ---------------       ---------------
> > + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> > + *     | channel     |   | channel     |       | channel     |
> > + *     ---------------   ---------------       ---------------
> > + *            |                |                      |
> > + *            ------------------                      |
> > + *                     |                              |
> > + *               ------------                    ------------
> > + *               |  dmadev  |                    |  dmadev  |
> > + *               ------------                    ------------
> > + *                     |                              |
> > + *            ------------------               ------------------
> > + *            | HW-DMA-channel |               | HW-DMA-channel |
> > + *            ------------------               ------------------
> > + *                     |                              |
> > + *                     --------------------------------
> > + *                                     |
> > + *                           ---------------------
> > + *                           | HW-DMA-Controller |
> > + *                           ---------------------
> 
> You don't hyphens between the words I think.
> 
> > + *
> > + * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
> > + * each HW-DMA-channel should be represented by a dmadev.
> > + *
> > + * The dmadev could create multiple virtual DMA channels, each virtual DMA
> > + * channel represents a different transfer context. The DMA operation request
> > + * must be submitted to the virtual DMA channel. e.g. Application could create
> > + * virtual DMA channel 0 for memory-to-memory transfer scenario, and create
> > + * virtual DMA channel 1 for memory-to-device transfer scenario.
> 
> What is the benefit of virtual channels compared to have separate dmadevs
> for the same HW channel?
> 
> > + *
> > + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> > + * PCI/SoC device probing phase performed at EAL initialization time. And could
> 
> Not clear what is this phase. Do you mean bus probing?
> 
> > + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> > + * phase.
> 
> Again what is this phase?
> I think freeing should be done only via the "close" function.
> 
> > + *
> > + * This framework uses 'uint16_t dev_id' as the device identifier of a dmadev,
> > + * and 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> 
> I think it is better to use signed int16_t so you can express "none" in the API,
> which can simplify some functions and error management.
> 
> > + *
> > + * The functions exported by the dmadev API to setup a device designated by its
> > + * device identifier must be invoked in the following order:
> > + *     - rte_dmadev_configure()
> > + *     - rte_dmadev_vchan_setup()
> > + *     - rte_dmadev_start()
> > + *
> > + * Then, the application can invoke dataplane APIs to process jobs.
> 
> You mean "dataplane functions".
> 
> > + *
> > + * If the application wants to change the configuration (i.e. invoke
> > + * rte_dmadev_configure() or rte_dmadev_vchan_setup()), it must invoke
> > + * rte_dmadev_stop() first to stop the device and then do the reconfiguration
> > + * before invoking rte_dmadev_start() again. The dataplane APIs should not be
> 
> again, APIs -> functions
> 
> > + * invoked when the device is stopped.
> > + *
> > + * Finally, an application can close a dmadev by invoking the
> > + * rte_dmadev_close() function.
> > + *
> > + * The dataplane APIs include two parts:
> > + * The first part is the submission of operation requests:
> > + *     - rte_dmadev_copy()
> > + *     - rte_dmadev_copy_sg()
> > + *     - rte_dmadev_fill()
> > + *     - rte_dmadev_submit()
> > + *
> > + * These APIs could work with different virtual DMA channels which have
> > + * different contexts.
> > + *
> > + * The first three APIs are used to submit the operation request to the virtual
> > + * DMA channel, if the submission is successful, an uint16_t ring_idx is
> > + * returned, otherwise a negative number is returned.
> 
> unsigned or negative? looks weird.
> 

May be, but it works well. We could perhaps rephase to make it less weird
though:
"if the submission is successful, a positive ring_idx <= UINT16_MAX is
 returned, otherwise a negative number is returned."

> > + *
> > + * The last API was used to issue doorbell to hardware, and also there are flags
> > + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
> > + * same work.
> 
> I don't understand this sentence.
> You mean rte_dmadev_submit function?
> Why past tense "was"?
> Why having a redundant function?
> 

Just because there are two ways to do something does not mean that one of
them is redundant, as both may be more suitable for different situations.
When enqueuing a set of jobs to the device, having a separate submit
outside a loop makes for clearer code than having a check for the last
iteration inside the loop to set a special submit flag.  However, for cases
where one item alone is to be submitted or there is a small set of jobs to
be submitted sequentially, having a submit flag provides a lower-overhead
way of doing the submission while still keeping the code clean.

> > + *
> > + * The second part is to obtain the result of requests:
> > + *     - rte_dmadev_completed()
> > + *         - return the number of operation requests completed successfully.
> > + *     - rte_dmadev_completed_status()
> > + *         - return the number of operation requests completed.
> > + *
> > + * @note The two completed APIs also support return the last completed
> > + * operation's ring_idx.
> 
> Not sure why this note here.
> 
> > + * @note If the dmadev works in silent mode (@see RTE_DMADEV_CAPA_SILENT),
> > + * application does not invoke the above two completed APIs.
> > + *
> > + * About the ring_idx which enqueue APIs (e.g. rte_dmadev_copy()
> > + * rte_dmadev_fill()) returned, the rules are as follows:
> 
> I feel a word is missing above.
> 
> > + *     - ring_idx for each virtual DMA channel are independent.
> > + *     - For a virtual DMA channel, the ring_idx is monotonically incremented,
> > + *       when it reach UINT16_MAX, it wraps back to zero.
> > + *     - This ring_idx can be used by applications to track per-operation
> > + *       metadata in an application-defined circular ring.
> > + *     - The initial ring_idx of a virtual DMA channel is zero, after the
> > + *       device is stopped, the ring_idx needs to be reset to zero.
> > + *
> > + * One example:
> > + *     - step-1: start one dmadev
> > + *     - step-2: enqueue a copy operation, the ring_idx return is 0
> > + *     - step-3: enqueue a copy operation again, the ring_idx return is 1
> > + *     - ...
> > + *     - step-101: stop the dmadev
> > + *     - step-102: start the dmadev
> > + *     - step-103: enqueue a copy operation, the ring_idx return is 0
> > + *     - ...
> > + *     - step-x+0: enqueue a fill operation, the ring_idx return is 65535
> > + *     - step-x+1: enqueue a copy operation, the ring_idx return is 0
> > + *     - ...
> > + *
> > + * The DMA operation address used in enqueue APIs (i.e. rte_dmadev_copy(),
> > + * rte_dmadev_copy_sg(), rte_dmadev_fill()) defined as rte_iova_t type. The
> 
> "is" defined
> 
> > + * dmadev supports two types of address: memory address and device address.
> 
> Please try to start new sentences on a new line.
> 
> > + *
> > + * - memory address: the source and destination address of the memory-to-memory
> > + * transfer type, or the source address of the memory-to-device transfer type,
> > + * or the destination address of the device-to-memory transfer type.
> > + * @note If the device support SVA (@see RTE_DMADEV_CAPA_SVA), the memory
> > + * address can be any VA address, otherwise it must be an IOVA address.
> > + *
> > + * - device address: the source and destination address of the device-to-device
> > + * transfer type, or the source address of the device-to-memory transfer type,
> > + * or the destination address of the memory-to-device transfer type.
> > + *
> > + * By default, all the functions of the dmadev API exported by a PMD are
> 
> What do you mean "by default"? Is there some exceptions?
> 
> > + * lock-free functions which assume to not be invoked in parallel on different
> > + * logical cores to work on the same target dmadev object.
> > + * @note Different virtual DMA channels on the same dmadev *DO NOT* support
> > + * parallel invocation because these virtual DMA channels share the same
> > + * HW-DMA-channel.
> > + *
> > + */
> [...]
> > +#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN
> 
> Why not using RTE_DEV_NAME_MAX_LEN directly?
> If you keep it, it should be commented, explaining whether it takes '\0'
> into account or not.
> 
> > +__rte_experimental
> > +bool
> > +rte_dmadev_is_valid_dev(uint16_t dev_id);
> 
> I would suggest dropping the final "_dev" in the function name.
> 

The alternative, which I would support, is replacing "rte_dmadev" with
"rte_dma" across the API. This would then become "rte_dma_is_valid_dev"
which is clearer, since the dev is not part of the standard prefix. It also
would fit in with a possible future function of "rte_dma_is_valid_vchan"
for instance.

> 
> > +uint16_t
> > +rte_dmadev_count(void);
> 
> It would be safer to name it rte_dmadev_count_avail
> in case we need new kind of device count later.
> 

If we change "dmadev" to "dma" this could then be
"rte_dma_count_avail_dev".

> > +
> > +/* Enumerates DMA device capabilities. */
> 
> You should group them with a doxygen group syntax.
> See https://patches.dpdk.org/project/dpdk/patch/20210830104232.598703-1-thomas@monjalon.net/
> 
> > +#define RTE_DMADEV_CAPA_MEM_TO_MEM	(1ull << 0)
> 
> Please use RTE_BIT macro (32 or 64).
> 
> > +/**< DMA device support memory-to-memory transfer.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> 
> It is preferred to have documentation before the item.
> 
> [...]
> 
> > +/**
> > + * A structure used to retrieve the information of a DMA device.
> > + */
> > +struct rte_dmadev_info {
> > +	struct rte_device *device; /**< Generic Device information. */
> 
> Please do not expose this.
> 
> > +	uint64_t dev_capa; /**< Device capabilities (RTE_DMADEV_CAPA_*). */
> > +	uint16_t max_vchans;
> > +	/**< Maximum number of virtual DMA channels supported. */
> > +	uint16_t max_desc;
> > +	/**< Maximum allowed number of virtual DMA channel descriptors. */
> > +	uint16_t min_desc;
> > +	/**< Minimum allowed number of virtual DMA channel descriptors. */
> > +	uint16_t max_sges;
> > +	/**< Maximum number of source or destination scatter-gather entry
> > +	 * supported.
> > +	 * If the device does not support COPY_SG capability, this value can be
> > +	 * zero.
> > +	 * If the device supports COPY_SG capability, then rte_dmadev_copy_sg()
> > +	 * parameter nb_src/nb_dst should not exceed this value.
> > +	 */
> > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */
> 
> What about adding NUMA node?
> 
>     /* Local NUMA memory ID. -1 if unknown. */
>     int16_t numa_node;
> 

That was omitted as it could be got through the device structure. If device
is removed, we need to ensure all fields needed from device, such as numa
node, are made available here.

> > +};
> 
> > +int
> > +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
>
<snip>
  
Thomas Monjalon Sept. 9, 2021, 11:29 a.m. UTC | #3
09/09/2021 13:18, Bruce Richardson:
> On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
> > 07/09/2021 14:56, Chengwen Feng:
> > > + * The first three APIs are used to submit the operation request to the virtual
> > > + * DMA channel, if the submission is successful, an uint16_t ring_idx is
> > > + * returned, otherwise a negative number is returned.
> > 
> > unsigned or negative? looks weird.
> 
> May be, but it works well. We could perhaps rephase to make it less weird
> though:
> "if the submission is successful, a positive ring_idx <= UINT16_MAX is
>  returned, otherwise a negative number is returned."

I am advocating for int16_t,
it makes a lot of things simpler.

> > > + *
> > > + * The last API was used to issue doorbell to hardware, and also there are flags
> > > + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
> > > + * same work.
> > 
> > I don't understand this sentence.
> > You mean rte_dmadev_submit function?
> > Why past tense "was"?
> > Why having a redundant function?
> > 
> 
> Just because there are two ways to do something does not mean that one of
> them is redundant, as both may be more suitable for different situations.

I agree.

> When enqueuing a set of jobs to the device, having a separate submit
> outside a loop makes for clearer code than having a check for the last
> iteration inside the loop to set a special submit flag.  However, for cases
> where one item alone is to be submitted or there is a small set of jobs to
> be submitted sequentially, having a submit flag provides a lower-overhead
> way of doing the submission while still keeping the code clean.

This kind of explanation may be missing in doxygen?

> > > +bool
> > > +rte_dmadev_is_valid_dev(uint16_t dev_id);
> > 
> > I would suggest dropping the final "_dev" in the function name.
> > 
> 
> The alternative, which I would support, is replacing "rte_dmadev" with
> "rte_dma" across the API. This would then become "rte_dma_is_valid_dev"
> which is clearer, since the dev is not part of the standard prefix. It also
> would fit in with a possible future function of "rte_dma_is_valid_vchan"
> for instance.

Yes
The question is whether it would make sense to reserver rte_dma_ prefix
for some DMA functions which would be outside of dmadev lib?
If you think that all DMA functions will be in dmadev,
then yes we can shorten the prefix to rte_dma_.

> > > +uint16_t
> > > +rte_dmadev_count(void);
> > 
> > It would be safer to name it rte_dmadev_count_avail
> > in case we need new kind of device count later.
> > 
> 
> If we change "dmadev" to "dma" this could then be
> "rte_dma_count_avail_dev".

Yes

> > > +/**
> > > + * A structure used to retrieve the information of a DMA device.
> > > + */
> > > +struct rte_dmadev_info {
> > > +	struct rte_device *device; /**< Generic Device information. */
> > 
> > Please do not expose this.
> > 
> > > +	uint64_t dev_capa; /**< Device capabilities (RTE_DMADEV_CAPA_*). */
> > > +	uint16_t max_vchans;
> > > +	/**< Maximum number of virtual DMA channels supported. */
> > > +	uint16_t max_desc;
> > > +	/**< Maximum allowed number of virtual DMA channel descriptors. */
> > > +	uint16_t min_desc;
> > > +	/**< Minimum allowed number of virtual DMA channel descriptors. */
> > > +	uint16_t max_sges;
> > > +	/**< Maximum number of source or destination scatter-gather entry
> > > +	 * supported.
> > > +	 * If the device does not support COPY_SG capability, this value can be
> > > +	 * zero.
> > > +	 * If the device supports COPY_SG capability, then rte_dmadev_copy_sg()
> > > +	 * parameter nb_src/nb_dst should not exceed this value.
> > > +	 */
> > > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */
> > 
> > What about adding NUMA node?
> > 
> >     /* Local NUMA memory ID. -1 if unknown. */
> >     int16_t numa_node;
> > 
> 
> That was omitted as it could be got through the device structure. If device
> is removed, we need to ensure all fields needed from device, such as numa
> node, are made available here.

Ah yes, forgot about rte_device :)
Yes please remove rte_device from this struct.
  
Bruce Richardson Sept. 9, 2021, 12:45 p.m. UTC | #4
On Thu, Sep 09, 2021 at 01:29:33PM +0200, Thomas Monjalon wrote:
> 09/09/2021 13:18, Bruce Richardson:
> > On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
> > > 07/09/2021 14:56, Chengwen Feng:
> > > > + * The first three APIs are used to submit the operation request to the virtual
> > > > + * DMA channel, if the submission is successful, an uint16_t ring_idx is
> > > > + * returned, otherwise a negative number is returned.
> > > 
> > > unsigned or negative? looks weird.
> > 
> > May be, but it works well. We could perhaps rephase to make it less weird
> > though:
> > "if the submission is successful, a positive ring_idx <= UINT16_MAX is
> >  returned, otherwise a negative number is returned."
> 
> I am advocating for int16_t,
> it makes a lot of things simpler.
> 

No, it doesn't work as you can't have wrap-around of the IDs once you use
signed values - and that impacts both the end app and the internals of the
drivers. Let's keep it as-is otherwise it will have massive impacts -
including potential perf impacts.

> > > > + *
> > > > + * The last API was used to issue doorbell to hardware, and also there are flags
> > > > + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
> > > > + * same work.
> > > 
> > > I don't understand this sentence.
> > > You mean rte_dmadev_submit function?
> > > Why past tense "was"?
> > > Why having a redundant function?
> > > 
> > 
> > Just because there are two ways to do something does not mean that one of
> > them is redundant, as both may be more suitable for different situations.
> 
> I agree.
> 
> > When enqueuing a set of jobs to the device, having a separate submit
> > outside a loop makes for clearer code than having a check for the last
> > iteration inside the loop to set a special submit flag.  However, for cases
> > where one item alone is to be submitted or there is a small set of jobs to
> > be submitted sequentially, having a submit flag provides a lower-overhead
> > way of doing the submission while still keeping the code clean.
> 
> This kind of explanation may be missing in doxygen?
> 

It can be added, sure.

> > > > +bool
> > > > +rte_dmadev_is_valid_dev(uint16_t dev_id);
> > > 
> > > I would suggest dropping the final "_dev" in the function name.
> > > 
> > 
> > The alternative, which I would support, is replacing "rte_dmadev" with
> > "rte_dma" across the API. This would then become "rte_dma_is_valid_dev"
> > which is clearer, since the dev is not part of the standard prefix. It also
> > would fit in with a possible future function of "rte_dma_is_valid_vchan"
> > for instance.
> 
> Yes
> The question is whether it would make sense to reserver rte_dma_ prefix
> for some DMA functions which would be outside of dmadev lib?
> If you think that all DMA functions will be in dmadev,
> then yes we can shorten the prefix to rte_dma_.
> 

Well, any DPDK dma functions which are not in dma library should have the
prefix of the library they are in e.g. rte_eal_dma_*, rte_pci_dma_*
Therefore, I don't think name conflicts should be an issue, and I like
having less typing to do in function names (and I believe Morten was
strongly proposing this previously too)

> > > > +uint16_t
> > > > +rte_dmadev_count(void);
> > > 
> > > It would be safer to name it rte_dmadev_count_avail
> > > in case we need new kind of device count later.
> > > 
> > 
> > If we change "dmadev" to "dma" this could then be
> > "rte_dma_count_avail_dev".
> 
> Yes
> 
> > > > +/**
> > > > + * A structure used to retrieve the information of a DMA device.
> > > > + */
> > > > +struct rte_dmadev_info {
> > > > +	struct rte_device *device; /**< Generic Device information. */
> > > 
> > > Please do not expose this.
> > > 
> > > > +	uint64_t dev_capa; /**< Device capabilities (RTE_DMADEV_CAPA_*). */
> > > > +	uint16_t max_vchans;
> > > > +	/**< Maximum number of virtual DMA channels supported. */
> > > > +	uint16_t max_desc;
> > > > +	/**< Maximum allowed number of virtual DMA channel descriptors. */
> > > > +	uint16_t min_desc;
> > > > +	/**< Minimum allowed number of virtual DMA channel descriptors. */
> > > > +	uint16_t max_sges;
> > > > +	/**< Maximum number of source or destination scatter-gather entry
> > > > +	 * supported.
> > > > +	 * If the device does not support COPY_SG capability, this value can be
> > > > +	 * zero.
> > > > +	 * If the device supports COPY_SG capability, then rte_dmadev_copy_sg()
> > > > +	 * parameter nb_src/nb_dst should not exceed this value.
> > > > +	 */
> > > > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */
> > > 
> > > What about adding NUMA node?
> > > 
> > >     /* Local NUMA memory ID. -1 if unknown. */
> > >     int16_t numa_node;
> > > 
> > 
> > That was omitted as it could be got through the device structure. If device
> > is removed, we need to ensure all fields needed from device, such as numa
> > node, are made available here.
> 
> Ah yes, forgot about rte_device :)
> Yes please remove rte_device from this struct.
>
  
fengchengwen Sept. 9, 2021, 1:33 p.m. UTC | #5
Thanks for reviewing, mostly OK, a few inlined.

On 2021/9/9 18:33, Thomas Monjalon wrote:
> Hi,
> 
> I am having a surface look at the API.
> I hope we can do better than previous libs.
> 
> 07/09/2021 14:56, Chengwen Feng:
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>>  F: app/test/test_rawdev.c
>>  F: doc/guides/prog_guide/rawdev.rst
>>  
>> +DMA device API - EXPERIMENTAL
>> +M: Chengwen Feng <fengchengwen@huawei.com>
>> +F: lib/dmadev/
> 
> 
> I think it should before (preferably) or after eventdev,
> but let's keep rawdev as the last one.
> 
> Then please apply the same order in other non-alphabetical lists (doc, meson, etc).
> 
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2021 HiSilicon Limited.
>> + * Copyright(c) 2021 Intel Corporation.
>> + * Copyright(c) 2021 Marvell International Ltd.
>> + * Copyright(c) 2021 SmartShare Systems.
>> + */
> 
> No need for final dot in copyright lines.
> 
>> +#ifndef _RTE_DMADEV_H_
>> +#define _RTE_DMADEV_H_
> 
> No need for surrounding underscores.
> 
>> +
>> +/**
>> + * @file rte_dmadev.h
>> + *
>> + * RTE DMA (Direct Memory Access) device APIs.
> 
> RTE has no meaning when used in a sentence.
> And given it is a DPDK library, you don't really need to specify.
> I would also remove the final "s" as the library is one interface.
> 
>> + *
>> + * The DMA framework is built on the following model:
>> + *
>> + *     ---------------   ---------------       ---------------
>> + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
>> + *     | channel     |   | channel     |       | channel     |
>> + *     ---------------   ---------------       ---------------
>> + *            |                |                      |
>> + *            ------------------                      |
>> + *                     |                              |
>> + *               ------------                    ------------
>> + *               |  dmadev  |                    |  dmadev  |
>> + *               ------------                    ------------
>> + *                     |                              |
>> + *            ------------------               ------------------
>> + *            | HW-DMA-channel |               | HW-DMA-channel |
>> + *            ------------------               ------------------
>> + *                     |                              |
>> + *                     --------------------------------
>> + *                                     |
>> + *                           ---------------------
>> + *                           | HW-DMA-Controller |
>> + *                           ---------------------
> 
> You don't hyphens between the words I think.
> 
>> + *
>> + * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
>> + * each HW-DMA-channel should be represented by a dmadev.
>> + *
>> + * The dmadev could create multiple virtual DMA channels, each virtual DMA
>> + * channel represents a different transfer context. The DMA operation request
>> + * must be submitted to the virtual DMA channel. e.g. Application could create
>> + * virtual DMA channel 0 for memory-to-memory transfer scenario, and create
>> + * virtual DMA channel 1 for memory-to-device transfer scenario.
> 
> What is the benefit of virtual channels compared to have separate dmadevs
> for the same HW channel?

This design is from the previous discussion [1]. If a dmadev is created for each
virtual channel, there are many associations between the dmadevs. For example,
channel operations of some devices need to interact with the kernel, the corresponding
kernel operation handles need to be shared among multiple dmadevs. It's going to get
more complicated.

[1] https://lore.kernel.org/dpdk-dev/c4a0ee30-f7b8-f8a1-463c-8eedaec82aea@huawei.com/

> 
>> + *
>> + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
>> + * PCI/SoC device probing phase performed at EAL initialization time. And could
> 
> Not clear what is this phase. Do you mean bus probing?

yes, it's bus probing

> 
>> + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
>> + * phase.
> 
> Again what is this phase?
> I think freeing should be done only via the "close" function.

OK
The allocate/release will be reconstructed with reference to rte_eth_dev_pci_generic_probe.

> 
>> + *
>> + * This framework uses 'uint16_t dev_id' as the device identifier of a dmadev,
>> + * and 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> 
> I think it is better to use signed int16_t so you can express "none" in the API,
> which can simplify some functions and error management.
> 
>> + *
>> + * The functions exported by the dmadev API to setup a device designated by its
>> + * device identifier must be invoked in the following order:
>> + *     - rte_dmadev_configure()
>> + *     - rte_dmadev_vchan_setup()
>> + *     - rte_dmadev_start()
>> + *
>> + * Then, the application can invoke dataplane APIs to process jobs.
> 
> You mean "dataplane functions".
> 
>> + *
>> + * If the application wants to change the configuration (i.e. invoke
>> + * rte_dmadev_configure() or rte_dmadev_vchan_setup()), it must invoke
>> + * rte_dmadev_stop() first to stop the device and then do the reconfiguration
>> + * before invoking rte_dmadev_start() again. The dataplane APIs should not be
> 
> again, APIs -> functions
> 
>> + * invoked when the device is stopped.
>> + *
>> + * Finally, an application can close a dmadev by invoking the
>> + * rte_dmadev_close() function.
>> + *
>> + * The dataplane APIs include two parts:
>> + * The first part is the submission of operation requests:
>> + *     - rte_dmadev_copy()
>> + *     - rte_dmadev_copy_sg()
>> + *     - rte_dmadev_fill()
>> + *     - rte_dmadev_submit()
>> + *
>> + * These APIs could work with different virtual DMA channels which have
>> + * different contexts.
>> + *
>> + * The first three APIs are used to submit the operation request to the virtual
>> + * DMA channel, if the submission is successful, an uint16_t ring_idx is
>> + * returned, otherwise a negative number is returned.
> 
> unsigned or negative? looks weird.
> 
>> + *
>> + * The last API was used to issue doorbell to hardware, and also there are flags
>> + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
>> + * same work.
> 
> I don't understand this sentence.
> You mean rte_dmadev_submit function?
> Why past tense "was"?
> Why having a redundant function?
> 
>> + *
>> + * The second part is to obtain the result of requests:
>> + *     - rte_dmadev_completed()
>> + *         - return the number of operation requests completed successfully.
>> + *     - rte_dmadev_completed_status()
>> + *         - return the number of operation requests completed.
>> + *
>> + * @note The two completed APIs also support return the last completed
>> + * operation's ring_idx.
> 
> Not sure why this note here.
> 
>> + * @note If the dmadev works in silent mode (@see RTE_DMADEV_CAPA_SILENT),
>> + * application does not invoke the above two completed APIs.
>> + *
>> + * About the ring_idx which enqueue APIs (e.g. rte_dmadev_copy()
>> + * rte_dmadev_fill()) returned, the rules are as follows:
> 
> I feel a word is missing above.

Can you point it out specifically ?
PS: I specifically examined by access https://www.nounplus.net/grammarcheck/ and found
it prompts the 'enqueue' to 'enqueues or enqueued'.

> 
>> + *     - ring_idx for each virtual DMA channel are independent.
>> + *     - For a virtual DMA channel, the ring_idx is monotonically incremented,
>> + *       when it reach UINT16_MAX, it wraps back to zero.
>> + *     - This ring_idx can be used by applications to track per-operation
>> + *       metadata in an application-defined circular ring.
>> + *     - The initial ring_idx of a virtual DMA channel is zero, after the
>> + *       device is stopped, the ring_idx needs to be reset to zero.
>> + *
>> + * One example:
>> + *     - step-1: start one dmadev
>> + *     - step-2: enqueue a copy operation, the ring_idx return is 0
>> + *     - step-3: enqueue a copy operation again, the ring_idx return is 1
>> + *     - ...
>> + *     - step-101: stop the dmadev
>> + *     - step-102: start the dmadev
>> + *     - step-103: enqueue a copy operation, the ring_idx return is 0
>> + *     - ...
>> + *     - step-x+0: enqueue a fill operation, the ring_idx return is 65535
>> + *     - step-x+1: enqueue a copy operation, the ring_idx return is 0
>> + *     - ...
>> + *
>> + * The DMA operation address used in enqueue APIs (i.e. rte_dmadev_copy(),
>> + * rte_dmadev_copy_sg(), rte_dmadev_fill()) defined as rte_iova_t type. The
> 
> "is" defined
> 
>> + * dmadev supports two types of address: memory address and device address.
> 
> Please try to start new sentences on a new line.
> 
>> + *
>> + * - memory address: the source and destination address of the memory-to-memory
>> + * transfer type, or the source address of the memory-to-device transfer type,
>> + * or the destination address of the device-to-memory transfer type.
>> + * @note If the device support SVA (@see RTE_DMADEV_CAPA_SVA), the memory
>> + * address can be any VA address, otherwise it must be an IOVA address.
>> + *
>> + * - device address: the source and destination address of the device-to-device
>> + * transfer type, or the source address of the device-to-memory transfer type,
>> + * or the destination address of the memory-to-device transfer type.
>> + *
>> + * By default, all the functions of the dmadev API exported by a PMD are
> 
> What do you mean "by default"? Is there some exceptions?
> 
>> + * lock-free functions which assume to not be invoked in parallel on different
>> + * logical cores to work on the same target dmadev object.
>> + * @note Different virtual DMA channels on the same dmadev *DO NOT* support
>> + * parallel invocation because these virtual DMA channels share the same
>> + * HW-DMA-channel.
>> + *
>> + */
> [...]
>> +#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN
> 
> Why not using RTE_DEV_NAME_MAX_LEN directly?
> If you keep it, it should be commented, explaining whether it takes '\0'
> into account or not.
> 
>> +__rte_experimental
>> +bool
>> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> 
> I would suggest dropping the final "_dev" in the function name.
> 
> 
>> +uint16_t
>> +rte_dmadev_count(void);
> 
> It would be safer to name it rte_dmadev_count_avail
> in case we need new kind of device count later.
> 
>> +
>> +/* Enumerates DMA device capabilities. */
> 
> You should group them with a doxygen group syntax.
> See https://patches.dpdk.org/project/dpdk/patch/20210830104232.598703-1-thomas@monjalon.net/
> 
>> +#define RTE_DMADEV_CAPA_MEM_TO_MEM	(1ull << 0)
> 
> Please use RTE_BIT macro (32 or 64).
> 
>> +/**< DMA device support memory-to-memory transfer.
>> + *
>> + * @see struct rte_dmadev_info::dev_capa
>> + */
> 
> It is preferred to have documentation before the item.

Is this particularly strong?
I use postfix comment style for whole doxygen comments. I think it's better to maintain a unified
style.

> 
> [...]
> 
>> +/**
>> + * A structure used to retrieve the information of a DMA device.
>> + */
>> +struct rte_dmadev_info {
>> +	struct rte_device *device; /**< Generic Device information. */
> 
> Please do not expose this.
> 
>> +	uint64_t dev_capa; /**< Device capabilities (RTE_DMADEV_CAPA_*). */
>> +	uint16_t max_vchans;
>> +	/**< Maximum number of virtual DMA channels supported. */
>> +	uint16_t max_desc;
>> +	/**< Maximum allowed number of virtual DMA channel descriptors. */
>> +	uint16_t min_desc;
>> +	/**< Minimum allowed number of virtual DMA channel descriptors. */
>> +	uint16_t max_sges;
>> +	/**< Maximum number of source or destination scatter-gather entry
>> +	 * supported.
>> +	 * If the device does not support COPY_SG capability, this value can be
>> +	 * zero.
>> +	 * If the device supports COPY_SG capability, then rte_dmadev_copy_sg()
>> +	 * parameter nb_src/nb_dst should not exceed this value.
>> +	 */
>> +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */
> 
> What about adding NUMA node?
> 
>     /* Local NUMA memory ID. -1 if unknown. */
>     int16_t numa_node;
> 
>> +};
> 
>> +int
>> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
> 
> In .h files, the return type should not be on a separate line.
> 
>> +
>> +/**
>> + * A structure used to configure a DMA device.
>> + */
> 
> You should mention where it is used with @see.
> 
>> +struct rte_dmadev_conf {
>> +	uint16_t nb_vchans;
>> +	/**< The number of virtual DMA channels to set up for the DMA device.
>> +	 * This value cannot be greater than the field 'max_vchans' of struct
>> +	 * rte_dmadev_info which get from rte_dmadev_info_get().
>> +	 */
>> +	bool enable_silent;
>> +	/**< Indicates whether to enable silent mode.
>> +	 * false-default mode, true-silent mode.
>> +	 * This value can be set to true only when the SILENT capability is
>> +	 * supported.
>> +	 *
>> +	 * @see RTE_DMADEV_CAPA_SILENT
>> +	 */
>> +};
> [...]
>> +#define RTE_DMADEV_ALL_VCHAN	0xFFFFu
> 
> Please do not add this kind of constant without a doxygen comment.
> 
> 
> It seems you don't manage the maximum number of devices.
> It is fixed in the .c file:
> 	struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
> 
> Instead I would suggest a more dynamic approach with an init function,
> so the application can extend it before calling rte_eal_init.
> Please see how it is implemented here:
> https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-2-thomas@monjalon.net/
> 
> Above patch could also inspire you to start some docs in this patch.
> 
> 
> This series add one file per patch.
> Instead it would be better to have groups of features per patch,
> meaning the implementation and the driver interface should be
> in the same patch.
> You can split like this:
> 	1/ device allocation
> 	2/ configuration and start/stop
> 	3/ dataplane functions
> 
> I would suggest 2 more patches:
> 	4/ event notification
> see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-3-thomas@monjalon.net/
> 	5/ multi-process
> see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-5-thomas@monjalon.net/

The split mode you recommend is better.
But is this particularly strong ?  Because many acked-by and reviewed-by base on stand-alone file.
Does this division mean that a new acked/reviewed ?

> 
> 
> Thanks for the work
> 
> 
> .
>
  
fengchengwen Sept. 9, 2021, 1:54 p.m. UTC | #6
On 2021/9/9 20:45, Bruce Richardson wrote:
> On Thu, Sep 09, 2021 at 01:29:33PM +0200, Thomas Monjalon wrote:
>> 09/09/2021 13:18, Bruce Richardson:
>>> On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
>>>> 07/09/2021 14:56, Chengwen Feng:
>>>>> + * The first three APIs are used to submit the operation request to the virtual
>>>>> + * DMA channel, if the submission is successful, an uint16_t ring_idx is
>>>>> + * returned, otherwise a negative number is returned.
>>>>
>>>> unsigned or negative? looks weird.
>>>
>>> May be, but it works well. We could perhaps rephase to make it less weird
>>> though:
>>> "if the submission is successful, a positive ring_idx <= UINT16_MAX is
>>>  returned, otherwise a negative number is returned."
>>
>> I am advocating for int16_t,
>> it makes a lot of things simpler.
>>
> 
> No, it doesn't work as you can't have wrap-around of the IDs once you use
> signed values - and that impacts both the end app and the internals of the
> drivers. Let's keep it as-is otherwise it will have massive impacts -
> including potential perf impacts.
> 
>>>>> + *
>>>>> + * The last API was used to issue doorbell to hardware, and also there are flags
>>>>> + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
>>>>> + * same work.
>>>>
>>>> I don't understand this sentence.
>>>> You mean rte_dmadev_submit function?
>>>> Why past tense "was"?
>>>> Why having a redundant function?
>>>>
>>>
>>> Just because there are two ways to do something does not mean that one of
>>> them is redundant, as both may be more suitable for different situations.
>>
>> I agree.
>>
>>> When enqueuing a set of jobs to the device, having a separate submit
>>> outside a loop makes for clearer code than having a check for the last
>>> iteration inside the loop to set a special submit flag.  However, for cases
>>> where one item alone is to be submitted or there is a small set of jobs to
>>> be submitted sequentially, having a submit flag provides a lower-overhead
>>> way of doing the submission while still keeping the code clean.
>>
>> This kind of explanation may be missing in doxygen?
>>
> 
> It can be added, sure.
> 
>>>>> +bool
>>>>> +rte_dmadev_is_valid_dev(uint16_t dev_id);
>>>>
>>>> I would suggest dropping the final "_dev" in the function name.
>>>>
>>>
>>> The alternative, which I would support, is replacing "rte_dmadev" with
>>> "rte_dma" across the API. This would then become "rte_dma_is_valid_dev"
>>> which is clearer, since the dev is not part of the standard prefix. It also
>>> would fit in with a possible future function of "rte_dma_is_valid_vchan"
>>> for instance.
>>
>> Yes
>> The question is whether it would make sense to reserver rte_dma_ prefix
>> for some DMA functions which would be outside of dmadev lib?
>> If you think that all DMA functions will be in dmadev,
>> then yes we can shorten the prefix to rte_dma_.
>>
> 
> Well, any DPDK dma functions which are not in dma library should have the
> prefix of the library they are in e.g. rte_eal_dma_*, rte_pci_dma_*
> Therefore, I don't think name conflicts should be an issue, and I like
> having less typing to do in function names (and I believe Morten was
> strongly proposing this previously too)

The dmadev is rather short, if change I prefer all public API with rte_dma_ prefix,
and don't have rte_dma_dev_ prefix for such start/stop/close. (ps: the rte_eth_ also
have rte_eth_dev_close which is painful for OCD).

Also should the filename(e.g. rte_dmadev.h) and directory-name(lib/dmadev) also change ?

> 
>>>>> +uint16_t
>>>>> +rte_dmadev_count(void);
>>>>
>>>> It would be safer to name it rte_dmadev_count_avail
>>>> in case we need new kind of device count later.
>>>>
>>>
>>> If we change "dmadev" to "dma" this could then be
>>> "rte_dma_count_avail_dev".
>>
>> Yes
>>
>>>>> +/**
>>>>> + * A structure used to retrieve the information of a DMA device.
>>>>> + */
>>>>> +struct rte_dmadev_info {
>>>>> +	struct rte_device *device; /**< Generic Device information. */
>>>>
>>>> Please do not expose this.
>>>>
>>>>> +	uint64_t dev_capa; /**< Device capabilities (RTE_DMADEV_CAPA_*). */
>>>>> +	uint16_t max_vchans;
>>>>> +	/**< Maximum number of virtual DMA channels supported. */
>>>>> +	uint16_t max_desc;
>>>>> +	/**< Maximum allowed number of virtual DMA channel descriptors. */
>>>>> +	uint16_t min_desc;
>>>>> +	/**< Minimum allowed number of virtual DMA channel descriptors. */
>>>>> +	uint16_t max_sges;
>>>>> +	/**< Maximum number of source or destination scatter-gather entry
>>>>> +	 * supported.
>>>>> +	 * If the device does not support COPY_SG capability, this value can be
>>>>> +	 * zero.
>>>>> +	 * If the device supports COPY_SG capability, then rte_dmadev_copy_sg()
>>>>> +	 * parameter nb_src/nb_dst should not exceed this value.
>>>>> +	 */
>>>>> +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */
>>>>
>>>> What about adding NUMA node?
>>>>
>>>>     /* Local NUMA memory ID. -1 if unknown. */
>>>>     int16_t numa_node;
>>>>
>>>
>>> That was omitted as it could be got through the device structure. If device
>>> is removed, we need to ensure all fields needed from device, such as numa
>>> node, are made available here.
>>
>> Ah yes, forgot about rte_device :)
>> Yes please remove rte_device from this struct.
>>
> .
>
  
Thomas Monjalon Sept. 9, 2021, 2:19 p.m. UTC | #7
09/09/2021 15:33, fengchengwen:
> On 2021/9/9 18:33, Thomas Monjalon wrote:
> > 07/09/2021 14:56, Chengwen Feng:
> >> + * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
> >> + * each HW-DMA-channel should be represented by a dmadev.
> >> + *
> >> + * The dmadev could create multiple virtual DMA channels, each virtual DMA
> >> + * channel represents a different transfer context. The DMA operation request
> >> + * must be submitted to the virtual DMA channel. e.g. Application could create
> >> + * virtual DMA channel 0 for memory-to-memory transfer scenario, and create
> >> + * virtual DMA channel 1 for memory-to-device transfer scenario.
> > 
> > What is the benefit of virtual channels compared to have separate dmadevs
> > for the same HW channel?
> 
> This design is from the previous discussion [1]. If a dmadev is created for each
> virtual channel, there are many associations between the dmadevs. For example,
> channel operations of some devices need to interact with the kernel, the corresponding
> kernel operation handles need to be shared among multiple dmadevs. It's going to get
> more complicated.
> 
> [1] https://lore.kernel.org/dpdk-dev/c4a0ee30-f7b8-f8a1-463c-8eedaec82aea@huawei.com/

OK thanks for the explanation.

[...]
> >> + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> >> + * phase.
> > 
> > Again what is this phase?
> > I think freeing should be done only via the "close" function.
> 
> OK
> The allocate/release will be reconstructed with reference to rte_eth_dev_pci_generic_probe.

You shouldn't always mimic ethdev, there can be some misconceptions :)
I think you don't need PCI specific helper.

[...]
> >> + * @note If the dmadev works in silent mode (@see RTE_DMADEV_CAPA_SILENT),
> >> + * application does not invoke the above two completed APIs.
> >> + *
> >> + * About the ring_idx which enqueue APIs (e.g. rte_dmadev_copy()
> >> + * rte_dmadev_fill()) returned, the rules are as follows:
> > 
> > I feel a word is missing above.
> 
> Can you point it out specifically ?
> PS: I specifically examined by access https://www.nounplus.net/grammarcheck/ and found
> it prompts the 'enqueue' to 'enqueues or enqueued'.

After second read, I think it is a tense problem.
What about "returned" -> "return" ?

[...]
> >> +/**< DMA device support memory-to-memory transfer.
> >> + *
> >> + * @see struct rte_dmadev_info::dev_capa
> >> + */
> > 
> > It is preferred to have documentation before the item.
> 
> Is this particularly strong?
> I use postfix comment style for whole doxygen comments. I think it's better to maintain a unified
> style.

In general prefix comment is preferred.
Postfix comments are OK for short comments on the same line.

[...]
> > This series add one file per patch.
> > Instead it would be better to have groups of features per patch,
> > meaning the implementation and the driver interface should be
> > in the same patch.
> > You can split like this:
> > 	1/ device allocation
> > 	2/ configuration and start/stop
> > 	3/ dataplane functions
> > 
> > I would suggest 2 more patches:
> > 	4/ event notification
> > see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-3-thomas@monjalon.net/
> > 	5/ multi-process
> > see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-5-thomas@monjalon.net/
> 
> The split mode you recommend is better.
> But is this particularly strong ?

Yes, that's really better.

> Because many acked-by and reviewed-by base on stand-alone file.
> Does this division mean that a new acked/reviewed ?

You can keep the acks which are commong to the first 4 patches I guess
and ask for re-ack to others.
  
Thomas Monjalon Sept. 9, 2021, 2:26 p.m. UTC | #8
09/09/2021 15:54, fengchengwen:
> On 2021/9/9 20:45, Bruce Richardson wrote:
> > On Thu, Sep 09, 2021 at 01:29:33PM +0200, Thomas Monjalon wrote:
> >> 09/09/2021 13:18, Bruce Richardson:
> >>> On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
> >>>> 07/09/2021 14:56, Chengwen Feng:
> >>>>> + * The first three APIs are used to submit the operation request to the virtual
> >>>>> + * DMA channel, if the submission is successful, an uint16_t ring_idx is
> >>>>> + * returned, otherwise a negative number is returned.
> >>>>
> >>>> unsigned or negative? looks weird.
> >>>
> >>> May be, but it works well. We could perhaps rephase to make it less weird
> >>> though:
> >>> "if the submission is successful, a positive ring_idx <= UINT16_MAX is
> >>>  returned, otherwise a negative number is returned."
> >>
> >> I am advocating for int16_t,
> >> it makes a lot of things simpler.
> > 
> > No, it doesn't work as you can't have wrap-around of the IDs once you use
> > signed values - and that impacts both the end app and the internals of the
> > drivers. Let's keep it as-is otherwise it will have massive impacts -
> > including potential perf impacts.

Not sure to understand what you mean.
Please could you explain what does not work and what is the perf impact?
I guess you want unsigned index for rings, then OK.
For device ID however, I believe signed integer is useful.

[...]
> >>>>> +bool
> >>>>> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> >>>>
> >>>> I would suggest dropping the final "_dev" in the function name.
> >>>
> >>> The alternative, which I would support, is replacing "rte_dmadev" with
> >>> "rte_dma" across the API. This would then become "rte_dma_is_valid_dev"
> >>> which is clearer, since the dev is not part of the standard prefix. It also
> >>> would fit in with a possible future function of "rte_dma_is_valid_vchan"
> >>> for instance.
> >>
> >> Yes
> >> The question is whether it would make sense to reserver rte_dma_ prefix
> >> for some DMA functions which would be outside of dmadev lib?
> >> If you think that all DMA functions will be in dmadev,
> >> then yes we can shorten the prefix to rte_dma_.
> >>
> > 
> > Well, any DPDK dma functions which are not in dma library should have the
> > prefix of the library they are in e.g. rte_eal_dma_*, rte_pci_dma_*

Quite often, we skip the eal_ prefix, that's why I was thinking about
a possible namespace conflict. Anyway it could be managed.

> > Therefore, I don't think name conflicts should be an issue, and I like
> > having less typing to do in function names (and I believe Morten was
> > strongly proposing this previously too)
> 
> The dmadev is rather short, if change I prefer all public API with rte_dma_ prefix,
> and don't have rte_dma_dev_ prefix for such start/stop/close. (ps: the rte_eth_ also
> have rte_eth_dev_close which is painful for OCD).

Yes OK for rte_dma_ prefix everywhere.

> Also should the filename(e.g. rte_dmadev.h) and directory-name(lib/dmadev) also change ?

I believe it's better to keep dmadev as name of the lib and filename,
so it's consistent with other device classes.
What are the other opinions?
  
Bruce Richardson Sept. 9, 2021, 2:28 p.m. UTC | #9
On Thu, Sep 09, 2021 at 09:54:27PM +0800, fengchengwen wrote:
> On 2021/9/9 20:45, Bruce Richardson wrote:
> > On Thu, Sep 09, 2021 at 01:29:33PM +0200, Thomas Monjalon wrote:
> >> 09/09/2021 13:18, Bruce Richardson:
> >>> On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
> >>>> 07/09/2021 14:56, Chengwen Feng:
> >>>>> + * The first three APIs are used to submit the operation request to the virtual
> >>>>> + * DMA channel, if the submission is successful, an uint16_t ring_idx is
> >>>>> + * returned, otherwise a negative number is returned.
> >>>>
> >>>> unsigned or negative? looks weird.
> >>>
> >>> May be, but it works well. We could perhaps rephase to make it less weird
> >>> though:
> >>> "if the submission is successful, a positive ring_idx <= UINT16_MAX is
> >>>  returned, otherwise a negative number is returned."
> >>
> >> I am advocating for int16_t,
> >> it makes a lot of things simpler.
> >>
> > 
> > No, it doesn't work as you can't have wrap-around of the IDs once you use
> > signed values - and that impacts both the end app and the internals of the
> > drivers. Let's keep it as-is otherwise it will have massive impacts -
> > including potential perf impacts.
> > 
> >>>>> + *
> >>>>> + * The last API was used to issue doorbell to hardware, and also there are flags
> >>>>> + * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
> >>>>> + * same work.
> >>>>
> >>>> I don't understand this sentence.
> >>>> You mean rte_dmadev_submit function?
> >>>> Why past tense "was"?
> >>>> Why having a redundant function?
> >>>>
> >>>
> >>> Just because there are two ways to do something does not mean that one of
> >>> them is redundant, as both may be more suitable for different situations.
> >>
> >> I agree.
> >>
> >>> When enqueuing a set of jobs to the device, having a separate submit
> >>> outside a loop makes for clearer code than having a check for the last
> >>> iteration inside the loop to set a special submit flag.  However, for cases
> >>> where one item alone is to be submitted or there is a small set of jobs to
> >>> be submitted sequentially, having a submit flag provides a lower-overhead
> >>> way of doing the submission while still keeping the code clean.
> >>
> >> This kind of explanation may be missing in doxygen?
> >>
> > 
> > It can be added, sure.
> > 
> >>>>> +bool
> >>>>> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> >>>>
> >>>> I would suggest dropping the final "_dev" in the function name.
> >>>>
> >>>
> >>> The alternative, which I would support, is replacing "rte_dmadev" with
> >>> "rte_dma" across the API. This would then become "rte_dma_is_valid_dev"
> >>> which is clearer, since the dev is not part of the standard prefix. It also
> >>> would fit in with a possible future function of "rte_dma_is_valid_vchan"
> >>> for instance.
> >>
> >> Yes
> >> The question is whether it would make sense to reserver rte_dma_ prefix
> >> for some DMA functions which would be outside of dmadev lib?
> >> If you think that all DMA functions will be in dmadev,
> >> then yes we can shorten the prefix to rte_dma_.
> >>
> > 
> > Well, any DPDK dma functions which are not in dma library should have the
> > prefix of the library they are in e.g. rte_eal_dma_*, rte_pci_dma_*
> > Therefore, I don't think name conflicts should be an issue, and I like
> > having less typing to do in function names (and I believe Morten was
> > strongly proposing this previously too)
> 
> The dmadev is rather short, if change I prefer all public API with rte_dma_ prefix,
> and don't have rte_dma_dev_ prefix for such start/stop/close. (ps: the rte_eth_ also
> have rte_eth_dev_close which is painful for OCD).

I agree that having rte_dma_dev_* is unpleasant naming for those functions,
so if we use rte_dma_ as prefix, any dev should be at the end instead:
i.e. rte_dma_stop_dev, rte_dma_start_dev, rte_dma_close_dev, etc.

> 
> Also should the filename(e.g. rte_dmadev.h) and directory-name(lib/dmadev) also change ?
> 
I would keep those names intact.

/Bruce
  
Bruce Richardson Sept. 9, 2021, 2:31 p.m. UTC | #10
On Thu, Sep 09, 2021 at 04:26:40PM +0200, Thomas Monjalon wrote:
> 09/09/2021 15:54, fengchengwen:
> > On 2021/9/9 20:45, Bruce Richardson wrote:
> > > On Thu, Sep 09, 2021 at 01:29:33PM +0200, Thomas Monjalon wrote:
> > >> 09/09/2021 13:18, Bruce Richardson:
> > >>> On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
> > >>>> 07/09/2021 14:56, Chengwen Feng:
> > >>>>> + * The first three APIs are used to submit the operation request to the virtual
> > >>>>> + * DMA channel, if the submission is successful, an uint16_t ring_idx is
> > >>>>> + * returned, otherwise a negative number is returned.
> > >>>>
> > >>>> unsigned or negative? looks weird.
> > >>>
> > >>> May be, but it works well. We could perhaps rephase to make it less weird
> > >>> though:
> > >>> "if the submission is successful, a positive ring_idx <= UINT16_MAX is
> > >>>  returned, otherwise a negative number is returned."
> > >>
> > >> I am advocating for int16_t,
> > >> it makes a lot of things simpler.
> > > 
> > > No, it doesn't work as you can't have wrap-around of the IDs once you use
> > > signed values - and that impacts both the end app and the internals of the
> > > drivers. Let's keep it as-is otherwise it will have massive impacts -
> > > including potential perf impacts.
> 
> Not sure to understand what you mean.
> Please could you explain what does not work and what is the perf impact?
> I guess you want unsigned index for rings, then OK.

Yes, that is it.

> For device ID however, I believe signed integer is useful.

No objection to that.

> 
> [...]
> > >>>>> +bool
> > >>>>> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> > >>>>
> > >>>> I would suggest dropping the final "_dev" in the function name.
> > >>>
> > >>> The alternative, which I would support, is replacing "rte_dmadev" with
> > >>> "rte_dma" across the API. This would then become "rte_dma_is_valid_dev"
> > >>> which is clearer, since the dev is not part of the standard prefix. It also
> > >>> would fit in with a possible future function of "rte_dma_is_valid_vchan"
> > >>> for instance.
> > >>
> > >> Yes
> > >> The question is whether it would make sense to reserver rte_dma_ prefix
> > >> for some DMA functions which would be outside of dmadev lib?
> > >> If you think that all DMA functions will be in dmadev,
> > >> then yes we can shorten the prefix to rte_dma_.
> > >>
> > > 
> > > Well, any DPDK dma functions which are not in dma library should have the
> > > prefix of the library they are in e.g. rte_eal_dma_*, rte_pci_dma_*
> 
> Quite often, we skip the eal_ prefix, that's why I was thinking about
> a possible namespace conflict. Anyway it could be managed.
> 
> > > Therefore, I don't think name conflicts should be an issue, and I like
> > > having less typing to do in function names (and I believe Morten was
> > > strongly proposing this previously too)
> > 
> > The dmadev is rather short, if change I prefer all public API with rte_dma_ prefix,
> > and don't have rte_dma_dev_ prefix for such start/stop/close. (ps: the rte_eth_ also
> > have rte_eth_dev_close which is painful for OCD).
> 
> Yes OK for rte_dma_ prefix everywhere.
> 
> > Also should the filename(e.g. rte_dmadev.h) and directory-name(lib/dmadev) also change ?
> 
> I believe it's better to keep dmadev as name of the lib and filename,
> so it's consistent with other device classes.
> What are the other opinions?

Definitely keep. It's one thing to have additional characters in the header
name, another to have them in the APIs.

/Bruce
  
Morten Brørup Sept. 9, 2021, 3:12 p.m. UTC | #11
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Thursday, 9 September 2021 16.29
> 
> On Thu, Sep 09, 2021 at 09:54:27PM +0800, fengchengwen wrote:
> > On 2021/9/9 20:45, Bruce Richardson wrote:
> > > On Thu, Sep 09, 2021 at 01:29:33PM +0200, Thomas Monjalon wrote:
> > >> 09/09/2021 13:18, Bruce Richardson:
> > >>> On Thu, Sep 09, 2021 at 12:33:00PM +0200, Thomas Monjalon wrote:
> > >>>> 07/09/2021 14:56, Chengwen Feng:

[snip]

> > >>>>> +bool
> > >>>>> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> > >>>>
> > >>>> I would suggest dropping the final "_dev" in the function name.
> > >>>>
> > >>>
> > >>> The alternative, which I would support, is replacing "rte_dmadev"
> with
> > >>> "rte_dma" across the API. This would then become
> "rte_dma_is_valid_dev"
> > >>> which is clearer, since the dev is not part of the standard
> prefix. It also
> > >>> would fit in with a possible future function of
> "rte_dma_is_valid_vchan"
> > >>> for instance.
> > >>
> > >> Yes
> > >> The question is whether it would make sense to reserver rte_dma_
> prefix
> > >> for some DMA functions which would be outside of dmadev lib?
> > >> If you think that all DMA functions will be in dmadev,
> > >> then yes we can shorten the prefix to rte_dma_.
> > >>
> > >
> > > Well, any DPDK dma functions which are not in dma library should
> have the
> > > prefix of the library they are in e.g. rte_eal_dma_*, rte_pci_dma_*
> > > Therefore, I don't think name conflicts should be an issue, and I
> like
> > > having less typing to do in function names (and I believe Morten
> was
> > > strongly proposing this previously too)
> >
> > The dmadev is rather short, if change I prefer all public API with
> rte_dma_ prefix,
> > and don't have rte_dma_dev_ prefix for such start/stop/close. (ps:
> the rte_eth_ also
> > have rte_eth_dev_close which is painful for OCD).
> 
> I agree that having rte_dma_dev_* is unpleasant naming for those
> functions,
> so if we use rte_dma_ as prefix, any dev should be at the end instead:
> i.e. rte_dma_stop_dev, rte_dma_start_dev, rte_dma_close_dev, etc.
> 

I agree about using rte_dma_ as general prefix.

But I disagree about rte_dma_<action>_<object>() function names, such as rte_dma_stop_dev().

We should follow the convention of rte_dma_<object>_<action>(), like in the ethdev library, e.g. rte_eth_dev_get(), rte_eth_fec_get_capability().

Or simply rte_dma_<action>(), if the object is obvious and can be omitted.

I.e. rte_dma_dev_stop() or rte_dma_stop().

> >
> > Also should the filename(e.g. rte_dmadev.h) and directory-
> name(lib/dmadev) also change ?
> >
> I would keep those names intact.

Keep intact, as Bruce suggests. This also aligns with the ethdev library.

> 
> /Bruce
  
fengchengwen Sept. 16, 2021, 3:57 a.m. UTC | #12
Hi Thomas,

Most fixed in V22, some please see inline comment

Thanks.

On 2021/9/9 18:33, Thomas Monjalon wrote:
> Hi,
> 
> I am having a surface look at the API.
> I hope we can do better than previous libs.
> 
> 07/09/2021 14:56, Chengwen Feng:
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>>  F: app/test/test_rawdev.c
>>  F: doc/guides/prog_guide/rawdev.rst
>>  
>> +DMA device API - EXPERIMENTAL
>> +M: Chengwen Feng <fengchengwen@huawei.com>
>> +F: lib/dmadev/

[snip]

>> +
>> +/* Enumerates DMA device capabilities. */
> 
> You should group them with a doxygen group syntax.
> See https://patches.dpdk.org/project/dpdk/patch/20210830104232.598703-1-thomas@monjalon.net/

Because RTE_DMADEV_CAPA_* has multiple lines of comments, the effect is not good
when using group syntax.

Also consider using enum to define, but its value is uint64_t, and enumeration is
generally of the int type.

So it stays the same here.

> 
>> +#define RTE_DMADEV_CAPA_MEM_TO_MEM	(1ull << 0)
> 
> Please use RTE_BIT macro (32 or 64).
> 
>> +/**< DMA device support memory-to-memory transfer.
>> + *
>> + * @see struct rte_dmadev_info::dev_capa
>> + */
> 

[snip]

> 
> This series add one file per patch.
> Instead it would be better to have groups of features per patch,
> meaning the implementation and the driver interface should be
> in the same patch.
> You can split like this:
> 	1/ device allocation
> 	2/ configuration and start/stop
> 	3/ dataplane functions
> 
> I would suggest 2 more patches:
> 	4/ event notification
> see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-3-thomas@monjalon.net/
> 	5/ multi-process
> see https://patches.dpdk.org/project/dpdk/patch/20210730135533.417611-5-thomas@monjalon.net/
> 

The multi-process have many modify for device allocation, because the coupling
between the two is deep, I combines them to one patch.

> 
> Thanks for the work
> 
> 
> .
>
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 1e0d303394..9885cc56b7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -496,6 +496,10 @@  F: drivers/raw/skeleton/
 F: app/test/test_rawdev.c
 F: doc/guides/prog_guide/rawdev.rst
 
+DMA device API - EXPERIMENTAL
+M: Chengwen Feng <fengchengwen@huawei.com>
+F: lib/dmadev/
+
 
 Memory Pool Drivers
 -------------------
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..ce08250639 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -27,6 +27,7 @@  The public API headers are grouped by topics:
   [event_timer_adapter]    (@ref rte_event_timer_adapter.h),
   [event_crypto_adapter]   (@ref rte_event_crypto_adapter.h),
   [rawdev]             (@ref rte_rawdev.h),
+  [dmadev]             (@ref rte_dmadev.h),
   [metrics]            (@ref rte_metrics.h),
   [bitrate]            (@ref rte_bitrate.h),
   [latency]            (@ref rte_latencystats.h),
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6..a44a92b5fe 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -34,6 +34,7 @@  INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/cmdline \
                           @TOPDIR@/lib/compressdev \
                           @TOPDIR@/lib/cryptodev \
+                          @TOPDIR@/lib/dmadev \
                           @TOPDIR@/lib/distributor \
                           @TOPDIR@/lib/efd \
                           @TOPDIR@/lib/ethdev \
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b573834..3562822b3d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,11 @@  New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Added dmadev library support.**
+
+  The dmadev library provides a DMA device framework for management and
+  provision of hardware and software DMA devices.
+
 
 Removed Items
 -------------
diff --git a/lib/dmadev/meson.build b/lib/dmadev/meson.build
new file mode 100644
index 0000000000..6d5bd85373
--- /dev/null
+++ b/lib/dmadev/meson.build
@@ -0,0 +1,4 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 HiSilicon Limited.
+
+headers = files('rte_dmadev.h')
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
new file mode 100644
index 0000000000..76d71615eb
--- /dev/null
+++ b/lib/dmadev/rte_dmadev.h
@@ -0,0 +1,951 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 HiSilicon Limited.
+ * Copyright(c) 2021 Intel Corporation.
+ * Copyright(c) 2021 Marvell International Ltd.
+ * Copyright(c) 2021 SmartShare Systems.
+ */
+
+#ifndef _RTE_DMADEV_H_
+#define _RTE_DMADEV_H_
+
+/**
+ * @file rte_dmadev.h
+ *
+ * RTE DMA (Direct Memory Access) device APIs.
+ *
+ * The DMA framework is built on the following model:
+ *
+ *     ---------------   ---------------       ---------------
+ *     | virtual DMA |   | virtual DMA |       | virtual DMA |
+ *     | channel     |   | channel     |       | channel     |
+ *     ---------------   ---------------       ---------------
+ *            |                |                      |
+ *            ------------------                      |
+ *                     |                              |
+ *               ------------                    ------------
+ *               |  dmadev  |                    |  dmadev  |
+ *               ------------                    ------------
+ *                     |                              |
+ *            ------------------               ------------------
+ *            | HW-DMA-channel |               | HW-DMA-channel |
+ *            ------------------               ------------------
+ *                     |                              |
+ *                     --------------------------------
+ *                                     |
+ *                           ---------------------
+ *                           | HW-DMA-Controller |
+ *                           ---------------------
+ *
+ * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
+ * each HW-DMA-channel should be represented by a dmadev.
+ *
+ * The dmadev could create multiple virtual DMA channels, each virtual DMA
+ * channel represents a different transfer context. The DMA operation request
+ * must be submitted to the virtual DMA channel. e.g. Application could create
+ * virtual DMA channel 0 for memory-to-memory transfer scenario, and create
+ * virtual DMA channel 1 for memory-to-device transfer scenario.
+ *
+ * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
+ * PCI/SoC device probing phase performed at EAL initialization time. And could
+ * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
+ * phase.
+ *
+ * This framework uses 'uint16_t dev_id' as the device identifier of a dmadev,
+ * and 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
+ *
+ * The functions exported by the dmadev API to setup a device designated by its
+ * device identifier must be invoked in the following order:
+ *     - rte_dmadev_configure()
+ *     - rte_dmadev_vchan_setup()
+ *     - rte_dmadev_start()
+ *
+ * Then, the application can invoke dataplane APIs to process jobs.
+ *
+ * If the application wants to change the configuration (i.e. invoke
+ * rte_dmadev_configure() or rte_dmadev_vchan_setup()), it must invoke
+ * rte_dmadev_stop() first to stop the device and then do the reconfiguration
+ * before invoking rte_dmadev_start() again. The dataplane APIs should not be
+ * invoked when the device is stopped.
+ *
+ * Finally, an application can close a dmadev by invoking the
+ * rte_dmadev_close() function.
+ *
+ * The dataplane APIs include two parts:
+ * The first part is the submission of operation requests:
+ *     - rte_dmadev_copy()
+ *     - rte_dmadev_copy_sg()
+ *     - rte_dmadev_fill()
+ *     - rte_dmadev_submit()
+ *
+ * These APIs could work with different virtual DMA channels which have
+ * different contexts.
+ *
+ * The first three APIs are used to submit the operation request to the virtual
+ * DMA channel, if the submission is successful, an uint16_t ring_idx is
+ * returned, otherwise a negative number is returned.
+ *
+ * The last API was used to issue doorbell to hardware, and also there are flags
+ * (@see RTE_DMA_OP_FLAG_SUBMIT) parameter of the first three APIs could do the
+ * same work.
+ *
+ * The second part is to obtain the result of requests:
+ *     - rte_dmadev_completed()
+ *         - return the number of operation requests completed successfully.
+ *     - rte_dmadev_completed_status()
+ *         - return the number of operation requests completed.
+ *
+ * @note The two completed APIs also support return the last completed
+ * operation's ring_idx.
+ * @note If the dmadev works in silent mode (@see RTE_DMADEV_CAPA_SILENT),
+ * application does not invoke the above two completed APIs.
+ *
+ * About the ring_idx which enqueue APIs (e.g. rte_dmadev_copy()
+ * rte_dmadev_fill()) returned, the rules are as follows:
+ *     - ring_idx for each virtual DMA channel are independent.
+ *     - For a virtual DMA channel, the ring_idx is monotonically incremented,
+ *       when it reach UINT16_MAX, it wraps back to zero.
+ *     - This ring_idx can be used by applications to track per-operation
+ *       metadata in an application-defined circular ring.
+ *     - The initial ring_idx of a virtual DMA channel is zero, after the
+ *       device is stopped, the ring_idx needs to be reset to zero.
+ *
+ * One example:
+ *     - step-1: start one dmadev
+ *     - step-2: enqueue a copy operation, the ring_idx return is 0
+ *     - step-3: enqueue a copy operation again, the ring_idx return is 1
+ *     - ...
+ *     - step-101: stop the dmadev
+ *     - step-102: start the dmadev
+ *     - step-103: enqueue a copy operation, the ring_idx return is 0
+ *     - ...
+ *     - step-x+0: enqueue a fill operation, the ring_idx return is 65535
+ *     - step-x+1: enqueue a copy operation, the ring_idx return is 0
+ *     - ...
+ *
+ * The DMA operation address used in enqueue APIs (i.e. rte_dmadev_copy(),
+ * rte_dmadev_copy_sg(), rte_dmadev_fill()) defined as rte_iova_t type. The
+ * dmadev supports two types of address: memory address and device address.
+ *
+ * - memory address: the source and destination address of the memory-to-memory
+ * transfer type, or the source address of the memory-to-device transfer type,
+ * or the destination address of the device-to-memory transfer type.
+ * @note If the device support SVA (@see RTE_DMADEV_CAPA_SVA), the memory
+ * address can be any VA address, otherwise it must be an IOVA address.
+ *
+ * - device address: the source and destination address of the device-to-device
+ * transfer type, or the source address of the device-to-memory transfer type,
+ * or the destination address of the memory-to-device transfer type.
+ *
+ * By default, all the functions of the dmadev API exported by a PMD are
+ * lock-free functions which assume to not be invoked in parallel on different
+ * logical cores to work on the same target dmadev object.
+ * @note Different virtual DMA channels on the same dmadev *DO NOT* support
+ * parallel invocation because these virtual DMA channels share the same
+ * HW-DMA-channel.
+ *
+ */
+
+#include <stdint.h>
+
+#include <rte_common.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the device identifier for the named DMA device.
+ *
+ * @param name
+ *   DMA device name.
+ *
+ * @return
+ *   Returns DMA device identifier on success.
+ *   - <0: Failure to find named DMA device.
+ */
+__rte_experimental
+int
+rte_dmadev_get_dev_id(const char *name);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param dev_id
+ *   DMA device index.
+ *
+ * @return
+ *   - If the device index is valid (true) or not (false).
+ */
+__rte_experimental
+bool
+rte_dmadev_is_valid_dev(uint16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the total number of DMA devices that have been successfully
+ * initialised.
+ *
+ * @return
+ *   The total number of usable DMA devices.
+ */
+__rte_experimental
+uint16_t
+rte_dmadev_count(void);
+
+/* Enumerates DMA device capabilities. */
+#define RTE_DMADEV_CAPA_MEM_TO_MEM	(1ull << 0)
+/**< DMA device support memory-to-memory transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+
+#define RTE_DMADEV_CAPA_MEM_TO_DEV	(1ull << 1)
+/**< DMA device support memory-to-device transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ * @see struct rte_dmadev_port_param::port_type
+ */
+
+#define RTE_DMADEV_CAPA_DEV_TO_MEM	(1ull << 2)
+/**< DMA device support device-to-memory transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ * @see struct rte_dmadev_port_param::port_type
+ */
+
+#define RTE_DMADEV_CAPA_DEV_TO_DEV	(1ull << 3)
+/**< DMA device support device-to-device transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ * @see struct rte_dmadev_port_param::port_type
+ */
+
+#define RTE_DMADEV_CAPA_SVA		(1ull << 4)
+/**< DMA device support SVA which could use VA as DMA address.
+ * If device support SVA then application could pass any VA address like memory
+ * from rte_malloc(), rte_memzone(), malloc, stack memory.
+ * If device don't support SVA, then application should pass IOVA address which
+ * from rte_malloc(), rte_memzone().
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+
+#define RTE_DMADEV_CAPA_SILENT		(1ull << 5)
+/**< DMA device support work in silent mode.
+ * In this mode, application don't required to invoke rte_dmadev_completed*()
+ * API.
+ *
+ * @see struct rte_dmadev_conf::silent_mode
+ */
+
+#define RTE_DMADEV_CAPA_OPS_COPY	(1ull << 32)
+/**< DMA device support copy ops.
+ * This capability start with index of 32, so that it could leave gap between
+ * normal capability and ops capability.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+
+#define RTE_DMADEV_CAPA_OPS_COPY_SG	(1ull << 33)
+/**< DMA device support scatter-gather list copy ops.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+
+#define RTE_DMADEV_CAPA_OPS_FILL	(1ull << 34)
+/**< DMA device support fill ops.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+
+/**
+ * A structure used to retrieve the information of a DMA device.
+ */
+struct rte_dmadev_info {
+	struct rte_device *device; /**< Generic Device information. */
+	uint64_t dev_capa; /**< Device capabilities (RTE_DMADEV_CAPA_*). */
+	uint16_t max_vchans;
+	/**< Maximum number of virtual DMA channels supported. */
+	uint16_t max_desc;
+	/**< Maximum allowed number of virtual DMA channel descriptors. */
+	uint16_t min_desc;
+	/**< Minimum allowed number of virtual DMA channel descriptors. */
+	uint16_t max_sges;
+	/**< Maximum number of source or destination scatter-gather entry
+	 * supported.
+	 * If the device does not support COPY_SG capability, this value can be
+	 * zero.
+	 * If the device supports COPY_SG capability, then rte_dmadev_copy_sg()
+	 * parameter nb_src/nb_dst should not exceed this value.
+	 */
+	uint16_t nb_vchans; /**< Number of virtual DMA channel configured. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve information of a DMA device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param[out] dev_info
+ *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
+ *   information of the device.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
+
+/**
+ * A structure used to configure a DMA device.
+ */
+struct rte_dmadev_conf {
+	uint16_t nb_vchans;
+	/**< The number of virtual DMA channels to set up for the DMA device.
+	 * This value cannot be greater than the field 'max_vchans' of struct
+	 * rte_dmadev_info which get from rte_dmadev_info_get().
+	 */
+	bool enable_silent;
+	/**< Indicates whether to enable silent mode.
+	 * false-default mode, true-silent mode.
+	 * This value can be set to true only when the SILENT capability is
+	 * supported.
+	 *
+	 * @see RTE_DMADEV_CAPA_SILENT
+	 */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Configure a DMA device.
+ *
+ * This function must be invoked first before any other function in the
+ * API. This function can also be re-invoked when a device is in the
+ * stopped state.
+ *
+ * @param dev_id
+ *   The identifier of the device to configure.
+ * @param dev_conf
+ *   The DMA device configuration structure encapsulated into rte_dmadev_conf
+ *   object.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Start a DMA device.
+ *
+ * The device start step is the last one and consists of setting the DMA
+ * to start accepting jobs.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_start(uint16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Stop a DMA device.
+ *
+ * The device can be restarted with a call to rte_dmadev_start().
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_stop(uint16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Close a DMA device.
+ *
+ * The device cannot be restarted after this call.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_close(uint16_t dev_id);
+
+/**
+ * DMA transfer direction defines.
+ */
+enum rte_dma_direction {
+	RTE_DMA_DIR_MEM_TO_MEM,
+	/**< DMA transfer direction - from memory to memory.
+	 *
+	 * @see struct rte_dmadev_vchan_conf::direction
+	 */
+	RTE_DMA_DIR_MEM_TO_DEV,
+	/**< DMA transfer direction - from memory to device.
+	 * In a typical scenario, the SoCs are installed on host servers as
+	 * iNICs through the PCIe interface. In this case, the SoCs works in
+	 * EP(endpoint) mode, it could initiate a DMA move request from memory
+	 * (which is SoCs memory) to device (which is host memory).
+	 *
+	 * @see struct rte_dmadev_vchan_conf::direction
+	 */
+	RTE_DMA_DIR_DEV_TO_MEM,
+	/**< DMA transfer direction - from device to memory.
+	 * In a typical scenario, the SoCs are installed on host servers as
+	 * iNICs through the PCIe interface. In this case, the SoCs works in
+	 * EP(endpoint) mode, it could initiate a DMA move request from device
+	 * (which is host memory) to memory (which is SoCs memory).
+	 *
+	 * @see struct rte_dmadev_vchan_conf::direction
+	 */
+	RTE_DMA_DIR_DEV_TO_DEV,
+	/**< DMA transfer direction - from device to device.
+	 * In a typical scenario, the SoCs are installed on host servers as
+	 * iNICs through the PCIe interface. In this case, the SoCs works in
+	 * EP(endpoint) mode, it could initiate a DMA move request from device
+	 * (which is host memory) to the device (which is another host memory).
+	 *
+	 * @see struct rte_dmadev_vchan_conf::direction
+	 */
+};
+
+/**
+ * enum rte_dmadev_port_type - DMA access port type defines.
+ *
+ * @see struct rte_dmadev_port_param::port_type
+ */
+enum rte_dmadev_port_type {
+	RTE_DMADEV_PORT_NONE,
+	RTE_DMADEV_PORT_PCIE, /**< The DMA access port is PCIe. */
+};
+
+/**
+ * A structure used to descript DMA access port parameters.
+ *
+ * @see struct rte_dmadev_vchan_conf::src_port
+ * @see struct rte_dmadev_vchan_conf::dst_port
+ */
+struct rte_dmadev_port_param {
+	enum rte_dmadev_port_type port_type;
+	/**< The device access port type.
+	 * @see enum rte_dmadev_port_type
+	 */
+	union {
+		/** PCIe access port parameters.
+		 *
+		 * The following model shows SoC's PCIe module connects to
+		 * multiple PCIe hosts and multiple endpoints. The PCIe module
+		 * has an integrated DMA controller.
+		 *
+		 * If the DMA wants to access the memory of host A, it can be
+		 * initiated by PF1 in core0, or by VF0 of PF0 in core0.
+		 *
+		 * \code{.unparsed}
+		 * System Bus
+		 *    |     ----------PCIe module----------
+		 *    |     Bus
+		 *    |     Interface
+		 *    |     -----        ------------------
+		 *    |     |   |        | PCIe Core0     |
+		 *    |     |   |        |                |        -----------
+		 *    |     |   |        |   PF-0 -- VF-0 |        | Host A  |
+		 *    |     |   |--------|        |- VF-1 |--------| Root    |
+		 *    |     |   |        |   PF-1         |        | Complex |
+		 *    |     |   |        |   PF-2         |        -----------
+		 *    |     |   |        ------------------
+		 *    |     |   |
+		 *    |     |   |        ------------------
+		 *    |     |   |        | PCIe Core1     |
+		 *    |     |   |        |                |        -----------
+		 *    |     |   |        |   PF-0 -- VF-0 |        | Host B  |
+		 *    |-----|   |--------|   PF-1 -- VF-0 |--------| Root    |
+		 *    |     |   |        |        |- VF-1 |        | Complex |
+		 *    |     |   |        |   PF-2         |        -----------
+		 *    |     |   |        ------------------
+		 *    |     |   |
+		 *    |     |   |        ------------------
+		 *    |     |DMA|        |                |        ------
+		 *    |     |   |        |                |--------| EP |
+		 *    |     |   |--------| PCIe Core2     |        ------
+		 *    |     |   |        |                |        ------
+		 *    |     |   |        |                |--------| EP |
+		 *    |     |   |        |                |        ------
+		 *    |     -----        ------------------
+		 *
+		 * \endcode
+		 *
+		 * @note If some fields can not be supported by the
+		 * hardware/driver, then the driver ignores those fields.
+		 * Please check driver-specific documentation for limitations
+		 * and capablites.
+		 */
+		struct {
+			uint64_t coreid : 4; /**< PCIe core id used. */
+			uint64_t pfid : 8; /**< PF id used. */
+			uint64_t vfen : 1; /**< VF enable bit. */
+			uint64_t vfid : 16; /**< VF id used. */
+			uint64_t pasid : 20;
+			/**< The pasid filed in TLP packet. */
+			uint64_t attr : 3;
+			/**< The attributes filed in TLP packet. */
+			uint64_t ph : 2;
+			/**< The processing hint filed in TLP packet. */
+			uint64_t st : 16;
+			/**< The steering tag filed in TLP packet. */
+		} pcie;
+	};
+	uint64_t reserved[2]; /**< Reserved for future fields. */
+};
+
+/**
+ * A structure used to configure a virtual DMA channel.
+ */
+struct rte_dmadev_vchan_conf {
+	enum rte_dma_direction direction;
+	/**< Transfer direction
+	 * @see enum rte_dma_direction
+	 */
+	uint16_t nb_desc;
+	/**< Number of descriptor for the virtual DMA channel */
+	struct rte_dmadev_port_param src_port;
+	/**< 1) Used to describes the device access port parameter in the
+	 * device-to-memory transfer scenario.
+	 * 2) Used to describes the source device access port parameter in the
+	 * device-to-device transfer scenario.
+	 * @see struct rte_dmadev_port_param
+	 */
+	struct rte_dmadev_port_param dst_port;
+	/**< 1) Used to describes the device access port parameter in the
+	 * memory-to-device transfer scenario.
+	 * 2) Used to describes the destination device access port parameter in
+	 * the device-to-device transfer scenario.
+	 * @see struct rte_dmadev_port_param
+	 */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate and set up a virtual DMA channel.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel. The value must be in the range
+ *   [0, nb_vchans - 1] previously supplied to rte_dmadev_configure().
+ * @param conf
+ *   The virtual DMA channel configuration structure encapsulated into
+ *   rte_dmadev_vchan_conf object.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_vchan_setup(uint16_t dev_id, uint16_t vchan,
+		       const struct rte_dmadev_vchan_conf *conf);
+
+/**
+ * A structure used to retrieve statistics.
+ */
+struct rte_dmadev_stats {
+	uint64_t submitted;
+	/**< Count of operations which were submitted to hardware. */
+	uint64_t completed;
+	/**< Count of operations which were completed, including successful and
+	 * failed completions.
+	 */
+	uint64_t errors;
+	/**< Count of operations which failed to complete. */
+};
+
+#define RTE_DMADEV_ALL_VCHAN	0xFFFFu
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve basic statistics of a or all virtual DMA channel(s).
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
+ * @param[out] stats
+ *   The basic statistics structure encapsulated into rte_dmadev_stats
+ *   object.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_stats_get(uint16_t dev_id, uint16_t vchan,
+		     struct rte_dmadev_stats *stats);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reset basic statistics of a or all virtual DMA channel(s).
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_stats_reset(uint16_t dev_id, uint16_t vchan);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Dump DMA device info.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param f
+ *   The file to write the output to.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_dump(uint16_t dev_id, FILE *f);
+
+/**
+ * rte_dma_status_code - DMA transfer result status code defines.
+ */
+enum rte_dma_status_code {
+	RTE_DMA_STATUS_SUCCESSFUL,
+	/**< The operation completed successfully. */
+	RTE_DMA_STATUS_USER_ABORT,
+	/**< The operation failed to complete due abort by user.
+	 * This is mainly used when processing dev_stop, user could modidy the
+	 * descriptors (e.g. change one bit to tell hardware abort this job),
+	 * it allows outstanding requests to be complete as much as possible,
+	 * so reduce the time to stop the device.
+	 */
+	RTE_DMA_STATUS_NOT_ATTEMPTED,
+	/**< The operation failed to complete due to following scenarios:
+	 * The jobs in a particular batch are not attempted because they
+	 * appeared after a fence where a previous job failed. In some HW
+	 * implementation it's possible for jobs from later batches would be
+	 * completed, though, so report the status from the not attempted jobs
+	 * before reporting those newer completed jobs.
+	 */
+	RTE_DMA_STATUS_INVALID_SRC_ADDR,
+	/**< The operation failed to complete due invalid source address. */
+	RTE_DMA_STATUS_INVALID_DST_ADDR,
+	/**< The operation failed to complete due invalid destination
+	 * address.
+	 */
+	RTE_DMA_STATUS_INVALID_ADDR,
+	/**< The operation failed to complete due invalid source or destination
+	 * address, cover the case that only knows the address error, but not
+	 * sure which address error.
+	 */
+	RTE_DMA_STATUS_INVALID_LENGTH,
+	/**< The operation failed to complete due invalid length. */
+	RTE_DMA_STATUS_INVALID_OPCODE,
+	/**< The operation failed to complete due invalid opcode.
+	 * The DMA descriptor could have multiple format, which are
+	 * distinguished by the opcode field.
+	 */
+	RTE_DMA_STATUS_BUS_READ_ERROR,
+	/**< The operation failed to complete due bus read error. */
+	RTE_DMA_STATUS_BUS_WRITE_ERROR,
+	/**< The operation failed to complete due bus write error. */
+	RTE_DMA_STATUS_BUS_ERROR,
+	/**< The operation failed to complete due bus error, cover the case that
+	 * only knows the bus error, but not sure which direction error.
+	 */
+	RTE_DMA_STATUS_DATA_POISION,
+	/**< The operation failed to complete due data poison. */
+	RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
+	/**< The operation failed to complete due descriptor read error. */
+	RTE_DMA_STATUS_DEV_LINK_ERROR,
+	/**< The operation failed to complete due device link error.
+	 * Used to indicates that the link error in the memory-to-device/
+	 * device-to-memory/device-to-device transfer scenario.
+	 */
+	RTE_DMA_STATUS_PAGE_FAULT,
+	/**< The operation failed to complete due lookup page fault. */
+	RTE_DMA_STATUS_ERROR_UNKNOWN = 0x100,
+	/**< The operation failed to complete due unknown reason.
+	 * The initial value is 256, which reserves space for future errors.
+	 */
+};
+
+/**
+ * A structure used to hold scatter-gather DMA operation request entry.
+ */
+struct rte_dmadev_sge {
+	rte_iova_t addr; /**< The DMA operation address. */
+	uint32_t length; /**< The DMA operation length. */
+};
+
+/* DMA flags to augment operation preparation. */
+#define RTE_DMA_OP_FLAG_FENCE	(1ull << 0)
+/**< DMA fence flag.
+ * It means the operation with this flag must be processed only after all
+ * previous operations are completed.
+ * If the specify DMA HW works in-order (it means it has default fence between
+ * operations), this flag could be NOP.
+ *
+ * @see rte_dmadev_copy()
+ * @see rte_dmadev_copy_sg()
+ * @see rte_dmadev_fill()
+ */
+
+#define RTE_DMA_OP_FLAG_SUBMIT	(1ull << 1)
+/**< DMA submit flag.
+ * It means the operation with this flag must issue doorbell to hardware after
+ * enqueued jobs.
+ */
+
+#define RTE_DMA_OP_FLAG_LLC	(1ull << 2)
+/**< DMA write data to low level cache hint.
+ * Used for performance optimization, this is just a hint, and there is no
+ * capability bit for this, driver should not return error if this flag was set.
+ */
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a copy operation onto the virtual DMA channel.
+ *
+ * This queues up a copy operation to be performed by hardware, if the 'flags'
+ * parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell to begin
+ * this operation, otherwise do not trigger doorbell.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param src
+ *   The address of the source buffer.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the data to be copied.
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+int
+rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
+		uint32_t length, uint64_t flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a scatter-gather list copy operation onto the virtual DMA channel.
+ *
+ * This queues up a scatter-gather list copy operation to be performed by
+ * hardware, if the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then
+ * trigger doorbell to begin this operation, otherwise do not trigger doorbell.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param src
+ *   The pointer of source scatter-gather entry array.
+ * @param dst
+ *   The pointer of destination scatter-gather entry array.
+ * @param nb_src
+ *   The number of source scatter-gather entry.
+ *   @see struct rte_dmadev_info::max_sges
+ * @param nb_dst
+ *   The number of destination scatter-gather entry.
+ *   @see struct rte_dmadev_info::max_sges
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+int
+rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, struct rte_dmadev_sge *src,
+		   struct rte_dmadev_sge *dst, uint16_t nb_src, uint16_t nb_dst,
+		   uint64_t flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a fill operation onto the virtual DMA channel.
+ *
+ * This queues up a fill operation to be performed by hardware, if the 'flags'
+ * parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell to begin
+ * this operation, otherwise do not trigger doorbell.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param pattern
+ *   The pattern to populate the destination buffer with.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the destination buffer.
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+int
+rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
+		rte_iova_t dst, uint32_t length, uint64_t flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Trigger hardware to begin performing enqueued operations.
+ *
+ * This API is used to write the "doorbell" to the hardware to trigger it
+ * to begin the operations previously enqueued by rte_dmadev_copy/fill().
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int
+rte_dmadev_submit(uint16_t dev_id, uint16_t vchan);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Returns the number of operations that have been successfully completed.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param nb_cpls
+ *   The maximum number of completed operations that can be processed.
+ * @param[out] last_idx
+ *   The last completed operation's ring_idx.
+ *   If not required, NULL can be passed in.
+ * @param[out] has_error
+ *   Indicates if there are transfer error.
+ *   If not required, NULL can be passed in.
+ *
+ * @return
+ *   The number of operations that successfully completed. This return value
+ *   must be less than or equal to the value of nb_cpls.
+ */
+__rte_experimental
+uint16_t
+rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
+		     uint16_t *last_idx, bool *has_error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Returns the number of operations that have been completed, and the
+ * operations result may succeed or fail.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param nb_cpls
+ *   Indicates the size of status array.
+ * @param[out] last_idx
+ *   The last completed operation's ring_idx.
+ *   If not required, NULL can be passed in.
+ * @param[out] status
+ *   This is a pointer to an array of length 'nb_cpls' that holds the completion
+ *   status code of each operation.
+ *   @see enum rte_dma_status_code
+ *
+ * @return
+ *   The number of operations that completed. This return value must be less
+ *   than or equal to the value of nb_cpls.
+ *   If this number is greater than zero (assuming n), then n values in the
+ *   status array are also set.
+ */
+__rte_experimental
+uint16_t
+rte_dmadev_completed_status(uint16_t dev_id, uint16_t vchan,
+			    const uint16_t nb_cpls, uint16_t *last_idx,
+			    enum rte_dma_status_code *status);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DMADEV_H_ */
diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
new file mode 100644
index 0000000000..2e37882364
--- /dev/null
+++ b/lib/dmadev/version.map
@@ -0,0 +1,24 @@ 
+EXPERIMENTAL {
+	global:
+
+	rte_dmadev_close;
+	rte_dmadev_completed;
+	rte_dmadev_completed_status;
+	rte_dmadev_configure;
+	rte_dmadev_copy;
+	rte_dmadev_copy_sg;
+	rte_dmadev_count;
+	rte_dmadev_dump;
+	rte_dmadev_fill;
+	rte_dmadev_get_dev_id;
+	rte_dmadev_info_get;
+	rte_dmadev_is_valid_dev;
+	rte_dmadev_start;
+	rte_dmadev_stats_get;
+	rte_dmadev_stats_reset;
+	rte_dmadev_stop;
+	rte_dmadev_submit;
+	rte_dmadev_vchan_setup;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323..a542c238d2 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -44,6 +44,7 @@  libraries = [
         'power',
         'pdump',
         'rawdev',
+        'dmadev',
         'regexdev',
         'rib',
         'reorder',