[dpdk-dev,RFC,v2] libeventdev: event driven programming model framework for DPDK

Message ID 1476214216-31982-1-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive)
State RFC, archived
Headers

Commit Message

Jerin Jacob Oct. 11, 2016, 7:30 p.m. UTC
  Thanks to Intel and NXP folks for the positive and constructive feedback
I've received so far. Here is the updated RFC(v2).

I've attempted to address as many comments as possible.

This series adds rte_eventdev.h to the DPDK tree with
adequate documentation in doxygen format.

Updates are also available online:

Related draft header file (this patch):
https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h

PDF version(doxgen output):
https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf

Repo:
https://github.com/jerinjacobk/libeventdev

v1..v2

- Added Cavium, Intel, NXP copyrights in header file

- Changed the concept of flow queues to flow ids.
This is avoid dictating a specific structure to hold the flows.
A s/w implementation can do atomic load balancing on multiple
flow ids more efficiently than maintaining each event in a specific flow queue.

- Change the scheduling group to event queue.
A scheduling group is more a stream of events, so an event queue is a better
 abstraction.

- Introduced event port concept, Instead of trying eventdev access to the lcore,
a higher level of abstraction called event port is needed which is the
application i/f to the eventdev to dequeue and enqueue the events.
One or more event queues can be linked to single event port.
There can be more than one event port per lcore allowing multiple lightweight
threads to have their own i/f into eventdev, if the implementation supports it.
An event port will be bound to a lcore or a lightweight thread to keep
portable application workflow.
An event port abstraction also encapsulates dequeue depth and enqueue depth for
a scheduler implementations which can schedule multiple events at a time and
output events that can be buffered.

- Added configuration options with event queue(nb_atomic_flows,
nb_atomic_order_sequences, single consumer etc)
and event port(dequeue_queue_depth, enqueue_queue_depth etc) to define the
limits on the resource usage.(Useful for optimized software implementation)

- Introduced RTE_EVENT_DEV_CAP_QUEUE_QOS and RTE_EVENT_DEV_CAP_EVENT_QOS
schemes of priority handling

- Added event port to event queue servicing priority.
This allows two event ports to connect to the same event queue with
different priorities.

- Changed the workflow as schedule/dequeue/enqueue.
An implementation is free to define schedule as NOOP.
A distributed s/w scheduler can use this to schedule events;
also a centralized s/w scheduler can make this a NOOP on non-scheduler cores.

- Removed Cavium HW specific schedule_from_group API

- Removed Cavium HW specific ctxt_update/ctxt_wait APIs.
 Introduced a more generic "event pinning" concept. i.e
If the normal workflow is a dequeue -> do work based on event type -> enqueue,
a pin_event argument to enqueue
where the pinned event is returned through the normal dequeue)
allows application workflow to remain the same whether or not an
implementation supports it.

- Added dequeue() burst variant

- Added the definition of a closed/open system - where open system is memory
backed and closed system eventdev has limited capacity.
In such systems, it is also useful to denote per event port how many packets
can be active in the system.
This can serve as a threshold for ethdev like devices so they don't overwhelm
core to core events.

- Added the option to specify maximum amount of time(in ns) application needs
wait on dequeue()

- Removed the scheme of expressing the number of flows in log2 format

Open item or the item needs improvement.
----------------------------------------
- Abstract the differences in event QoS management with different priority schemes
available in different HW or SW implementations with portable application workflow.

Based on the feedback, there three different kinds of QoS support available in
three different HW or SW implementations.
1) Priority associated with the event queue
2) Priority associated with each event enqueue
(Same flow can have two different priority on two separate enqueue)
3) Priority associated with the flow(each flow has unique priority)

In v2, The differences abstracted based on device capability
(RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
This scheme would call for different application workflow for
nontrivial QoS-enabled applications.

Looking forward to getting comments from both application and driver
implementation perspective.

/Jerin

---
 doc/api/doxy-api-index.md          |    1 +
 doc/api/doxy-api.conf              |    1 +
 lib/librte_eventdev/rte_eventdev.h | 1204 ++++++++++++++++++++++++++++++++++++
 3 files changed, 1206 insertions(+)
 create mode 100644 lib/librte_eventdev/rte_eventdev.h
  

Comments

Bill Fischofer Oct. 14, 2016, 4:14 a.m. UTC | #1
Hi Jerin,

This looks reasonable and seems a welcome addition to DPDK. A few questions
noted inline:

On Tue, Oct 11, 2016 at 2:30 PM, Jerin Jacob <jerin.jacob@caviumnetworks.com
> wrote:

> Thanks to Intel and NXP folks for the positive and constructive feedback
> I've received so far. Here is the updated RFC(v2).
>
> I've attempted to address as many comments as possible.
>
> This series adds rte_eventdev.h to the DPDK tree with
> adequate documentation in doxygen format.
>
> Updates are also available online:
>
> Related draft header file (this patch):
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
>
> PDF version(doxgen output):
> https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
>
> Repo:
> https://github.com/jerinjacobk/libeventdev
>
> v1..v2
>
> - Added Cavium, Intel, NXP copyrights in header file
>
> - Changed the concept of flow queues to flow ids.
> This is avoid dictating a specific structure to hold the flows.
> A s/w implementation can do atomic load balancing on multiple
> flow ids more efficiently than maintaining each event in a specific flow
> queue.
>
> - Change the scheduling group to event queue.
> A scheduling group is more a stream of events, so an event queue is a
> better
>  abstraction.
>
> - Introduced event port concept, Instead of trying eventdev access to the
> lcore,
> a higher level of abstraction called event port is needed which is the
> application i/f to the eventdev to dequeue and enqueue the events.
> One or more event queues can be linked to single event port.
> There can be more than one event port per lcore allowing multiple
> lightweight
> threads to have their own i/f into eventdev, if the implementation
> supports it.
> An event port will be bound to a lcore or a lightweight thread to keep
> portable application workflow.
> An event port abstraction also encapsulates dequeue depth and enqueue
> depth for
> a scheduler implementations which can schedule multiple events at a time
> and
> output events that can be buffered.
>
> - Added configuration options with event queue(nb_atomic_flows,
> nb_atomic_order_sequences, single consumer etc)
> and event port(dequeue_queue_depth, enqueue_queue_depth etc) to define the
> limits on the resource usage.(Useful for optimized software implementation)
>
> - Introduced RTE_EVENT_DEV_CAP_QUEUE_QOS and RTE_EVENT_DEV_CAP_EVENT_QOS
> schemes of priority handling
>
> - Added event port to event queue servicing priority.
> This allows two event ports to connect to the same event queue with
> different priorities.
>
> - Changed the workflow as schedule/dequeue/enqueue.
> An implementation is free to define schedule as NOOP.
> A distributed s/w scheduler can use this to schedule events;
> also a centralized s/w scheduler can make this a NOOP on non-scheduler
> cores.
>
> - Removed Cavium HW specific schedule_from_group API
>
> - Removed Cavium HW specific ctxt_update/ctxt_wait APIs.
>  Introduced a more generic "event pinning" concept. i.e
> If the normal workflow is a dequeue -> do work based on event type ->
> enqueue,
> a pin_event argument to enqueue
> where the pinned event is returned through the normal dequeue)
> allows application workflow to remain the same whether or not an
> implementation supports it.
>
> - Added dequeue() burst variant
>
> - Added the definition of a closed/open system - where open system is
> memory
> backed and closed system eventdev has limited capacity.
> In such systems, it is also useful to denote per event port how many
> packets
> can be active in the system.
> This can serve as a threshold for ethdev like devices so they don't
> overwhelm
> core to core events.
>
> - Added the option to specify maximum amount of time(in ns) application
> needs
> wait on dequeue()
>
> - Removed the scheme of expressing the number of flows in log2 format
>
> Open item or the item needs improvement.
> ----------------------------------------
> - Abstract the differences in event QoS management with different priority
> schemes
> available in different HW or SW implementations with portable application
> workflow.
>
> Based on the feedback, there three different kinds of QoS support
> available in
> three different HW or SW implementations.
> 1) Priority associated with the event queue
> 2) Priority associated with each event enqueue
> (Same flow can have two different priority on two separate enqueue)
> 3) Priority associated with the flow(each flow has unique priority)
>
> In v2, The differences abstracted based on device capability
> (RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
> RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
> This scheme would call for different application workflow for
> nontrivial QoS-enabled applications.
>
> Looking forward to getting comments from both application and driver
> implementation perspective.
>
> /Jerin
>
> ---
>  doc/api/doxy-api-index.md          |    1 +
>  doc/api/doxy-api.conf              |    1 +
>  lib/librte_eventdev/rte_eventdev.h | 1204 ++++++++++++++++++++++++++++++
> ++++++
>  3 files changed, 1206 insertions(+)
>  create mode 100644 lib/librte_eventdev/rte_eventdev.h
>
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index 6675f96..28c1329 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -40,6 +40,7 @@ There are many libraries, so their headers may be
> grouped by topics:
>    [ethdev]             (@ref rte_ethdev.h),
>    [ethctrl]            (@ref rte_eth_ctrl.h),
>    [cryptodev]          (@ref rte_cryptodev.h),
> +  [eventdev]           (@ref rte_eventdev.h),
>    [devargs]            (@ref rte_devargs.h),
>    [bond]               (@ref rte_eth_bond.h),
>    [vhost]              (@ref rte_virtio_net.h),
> diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
> index 9dc7ae5..9841477 100644
> --- a/doc/api/doxy-api.conf
> +++ b/doc/api/doxy-api.conf
> @@ -41,6 +41,7 @@ INPUT                   = doc/api/doxy-api-index.md \
>                            lib/librte_cryptodev \
>                            lib/librte_distributor \
>                            lib/librte_ether \
> +                          lib/librte_eventdev \
>                            lib/librte_hash \
>                            lib/librte_ip_frag \
>                            lib/librte_jobstats \
> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_
> eventdev.h
> new file mode 100644
> index 0000000..f60e461
> --- /dev/null
> +++ b/lib/librte_eventdev/rte_eventdev.h
> @@ -0,0 +1,1204 @@
> +/*
> + *   BSD LICENSE
> + *
> + *   Copyright 2016 Cavium.
> + *   Copyright 2016 Intel Corporation.
> + *   Copyright 2016 NXP.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Cavium nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_EVENTDEV_H_
> +#define _RTE_EVENTDEV_H_
> +
> +/**
> + * @file
> + *
> + * RTE Event Device API
> + *
> + * The Event Device API is composed of two parts:
> + *
> + * - The application-oriented Event API that includes functions to setup
> + *   an event device (configure it, setup its queues, ports and start
> it), to
> + *   establish the link between queues to port and to receive events, and
> so on.
> + *
> + * - The driver-oriented Event API that exports a function allowing
> + *   an event poll Mode Driver (PMD) to simultaneously register itself as
> + *   an event device driver.
> + *
> + * Event device components:
> + *
> + *                     +-----------------+
> + *                     | +-------------+ |
> + *        +-------+    | |    flow 0   | |
> + *        |Packet |    | +-------------+ |
> + *        |event  |    | +-------------+ |
> + *        |       |    | |    flow 1   | |event_port_link(port0, queue0)
> + *        +-------+    | +-------------+ |     |     +--------+
> + *        +-------+    | +-------------+ o-----v-----o        |dequeue
> +------+
> + *        |Crypto |    | |    flow n   | |           | event
> +------->|Core 0|
> + *        |work   |    | +-------------+ o----+      | port 0 |        |
>     |
> + *        |done ev|    |  event queue 0  |    |      +--------+
> +------+
> + *        +-------+    +-----------------+    |
> + *        +-------+                           |
> + *        |Timer  |    +-----------------+    |      +--------+
> + *        |expiry |    | +-------------+ |    +------o        |dequeue
> +------+
> + *        |event  |    | |    flow 0   | o-----------o event
> +------->|Core 1|
> + *        +-------+    | +-------------+ |      +----o port 1 |        |
>     |
> + *       Event enqueue | +-------------+ |      |    +--------+
> +------+
> + *     o-------------> | |    flow 1   | |      |
> + *        enqueue(     | +-------------+ |      |
> + *        queue_id,    |                 |      |    +--------+
> +------+
> + *        flow_id,     | +-------------+ |      |    |        |dequeue
> |Core 2|
> + *        sched_type,  | |    flow n   | o-----------o event  +------->|
>     |
> + *        event_type,  | +-------------+ |      |    | port 2 |
> +------+
> + *        subev_type,  |  event queue 1  |      |    +--------+
> + *        event)       +-----------------+      |    +--------+
> + *                                              |    |        |dequeue
> +------+
> + *        +-------+    +-----------------+      |    | event
> +------->|Core n|
> + *        |Core   |    | +-------------+ o-----------o port n |        |
>     |
> + *        |(SW)   |    | |    flow 0   | |      |    +--------+
> +--+---+
> + *        |event  |    | +-------------+ |      |
>  |
> + *        +-------+    | +-------------+ |      |
>  |
> + *            ^        | |    flow 1   | |      |
>  |
> + *            |        | +-------------+ o------+
>  |
> + *            |        | +-------------+ |
> |
> + *            |        | |    flow n   | |
> |
> + *            |        | +-------------+ |
> |
> + *            |        |  event queue n  |
> |
> + *            |        +-----------------+
> |
> + *            |
>  |
> + *            +-----------------------------
> ------------------------------+
> + *
> + *
> + *
> + * Event device: A hardware or software-based event scheduler.
> + *
> + * Event: A unit of scheduling that encapsulates a packet or other
> datatype
> + * like SW generated event from the core, Crypto work completion
> notification,
> + * Timer expiry event notification etc as well as metadata.
> + * The metadata includes flow ID, scheduling type, event priority,
> event_type,
> + * sub_event_type etc.
> + *
> + * Event queue: A queue containing events that are scheduled by the event
> dev.
> + * An event queue contains events of different flows associated with
> scheduling
> + * types, such as atomic, ordered, or parallel.
> + *
> + * Event port: An application's interface into the event dev for enqueue
> and
> + * dequeue operations. Each event port can be linked with one or more
> + * event queues for dequeue operations.
> + *
> + * By default, all the functions of the Event Device API exported by a PMD
> + * are lock-free functions which assume to not be invoked in parallel on
> + * different logical cores to work on the same target object. For
> instance,
> + * the dequeue function of a PMD cannot be invoked in parallel on two
> logical
> + * cores to operates on same  event port. Of course, this function
> + * can be invoked in parallel by different logical cores on different
> ports.
> + * It is the responsibility of the upper level application to enforce
> this rule.
> + *
> + * In all functions of the Event API, the Event device is
> + * designated by an integer >= 0 named the device identifier *dev_id*
> + *
> + * At the Event driver level, Event devices are represented by a generic
> + * data structure of type *rte_event_dev*.
> + *
> + * Event devices are dynamically registered during the PCI/SoC device
> probing
> + * phase performed at EAL initialization time.
> + * When an Event device is being probed, a *rte_event_dev* structure and
> + * a new device identifier are allocated for that device. Then, the
> + * event_dev_init() function supplied by the Event driver matching the
> probed
> + * device is invoked to properly initialize the device.
> + *
> + * The role of the device init function consists of resetting the
> hardware or
> + * software event driver implementations.
> + *
> + * If the device init operation is successful, the correspondence between
> + * the device identifier assigned to the new device and its associated
> + * *rte_event_dev* structure is effectively registered.
> + * Otherwise, both the *rte_event_dev* structure and the device
> identifier are
> + * freed.
> + *
> + * The functions exported by the application Event API to setup a device
> + * designated by its device identifier must be invoked in the following
> order:
> + *     - rte_event_dev_configure()
> + *     - rte_event_queue_setup()
> + *     - rte_event_port_setup()
> + *     - rte_event_port_link()
> + *     - rte_event_dev_start()
> + *
> + * Then, the application can invoke, in any order, the functions
> + * exported by the Event API to schedule events, dequeue events, enqueue
> events,
> + * change event queue(s) to event port [un]link establishment and so on.
> + *
> + * Application may use rte_event_[queue/port]_default_conf_get() to get
> the
> + * default configuration to set up an event queue or event port by
> + * overriding few default values.
> + *
> + * If the application wants to change the configuration (i.e. call
> + * rte_event_dev_configure(), rte_event_queue_setup(), or
> + * rte_event_port_setup()), it must call rte_event_dev_stop() first to
> stop the
> + * device and then do the reconfiguration before calling
> rte_event_dev_start()
> + * again. The schedule, enqueue and dequeue functions should not be
> invoked
> + * when the device is stopped.
>

Given this requirement, the question is what happens to events that are "in
flight" at the time rte_event_dev_stop() is called? Is stop an asynchronous
operation that quiesces the event _dev and allows in-flight events to drain
from queues/ports prior to fully stopping, or is some sort of separate
explicit quiesce mechanism required? If stop is synchronous and simply
halts the event_dev, then how is an application to know if subsequent
configure/setup calls would leave these pending events with no place to
stand?

+ *
> + * Finally, an application can close an Event device by invoking the
> + * rte_event_dev_close() function.
> + *
> + * Each function of the application Event API invokes a specific function
> + * of the PMD that controls the target device designated by its device
> + * identifier.
> + *
> + * For this purpose, all device-specific functions of an Event driver are
> + * supplied through a set of pointers contained in a generic structure of
> type
> + * *event_dev_ops*.
> + * The address of the *event_dev_ops* structure is stored in the
> *rte_event_dev*
> + * structure by the device init function of the Event driver, which is
> + * invoked during the PCI/SoC device probing phase, as explained earlier.
> + *
> + * In other words, each function of the Event API simply retrieves the
> + * *rte_event_dev* structure associated with the device identifier and
> + * performs an indirect invocation of the corresponding driver function
> + * supplied in the *event_dev_ops* structure of the *rte_event_dev*
> structure.
> + *
> + * For performance reasons, the address of the fast-path functions of the
> + * Event driver is not contained in the *event_dev_ops* structure.
> + * Instead, they are directly stored at the beginning of the
> *rte_event_dev*
> + * structure to avoid an extra indirect memory access during their
> invocation.
> + *
> + * RTE event device drivers do not use interrupts for enqueue or dequeue
> + * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue
> + * functions to applications.
> + *
> + * An event driven based application has following typical workflow on
> fastpath:
> + * \code{.c}
> + *     while (1) {
> + *
> + *             rte_event_schedule(dev_id);
> + *
> + *             rte_event_dequeue(...);
> + *
> + *             (event processing)
> + *
> + *             rte_event_enqueue(...);
> + *     }
> + * \endcode
> + *
> + * The *schedule* operation is intended to do event scheduling, and the
> + * *dequeue* operation returns the scheduled events. An implementation
> + * is free to define the semantics between *schedule* and *dequeue*. For
> + * example, a system based on a hardware scheduler can define its
> + * rte_event_schedule() to be an NOOP, whereas a software scheduler can
> use
> + * the *schedule* operation to schedule events.
> + *
> + * The events are injected to event device through *enqueue* operation by
> + * event producers in the system. The typical event producers are ethdev
> + * subsystem for generating packet events, core(SW) for generating events
> based
> + * on different stages of application processing, cryptodev for generating
> + * crypto work completion notification etc
> + *
> + * The *dequeue* operation gets one or more events from the event ports.
> + * The application process the events and send to downstream event queue
> through
> + * rte_event_enqueue() if it is an intermediate stage of event
> processing, on
> + * the final stage, the application may send to different subsystem like
> ethdev
> + * to send the packet/event on the wire using ethdev rte_eth_tx_burst()
> API.
> + *
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_pci.h>
> +#include <rte_dev.h>
> +#include <rte_devargs.h>
> +#include <rte_errno.h>
> +
> +/**
> + * Get the total number of event devices that have been successfully
> + * initialised.
> + *
> + * @return
> + *   The total number of usable event devices.
> + */
> +extern uint8_t
> +rte_event_dev_count(void);
> +
> +/**
> + * Get the device identifier for the named event device.
> + *
> + * @param name
> + *   Event device name to select the event device identifier.
> + *
> + * @return
> + *   Returns event device identifier on success.
> + *   - <0: Failure to find named event device.
> + */
> +extern uint8_t
> +rte_event_dev_get_dev_id(const char *name);
> +
> +/**
> + * Return the NUMA socket to which a device is connected.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @return
> + *   The NUMA socket id to which the device is connected or
> + *   a default of zero if the socket could not be determined.
> + *   - -1: dev_id value is out of range.
> + */
> +extern int
> +rte_event_dev_socket_id(uint8_t dev_id);
> +
> +/* Event device capability bitmap flags */
> +#define RTE_EVENT_DEV_CAP_QUEUE_QOS        (1 << 0)
> +/**< Event scheduling prioritization is based on the priority associated
> with
> + *  each event queue.
> + *
> + *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL
> + */
> +#define RTE_EVENT_DEV_CAP_EVENT_QOS        (1 << 1)
> +/**< Event scheduling prioritization is based on the priority associated
> with
> + *  each event. Priority of each event is supplied in *rte_event*
> structure
> + *  on each enqueue operation.
> + *
> + *  \see rte_event_enqueue()
> + */
> +
> +/**
> + * Event device information
> + */
> +struct rte_event_dev_info {
> +       const char *driver_name;        /**< Event driver name */
> +       struct rte_pci_device *pci_dev; /**< PCI information */
> +       uint32_t min_dequeue_wait_ns;
> +       /**< Minimum supported global dequeue wait delay(ns) by this
> device */
> +       uint32_t max_dequeue_wait_ns;
> +       /**< Maximum supported global dequeue wait delay(ns) by this
> device */
> +       uint32_t dequeue_wait_ns;
>

Am I reading this correctly that there is no way to support an indefinite
waiting capability? Or is this just saying that if a timed wait is
performed there are min/max limits for the wait duration?


> +       /**< Configured global dequeue wait delay(ns) for this device */
> +       uint8_t max_event_queues;
> +       /**< Maximum event_queues supported by this device */
> +       uint32_t max_event_queue_flows;
> +       /**< Maximum supported flows in an event queue by this device*/
> +       uint8_t max_event_queue_priority_levels;
> +       /**< Maximum number of event queue priority levels by this device.
> +        * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
> +        */
> +       uint8_t nb_event_queues;
> +       /**< Configured number of event queues for this device */
>

Is 256 a sufficient number of queues? While various SoCs may have limits,
why impose such a small limit architecturally?


> +       uint8_t max_event_priority_levels;
> +       /**< Maximum number of event priority levels by this device.
> +        * Valid when the device has RTE_EVENT_DEV_CAP_EVENT_QOS capability
> +         */
> +       uint8_t max_event_ports;
> +       /**< Maximum number of event ports supported by this device */
> +       uint8_t nb_event_ports;
> +       /**< Configured number of event ports for this device */
>

Same question as for queues. 256 seems a small number to be architecting
for this.


> +       uint8_t max_event_port_dequeue_queue_depth;
> +       /**< Maximum dequeue queue depth for any event port.
> +        * Implementations can schedule N events at a time to an event
> port.
> +        * A device that does not support bulk dequeue will set this as 1.
> +        * \see rte_event_port_setup()
> +        */
> +       uint32_t max_event_port_enqueue_queue_depth;
> +       /**< Maximum enqueue queue depth for any event port.
> Implementations
> +        * can batch N events at a time to enqueue through event port
> +        * \see rte_event_port_setup()
> +        */
> +       int32_t max_num_events;
> +       /**< A *closed system* event dev has a limit on the number of
> events it
> +        * can manage at a time. An *open system* event dev does not have a
> +        * limit and will specify this as -1.
> +        */
> +       uint32_t event_dev_cap;
> +       /**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/
> +};
> +
> +/**
> + * Retrieve the contextual information of an event device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @param[out] dev_info
> + *   A pointer to a structure of type *rte_event_dev_info* to be filled
> with the
> + *   contextual information of the device.
> + *
> + */
> +extern void
> +rte_event_dev_info_get(uint8_t dev_id, struct rte_event_dev_info
> *dev_info);
> +
> +/* Event device configuration bitmap flags */
> +#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1 << 0)
> +/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.
> + *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()
> + */
> +
> +/** Event device configuration structure */
> +struct rte_event_dev_config {
> +       uint32_t dequeue_wait_ns;
> +       /**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this
> device.
> +        * This value should be in the range of *min_dequeue_wait_ns* and
> +        * *max_dequeue_wait_ns* which previously provided in
> +        * rte_event_dev_info_get()
> +        * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> +        */
> +       int32_t nb_events_limit;
> +       /**< Applies to *closed system* event dev only. This field
> indicates a
> +        * limit to ethdev-like devices to limit the number of events
> injected
> +        * into the system to not overwhelm core-to-core events.
> +        * This value cannot exceed the *max_num_events* which previously
> +        * provided in rte_event_dev_info_get()
> +        */
> +       uint8_t nb_event_queues;
> +       /**< Number of event queues to configure on this device.
> +        * This value cannot exceed the *max_event_queues* which previously
> +        * provided in rte_event_dev_info_get()
> +        */
> +       uint8_t nb_event_ports;
> +       /**< Number of event ports to configure on this device.
> +        * This value cannot exceed the *max_event_ports* which previously
> +        * provided in rte_event_dev_info_get()
> +        */
> +       uint32_t event_dev_cfg;
> +       /**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
> +};
> +
> +/**
> + * Configure an event device.
> + *
> + * This function must be invoked first before any other function in the
> + * API. This function can also be re-invoked when a device is in the
> + * stopped state.
> + *
> + * The caller may use rte_event_dev_info_get() to get the capability of
> each
> + * resources available for this event device.
> + *
> + * @param dev_id
> + *   The identifier of the device to configure.
> + * @param config
> + *   The event device configuration structure.
> + *
> + * @return
> + *   - 0: Success, device configured.
> + *   - <0: Error code returned by the driver configuration function.
> + */
> +extern int
> +rte_event_dev_configure(uint8_t dev_id, struct rte_event_dev_config
> *config);
> +
> +
> +/* Event queue specific APIs */
> +
> +#define RTE_EVENT_QUEUE_PRIORITY_HIGHEST   0
> +/**< Highest event queue priority */
> +#define RTE_EVENT_QUEUE_PRIORITY_NORMAL    128
> +/**< Normal event queue priority */
> +#define RTE_EVENT_QUEUE_PRIORITY_LOWEST    255
> +/**< Lowest event queue priority */
> +
> +/* Event queue configuration bitmap flags */
> +#define RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER    (1 << 0)
> +/**< This event queue links only to a single event port.
> + *
> + *  \see rte_event_port_setup(), rte_event_port_link()
> + */
> +
> +/** Event queue configuration structure */
> +struct rte_event_queue_conf {
> +       uint32_t nb_atomic_flows;
> +       /**< The maximum number of active flows this queue can track at any
> +        * given time. The value must be in the range of
> +        * [1 - max_event_queue_flows)] which previously supplied
> +        * to rte_event_dev_configure().
> +        */
> +       uint32_t nb_atomic_order_sequences;
> +       /**< The maximum number of outstanding events waiting to be
> (egress-)
> +        * reordered by this queue. In other words, the number of entries
> in
> +        * this queue’s reorder buffer.The value must be in the range of
> +        * [1 - max_event_queue_flows)] which previously supplied
> +        * to rte_event_dev_configure().
>

What happens if this limit is exceeded? While atomic limits are bounded by
the number of lcores, the same cannot be said for ordered queues.
Presumably the queue would refuse further dequeues once this limit is
reached until pending reorders are resolved to permit continued processing?
If so that should be stated explicitly.


> +        */
> +       uint32_t event_queue_cfg; /**< Queue config
> flags(EVENT_QUEUE_CFG_) */
> +       uint8_t priority;
> +       /**< Priority for this event queue relative to other event queues.
> +        * The requested priority should in the range of
> +        * [RTE_EVENT_QUEUE_PRIORITY_HIGHEST, RTE_EVENT_QUEUE_PRIORITY_
> LOWEST].
> +        * The implementation shall normalize the requested priority to
> +        * event device supported priority value.
> +        * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
> +        */
> +};
> +
> +/**
> + * Retrieve the default configuration information of an event queue
> designated
> + * by its *queue_id* from the event driver for an event device.
> + *
> + * This function intended to be used in conjunction with
> rte_event_queue_setup()
> + * where caller needs to set up the queue by overriding few default
> values.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param queue_id
> + *   The index of the event queue to get the configuration information.
> + *   The value must be in the range [0, nb_event_queues - 1]
> + *   previously supplied to rte_event_dev_configure().
> + * @param[out] queue_conf
> + *   The pointer to the default event queue configuration data.
> + *
> + * \see rte_event_queue_setup()
> + *
> + */
> +extern void
> +rte_event_queue_default_conf_get(uint8_t dev_id, uint8_t queue_id,
> +                                struct rte_event_queue_conf *queue_conf);
> +
> +/**
> + * Allocate and set up an event queue for an event device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param queue_id
> + *   The index of the event queue to setup. The value must be in the range
> + *   [0, nb_event_queues - 1] previously supplied to
> rte_event_dev_configure().
> + * @param queue_conf
> + *   The pointer to the configuration data to be used for the event queue.
> + *   NULL value is allowed, in which case default configuration
> used.
> + *
> + * \see rte_event_queue_default_conf_get()
> + *
> + * @return
> + *   - 0: Success, event queue correctly set up.
> + *   - <0: event queue configuration failed
> + */
> +extern int
> +rte_event_queue_setup(uint8_t dev_id, uint8_t queue_id,
> +                     struct rte_event_queue_conf *queue_conf);
> +
> +/**
> + * Get the number of event queues on a specific event device
> + *
> + * @param dev_id
> + *   Event device identifier.
> + * @return
> + *   - The number of configured event queues
> + */
> +extern uint16_t
> +rte_event_queue_count(uint8_t dev_id);
> +
> +/**
> + * Get the priority of the event queue on a specific event device
> + *
> + * @param dev_id
> + *   Event device identifier.
> + * @param queue_id
> + *   Event queue identifier.
> + * @return
> + *   - If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then the
> + *    configured priority of the event queue in
> + *    [RTE_EVENT_QUEUE_PRIORITY_HIGHEST, RTE_EVENT_QUEUE_PRIORITY_LOWEST]
> range
> + *    else the value one
> + */
> +extern uint8_t
> +rte_event_queue_priority(uint8_t dev_id, uint8_t queue_id);
> +
> +/* Event port specific APIs */
> +
> +/** Event port configuration structure */
> +struct rte_event_port_conf {
> +       int32_t new_event_threshold;
> +       /**< A backpressure threshold for new event enqueues on this port.
> +        * Use for *closed system* event dev where event capacity is
> limited,
> +        * and cannot exceed the capacity of the event dev.
> +        * Configuring ports with different thresholds can make higher
> priority
> +        * traffic less likely to  be backpressured.
> +        * For example, a port used to inject NIC Rx packets into the
> event dev
> +        * can have a lower threshold so as not to overwhelm the device,
> +        * while ports used for worker pools can have a higher threshold.
> +        */
> +       uint8_t dequeue_queue_depth;
> +       /**< Configure number of bulk dequeues for this event port.
> +        * This value cannot exceed the *max_event_port_dequeue_queue_
> depth*
> +        * which previously supplied to rte_event_dev_configure()
> +        */
> +       uint8_t enqueue_queue_depth;
> +       /**< Configure number of bulk enqueues for this event port.
> +        * This value cannot exceed the *max_event_port_enqueue_queue_
> depth*
> +        * which previously supplied to rte_event_dev_configure()
> +        */
> +};
> +
> +/**
> + * Retrieve the default configuration information of an event port
> designated
> + * by its *port_id* from the event driver for an event device.
> + *
> + * This function intended to be used in conjunction with
> rte_event_port_setup()
> + * where caller needs to set up the port by overriding few default values.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   The index of the event port to get the configuration information.
> + *   The value must be in the range [0, nb_event_ports - 1]
> + *   previously supplied to rte_event_dev_configure().
> + * @param[out] port_conf
> + *   The pointer to the default event port configuration data
> + *
> + * \see rte_event_port_setup()
> + *
> + */
> +extern void
> +rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> +                               struct rte_event_port_conf *port_conf);
> +
> +/**
> + * Allocate and set up an event port for an event device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   The index of the event port to setup. The value must be in the range
> + *   [0, nb_event_ports - 1] previously supplied to
> rte_event_dev_configure().
> + * @param port_conf
> + *   The pointer to the configuration data to be used for the queue.
> + *   NULL value is allowed, in which case default configuration
> used.
> + *
> + * \see rte_event_port_default_conf_get()
> + *
> + * @return
> + *   - 0: Success, event port correctly set up.
> + *   - <0: Port configuration failed
> + *   - (-EDQUOT) Quota exceeded(Application tried to link the queue
> configured
> + *   with RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one event
> ports)
> + */
> +extern int
> +rte_event_port_setup(uint8_t dev_id, uint8_t port_id,
> +                    struct rte_event_port_conf *port_conf);
> +
> +/**
> + * Get the number of dequeue queue depth configured for event port
> designated
> + * by its *port_id* on a specific event device
> + *
> + * @param dev_id
> + *   Event device identifier.
> + * @param port_id
> + *   Event port identifier.
> + * @return
> + *   - The number of configured dequeue queue depth
> + *
> + * \see rte_event_dequeue_burst()
> + */
> +extern uint8_t
> +rte_event_port_dequeue_depth(uint8_t dev_id, uint8_t port_id);
> +
> +/**
> + * Get the number of enqueue queue depth configured for event port
> designated
> + * by its *port_id* on a specific event device
> + *
> + * @param dev_id
> + *   Event device identifier.
> + * @param port_id
> + *   Event port identifier.
> + * @return
> + *   - The number of configured enqueue queue depth
> + *
> + * \see rte_event_enqueue_burst()
> + */
> +extern uint8_t
> +rte_event_port_enqueue_depth(uint8_t dev_id, uint8_t port_id);
> +
> +/**
> + * Get the number of ports on a specific event device
> + *
> + * @param dev_id
> + *   Event device identifier.
> + * @return
> + *   - The number of configured ports
> + */
> +extern uint8_t
> +rte_event_port_count(uint8_t dev_id);
> +
> +/**
> + * Start an event device.
> + *
> + * The device start step is the last one and consists of setting the event
> + * queues to start accepting the events and schedules to event ports.
> + *
> + * On success, all basic functions exported by the API (event enqueue,
> + * event dequeue and so on) can be invoked.
> + *
> + * @param dev_id
> + *   Event device identifier
> + * @return
> + *   - 0: Success, device started.
> + *   - <0: Error code of the driver device start function.
> + */
> +extern int
> +rte_event_dev_start(uint8_t dev_id);
> +
> +/**
> + * Stop an event device. The device can be restarted with a call to
> + * rte_event_dev_start()
> + *
> + * @param dev_id
> + *   Event device identifier.
> + */
> +extern void
> +rte_event_dev_stop(uint8_t dev_id);
>

Having this be a void function implies this function cannot fail. Is that
assumption always correct?


> +
> +/**
> + * Close an event device. The device cannot be restarted!
> + *
> + * @param dev_id
> + *   Event device identifier
> + *
> + * @return
> + *  - 0 on successfully closing device
> + *  - <0 on failure to close device
> + */
> +extern int
> +rte_event_dev_close(uint8_t dev_id);
> +
> +/* Scheduler type definitions */
> +#define RTE_SCHED_TYPE_ORDERED         0
> +/**< Ordered scheduling
> + *
> + * Events from an ordered flow of an event queue can be scheduled to
> multiple
> + * ports for concurrent processing while maintaining the original event
> order.
> + * This scheme enables the user to achieve high single flow throughput by
> + * avoiding SW synchronization for ordering between ports which bound to
> cores.
> + *
> + * The source flow ordering from an event queue is maintained when events
> are
> + * enqueued to their destination queue within the same ordered flow
> context.
> + * An event port holds the context until application call
> rte_event_dequeue()
> + * from the same port, which implicitly releases the context.
> + * User may allow the scheduler to release the context earlier than that
> + * by calling rte_event_release()
> + *
> + * Events from the source queue appear in their original order when
> dequeued
> + * from a destination queue.
> + * Event ordering is based on the received event(s), but also other
> + * (newly allocated or stored) events are ordered when enqueued within
> the same
> + * ordered context. Events not enqueued (e.g. released or stored) within
> the
> + * context are  considered missing from reordering and are skipped at
> this time
> + * (but can be ordered again within another context).
> + *
> + * \see rte_event_dequeue(), rte_event_release()
> + */
> +
> +#define RTE_SCHED_TYPE_ATOMIC          1
> +/**< Atomic scheduling
> + *
> + * Events from an atomic flow of an event queue can be scheduled only to a
> + * single port at a time. The port is guaranteed to have exclusive
> (atomic)
> + * access to the associated flow context, which enables the user to avoid
> SW
> + * synchronization. Atomic flows also help to maintain event ordering
> + * since only one port at a time can process events from a flow of an
> + * event queue.
> + *
> + * The atomic queue synchronization context is dedicated to the port until
> + * application call rte_event_dequeue() from the same port, which
> implicitly
> + * releases the context. User may allow the scheduler to release the
> context
> + * earlier than that by calling rte_event_release()
> + *
> + * \see rte_event_dequeue(), rte_event_release()
> + */
> +
> +#define RTE_SCHED_TYPE_PARALLEL                2
> +/**< Parallel scheduling
> + *
> + * The scheduler performs priority scheduling, load balancing, etc.
> functions
> + * but does not provide additional event synchronization or ordering.
> + * It is free to schedule events from a single parallel flow of an event
> queue
> + * to multiple events ports for concurrent processing.
> + * The application is responsible for flow context synchronization and
> + * event ordering (SW synchronization).
> + */
> +
> +/* Event types to classify the event source */
> +#define RTE_EVENT_TYPE_ETHDEV          0x0
> +/**< The event generated from ethdev subsystem */
> +#define RTE_EVENT_TYPE_CRYPTODEV       0x1
> +/**< The event generated from crypodev subsystem */
> +#define RTE_EVENT_TYPE_TIMERDEV                0x2
> +/**< The event generated from timerdev subsystem */
> +#define RTE_EVENT_TYPE_CORE            0x3
> +/**< The event generated from core.
> + * Application may use *sub_event_type* to further classify the event
> + */
> +#define RTE_EVENT_TYPE_MAX             0x10
> +/**< Maximum number of event types */
> +
> +/* Event priority */
> +#define RTE_EVENT_PRIORITY_HIGHEST      0
> +/**< Highest event priority */
> +#define RTE_EVENT_PRIORITY_NORMAL       128
> +/**< Normal event priority */
> +#define RTE_EVENT_PRIORITY_LOWEST       255
> +/**< Lowest event priority */
> +
> +/**
> + * The generic *rte_event* structure to hold the event attributes
> + * for dequeue and enqueue operation
> + */
> +struct rte_event {
> +       /** WORD0 */
> +       RTE_STD_C11
> +        union {
> +               uint64_t u64;
> +               /** Event attributes for dequeue or enqueue operation */
> +               struct {
> +                       uint32_t flow_id:24;
> +                       /**< Targeted flow identifier for the enqueue and
> +                        * dequeue operation.
> +                        * The value must be in the range of
> +                        * [1 - max_event_queue_flows)] which
> +                        * previously supplied to
> rte_event_dev_configure().
> +                        */
> +                       uint32_t queue_id:8;
> +                       /**< Targeted event queue identifier for the
> enqueue or
> +                        * dequeue operation.
> +                        * The value must be in the range of
> +                        * [0, nb_event_queues - 1] which previously
> supplied to
> +                        * rte_event_dev_configure().
> +                        */
> +                       uint8_t  sched_type;
> +                       /**< Scheduler synchronization type
> (RTE_SCHED_TYPE_)
> +                        * associated with flow id on a given event queue
> +                        * for the enqueue and dequeue operation.
> +                        */
> +                       uint8_t  event_type;
> +                       /**< Event type to classify the event source. */
> +                       uint8_t  sub_event_type;
> +                       /**< Sub-event types based on the event source.
> +                        * \see RTE_EVENT_TYPE_CORE
> +                        */
> +                       uint8_t  priority;
> +                       /**< Event priority relative to other events in the
> +                        * event queue. The requested priority should in
> the
> +                        * range of  [RTE_EVENT_PRIORITY_HIGHEST,
> +                        * RTE_EVENT_PRIORITY_LOWEST].
> +                        * The implementation shall normalize the requested
> +                        * priority to supported priority value.
> +                        * Valid when the device has
> RTE_EVENT_DEV_CAP_EVENT_QOS
> +                        * capability.
> +                        */
> +               };
> +       };
> +       /** WORD1 */
> +       RTE_STD_C11
> +       union {
> +               uintptr_t event;
> +               /**< Opaque event pointer */
> +               struct rte_mbuf *mbuf;
> +               /**< mbuf pointer if dequeued event is associated with
> mbuf */
> +       };
> +};
> +
> +/**
> + * Schedule one or more events in the event dev.
> + *
> + * An event dev implementation may define this is a NOOP, for instance if
> + * the event dev performs its scheduling in hardware.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + */
> +extern void
> +rte_event_schedule(uint8_t dev_id);
> +
> +/**
> + * Enqueue the event object supplied in the *rte_event* structure on an
> + * event device designated by its *dev_id* through the event port
> specified by
> + * *port_id*. The event object specifies the event queue on which this
> + * event will be enqueued.
> + *
> + * @param dev_id
> + *   Event device identifier.
> + * @param port_id
> + *   The identifier of the event port.
> + * @param ev
> + *   Pointer to struct rte_event
> + * @param pin_event
> + *   Hint to the scheduler that the event can be pinned to the same port
> for
> + *   the next scheduling stage. For implementations that support it, this
> + *   allows the same core to process the next stage in the pipeline for a
> given
> + *   event, taking advantage of cache locality. The pinned event will be
> + *   received through rte_event_dequeue(). This is a hint and the event is
> + *   not guaranteed to be pinned to the port. This hint is valid only
> when the
> + *   event is dequeued with rte_event_dequeue() followed by
> rte_event_enqueue().
> + *
> + * @return
> + *  - 0 on success
> + *  - <0 on failure. Failure can occur if the event port's output queue is
> + *     backpressured, for instance.
> + */
> +extern int
> +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev,
> +                 bool pin_event);
> +
> +/**
> + * Enqueue a burst of events objects supplied in *rte_event* structure on
> an
> + * event device designated by its *dev_id* through the event port
> specified by
> + * *port_id*. Each event object specifies the event queue on which it
> + * will be enqueued.
> + *
> + * The rte_event_enqueue_burst() function is invoked to enqueue
> + * multiple event objects.
> + * It is the burst variant of rte_event_enqueue() function.
> + *
> + * The *num* parameter is the number of event objects to enqueue which are
> + * supplied in the *ev* array of *rte_event* structure.
> + *
> + * The rte_event_enqueue_burst() function returns the number of
> + * events objects it actually enqueued. A return value equal to *num*
> means
> + * that all event objects have been enqueued.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   The identifier of the event port.
> + * @param ev
> + *   An array of *num* pointers to *rte_event* structure
> + *   which contain the event object enqueue operations to be processed.
> + * @param num
> + *   The number of event objects to enqueue, typically number of
> + *   rte_event_port_enqueue_depth() available for this port.
> + * @param pin_event
> + *   Hint to the scheduler that the event can be pinned to the same port
> for
> + *   the next scheduling stage. For implementations that support it, this
> + *   allows the same core to process the next stage in the pipeline for a
> given
> + *   event, taking advantage of cache locality. The pinned event will be
> + *   received through rte_event_dequeue(). This is a hint and the event is
> + *   not guaranteed to be pinned to the port. This hint is valid only
> when the
> + *   event is dequeued with rte_event_dequeue() followed by
> rte_event_enqueue().
> + *
> + * @return
> + *   The number of event objects actually enqueued on the event device.
> The
> + *   return value can be less than the value of the *num* parameter when
> the
> + *   event devices queue is full or if invalid parameters are specified
> in a
> + *   *rte_event*. If return value is less than *num*, the remaining
> events at
> + *   the end of ev[] are not consumed, and the caller has to take care of
> them.
> + *
> + * \see rte_event_enqueue(), rte_event_port_enqueue_depth()
> + */
> +extern int
> +rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
> +                       struct rte_event ev[], int num, bool pin_event);
> +
> +/**
> + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> + *
> + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> flag then
> + * application can use this function to convert wait value in nanoseconds
> to
> + * implementations specific wait value supplied in rte_event_dequeue()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param ns
> + *   Wait time in nanosecond
> + *
> + * @return
> + * Value for the *wait* parameter in rte_event_dequeue() function
> + *
> + * \see rte_event_dequeue(), RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> + * \see rte_event_dev_configure()
> + *
> + */
> +extern uint64_t
> +rte_event_dequeue_wait_time(uint8_t dev_id, uint64_t ns);
> +
> +/**
> + * Dequeue an event from the event port specified by *port_id* on the
> + * event device designated by its *dev_id*.
> + *
> + * rte_event_dequeue() does not dictate the specifics of scheduling
> algorithm as
> + * each eventdev driver may have different criteria to schedule an event.
> + * However, in general, from an application perspective scheduler may use
> the
> + * following scheme to dispatch an event to the port.
> + *
> + * 1) Selection of event queue based on
> + *   a) The list of event queues are linked to the event port.
> + *   b) If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then
> event
> + *   queue selection from list is based on event queue priority relative
> to
> + *   other event queue supplied as *priority* in rte_event_queue_setup()
> + *   c) If the device has RTE_EVENT_DEV_CAP_EVENT_QOS capability then
> event
> + *   queue selection from the list is based on event priority supplied as
> + *   *priority* in rte_event_enqueue_burst()
> + * 2) Selection of event
> + *   a) The number of flows available in selected event queue.
> + *   b) Schedule type method associated with the event
> + *
> + * On a successful dequeue, the event port holds flow id and schedule type
> + * context associated with the dispatched event. The context is
> automatically
> + * released in the next rte_event_dequeue() invocation, or
> rte_event_release()
> + * can be called to release the context early.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   The identifier of the event port.
> + * @param[out] ev
> + *   Pointer to struct rte_event. On successful event dispatch,
> implementation
> + *   updates the event attributes.
> + * @param wait
> + *   0 - no-wait, returns immediately if there is no event.
> + *   >0 - wait for the event, if the device is configured with
> + *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait
> until
> + *   the event available or *wait* time.
> + *   if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_
> WAIT
> + *   then this function will wait until the event available or
> *dequeue_wait_ns*
> + *   ns which was previously supplied to rte_event_dev_configure()
> + *
> + * @return
> + * When true, a valid event has been dispatched by the scheduler.
> + *
> + */
> +extern bool
> +rte_event_dequeue(uint8_t dev_id, uint8_t port_id,
> +                 struct rte_event *ev, uint64_t wait);
> +
> +/**
> + * Dequeue a burst of events objects from the event port designated by its
> + * *event_port_id*, on an event device designated by its *dev_id*.
> + *
> + * The rte_event_dequeue_burst() function is invoked to dequeue
> + * multiple event objects. It is the burst variant of rte_event_dequeue()
> + * function.
> + *
> + * The *num* parameter is the maximum number of event objects to dequeue
> which
> + * are returned in the *ev* array of *rte_event* structure.
> + *
> + * The rte_event_dequeue_burst() function returns the number of
> + * events objects it actually dequeued. A return value equal to
> + * *num* means that all event objects have been dequeued.
> + *
> + * The number of events dequeued is the number of scheduler contexts held
> by
> + * this port. These contexts are automatically released in the next
> + * rte_event_dequeue() invocation, or rte_event_release() can be called
> once
> + * per event to release the contexts early.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   The identifier of the event port.
> + * @param[out] ev
> + *   An array of *num* pointers to *rte_event* structure which is
> populated
> + *   with the dequeued event objects.
> + * @param num
> + *   The maximum number of event objects to dequeue, typically number of
> + *   rte_event_port_dequeue_depth() available for this port.
> + * @param wait
> + *   0 - no-wait, returns immediately if there is no event.
> + *   >0 - wait for the event, if the device is configured with
> + *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait
> until the
> + *   event available or *wait* time.
> + *   if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_
> WAIT
> + *   then this function will wait until the event available or
> *dequeue_wait_ns*
> + *   ns which was previously supplied to rte_event_dev_configure()
> + *
> + * @return
> + * The number of event objects actually dequeued from the port. The return
> + * value can be less than the value of the *num* parameter when the
> + * event port's queue is not full.
> + *
> + * \see rte_event_dequeue(), rte_event_port_dequeue_depth()
> + */
> +extern int
> +rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id,
> +                       struct rte_event *ev, int num, uint64_t wait);
> +
> +/**
> + * Release the current flow context associated with a schedule type which
> + * dequeued from a given event queue though the event port designated by
> + * its *port_id*
> + *
> + * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> + * then this function hints the scheduler that the user has completed
> critical
> + * section processing in the current atomic context.
> + * The scheduler is now allowed to schedule events from the same flow from
> + * an event queue to another port. However, the context may be still held
> + * until the next rte_event_dequeue() or rte_event_dequeue_burst() call,
> this
> + * call allows but does not force the scheduler to release the context
> early.
> + *
> + * Early atomic context release may increase parallelism and thus system
> + * performance, but the user needs to design carefully the split into
> critical
> + * vs non-critical sections.
> + *
> + * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> + * then this function hints the scheduler that the user has done all that
> need
> + * to maintain event order in the current ordered context.
> + * The scheduler is allowed to release the ordered context of this port
> and
> + * avoid reordering any following enqueues.
> + *
> + * Early ordered context release may increase parallelism and thus system
> + * performance.
> + *
> + * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
> + * or no scheduling context is held then this function may be an NOOP,
> + * depending on the implementation.
> + *
> + * If multiple events are dequeued with rte_event_dequeue_burst(),
> + * rte_event_release() will release each flow context associated with a
> + * schedule type of an event though *index*, it denotes the order in
> + * which it was dequeued with rte_event_dequeue_burst()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   The identifier of the event port.
> + * @param index
> + *   The index of the event that dequeued with rte_event_dequeue_burst()
> + *   which needs to release. The value zero used if the event dequeued
> with
> + *   rte_event_dequeue()
> + *
> + *  \see rte_event_dequeue(), rte_event_dequeue_burst()
> + */
> +extern void
> +rte_event_release(uint8_t dev_id, uint8_t port_id, uint8_t index);
> +
> +#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST  0
> +/**< Highest event queue servicing priority */
> +#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL   128
> +/**< Normal event queue servicing priority */
> +#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST   255
> +/**< Lowest event queue servicing priority */
> +
> +/** Structure to hold the queue to port link establishment attributes */
> +struct rte_event_queue_link {
> +       uint8_t queue_id;
> +       /**< Event queue identifier to select the source queue to link */
> +       uint8_t priority;
> +       /**< The priority of the event queue for this event port.
> +        * The priority defines the event port's servicing priority for
> +        * event queue, which may be ignored by an implementation.
> +        * The requested priority should in the range of
> +        * [RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST,
> +        * RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST].
> +        * The implementation shall normalize the requested priority to
> +        * implementation supported priority value.
> +        */
> +};
> +
> +/**
> + * Link multiple source event queues supplied in *rte_event_queue_link*
> + * structure as *queue_id* to the destination event port designated by its
> + * *port_id* on the event device designated by its *dev_id*.
> + *
> + * The link establishment shall enable the event port *port_id* from
> + * receiving events from the specified event queue *queue_id*
> + *
> + * An event queue may link to one or more event ports.
> + * The number of links can be established from an event queue to event
> port is
> + * implementation defined.
> + *
> + * Event queue(s) to event port link establishment can be changed at
> runtime
> + * without re-configuring the device to support scaling and to reduce the
> + * latency of critical work by establishing the link with more event ports
> + * at runtime.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @param port_id
> + *   Event port identifier to select the destination port to link.
> + *
> + * @param link
> + *   An array of *num* pointers to *rte_event_queue_link* structure
> + *   which contain the event queue to event port link establishment
> attributes.
> + *   NULL value is allowed, in which case this function links all the
> configured
> + *   event queues *nb_event_queues* which previously supplied to
> + *   rte_event_dev_configure() to the event port *port_id* with normal
> servicing
> + *   priority(RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL).
> + *
> + * @param num
> + *   The number of links to establish
> + *
> + * @return
> + * The number of links actually established on the event device. The
> return
> + * value can be less than the value of the *num* parameter when the
> + * implementation has the limitation on specific queue to port link
> + * establishment or if invalid parameters are specified
> + * in a *rte_event_queue_link*.
> + * If the return value is less than *num*, the remaining links at the end
> of
> + * link[] are not established, and the caller has to take care of them.
> + * If return value is less than *num* then implementation shall update the
> + * rte_errno accordingly, Possible rte_errno values are
> + * (-EDQUOT) Quota exceeded(Application tried to link the queue
> configured with
> + *  RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one event ports)
> + * (-EINVAL) Invalid parameter
> + *
> + */
> +extern int
> +rte_event_port_link(uint8_t dev_id, uint8_t port_id,
> +                   struct rte_event_queue_link link[], int num);
> +
> +/**
> + * Unlink multiple source event queues supplied in *queues* from the
> destination
> + * event port designated by its *port_id* on the event device designated
> + * by its *dev_id*.
> + *
> + * The unlink establishment shall disable the event port *port_id* from
> + * receiving events from the specified event queue *queue_id*
> + *
> + * Event queue(s) to event port unlink establishment can be changed at
> runtime
> + * without re-configuring the device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @param port_id
> + *   Event port identifier to select the destination port to unlink.
> + *
> + * @param queues
> + *   An array of *num* event queues to be unlinked from the event port.
> + *   NULL value is allowed, in which case this function unlinks all the
> + *   event queue(s) from the event port *port_id*.
> + *
> + * @param num
> + *   The number of unlinks to establish
> + *
> + * @return
> + * The number of unlinks actually established on the event device. The
> return
> + * value can be less than the value of the *num* parameter when the
> + * implementation has the limitation on specific queue to port unlink
> + * establishment or if invalid parameters are specified.
> + * If the return value is less than *num*, the remaining queues at the
> end of
> + * queues[] are not established, and the caller has to take care of them.
> + * If return value is less than *num* then implementation shall update the
> + * rte_errno accordingly, Possible rte_errno values are
> + * (-EINVAL) Invalid parameter
> + *
> + */
> +extern int
> +rte_event_port_unlink(uint8_t dev_id, uint8_t port_id,
> +                   uint8_t queues[], int num);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_EVENTDEV_H_ */
> --
> 2.5.5
>
>
  
Jerin Jacob Oct. 14, 2016, 9:26 a.m. UTC | #2
On Thu, Oct 13, 2016 at 11:14:38PM -0500, Bill Fischofer wrote:
> Hi Jerin,

Hi Bill,

Thanks for the review.

[snip]
> > + * If the device init operation is successful, the correspondence between
> > + * the device identifier assigned to the new device and its associated
> > + * *rte_event_dev* structure is effectively registered.
> > + * Otherwise, both the *rte_event_dev* structure and the device
> > identifier are
> > + * freed.
> > + *
> > + * The functions exported by the application Event API to setup a device
> > + * designated by its device identifier must be invoked in the following
> > order:
> > + *     - rte_event_dev_configure()
> > + *     - rte_event_queue_setup()
> > + *     - rte_event_port_setup()
> > + *     - rte_event_port_link()
> > + *     - rte_event_dev_start()
> > + *
> > + * Then, the application can invoke, in any order, the functions
> > + * exported by the Event API to schedule events, dequeue events, enqueue
> > events,
> > + * change event queue(s) to event port [un]link establishment and so on.
> > + *
> > + * Application may use rte_event_[queue/port]_default_conf_get() to get
> > the
> > + * default configuration to set up an event queue or event port by
> > + * overriding few default values.
> > + *
> > + * If the application wants to change the configuration (i.e. call
> > + * rte_event_dev_configure(), rte_event_queue_setup(), or
> > + * rte_event_port_setup()), it must call rte_event_dev_stop() first to
> > stop the
> > + * device and then do the reconfiguration before calling
> > rte_event_dev_start()
> > + * again. The schedule, enqueue and dequeue functions should not be
> > invoked
> > + * when the device is stopped.
> >
> 
> Given this requirement, the question is what happens to events that are "in
> flight" at the time rte_event_dev_stop() is called? Is stop an asynchronous
> operation that quiesces the event _dev and allows in-flight events to drain
> from queues/ports prior to fully stopping, or is some sort of separate
> explicit quiesce mechanism required? If stop is synchronous and simply
> halts the event_dev, then how is an application to know if subsequent
> configure/setup calls would leave these pending events with no place to
> stand?
>

From an application API perspective rte_event_dev_stop() is a synchronous function.
If the stop has been called for re-configuring the number of queues, ports etc of
the device, then "in flight" entry preservation will be implementation defined.
else "in flight" entries will be preserved.

[snip]
 
> > +extern int
> > +rte_event_dev_socket_id(uint8_t dev_id);
> > +
> > +/* Event device capability bitmap flags */
> > +#define RTE_EVENT_DEV_CAP_QUEUE_QOS        (1 << 0)
> > +/**< Event scheduling prioritization is based on the priority associated
> > with
> > + *  each event queue.
> > + *
> > + *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL
> > + */
> > +#define RTE_EVENT_DEV_CAP_EVENT_QOS        (1 << 1)
> > +/**< Event scheduling prioritization is based on the priority associated
> > with
> > + *  each event. Priority of each event is supplied in *rte_event*
> > structure
> > + *  on each enqueue operation.
> > + *
> > + *  \see rte_event_enqueue()
> > + */
> > +
> > +/**
> > + * Event device information
> > + */
> > +struct rte_event_dev_info {
> > +       const char *driver_name;        /**< Event driver name */
> > +       struct rte_pci_device *pci_dev; /**< PCI information */
> > +       uint32_t min_dequeue_wait_ns;
> > +       /**< Minimum supported global dequeue wait delay(ns) by this
> > device */
> > +       uint32_t max_dequeue_wait_ns;
> > +       /**< Maximum supported global dequeue wait delay(ns) by this
> > device */
> > +       uint32_t dequeue_wait_ns;
> >
> 
> Am I reading this correctly that there is no way to support an indefinite
> waiting capability? Or is this just saying that if a timed wait is
> performed there are min/max limits for the wait duration?

Application can wait indefinite if required. see
RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.

Trivial application may not need different wait values on each dequeue.This is
a performance optimization opportunity for implementation.

> 
> 
> > +       /**< Configured global dequeue wait delay(ns) for this device */
> > +       uint8_t max_event_queues;
> > +       /**< Maximum event_queues supported by this device */
> > +       uint32_t max_event_queue_flows;
> > +       /**< Maximum supported flows in an event queue by this device*/
> > +       uint8_t max_event_queue_priority_levels;
> > +       /**< Maximum number of event queue priority levels by this device.
> > +        * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
> > +        */
> > +       uint8_t nb_event_queues;
> > +       /**< Configured number of event queues for this device */
> >
> 
> Is 256 a sufficient number of queues? While various SoCs may have limits,
> why impose such a small limit architecturally?

Each event queue potentially can support millions of flows.That way, 256 may not be a
small limit.The reason to choose queue_id as 8bit to hold all the attribute
of "struct rte_event" in 128bit. So that SIMD optimization is possible
in the implementation.

> > +       /**< The maximum number of active flows this queue can track at any
> > +        * given time. The value must be in the range of
> > +        * [1 - max_event_queue_flows)] which previously supplied
> > +        * to rte_event_dev_configure().
> > +        */
> > +       uint32_t nb_atomic_order_sequences;
> > +       /**< The maximum number of outstanding events waiting to be
> > (egress-)
> > +        * reordered by this queue. In other words, the number of entries
> > in
> > +        * this queue’s reorder buffer.The value must be in the range of
> > +        * [1 - max_event_queue_flows)] which previously supplied
> > +        * to rte_event_dev_configure().
> >
> 
> What happens if this limit is exceeded? While atomic limits are bounded by
> the number of lcores, the same cannot be said for ordered queues.
> Presumably the queue would refuse further dequeues once this limit is
> reached until pending reorders are resolved to permit continued processing?
> If so that should be stated explicitly.

OK. I will update details.

> 
> 
> > + *
> > + *
> > + * The device start step is the last one and consists of setting the event
> > + * queues to start accepting the events and schedules to event ports.
> > + *
> > + * On success, all basic functions exported by the API (event enqueue,
> > + * event dequeue and so on) can be invoked.
> > + *
> > + * @param dev_id
> > + *   Event device identifier
> > + * @return
> > + *   - 0: Success, device started.
> > + *   - <0: Error code of the driver device start function.
> > + */
> > +extern int
> > +rte_event_dev_start(uint8_t dev_id);
> > +
> > +/**
> > + * Stop an event device. The device can be restarted with a call to
> > + * rte_event_dev_start()
> > + *
> > + * @param dev_id
> > + *   Event device identifier.
> > + */
> > +extern void
> > +rte_event_dev_stop(uint8_t dev_id);
> >
> 
> Having this be a void function implies this function cannot fail. Is that
> assumption always correct?

Yes. Subsequent rte_event_dev_start() can return error if the implementation
really have some critical issues on starting the device.

> 
> 
> > +
> > +/**
> > + * Close an event device. The device cannot be restarted!
> > + *
> > + * @param dev_id
> > + *   Event device identifier
> > + *
> > + * @return
> > + *  - 0 on successfully closing device
> > + *  - <0 on failure to close device
> > + */
> > +extern int
  
Hemant Agrawal Oct. 14, 2016, 10:30 a.m. UTC | #3
Hi Bill/Jerin,

> 

> Thanks for the review.

> 

> [snip]

> > > + * If the device init operation is successful, the correspondence

> > > + between

> > > + * the device identifier assigned to the new device and its

> > > + associated

> > > + * *rte_event_dev* structure is effectively registered.

> > > + * Otherwise, both the *rte_event_dev* structure and the device

> > > identifier are

> > > + * freed.

> > > + *

> > > + * The functions exported by the application Event API to setup a

> > > + device

> > > + * designated by its device identifier must be invoked in the

> > > + following

> > > order:

> > > + *     - rte_event_dev_configure()

> > > + *     - rte_event_queue_setup()

> > > + *     - rte_event_port_setup()

> > > + *     - rte_event_port_link()

> > > + *     - rte_event_dev_start()

> > > + *

> > > + * Then, the application can invoke, in any order, the functions

> > > + * exported by the Event API to schedule events, dequeue events,

> > > + enqueue

> > > events,

> > > + * change event queue(s) to event port [un]link establishment and so on.

> > > + *

> > > + * Application may use rte_event_[queue/port]_default_conf_get() to

> > > + get

> > > the

> > > + * default configuration to set up an event queue or event port by

> > > + * overriding few default values.

> > > + *

> > > + * If the application wants to change the configuration (i.e. call

> > > + * rte_event_dev_configure(), rte_event_queue_setup(), or

> > > + * rte_event_port_setup()), it must call rte_event_dev_stop() first

> > > + to

> > > stop the

> > > + * device and then do the reconfiguration before calling

> > > rte_event_dev_start()

> > > + * again. The schedule, enqueue and dequeue functions should not be

> > > invoked

> > > + * when the device is stopped.

> > >

> >

> > Given this requirement, the question is what happens to events that

> > are "in flight" at the time rte_event_dev_stop() is called? Is stop an

> > asynchronous operation that quiesces the event _dev and allows

> > in-flight events to drain from queues/ports prior to fully stopping,

> > or is some sort of separate explicit quiesce mechanism required? If

> > stop is synchronous and simply halts the event_dev, then how is an

> > application to know if subsequent configure/setup calls would leave

> > these pending events with no place to stand?

> >

> 

> From an application API perspective rte_event_dev_stop() is a synchronous

> function.

> If the stop has been called for re-configuring the number of queues, ports etc of

> the device, then "in flight" entry preservation will be implementation defined.

> else "in flight" entries will be preserved.

> 

> [snip]

> 

> > > +extern int

> > > +rte_event_dev_socket_id(uint8_t dev_id);

> > > +

> > > +/* Event device capability bitmap flags */

> > > +#define RTE_EVENT_DEV_CAP_QUEUE_QOS        (1 << 0)

> > > +/**< Event scheduling prioritization is based on the priority

> > > +associated

> > > with

> > > + *  each event queue.

> > > + *

> > > + *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL

> > > +*/

> > > +#define RTE_EVENT_DEV_CAP_EVENT_QOS        (1 << 1)

> > > +/**< Event scheduling prioritization is based on the priority

> > > +associated

> > > with

> > > + *  each event. Priority of each event is supplied in *rte_event*

> > > structure

> > > + *  on each enqueue operation.

> > > + *

> > > + *  \see rte_event_enqueue()

> > > + */

> > > +

> > > +/**

> > > + * Event device information

> > > + */

> > > +struct rte_event_dev_info {

> > > +       const char *driver_name;        /**< Event driver name */

> > > +       struct rte_pci_device *pci_dev; /**< PCI information */

> > > +       uint32_t min_dequeue_wait_ns;

> > > +       /**< Minimum supported global dequeue wait delay(ns) by this

> > > device */

> > > +       uint32_t max_dequeue_wait_ns;

> > > +       /**< Maximum supported global dequeue wait delay(ns) by this

> > > device */

> > > +       uint32_t dequeue_wait_ns;

> > >

> >

> > Am I reading this correctly that there is no way to support an

> > indefinite waiting capability? Or is this just saying that if a timed

> > wait is performed there are min/max limits for the wait duration?

> 

> Application can wait indefinite if required. see

> RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.

> 

> Trivial application may not need different wait values on each dequeue.This is a

> performance optimization opportunity for implementation.


 Jerin, It is irrespective of wait configuration, whether you are using per device wait or per dequeuer wait. 
 Can the value of MAX_U32 or MAX_U64 be treated as infinite weight? 

> 

> >

> >

> > > +       /**< Configured global dequeue wait delay(ns) for this device */

> > > +       uint8_t max_event_queues;

> > > +       /**< Maximum event_queues supported by this device */

> > > +       uint32_t max_event_queue_flows;

> > > +       /**< Maximum supported flows in an event queue by this device*/

> > > +       uint8_t max_event_queue_priority_levels;

> > > +       /**< Maximum number of event queue priority levels by this device.

> > > +        * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS

> capability

> > > +        */

> > > +       uint8_t nb_event_queues;

> > > +       /**< Configured number of event queues for this device */

> > >

> >

> > Is 256 a sufficient number of queues? While various SoCs may have

> > limits, why impose such a small limit architecturally?

> 

> Each event queue potentially can support millions of flows.That way, 256 may

> not be a small limit.The reason to choose queue_id as 8bit to hold all the

> attribute of "struct rte_event" in 128bit. So that SIMD optimization is possible in

> the implementation.

> 

[Hemant] 256 is not an small number of event queue. Please consider event queue as logically equivalent of a scheduler group in ODP. 

> > > +       /**< The maximum number of active flows this queue can track at any

> > > +        * given time. The value must be in the range of

> > > +        * [1 - max_event_queue_flows)] which previously supplied

> > > +        * to rte_event_dev_configure().

> > > +        */

> > > +       uint32_t nb_atomic_order_sequences;

> > > +       /**< The maximum number of outstanding events waiting to be

> > > (egress-)

> > > +        * reordered by this queue. In other words, the number of

> > > + entries

> > > in

> > > +        * this queue’s reorder buffer.The value must be in the range of

> > > +        * [1 - max_event_queue_flows)] which previously supplied

> > > +        * to rte_event_dev_configure().

> > >

> >

> > What happens if this limit is exceeded? While atomic limits are

> > bounded by the number of lcores, the same cannot be said for ordered

> queues.

> > Presumably the queue would refuse further dequeues once this limit is

> > reached until pending reorders are resolved to permit continued processing?

> > If so that should be stated explicitly.

> 

> OK. I will update details.

> 

> >

> >

> > > + *

> > > + *

> > > + * The device start step is the last one and consists of setting

> > > +the event

> > > + * queues to start accepting the events and schedules to event ports.

> > > + *

> > > + * On success, all basic functions exported by the API (event

> > > +enqueue,

> > > + * event dequeue and so on) can be invoked.

> > > + *

> > > + * @param dev_id

> > > + *   Event device identifier

> > > + * @return

> > > + *   - 0: Success, device started.

> > > + *   - <0: Error code of the driver device start function.

> > > + */

> > > +extern int

> > > +rte_event_dev_start(uint8_t dev_id);

> > > +

> > > +/**

> > > + * Stop an event device. The device can be restarted with a call to

> > > + * rte_event_dev_start()

> > > + *

> > > + * @param dev_id

> > > + *   Event device identifier.

> > > + */

> > > +extern void

> > > +rte_event_dev_stop(uint8_t dev_id);

> > >

> >

> > Having this be a void function implies this function cannot fail. Is

> > that assumption always correct?

> 

> Yes. Subsequent rte_event_dev_start() can return error if the implementation

> really have some critical issues on starting the device.

> 

> >

> >

> > > +

> > > +/**

> > > + * Close an event device. The device cannot be restarted!

> > > + *

> > > + * @param dev_id

> > > + *   Event device identifier

> > > + *

> > > + * @return

> > > + *  - 0 on successfully closing device

> > > + *  - <0 on failure to close device  */ extern int
  
Jerin Jacob Oct. 14, 2016, 12:52 p.m. UTC | #4
On Fri, Oct 14, 2016 at 10:30:33AM +0000, Hemant Agrawal wrote:

> > > Am I reading this correctly that there is no way to support an
> > > indefinite waiting capability? Or is this just saying that if a timed
> > > wait is performed there are min/max limits for the wait duration?
> > 
> > Application can wait indefinite if required. see
> > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.
> > 
> > Trivial application may not need different wait values on each dequeue.This is a
> > performance optimization opportunity for implementation.
> 
>  Jerin, It is irrespective of wait configuration, whether you are using per device wait or per dequeuer wait. 
>  Can the value of MAX_U32 or MAX_U64 be treated as infinite weight? 

That will be yet another check in the fast path in the implementation, I
think, for more fine-grained wait scheme. Let application configure the device
with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT so that the application can have
two different function pointer-based implementation for dequeue function
if required.

With RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration, implicitly
MAX_U64 becomes infinite weight as the wait is uint64_t.
I can add this info in v3 if required.

Jerin

> 
> >
  
Eads, Gage Oct. 14, 2016, 3 p.m. UTC | #5
Thanks Jerin, this looks good. I've put a few notes/questions inline.

Thanks,
Gage

>  -----Original Message-----

>  From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]

>  Sent: Tuesday, October 11, 2016 2:30 PM

>  To: dev@dpdk.org

>  Cc: thomas.monjalon@6wind.com; Richardson, Bruce

>  <bruce.richardson@intel.com>; Vangati, Narender

>  <narender.vangati@intel.com>; hemant.agrawal@nxp.com; Eads, Gage

>  <gage.eads@intel.com>; Jerin Jacob <jerin.jacob@caviumnetworks.com>

>  Subject: [dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming

>  model framework for DPDK

>  

>  Thanks to Intel and NXP folks for the positive and constructive feedback

>  I've received so far. Here is the updated RFC(v2).

>  

>  I've attempted to address as many comments as possible.

>  

>  This series adds rte_eventdev.h to the DPDK tree with

>  adequate documentation in doxygen format.

>  

>  Updates are also available online:

>  

>  Related draft header file (this patch):

>  https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h

>  

>  PDF version(doxgen output):

>  https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf

>  

>  Repo:

>  https://github.com/jerinjacobk/libeventdev

>  

>  v1..v2

>  

>  - Added Cavium, Intel, NXP copyrights in header file

>  

>  - Changed the concept of flow queues to flow ids.

>  This is avoid dictating a specific structure to hold the flows.

>  A s/w implementation can do atomic load balancing on multiple

>  flow ids more efficiently than maintaining each event in a specific flow queue.

>  

>  - Change the scheduling group to event queue.

>  A scheduling group is more a stream of events, so an event queue is a better

>   abstraction.

>  

>  - Introduced event port concept, Instead of trying eventdev access to the lcore,

>  a higher level of abstraction called event port is needed which is the

>  application i/f to the eventdev to dequeue and enqueue the events.

>  One or more event queues can be linked to single event port.

>  There can be more than one event port per lcore allowing multiple lightweight

>  threads to have their own i/f into eventdev, if the implementation supports it.

>  An event port will be bound to a lcore or a lightweight thread to keep

>  portable application workflow.

>  An event port abstraction also encapsulates dequeue depth and enqueue depth

>  for

>  a scheduler implementations which can schedule multiple events at a time and

>  output events that can be buffered.

>  

>  - Added configuration options with event queue(nb_atomic_flows,

>  nb_atomic_order_sequences, single consumer etc)

>  and event port(dequeue_queue_depth, enqueue_queue_depth etc) to define

>  the

>  limits on the resource usage.(Useful for optimized software implementation)

>  

>  - Introduced RTE_EVENT_DEV_CAP_QUEUE_QOS and

>  RTE_EVENT_DEV_CAP_EVENT_QOS

>  schemes of priority handling

>  

>  - Added event port to event queue servicing priority.

>  This allows two event ports to connect to the same event queue with

>  different priorities.

>  

>  - Changed the workflow as schedule/dequeue/enqueue.

>  An implementation is free to define schedule as NOOP.

>  A distributed s/w scheduler can use this to schedule events;

>  also a centralized s/w scheduler can make this a NOOP on non-scheduler cores.

>  

>  - Removed Cavium HW specific schedule_from_group API

>  

>  - Removed Cavium HW specific ctxt_update/ctxt_wait APIs.

>   Introduced a more generic "event pinning" concept. i.e

>  If the normal workflow is a dequeue -> do work based on event type ->

>  enqueue,

>  a pin_event argument to enqueue

>  where the pinned event is returned through the normal dequeue)

>  allows application workflow to remain the same whether or not an

>  implementation supports it.

>  

>  - Added dequeue() burst variant

>  

>  - Added the definition of a closed/open system - where open system is memory

>  backed and closed system eventdev has limited capacity.

>  In such systems, it is also useful to denote per event port how many packets

>  can be active in the system.

>  This can serve as a threshold for ethdev like devices so they don't overwhelm

>  core to core events.

>  

>  - Added the option to specify maximum amount of time(in ns) application needs

>  wait on dequeue()

>  

>  - Removed the scheme of expressing the number of flows in log2 format

>  

>  Open item or the item needs improvement.

>  ----------------------------------------

>  - Abstract the differences in event QoS management with different priority

>  schemes

>  available in different HW or SW implementations with portable application

>  workflow.

>  

>  Based on the feedback, there three different kinds of QoS support available in

>  three different HW or SW implementations.

>  1) Priority associated with the event queue

>  2) Priority associated with each event enqueue

>  (Same flow can have two different priority on two separate enqueue)

>  3) Priority associated with the flow(each flow has unique priority)

>  

>  In v2, The differences abstracted based on device capability

>  (RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,

>  RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).

>  This scheme would call for different application workflow for

>  nontrivial QoS-enabled applications.

>  

>  Looking forward to getting comments from both application and driver

>  implementation perspective.

>  

>  /Jerin

>  

>  ---

>   doc/api/doxy-api-index.md          |    1 +

>   doc/api/doxy-api.conf              |    1 +

>   lib/librte_eventdev/rte_eventdev.h | 1204

>  ++++++++++++++++++++++++++++++++++++

>   3 files changed, 1206 insertions(+)

>   create mode 100644 lib/librte_eventdev/rte_eventdev.h

>  

>  diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md

>  index 6675f96..28c1329 100644

>  --- a/doc/api/doxy-api-index.md

>  +++ b/doc/api/doxy-api-index.md

>  @@ -40,6 +40,7 @@ There are many libraries, so their headers may be

>  grouped by topics:

>     [ethdev]             (@ref rte_ethdev.h),

>     [ethctrl]            (@ref rte_eth_ctrl.h),

>     [cryptodev]          (@ref rte_cryptodev.h),

>  +  [eventdev]           (@ref rte_eventdev.h),

>     [devargs]            (@ref rte_devargs.h),

>     [bond]               (@ref rte_eth_bond.h),

>     [vhost]              (@ref rte_virtio_net.h),

>  diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf

>  index 9dc7ae5..9841477 100644

>  --- a/doc/api/doxy-api.conf

>  +++ b/doc/api/doxy-api.conf

>  @@ -41,6 +41,7 @@ INPUT                   = doc/api/doxy-api-index.md \

>                             lib/librte_cryptodev \

>                             lib/librte_distributor \

>                             lib/librte_ether \

>  +                          lib/librte_eventdev \

>                             lib/librte_hash \

>                             lib/librte_ip_frag \

>                             lib/librte_jobstats \

>  diff --git a/lib/librte_eventdev/rte_eventdev.h

>  b/lib/librte_eventdev/rte_eventdev.h

>  new file mode 100644

>  index 0000000..f60e461

>  --- /dev/null

>  +++ b/lib/librte_eventdev/rte_eventdev.h

>  @@ -0,0 +1,1204 @@

>  +/*

>  + *   BSD LICENSE

>  + *

>  + *   Copyright 2016 Cavium.

>  + *   Copyright 2016 Intel Corporation.

>  + *   Copyright 2016 NXP.

>  + *

>  + *   Redistribution and use in source and binary forms, with or without

>  + *   modification, are permitted provided that the following conditions

>  + *   are met:

>  + *

>  + *     * Redistributions of source code must retain the above copyright

>  + *       notice, this list of conditions and the following disclaimer.

>  + *     * Redistributions in binary form must reproduce the above copyright

>  + *       notice, this list of conditions and the following disclaimer in

>  + *       the documentation and/or other materials provided with the

>  + *       distribution.

>  + *     * Neither the name of Cavium nor the names of its

>  + *       contributors may be used to endorse or promote products derived

>  + *       from this software without specific prior written permission.

>  + *

>  + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND

>  CONTRIBUTORS

>  + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT

>  NOT

>  + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND

>  FITNESS FOR

>  + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE

>  COPYRIGHT

>  + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,

>  INCIDENTAL,

>  + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT

>  NOT

>  + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS

>  OF USE,

>  + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED

>  AND ON ANY

>  + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR

>  TORT

>  + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF

>  THE USE

>  + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH

>  DAMAGE.

>  + */

>  +

>  +#ifndef _RTE_EVENTDEV_H_

>  +#define _RTE_EVENTDEV_H_

>  +

>  +/**

>  + * @file

>  + *

>  + * RTE Event Device API

>  + *

>  + * The Event Device API is composed of two parts:

>  + *

>  + * - The application-oriented Event API that includes functions to setup

>  + *   an event device (configure it, setup its queues, ports and start it), to

>  + *   establish the link between queues to port and to receive events, and so on.

>  + *

>  + * - The driver-oriented Event API that exports a function allowing

>  + *   an event poll Mode Driver (PMD) to simultaneously register itself as

>  + *   an event device driver.

>  + *

>  + * Event device components:

>  + *

>  + *                     +-----------------+

>  + *                     | +-------------+ |

>  + *        +-------+    | |    flow 0   | |

>  + *        |Packet |    | +-------------+ |

>  + *        |event  |    | +-------------+ |

>  + *        |       |    | |    flow 1   | |event_port_link(port0, queue0)

>  + *        +-------+    | +-------------+ |     |     +--------+

>  + *        +-------+    | +-------------+ o-----v-----o        |dequeue +------+

>  + *        |Crypto |    | |    flow n   | |           | event  +------->|Core 0|

>  + *        |work   |    | +-------------+ o----+      | port 0 |        |      |

>  + *        |done ev|    |  event queue 0  |    |      +--------+        +------+

>  + *        +-------+    +-----------------+    |

>  + *        +-------+                           |

>  + *        |Timer  |    +-----------------+    |      +--------+

>  + *        |expiry |    | +-------------+ |    +------o        |dequeue +------+

>  + *        |event  |    | |    flow 0   | o-----------o event  +------->|Core 1|

>  + *        +-------+    | +-------------+ |      +----o port 1 |        |      |

>  + *       Event enqueue | +-------------+ |      |    +--------+        +------+

>  + *     o-------------> | |    flow 1   | |      |

>  + *        enqueue(     | +-------------+ |      |

>  + *        queue_id,    |                 |      |    +--------+        +------+

>  + *        flow_id,     | +-------------+ |      |    |        |dequeue |Core 2|

>  + *        sched_type,  | |    flow n   | o-----------o event  +------->|      |

>  + *        event_type,  | +-------------+ |      |    | port 2 |        +------+

>  + *        subev_type,  |  event queue 1  |      |    +--------+

>  + *        event)       +-----------------+      |    +--------+

>  + *                                              |    |        |dequeue +------+

>  + *        +-------+    +-----------------+      |    | event  +------->|Core n|

>  + *        |Core   |    | +-------------+ o-----------o port n |        |      |

>  + *        |(SW)   |    | |    flow 0   | |      |    +--------+        +--+---+

>  + *        |event  |    | +-------------+ |      |                         |

>  + *        +-------+    | +-------------+ |      |                         |

>  + *            ^        | |    flow 1   | |      |                         |

>  + *            |        | +-------------+ o------+                         |

>  + *            |        | +-------------+ |                                |

>  + *            |        | |    flow n   | |                                |

>  + *            |        | +-------------+ |                                |

>  + *            |        |  event queue n  |                                |

>  + *            |        +-----------------+                                |

>  + *            |                                                           |

>  + *            +-----------------------------------------------------------+

>  + *

>  + *

>  + *

>  + * Event device: A hardware or software-based event scheduler.

>  + *

>  + * Event: A unit of scheduling that encapsulates a packet or other datatype

>  + * like SW generated event from the core, Crypto work completion

>  notification,

>  + * Timer expiry event notification etc as well as metadata.

>  + * The metadata includes flow ID, scheduling type, event priority, event_type,

>  + * sub_event_type etc.

>  + *

>  + * Event queue: A queue containing events that are scheduled by the event

>  dev.

>  + * An event queue contains events of different flows associated with

>  scheduling

>  + * types, such as atomic, ordered, or parallel.

>  + *

>  + * Event port: An application's interface into the event dev for enqueue and

>  + * dequeue operations. Each event port can be linked with one or more

>  + * event queues for dequeue operations.

>  + *

>  + * By default, all the functions of the Event Device API exported by a PMD

>  + * are lock-free functions which assume to not be invoked in parallel on

>  + * different logical cores to work on the same target object. For instance,

>  + * the dequeue function of a PMD cannot be invoked in parallel on two logical

>  + * cores to operates on same  event port. Of course, this function

>  + * can be invoked in parallel by different logical cores on different ports.

>  + * It is the responsibility of the upper level application to enforce this rule.

>  + *

>  + * In all functions of the Event API, the Event device is

>  + * designated by an integer >= 0 named the device identifier *dev_id*

>  + *

>  + * At the Event driver level, Event devices are represented by a generic

>  + * data structure of type *rte_event_dev*.

>  + *

>  + * Event devices are dynamically registered during the PCI/SoC device probing

>  + * phase performed at EAL initialization time.

>  + * When an Event device is being probed, a *rte_event_dev* structure and

>  + * a new device identifier are allocated for that device. Then, the

>  + * event_dev_init() function supplied by the Event driver matching the probed

>  + * device is invoked to properly initialize the device.

>  + *

>  + * The role of the device init function consists of resetting the hardware or

>  + * software event driver implementations.

>  + *

>  + * If the device init operation is successful, the correspondence between

>  + * the device identifier assigned to the new device and its associated

>  + * *rte_event_dev* structure is effectively registered.

>  + * Otherwise, both the *rte_event_dev* structure and the device identifier are

>  + * freed.

>  + *

>  + * The functions exported by the application Event API to setup a device

>  + * designated by its device identifier must be invoked in the following order:

>  + *     - rte_event_dev_configure()

>  + *     - rte_event_queue_setup()

>  + *     - rte_event_port_setup()

>  + *     - rte_event_port_link()

>  + *     - rte_event_dev_start()

>  + *

>  + * Then, the application can invoke, in any order, the functions

>  + * exported by the Event API to schedule events, dequeue events, enqueue

>  events,

>  + * change event queue(s) to event port [un]link establishment and so on.

>  + *

>  + * Application may use rte_event_[queue/port]_default_conf_get() to get the

>  + * default configuration to set up an event queue or event port by

>  + * overriding few default values.

>  + *

>  + * If the application wants to change the configuration (i.e. call

>  + * rte_event_dev_configure(), rte_event_queue_setup(), or

>  + * rte_event_port_setup()), it must call rte_event_dev_stop() first to stop the

>  + * device and then do the reconfiguration before calling rte_event_dev_start()

>  + * again. The schedule, enqueue and dequeue functions should not be invoked

>  + * when the device is stopped.

>  + *

>  + * Finally, an application can close an Event device by invoking the

>  + * rte_event_dev_close() function.

>  + *

>  + * Each function of the application Event API invokes a specific function

>  + * of the PMD that controls the target device designated by its device

>  + * identifier.

>  + *

>  + * For this purpose, all device-specific functions of an Event driver are

>  + * supplied through a set of pointers contained in a generic structure of type

>  + * *event_dev_ops*.

>  + * The address of the *event_dev_ops* structure is stored in the

>  *rte_event_dev*

>  + * structure by the device init function of the Event driver, which is

>  + * invoked during the PCI/SoC device probing phase, as explained earlier.

>  + *

>  + * In other words, each function of the Event API simply retrieves the

>  + * *rte_event_dev* structure associated with the device identifier and

>  + * performs an indirect invocation of the corresponding driver function

>  + * supplied in the *event_dev_ops* structure of the *rte_event_dev*

>  structure.

>  + *

>  + * For performance reasons, the address of the fast-path functions of the

>  + * Event driver is not contained in the *event_dev_ops* structure.

>  + * Instead, they are directly stored at the beginning of the *rte_event_dev*

>  + * structure to avoid an extra indirect memory access during their invocation.

>  + *

>  + * RTE event device drivers do not use interrupts for enqueue or dequeue

>  + * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue

>  + * functions to applications.

>  + *

>  + * An event driven based application has following typical workflow on

>  fastpath:

>  + * \code{.c}

>  + *	while (1) {

>  + *

>  + *		rte_event_schedule(dev_id);

>  + *

>  + *		rte_event_dequeue(...);

>  + *

>  + *		(event processing)

>  + *

>  + *		rte_event_enqueue(...);

>  + *	}

>  + * \endcode

>  + *

>  + * The *schedule* operation is intended to do event scheduling, and the

>  + * *dequeue* operation returns the scheduled events. An implementation

>  + * is free to define the semantics between *schedule* and *dequeue*. For

>  + * example, a system based on a hardware scheduler can define its

>  + * rte_event_schedule() to be an NOOP, whereas a software scheduler can use

>  + * the *schedule* operation to schedule events.

>  + *

>  + * The events are injected to event device through *enqueue* operation by

>  + * event producers in the system. The typical event producers are ethdev

>  + * subsystem for generating packet events, core(SW) for generating events

>  based

>  + * on different stages of application processing, cryptodev for generating

>  + * crypto work completion notification etc

>  + *

>  + * The *dequeue* operation gets one or more events from the event ports.

>  + * The application process the events and send to downstream event queue

>  through

>  + * rte_event_enqueue() if it is an intermediate stage of event processing, on

>  + * the final stage, the application may send to different subsystem like ethdev

>  + * to send the packet/event on the wire using ethdev rte_eth_tx_burst() API.

>  + *

>  + */

>  +

>  +#ifdef __cplusplus

>  +extern "C" {

>  +#endif

>  +

>  +#include <rte_pci.h>

>  +#include <rte_dev.h>

>  +#include <rte_devargs.h>

>  +#include <rte_errno.h>

>  +

>  +/**

>  + * Get the total number of event devices that have been successfully

>  + * initialised.

>  + *

>  + * @return

>  + *   The total number of usable event devices.

>  + */

>  +extern uint8_t

>  +rte_event_dev_count(void);

>  +

>  +/**

>  + * Get the device identifier for the named event device.

>  + *

>  + * @param name

>  + *   Event device name to select the event device identifier.

>  + *

>  + * @return

>  + *   Returns event device identifier on success.

>  + *   - <0: Failure to find named event device.

>  + */

>  +extern uint8_t

>  +rte_event_dev_get_dev_id(const char *name);


This return type should be int8_t, or some signed type, to support the failure case.

>  +

>  +/**

>  + * Return the NUMA socket to which a device is connected.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @return

>  + *   The NUMA socket id to which the device is connected or

>  + *   a default of zero if the socket could not be determined.

>  + *   - -1: dev_id value is out of range.

>  + */

>  +extern int

>  +rte_event_dev_socket_id(uint8_t dev_id);

>  +

>  +/* Event device capability bitmap flags */

>  +#define RTE_EVENT_DEV_CAP_QUEUE_QOS        (1 << 0)

>  +/**< Event scheduling prioritization is based on the priority associated with

>  + *  each event queue.

>  + *

>  + *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL

>  + */

>  +#define RTE_EVENT_DEV_CAP_EVENT_QOS        (1 << 1)

>  +/**< Event scheduling prioritization is based on the priority associated with

>  + *  each event. Priority of each event is supplied in *rte_event* structure

>  + *  on each enqueue operation.

>  + *

>  + *  \see rte_event_enqueue()

>  + */

>  +

>  +/**

>  + * Event device information

>  + */

>  +struct rte_event_dev_info {

>  +	const char *driver_name;	/**< Event driver name */

>  +	struct rte_pci_device *pci_dev;	/**< PCI information */

>  +	uint32_t min_dequeue_wait_ns;

>  +	/**< Minimum supported global dequeue wait delay(ns) by this device

>  */

>  +	uint32_t max_dequeue_wait_ns;

>  +	/**< Maximum supported global dequeue wait delay(ns) by this device

>  */

>  +	uint32_t dequeue_wait_ns;

>  +	/**< Configured global dequeue wait delay(ns) for this device */

>  +	uint8_t max_event_queues;

>  +	/**< Maximum event_queues supported by this device */

>  +	uint32_t max_event_queue_flows;

>  +	/**< Maximum supported flows in an event queue by this device*/

>  +	uint8_t max_event_queue_priority_levels;

>  +	/**< Maximum number of event queue priority levels by this device.

>  +	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS

>  capability

>  +	 */

>  +	uint8_t nb_event_queues;

>  +	/**< Configured number of event queues for this device */

>  +	uint8_t max_event_priority_levels;

>  +	/**< Maximum number of event priority levels by this device.

>  +	 * Valid when the device has RTE_EVENT_DEV_CAP_EVENT_QOS

>  capability

>  +         */

>  +	uint8_t max_event_ports;

>  +	/**< Maximum number of event ports supported by this device */

>  +	uint8_t nb_event_ports;

>  +	/**< Configured number of event ports for this device */

>  +	uint8_t max_event_port_dequeue_queue_depth;

>  +	/**< Maximum dequeue queue depth for any event port.

>  +	 * Implementations can schedule N events at a time to an event port.

>  +	 * A device that does not support bulk dequeue will set this as 1.

>  +	 * \see rte_event_port_setup()

>  +	 */

>  +	uint32_t max_event_port_enqueue_queue_depth;

>  +	/**< Maximum enqueue queue depth for any event port.

>  Implementations

>  +	 * can batch N events at a time to enqueue through event port

>  +	 * \see rte_event_port_setup()

>  +	 */

>  +	int32_t max_num_events;

>  +	/**< A *closed system* event dev has a limit on the number of events

>  it

>  +	 * can manage at a time. An *open system* event dev does not have a

>  +	 * limit and will specify this as -1.

>  +	 */

>  +	uint32_t event_dev_cap;

>  +	/**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/

>  +};

>  +

>  +/**

>  + * Retrieve the contextual information of an event device.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + *

>  + * @param[out] dev_info

>  + *   A pointer to a structure of type *rte_event_dev_info* to be filled with the

>  + *   contextual information of the device.

>  + *

>  + */

>  +extern void

>  +rte_event_dev_info_get(uint8_t dev_id, struct rte_event_dev_info

>  *dev_info);


I'm wondering if this return type should be int, so we can return an error if the dev_id is invalid.

>  +

>  +/* Event device configuration bitmap flags */

>  +#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1 << 0)

>  +/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.

>  + *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()

>  + */

>  +

>  +/** Event device configuration structure */

>  +struct rte_event_dev_config {

>  +	uint32_t dequeue_wait_ns;

>  +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this

>  device.

>  +	 * This value should be in the range of *min_dequeue_wait_ns* and

>  +	 * *max_dequeue_wait_ns* which previously provided in

>  +	 * rte_event_dev_info_get()

>  +	 * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT

>  +	 */

>  +	int32_t nb_events_limit;

>  +	/**< Applies to *closed system* event dev only. This field indicates a

>  +	 * limit to ethdev-like devices to limit the number of events injected

>  +	 * into the system to not overwhelm core-to-core events.

>  +	 * This value cannot exceed the *max_num_events* which previously

>  +	 * provided in rte_event_dev_info_get()

>  +	 */

>  +	uint8_t nb_event_queues;

>  +	/**< Number of event queues to configure on this device.

>  +	 * This value cannot exceed the *max_event_queues* which previously

>  +	 * provided in rte_event_dev_info_get()

>  +	 */

>  +	uint8_t nb_event_ports;

>  +	/**< Number of event ports to configure on this device.

>  +	 * This value cannot exceed the *max_event_ports* which previously

>  +	 * provided in rte_event_dev_info_get()

>  +	 */

>  +	uint32_t event_dev_cfg;

>  +	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/

>  +};

>  +

>  +/**

>  + * Configure an event device.

>  + *

>  + * This function must be invoked first before any other function in the

>  + * API. This function can also be re-invoked when a device is in the

>  + * stopped state.

>  + *

>  + * The caller may use rte_event_dev_info_get() to get the capability of each

>  + * resources available for this event device.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device to configure.

>  + * @param config

>  + *   The event device configuration structure.

>  + *

>  + * @return

>  + *   - 0: Success, device configured.

>  + *   - <0: Error code returned by the driver configuration function.

>  + */

>  +extern int

>  +rte_event_dev_configure(uint8_t dev_id, struct rte_event_dev_config

>  *config);

>  +

>  +

>  +/* Event queue specific APIs */

>  +

>  +#define RTE_EVENT_QUEUE_PRIORITY_HIGHEST   0

>  +/**< Highest event queue priority */

>  +#define RTE_EVENT_QUEUE_PRIORITY_NORMAL    128

>  +/**< Normal event queue priority */

>  +#define RTE_EVENT_QUEUE_PRIORITY_LOWEST    255

>  +/**< Lowest event queue priority */

>  +

>  +/* Event queue configuration bitmap flags */

>  +#define RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER    (1 << 0)

>  +/**< This event queue links only to a single event port.

>  + *

>  + *  \see rte_event_port_setup(), rte_event_port_link()

>  + */

>  +

>  +/** Event queue configuration structure */

>  +struct rte_event_queue_conf {

>  +	uint32_t nb_atomic_flows;

>  +	/**< The maximum number of active flows this queue can track at any

>  +	 * given time. The value must be in the range of

>  +	 * [1 - max_event_queue_flows)] which previously supplied

>  +	 * to rte_event_dev_configure().

>  +	 */

>  +	uint32_t nb_atomic_order_sequences;

>  +	/**< The maximum number of outstanding events waiting to be

>  (egress-)

>  +	 * reordered by this queue. In other words, the number of entries in

>  +	 * this queue’s reorder buffer.The value must be in the range of

>  +	 * [1 - max_event_queue_flows)] which previously supplied

>  +	 * to rte_event_dev_configure().

>  +	 */

>  +	uint32_t event_queue_cfg; /**< Queue config

>  flags(EVENT_QUEUE_CFG_) */

>  +	uint8_t priority;

>  +	/**< Priority for this event queue relative to other event queues.

>  +	 * The requested priority should in the range of

>  +	 * [RTE_EVENT_QUEUE_PRIORITY_HIGHEST,

>  RTE_EVENT_QUEUE_PRIORITY_LOWEST].

>  +	 * The implementation shall normalize the requested priority to

>  +	 * event device supported priority value.

>  +	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS

>  capability

>  +	 */

>  +};

>  +

>  +/**

>  + * Retrieve the default configuration information of an event queue

>  designated

>  + * by its *queue_id* from the event driver for an event device.

>  + *

>  + * This function intended to be used in conjunction with

>  rte_event_queue_setup()

>  + * where caller needs to set up the queue by overriding few default values.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param queue_id

>  + *   The index of the event queue to get the configuration information.

>  + *   The value must be in the range [0, nb_event_queues - 1]

>  + *   previously supplied to rte_event_dev_configure().

>  + * @param[out] queue_conf

>  + *   The pointer to the default event queue configuration data.

>  + *

>  + * \see rte_event_queue_setup()

>  + *

>  + */

>  +extern void

>  +rte_event_queue_default_conf_get(uint8_t dev_id, uint8_t queue_id,

>  +				 struct rte_event_queue_conf *queue_conf);

>  +

>  +/**

>  + * Allocate and set up an event queue for an event device.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param queue_id

>  + *   The index of the event queue to setup. The value must be in the range

>  + *   [0, nb_event_queues - 1] previously supplied to

>  rte_event_dev_configure().

>  + * @param queue_conf

>  + *   The pointer to the configuration data to be used for the event queue.

>  + *   NULL value is allowed, in which case default configuration	used.

>  + *

>  + * \see rte_event_queue_default_conf_get()

>  + *

>  + * @return

>  + *   - 0: Success, event queue correctly set up.

>  + *   - <0: event queue configuration failed

>  + */

>  +extern int

>  +rte_event_queue_setup(uint8_t dev_id, uint8_t queue_id,

>  +		      struct rte_event_queue_conf *queue_conf);

>  +

>  +/**

>  + * Get the number of event queues on a specific event device

>  + *

>  + * @param dev_id

>  + *   Event device identifier.

>  + * @return

>  + *   - The number of configured event queues

>  + */

>  +extern uint16_t

>  +rte_event_queue_count(uint8_t dev_id);

>  +

>  +/**

>  + * Get the priority of the event queue on a specific event device

>  + *

>  + * @param dev_id

>  + *   Event device identifier.

>  + * @param queue_id

>  + *   Event queue identifier.

>  + * @return

>  + *   - If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then the

>  + *    configured priority of the event queue in

>  + *    [RTE_EVENT_QUEUE_PRIORITY_HIGHEST,

>  RTE_EVENT_QUEUE_PRIORITY_LOWEST] range

>  + *    else the value one

>  + */

>  +extern uint8_t

>  +rte_event_queue_priority(uint8_t dev_id, uint8_t queue_id);

>  +

>  +/* Event port specific APIs */

>  +

>  +/** Event port configuration structure */

>  +struct rte_event_port_conf {

>  +	int32_t new_event_threshold;

>  +	/**< A backpressure threshold for new event enqueues on this port.

>  +	 * Use for *closed system* event dev where event capacity is limited,

>  +	 * and cannot exceed the capacity of the event dev.

>  +	 * Configuring ports with different thresholds can make higher priority

>  +	 * traffic less likely to  be backpressured.

>  +	 * For example, a port used to inject NIC Rx packets into the event dev

>  +	 * can have a lower threshold so as not to overwhelm the device,

>  +	 * while ports used for worker pools can have a higher threshold.

>  +	 */

>  +	uint8_t dequeue_queue_depth;

>  +	/**< Configure number of bulk dequeues for this event port.

>  +	 * This value cannot exceed the

>  *max_event_port_dequeue_queue_depth*

>  +	 * which previously supplied to rte_event_dev_configure()

>  +	 */

>  +	uint8_t enqueue_queue_depth;

>  +	/**< Configure number of bulk enqueues for this event port.

>  +	 * This value cannot exceed the

>  *max_event_port_enqueue_queue_depth*

>  +	 * which previously supplied to rte_event_dev_configure()

>  +	 */

>  +};

>  +

>  +/**

>  + * Retrieve the default configuration information of an event port designated

>  + * by its *port_id* from the event driver for an event device.

>  + *

>  + * This function intended to be used in conjunction with

>  rte_event_port_setup()

>  + * where caller needs to set up the port by overriding few default values.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param port_id

>  + *   The index of the event port to get the configuration information.

>  + *   The value must be in the range [0, nb_event_ports - 1]

>  + *   previously supplied to rte_event_dev_configure().

>  + * @param[out] port_conf

>  + *   The pointer to the default event port configuration data

>  + *

>  + * \see rte_event_port_setup()

>  + *

>  + */

>  +extern void

>  +rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,

>  +				struct rte_event_port_conf *port_conf);

>  +

>  +/**

>  + * Allocate and set up an event port for an event device.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param port_id

>  + *   The index of the event port to setup. The value must be in the range

>  + *   [0, nb_event_ports - 1] previously supplied to rte_event_dev_configure().

>  + * @param port_conf

>  + *   The pointer to the configuration data to be used for the queue.

>  + *   NULL value is allowed, in which case default configuration	used.

>  + *

>  + * \see rte_event_port_default_conf_get()

>  + *

>  + * @return

>  + *   - 0: Success, event port correctly set up.

>  + *   - <0: Port configuration failed

>  + *   - (-EDQUOT) Quota exceeded(Application tried to link the queue

>  configured

>  + *   with RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one

>  event ports)

>  + */

>  +extern int

>  +rte_event_port_setup(uint8_t dev_id, uint8_t port_id,

>  +		     struct rte_event_port_conf *port_conf);

>  +

>  +/**

>  + * Get the number of dequeue queue depth configured for event port

>  designated

>  + * by its *port_id* on a specific event device

>  + *

>  + * @param dev_id

>  + *   Event device identifier.

>  + * @param port_id

>  + *   Event port identifier.

>  + * @return

>  + *   - The number of configured dequeue queue depth

>  + *

>  + * \see rte_event_dequeue_burst()

>  + */

>  +extern uint8_t

>  +rte_event_port_dequeue_depth(uint8_t dev_id, uint8_t port_id);

>  +

>  +/**

>  + * Get the number of enqueue queue depth configured for event port

>  designated

>  + * by its *port_id* on a specific event device

>  + *

>  + * @param dev_id

>  + *   Event device identifier.

>  + * @param port_id

>  + *   Event port identifier.

>  + * @return

>  + *   - The number of configured enqueue queue depth

>  + *

>  + * \see rte_event_enqueue_burst()

>  + */

>  +extern uint8_t

>  +rte_event_port_enqueue_depth(uint8_t dev_id, uint8_t port_id);

>  +

>  +/**

>  + * Get the number of ports on a specific event device

>  + *

>  + * @param dev_id

>  + *   Event device identifier.

>  + * @return

>  + *   - The number of configured ports

>  + */

>  +extern uint8_t

>  +rte_event_port_count(uint8_t dev_id);

>  +

>  +/**

>  + * Start an event device.

>  + *

>  + * The device start step is the last one and consists of setting the event

>  + * queues to start accepting the events and schedules to event ports.

>  + *

>  + * On success, all basic functions exported by the API (event enqueue,

>  + * event dequeue and so on) can be invoked.

>  + *

>  + * @param dev_id

>  + *   Event device identifier

>  + * @return

>  + *   - 0: Success, device started.

>  + *   - <0: Error code of the driver device start function.

>  + */

>  +extern int

>  +rte_event_dev_start(uint8_t dev_id);

>  +

>  +/**

>  + * Stop an event device. The device can be restarted with a call to

>  + * rte_event_dev_start()

>  + *

>  + * @param dev_id

>  + *   Event device identifier.

>  + */

>  +extern void

>  +rte_event_dev_stop(uint8_t dev_id);

>  +

>  +/**

>  + * Close an event device. The device cannot be restarted!

>  + *

>  + * @param dev_id

>  + *   Event device identifier

>  + *

>  + * @return

>  + *  - 0 on successfully closing device

>  + *  - <0 on failure to close device

>  + */

>  +extern int

>  +rte_event_dev_close(uint8_t dev_id);

>  +

>  +/* Scheduler type definitions */

>  +#define RTE_SCHED_TYPE_ORDERED		0

>  +/**< Ordered scheduling

>  + *

>  + * Events from an ordered flow of an event queue can be scheduled to

>  multiple

>  + * ports for concurrent processing while maintaining the original event order.

>  + * This scheme enables the user to achieve high single flow throughput by

>  + * avoiding SW synchronization for ordering between ports which bound to

>  cores.

>  + *

>  + * The source flow ordering from an event queue is maintained when events

>  are

>  + * enqueued to their destination queue within the same ordered flow context.

>  + * An event port holds the context until application call rte_event_dequeue()

>  + * from the same port, which implicitly releases the context.

>  + * User may allow the scheduler to release the context earlier than that

>  + * by calling rte_event_release()

>  + *

>  + * Events from the source queue appear in their original order when dequeued

>  + * from a destination queue.

>  + * Event ordering is based on the received event(s), but also other

>  + * (newly allocated or stored) events are ordered when enqueued within the

>  same

>  + * ordered context. Events not enqueued (e.g. released or stored) within the

>  + * context are  considered missing from reordering and are skipped at this

>  time

>  + * (but can be ordered again within another context).

>  + *

>  + * \see rte_event_dequeue(), rte_event_release()

>  + */

>  +

>  +#define RTE_SCHED_TYPE_ATOMIC		1

>  +/**< Atomic scheduling

>  + *

>  + * Events from an atomic flow of an event queue can be scheduled only to a

>  + * single port at a time. The port is guaranteed to have exclusive (atomic)

>  + * access to the associated flow context, which enables the user to avoid SW

>  + * synchronization. Atomic flows also help to maintain event ordering

>  + * since only one port at a time can process events from a flow of an

>  + * event queue.

>  + *

>  + * The atomic queue synchronization context is dedicated to the port until

>  + * application call rte_event_dequeue() from the same port, which implicitly

>  + * releases the context. User may allow the scheduler to release the context

>  + * earlier than that by calling rte_event_release()

>  + *

>  + * \see rte_event_dequeue(), rte_event_release()

>  + */

>  +

>  +#define RTE_SCHED_TYPE_PARALLEL		2

>  +/**< Parallel scheduling

>  + *

>  + * The scheduler performs priority scheduling, load balancing, etc. functions

>  + * but does not provide additional event synchronization or ordering.

>  + * It is free to schedule events from a single parallel flow of an event queue

>  + * to multiple events ports for concurrent processing.

>  + * The application is responsible for flow context synchronization and

>  + * event ordering (SW synchronization).

>  + */

>  +

>  +/* Event types to classify the event source */

>  +#define RTE_EVENT_TYPE_ETHDEV		0x0

>  +/**< The event generated from ethdev subsystem */

>  +#define RTE_EVENT_TYPE_CRYPTODEV	0x1

>  +/**< The event generated from crypodev subsystem */

>  +#define RTE_EVENT_TYPE_TIMERDEV		0x2

>  +/**< The event generated from timerdev subsystem */

>  +#define RTE_EVENT_TYPE_CORE		0x3

>  +/**< The event generated from core.

>  + * Application may use *sub_event_type* to further classify the event

>  + */

>  +#define RTE_EVENT_TYPE_MAX		0x10

>  +/**< Maximum number of event types */

>  +

>  +/* Event priority */

>  +#define RTE_EVENT_PRIORITY_HIGHEST      0

>  +/**< Highest event priority */

>  +#define RTE_EVENT_PRIORITY_NORMAL       128

>  +/**< Normal event priority */

>  +#define RTE_EVENT_PRIORITY_LOWEST       255

>  +/**< Lowest event priority */

>  +

>  +/**

>  + * The generic *rte_event* structure to hold the event attributes

>  + * for dequeue and enqueue operation

>  + */

>  +struct rte_event {

>  +	/** WORD0 */

>  +	RTE_STD_C11

>  +        union {

>  +		uint64_t u64;

>  +		/** Event attributes for dequeue or enqueue operation */

>  +		struct {

>  +			uint32_t flow_id:24;

>  +			/**< Targeted flow identifier for the enqueue and

>  +			 * dequeue operation.

>  +			 * The value must be in the range of

>  +			 * [1 - max_event_queue_flows)] which

>  +			 * previously supplied to rte_event_dev_configure().

>  +			 */

>  +			uint32_t queue_id:8;

>  +			/**< Targeted event queue identifier for the enqueue

>  or

>  +			 * dequeue operation.

>  +			 * The value must be in the range of

>  +			 * [0, nb_event_queues - 1] which previously supplied

>  to

>  +			 * rte_event_dev_configure().

>  +			 */

>  +			uint8_t  sched_type;

>  +			/**< Scheduler synchronization type

>  (RTE_SCHED_TYPE_)

>  +			 * associated with flow id on a given event queue

>  +			 * for the enqueue and dequeue operation.

>  +			 */

>  +			uint8_t  event_type;

>  +			/**< Event type to classify the event source. */

>  +			uint8_t  sub_event_type;

>  +			/**< Sub-event types based on the event source.

>  +			 * \see RTE_EVENT_TYPE_CORE

>  +			 */

>  +			uint8_t  priority;

>  +			/**< Event priority relative to other events in the

>  +			 * event queue. The requested priority should in the

>  +			 * range of  [RTE_EVENT_PRIORITY_HIGHEST,

>  +			 * RTE_EVENT_PRIORITY_LOWEST].

>  +			 * The implementation shall normalize the requested

>  +			 * priority to supported priority value.

>  +			 * Valid when the device has

>  RTE_EVENT_DEV_CAP_EVENT_QOS

>  +			 * capability.

>  +			 */

>  +		};

>  +	};

>  +	/** WORD1 */

>  +	RTE_STD_C11

>  +	union {

>  +		uintptr_t event;

>  +		/**< Opaque event pointer */

>  +		struct rte_mbuf *mbuf;

>  +		/**< mbuf pointer if dequeued event is associated with mbuf

>  */

>  +	};

>  +};

>  +

>  +/**

>  + * Schedule one or more events in the event dev.

>  + *

>  + * An event dev implementation may define this is a NOOP, for instance if

>  + * the event dev performs its scheduling in hardware.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + */

>  +extern void

>  +rte_event_schedule(uint8_t dev_id);


One idea: Have the function return the number of scheduled packets (or 0 for implementations that do scheduling in hardware). This could be a helpful diagnostic for the software scheduler.

>  +

>  +/**

>  + * Enqueue the event object supplied in the *rte_event* structure on an

>  + * event device designated by its *dev_id* through the event port specified by

>  + * *port_id*. The event object specifies the event queue on which this

>  + * event will be enqueued.

>  + *

>  + * @param dev_id

>  + *   Event device identifier.

>  + * @param port_id

>  + *   The identifier of the event port.

>  + * @param ev

>  + *   Pointer to struct rte_event

>  + * @param pin_event

>  + *   Hint to the scheduler that the event can be pinned to the same port for

>  + *   the next scheduling stage. For implementations that support it, this

>  + *   allows the same core to process the next stage in the pipeline for a given

>  + *   event, taking advantage of cache locality. The pinned event will be

>  + *   received through rte_event_dequeue(). This is a hint and the event is

>  + *   not guaranteed to be pinned to the port. This hint is valid only when the

>  + *   event is dequeued with rte_event_dequeue() followed by

>  rte_event_enqueue().

>  + *

>  + * @return

>  + *  - 0 on success

>  + *  - <0 on failure. Failure can occur if the event port's output queue is

>  + *     backpressured, for instance.

>  + */

>  +extern int

>  +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev,

>  +		  bool pin_event);

>  +

>  +/**

>  + * Enqueue a burst of events objects supplied in *rte_event* structure on an

>  + * event device designated by its *dev_id* through the event port specified by

>  + * *port_id*. Each event object specifies the event queue on which it

>  + * will be enqueued.

>  + *

>  + * The rte_event_enqueue_burst() function is invoked to enqueue

>  + * multiple event objects.

>  + * It is the burst variant of rte_event_enqueue() function.

>  + *

>  + * The *num* parameter is the number of event objects to enqueue which are

>  + * supplied in the *ev* array of *rte_event* structure.

>  + *

>  + * The rte_event_enqueue_burst() function returns the number of

>  + * events objects it actually enqueued. A return value equal to *num* means

>  + * that all event objects have been enqueued.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param port_id

>  + *   The identifier of the event port.

>  + * @param ev

>  + *   An array of *num* pointers to *rte_event* structure

>  + *   which contain the event object enqueue operations to be processed.

>  + * @param num

>  + *   The number of event objects to enqueue, typically number of

>  + *   rte_event_port_enqueue_depth() available for this port.

>  + * @param pin_event

>  + *   Hint to the scheduler that the event can be pinned to the same port for

>  + *   the next scheduling stage. For implementations that support it, this

>  + *   allows the same core to process the next stage in the pipeline for a given

>  + *   event, taking advantage of cache locality. The pinned event will be

>  + *   received through rte_event_dequeue(). This is a hint and the event is

>  + *   not guaranteed to be pinned to the port. This hint is valid only when the

>  + *   event is dequeued with rte_event_dequeue() followed by

>  rte_event_enqueue().

>  + *

>  + * @return

>  + *   The number of event objects actually enqueued on the event device. The

>  + *   return value can be less than the value of the *num* parameter when the

>  + *   event devices queue is full or if invalid parameters are specified in a

>  + *   *rte_event*. If return value is less than *num*, the remaining events at

>  + *   the end of ev[] are not consumed, and the caller has to take care of them.

>  + *

>  + * \see rte_event_enqueue(), rte_event_port_enqueue_depth()

>  + */

>  +extern int

>  +rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,

>  +			struct rte_event ev[], int num, bool pin_event);

>  +

>  +/**

>  + * Converts nanoseconds to *wait* value for rte_event_dequeue()

>  + *

>  + * If the device is configured with

>  RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then

>  + * application can use this function to convert wait value in nanoseconds to

>  + * implementations specific wait value supplied in rte_event_dequeue()

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param ns

>  + *   Wait time in nanosecond

>  + *

>  + * @return

>  + * Value for the *wait* parameter in rte_event_dequeue() function

>  + *

>  + * \see rte_event_dequeue(), RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT

>  + * \see rte_event_dev_configure()

>  + *

>  + */

>  +extern uint64_t

>  +rte_event_dequeue_wait_time(uint8_t dev_id, uint64_t ns);

>  +

>  +/**

>  + * Dequeue an event from the event port specified by *port_id* on the

>  + * event device designated by its *dev_id*.

>  + *

>  + * rte_event_dequeue() does not dictate the specifics of scheduling algorithm

>  as

>  + * each eventdev driver may have different criteria to schedule an event.

>  + * However, in general, from an application perspective scheduler may use the

>  + * following scheme to dispatch an event to the port.

>  + *

>  + * 1) Selection of event queue based on

>  + *   a) The list of event queues are linked to the event port.

>  + *   b) If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then

>  event

>  + *   queue selection from list is based on event queue priority relative to

>  + *   other event queue supplied as *priority* in rte_event_queue_setup()

>  + *   c) If the device has RTE_EVENT_DEV_CAP_EVENT_QOS capability then

>  event

>  + *   queue selection from the list is based on event priority supplied as

>  + *   *priority* in rte_event_enqueue_burst()

>  + * 2) Selection of event

>  + *   a) The number of flows available in selected event queue.

>  + *   b) Schedule type method associated with the event

>  + *

>  + * On a successful dequeue, the event port holds flow id and schedule type

>  + * context associated with the dispatched event. The context is automatically

>  + * released in the next rte_event_dequeue() invocation, or rte_event_release()

>  + * can be called to release the context early.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param port_id

>  + *   The identifier of the event port.

>  + * @param[out] ev

>  + *   Pointer to struct rte_event. On successful event dispatch, implementation

>  + *   updates the event attributes.

>  + * @param wait

>  + *   0 - no-wait, returns immediately if there is no event.

>  + *   >0 - wait for the event, if the device is configured with

>  + *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait

>  until

>  + *   the event available or *wait* time.

>  + *   if the device is not configured with

>  RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT

>  + *   then this function will wait until the event available or *dequeue_wait_ns*

>  + *   ns which was previously supplied to rte_event_dev_configure()

>  + *

>  + * @return

>  + * When true, a valid event has been dispatched by the scheduler.

>  + *

>  + */

>  +extern bool

>  +rte_event_dequeue(uint8_t dev_id, uint8_t port_id,

>  +		  struct rte_event *ev, uint64_t wait);

>  +

>  +/**

>  + * Dequeue a burst of events objects from the event port designated by its

>  + * *event_port_id*, on an event device designated by its *dev_id*.

>  + *

>  + * The rte_event_dequeue_burst() function is invoked to dequeue

>  + * multiple event objects. It is the burst variant of rte_event_dequeue()

>  + * function.

>  + *

>  + * The *num* parameter is the maximum number of event objects to dequeue

>  which

>  + * are returned in the *ev* array of *rte_event* structure.

>  + *

>  + * The rte_event_dequeue_burst() function returns the number of

>  + * events objects it actually dequeued. A return value equal to

>  + * *num* means that all event objects have been dequeued.

>  + *

>  + * The number of events dequeued is the number of scheduler contexts held

>  by

>  + * this port. These contexts are automatically released in the next

>  + * rte_event_dequeue() invocation, or rte_event_release() can be called once

>  + * per event to release the contexts early.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param port_id

>  + *   The identifier of the event port.

>  + * @param[out] ev

>  + *   An array of *num* pointers to *rte_event* structure which is populated

>  + *   with the dequeued event objects.

>  + * @param num

>  + *   The maximum number of event objects to dequeue, typically number of

>  + *   rte_event_port_dequeue_depth() available for this port.

>  + * @param wait

>  + *   0 - no-wait, returns immediately if there is no event.

>  + *   >0 - wait for the event, if the device is configured with

>  + *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait

>  until the

>  + *   event available or *wait* time.

>  + *   if the device is not configured with

>  RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT

>  + *   then this function will wait until the event available or *dequeue_wait_ns*

>  + *   ns which was previously supplied to rte_event_dev_configure()

>  + *

>  + * @return

>  + * The number of event objects actually dequeued from the port. The return

>  + * value can be less than the value of the *num* parameter when the

>  + * event port's queue is not full.

>  + *

>  + * \see rte_event_dequeue(), rte_event_port_dequeue_depth()

>  + */

>  +extern int

>  +rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id,

>  +			struct rte_event *ev, int num, uint64_t wait);

>  +

>  +/**

>  + * Release the current flow context associated with a schedule type which

>  + * dequeued from a given event queue though the event port designated by

>  + * its *port_id*

>  + *

>  + * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*

>  + * then this function hints the scheduler that the user has completed critical

>  + * section processing in the current atomic context.

>  + * The scheduler is now allowed to schedule events from the same flow from

>  + * an event queue to another port. However, the context may be still held

>  + * until the next rte_event_dequeue() or rte_event_dequeue_burst() call, this

>  + * call allows but does not force the scheduler to release the context early.

>  + *

>  + * Early atomic context release may increase parallelism and thus system

>  + * performance, but the user needs to design carefully the split into critical

>  + * vs non-critical sections.

>  + *

>  + * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*

>  + * then this function hints the scheduler that the user has done all that need

>  + * to maintain event order in the current ordered context.

>  + * The scheduler is allowed to release the ordered context of this port and

>  + * avoid reordering any following enqueues.

>  + *

>  + * Early ordered context release may increase parallelism and thus system

>  + * performance.

>  + *

>  + * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*

>  + * or no scheduling context is held then this function may be an NOOP,

>  + * depending on the implementation.

>  + *

>  + * If multiple events are dequeued with rte_event_dequeue_burst(),

>  + * rte_event_release() will release each flow context associated with a

>  + * schedule type of an event though *index*, it denotes the order in

>  + * which it was dequeued with rte_event_dequeue_burst()

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + * @param port_id

>  + *   The identifier of the event port.

>  + * @param index

>  + *   The index of the event that dequeued with rte_event_dequeue_burst()

>  + *   which needs to release. The value zero used if the event dequeued with

>  + *   rte_event_dequeue()

>  + *

>  + *  \see rte_event_dequeue(), rte_event_dequeue_burst()

>  + */

>  +extern void

>  +rte_event_release(uint8_t dev_id, uint8_t port_id, uint8_t index);

>  +

>  +#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST  0

>  +/**< Highest event queue servicing priority */

>  +#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL   128

>  +/**< Normal event queue servicing priority */

>  +#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST   255

>  +/**< Lowest event queue servicing priority */

>  +

>  +/** Structure to hold the queue to port link establishment attributes */

>  +struct rte_event_queue_link {

>  +	uint8_t queue_id;

>  +	/**< Event queue identifier to select the source queue to link */

>  +	uint8_t priority;

>  +	/**< The priority of the event queue for this event port.

>  +	 * The priority defines the event port's servicing priority for

>  +	 * event queue, which may be ignored by an implementation.

>  +	 * The requested priority should in the range of

>  +	 * [RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST,

>  +	 * RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST].

>  +	 * The implementation shall normalize the requested priority to

>  +	 * implementation supported priority value.

>  +	 */

>  +};

>  +

>  +/**

>  + * Link multiple source event queues supplied in *rte_event_queue_link*

>  + * structure as *queue_id* to the destination event port designated by its

>  + * *port_id* on the event device designated by its *dev_id*.

>  + *

>  + * The link establishment shall enable the event port *port_id* from

>  + * receiving events from the specified event queue *queue_id*

>  + *

>  + * An event queue may link to one or more event ports.

>  + * The number of links can be established from an event queue to event port is

>  + * implementation defined.

>  + *

>  + * Event queue(s) to event port link establishment can be changed at runtime

>  + * without re-configuring the device to support scaling and to reduce the

>  + * latency of critical work by establishing the link with more event ports

>  + * at runtime.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + *

>  + * @param port_id

>  + *   Event port identifier to select the destination port to link.

>  + *

>  + * @param link

>  + *   An array of *num* pointers to *rte_event_queue_link* structure

>  + *   which contain the event queue to event port link establishment attributes.

>  + *   NULL value is allowed, in which case this function links all the configured

>  + *   event queues *nb_event_queues* which previously supplied to

>  + *   rte_event_dev_configure() to the event port *port_id* with normal

>  servicing

>  + *   priority(RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL).

>  + *

>  + * @param num

>  + *   The number of links to establish

>  + *

>  + * @return

>  + * The number of links actually established on the event device. The return

>  + * value can be less than the value of the *num* parameter when the

>  + * implementation has the limitation on specific queue to port link

>  + * establishment or if invalid parameters are specified

>  + * in a *rte_event_queue_link*.

>  + * If the return value is less than *num*, the remaining links at the end of

>  + * link[] are not established, and the caller has to take care of them.

>  + * If return value is less than *num* then implementation shall update the

>  + * rte_errno accordingly, Possible rte_errno values are

>  + * (-EDQUOT) Quota exceeded(Application tried to link the queue configured

>  with

>  + *  RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one event

>  ports)

>  + * (-EINVAL) Invalid parameter

>  + *

>  + */

>  +extern int

>  +rte_event_port_link(uint8_t dev_id, uint8_t port_id,

>  +		    struct rte_event_queue_link link[], int num);

>  +

>  +/**

>  + * Unlink multiple source event queues supplied in *queues* from the

>  destination

>  + * event port designated by its *port_id* on the event device designated

>  + * by its *dev_id*.

>  + *

>  + * The unlink establishment shall disable the event port *port_id* from

>  + * receiving events from the specified event queue *queue_id*

>  + *

>  + * Event queue(s) to event port unlink establishment can be changed at

>  runtime

>  + * without re-configuring the device.

>  + *

>  + * @param dev_id

>  + *   The identifier of the device.

>  + *

>  + * @param port_id

>  + *   Event port identifier to select the destination port to unlink.

>  + *

>  + * @param queues

>  + *   An array of *num* event queues to be unlinked from the event port.

>  + *   NULL value is allowed, in which case this function unlinks all the

>  + *   event queue(s) from the event port *port_id*.

>  + *

>  + * @param num

>  + *   The number of unlinks to establish

>  + *

>  + * @return

>  + * The number of unlinks actually established on the event device. The return

>  + * value can be less than the value of the *num* parameter when the

>  + * implementation has the limitation on specific queue to port unlink

>  + * establishment or if invalid parameters are specified.

>  + * If the return value is less than *num*, the remaining queues at the end of

>  + * queues[] are not established, and the caller has to take care of them.

>  + * If return value is less than *num* then implementation shall update the

>  + * rte_errno accordingly, Possible rte_errno values are

>  + * (-EINVAL) Invalid parameter

>  + *

>  + */

>  +extern int

>  +rte_event_port_unlink(uint8_t dev_id, uint8_t port_id,

>  +		    uint8_t queues[], int num);

>  +

>  +#ifdef __cplusplus

>  +}

>  +#endif

>  +

>  +#endif /* _RTE_EVENTDEV_H_ */

>  --

>  2.5.5
  
Bruce Richardson Oct. 14, 2016, 4:02 p.m. UTC | #6
On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> Thanks to Intel and NXP folks for the positive and constructive feedback
> I've received so far. Here is the updated RFC(v2).
> 
> I've attempted to address as many comments as possible.
> 
> This series adds rte_eventdev.h to the DPDK tree with
> adequate documentation in doxygen format.
> 
> Updates are also available online:
> 
> Related draft header file (this patch):
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> 
> PDF version(doxgen output):
> https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> 
> Repo:
> https://github.com/jerinjacobk/libeventdev
> 

Thanks for all the work on this.

<snip>
> +/* Event device configuration bitmap flags */
> +#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1 << 0)
> +/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.
> + *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()
> + */

Can you clarify why this is needed? If an app wants to use the same
dequeue wait times for all dequeues can it not specify that itself via
the wait time parameter, rather than having a global dequeue wait value?

/Bruce
  
Jerin Jacob Oct. 17, 2016, 4:18 a.m. UTC | #7
On Fri, Oct 14, 2016 at 03:00:57PM +0000, Eads, Gage wrote:
> Thanks Jerin, this looks good. I've put a few notes/questions inline.

Thanks Gage.

> 
> >  +
> >  +/**
> >  + * Get the device identifier for the named event device.
> >  + *
> >  + * @param name
> >  + *   Event device name to select the event device identifier.
> >  + *
> >  + * @return
> >  + *   Returns event device identifier on success.
> >  + *   - <0: Failure to find named event device.
> >  + */
> >  +extern uint8_t
> >  +rte_event_dev_get_dev_id(const char *name);
> 
> This return type should be int8_t, or some signed type, to support the failure case.

Makes sense. I will change to int to make consistent with rte_cryptodev_get_dev_id()

> 
> >  +};
> >  +
> >  +/**
> >  + * Schedule one or more events in the event dev.
> >  + *
> >  + * An event dev implementation may define this is a NOOP, for instance if
> >  + * the event dev performs its scheduling in hardware.
> >  + *
> >  + * @param dev_id
> >  + *   The identifier of the device.
> >  + */
> >  +extern void
> >  +rte_event_schedule(uint8_t dev_id);
> 
> One idea: Have the function return the number of scheduled packets (or 0 for implementations that do scheduling in hardware). This could be a helpful diagnostic for the software scheduler.

How about returning an implementation specific value ?
Rather than defining certain function associated with returned value.
Just to  make sure it works with all HW/SW implementations. Something like below,

/**
 * Schedule one or more events in the event dev.
 *
 * An event dev implementation may define this is a NOOP, for instance if
 * the event dev performs its scheduling in hardware.
 *
 * @param dev_id
 *   The identifier of the device.
 * @return
 *   Implementation specific value from the event driver for diagnostic purpose
 */
extern int
rte_event_schedule(uint8_t dev_id);
  
Jerin Jacob Oct. 17, 2016, 5:10 a.m. UTC | #8
On Fri, Oct 14, 2016 at 05:02:21PM +0100, Bruce Richardson wrote:
> On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> > Thanks to Intel and NXP folks for the positive and constructive feedback
> > I've received so far. Here is the updated RFC(v2).
> > 
> > I've attempted to address as many comments as possible.
> > 
> > This series adds rte_eventdev.h to the DPDK tree with
> > adequate documentation in doxygen format.
> > 
> > Updates are also available online:
> > 
> > Related draft header file (this patch):
> > https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> > 
> > PDF version(doxgen output):
> > https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> > 
> > Repo:
> > https://github.com/jerinjacobk/libeventdev
> > 
> 
> Thanks for all the work on this.

Thanks

> 
> <snip>
> > +/* Event device configuration bitmap flags */
> > +#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1 << 0)
> > +/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.
> > + *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()
> > + */
> 
> Can you clarify why this is needed? If an app wants to use the same
> dequeue wait times for all dequeues can it not specify that itself via
> the wait time parameter, rather than having a global dequeue wait value?

The rational for choosing this scheme to have optimized
rte_event_dequeue() for some implementation without loosing application
portability and need.

We mostly have two different types of HW schemes to define the wait time

HW1) Have only global wait value for the eventdev across all the
dequeue
HW2) Per queue wait value

In-terms of applications,
APP1) Trivial application does not need different dequeue value for each
dequeue
APP2) Non trivial applications does need different dequeue values

This config option can take advantage if application demands only APP1
on HW1 without loosing application potablity.(i.e if application demand
for APP2 scheme then HW1 based implementation can have different function
pointer to implement dequeue function)

The overall theme of the proposal to have more configuration options(like
RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER) to have high performance SW/HW implementations
  
Eads, Gage Oct. 17, 2016, 8:26 p.m. UTC | #9
>  -----Original Message-----
>  From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
>  Sent: Sunday, October 16, 2016 11:18 PM
>  To: Eads, Gage <gage.eads@intel.com>
>  Cc: dev@dpdk.org; thomas.monjalon@6wind.com; Richardson, Bruce
>  <bruce.richardson@intel.com>; Vangati, Narender
>  <narender.vangati@intel.com>; hemant.agrawal@nxp.com
>  Subject: Re: [dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven
>  programming model framework for DPDK
>  
>  On Fri, Oct 14, 2016 at 03:00:57PM +0000, Eads, Gage wrote:
>  > Thanks Jerin, this looks good. I've put a few notes/questions inline.
>  
>  Thanks Gage.
>  
>  >
>  > >  +
>  > >  +/**
>  > >  + * Get the device identifier for the named event device.
>  > >  + *
>  > >  + * @param name
>  > >  + *   Event device name to select the event device identifier.
>  > >  + *
>  > >  + * @return
>  > >  + *   Returns event device identifier on success.
>  > >  + *   - <0: Failure to find named event device.
>  > >  + */
>  > >  +extern uint8_t
>  > >  +rte_event_dev_get_dev_id(const char *name);
>  >
>  > This return type should be int8_t, or some signed type, to support the failure
>  case.
>  
>  Makes sense. I will change to int to make consistent with
>  rte_cryptodev_get_dev_id()
>  
>  >
>  > >  +};
>  > >  +
>  > >  +/**
>  > >  + * Schedule one or more events in the event dev.
>  > >  + *
>  > >  + * An event dev implementation may define this is a NOOP, for
>  > > instance if  + * the event dev performs its scheduling in hardware.
>  > >  + *
>  > >  + * @param dev_id
>  > >  + *   The identifier of the device.
>  > >  + */
>  > >  +extern void
>  > >  +rte_event_schedule(uint8_t dev_id);
>  >
>  > One idea: Have the function return the number of scheduled packets (or 0 for
>  implementations that do scheduling in hardware). This could be a helpful
>  diagnostic for the software scheduler.
>  
>  How about returning an implementation specific value ?
>  Rather than defining certain function associated with returned value.
>  Just to  make sure it works with all HW/SW implementations. Something like
>  below,
>  
>  /**
>   * Schedule one or more events in the event dev.
>   *
>   * An event dev implementation may define this is a NOOP, for instance if
>   * the event dev performs its scheduling in hardware.
>   *
>   * @param dev_id
>   *   The identifier of the device.
>   * @return
>   *   Implementation specific value from the event driver for diagnostic purpose
>   */
>  extern int
>  rte_event_schedule(uint8_t dev_id);
>  
>  

That's fine by me.

I also had a comment on the return value of rte_event_dev_info_get() in my previous email: "I'm wondering if this return type should be int, so we can return an error if the dev_id is invalid."

What do you think?

Thanks,
Gage

>  
>
  
Jerin Jacob Oct. 18, 2016, 11:19 a.m. UTC | #10
On Mon, Oct 17, 2016 at 08:26:33PM +0000, Eads, Gage wrote:
> 
> 
> >  -----Original Message-----
> >  From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> >  Sent: Sunday, October 16, 2016 11:18 PM
> >  To: Eads, Gage <gage.eads@intel.com>
> >  Cc: dev@dpdk.org; thomas.monjalon@6wind.com; Richardson, Bruce
> >  <bruce.richardson@intel.com>; Vangati, Narender
> >  <narender.vangati@intel.com>; hemant.agrawal@nxp.com
> >  Subject: Re: [dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven
> >  programming model framework for DPDK
> >  
> >  On Fri, Oct 14, 2016 at 03:00:57PM +0000, Eads, Gage wrote:
> >  > Thanks Jerin, this looks good. I've put a few notes/questions inline.
> >  
> >  Thanks Gage.
> >  
> >  >
> >  > >  +
> >  > >  +/**
> >  > >  + * Get the device identifier for the named event device.
> >  > >  + *
> >  > >  + * @param name
> >  > >  + *   Event device name to select the event device identifier.
> >  > >  + *
> >  > >  + * @return
> >  > >  + *   Returns event device identifier on success.
> >  > >  + *   - <0: Failure to find named event device.
> >  > >  + */
> >  > >  +extern uint8_t
> >  > >  +rte_event_dev_get_dev_id(const char *name);
> >  >
> >  > This return type should be int8_t, or some signed type, to support the failure
> >  case.
> >  
> >  Makes sense. I will change to int to make consistent with
> >  rte_cryptodev_get_dev_id()
> >  
> >  >
> >  > >  +};
> >  > >  +
> >  > >  +/**
> >  > >  + * Schedule one or more events in the event dev.
> >  > >  + *
> >  > >  + * An event dev implementation may define this is a NOOP, for
> >  > > instance if  + * the event dev performs its scheduling in hardware.
> >  > >  + *
> >  > >  + * @param dev_id
> >  > >  + *   The identifier of the device.
> >  > >  + */
> >  > >  +extern void
> >  > >  +rte_event_schedule(uint8_t dev_id);
> >  >
> >  > One idea: Have the function return the number of scheduled packets (or 0 for
> >  implementations that do scheduling in hardware). This could be a helpful
> >  diagnostic for the software scheduler.
> >  
> >  How about returning an implementation specific value ?
> >  Rather than defining certain function associated with returned value.
> >  Just to  make sure it works with all HW/SW implementations. Something like
> >  below,
> >  
> >  /**
> >   * Schedule one or more events in the event dev.
> >   *
> >   * An event dev implementation may define this is a NOOP, for instance if
> >   * the event dev performs its scheduling in hardware.
> >   *
> >   * @param dev_id
> >   *   The identifier of the device.
> >   * @return
> >   *   Implementation specific value from the event driver for diagnostic purpose
> >   */
> >  extern int
> >  rte_event_schedule(uint8_t dev_id);
> >  
> >  
> 
> That's fine by me.

OK. I will change it in v3

> 
> I also had a comment on the return value of rte_event_dev_info_get() in my previous email: "I'm wondering if this return type should be int, so we can return an error if the dev_id is invalid."
> 
> What do you think?

The void return was based on cryptodev_info_get().I think, it makes
sense to return "int". I will change it in v3.


> 
> Thanks,
> Gage
> 
> >  
> >
  
Jerin Jacob Oct. 25, 2016, 5:49 p.m. UTC | #11
On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> Thanks to Intel and NXP folks for the positive and constructive feedback
> I've received so far. Here is the updated RFC(v2).
> 
> I've attempted to address as many comments as possible.
> 
> This series adds rte_eventdev.h to the DPDK tree with
> adequate documentation in doxygen format.
> 
> Updates are also available online:
> 
> Related draft header file (this patch):
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> 
> PDF version(doxgen output):
> https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> 
> Repo:
> https://github.com/jerinjacobk/libeventdev
>

Hi Community,

So far, I have received constructive feedback from Intel, NXP and Linaro folks.
Let me know, if anyone else interested in contributing to the definition of eventdev?

If there are no major issues in proposed spec, then Cavium would like work on
implementing and up-streaming the common code(lib/librte_eventdev/) and
an associated HW driver.(Requested minor changes of v2 will be addressed
in next version).

We are planning to submit the work for 17.02 or 17.05 release(based on
how implementation goes).

/Jerin
Cavium
  
Van Haaren, Harry Oct. 26, 2016, 12:11 p.m. UTC | #12
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> 
> So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> Let me know, if anyone else interested in contributing to the definition of eventdev?
> 
> If there are no major issues in proposed spec, then Cavium would like work on
> implementing and up-streaming the common code(lib/librte_eventdev/) and
> an associated HW driver.(Requested minor changes of v2 will be addressed
> in next version).

Hi All,

I will propose a minor change to the rte_event struct, allowing some bits to be implementation specific. Currently the rte_event struct has no space to allow an implementation store any metadata about the event. For software performance it would be really helpful if there are some bits available for the implementation to keep some flags about each event.

I suggest to rework the struct as below which opens 6 bits that were otherwise wasted, and define them as implementation specific. By implementation specific it is understood that the implementation can overwrite any information stored in those bits, and the application must not expect the data to remain after the event is scheduled.

OLD:
struct rte_event {
	uint32_t flow_id:24;
	uint32_t queue_id:8;
	uint8_t  sched_type; /* Note only 2 bits of 8 are required */

NEW:
struct rte_event {
	uint32_t flow_id:24;
	uint32_t sched_type:2; /* reduced size : but 2 bits is enough for the enqueue types Ordered,Atomic,Parallel.*/
	uint32_t implementation:6; /* available for implementation specific metadata */
	uint8_t queue_id; /* still 8 bits as before */


Thoughts? -Harry
  
Jerin Jacob Oct. 26, 2016, 12:24 p.m. UTC | #13
On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> 
> Hi All,
> 
> I will propose a minor change to the rte_event struct, allowing some bits to be implementation specific. Currently the rte_event struct has no space to allow an implementation store any metadata about the event. For software performance it would be really helpful if there are some bits available for the implementation to keep some flags about each event.

OK.

> 
> I suggest to rework the struct as below which opens 6 bits that were otherwise wasted, and define them as implementation specific. By implementation specific it is understood that the implementation can overwrite any information stored in those bits, and the application must not expect the data to remain after the event is scheduled.
> 
> OLD:
> struct rte_event {
> 	uint32_t flow_id:24;
> 	uint32_t queue_id:8;
> 	uint8_t  sched_type; /* Note only 2 bits of 8 are required */
> 
> NEW:
> struct rte_event {
> 	uint32_t flow_id:24;
> 	uint32_t sched_type:2; /* reduced size : but 2 bits is enough for the enqueue types Ordered,Atomic,Parallel.*/
> 	uint32_t implementation:6; /* available for implementation specific metadata */
> 	uint8_t queue_id; /* still 8 bits as before */
> 
> 
> Thoughts? -Harry

Looks good to me. I will add it in v3.
  
Bruce Richardson Oct. 26, 2016, 12:43 p.m. UTC | #14
On Tue, Oct 25, 2016 at 11:19:05PM +0530, Jerin Jacob wrote:
> On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> > Thanks to Intel and NXP folks for the positive and constructive feedback
> > I've received so far. Here is the updated RFC(v2).
> > 
> > I've attempted to address as many comments as possible.
> > 
> > This series adds rte_eventdev.h to the DPDK tree with
> > adequate documentation in doxygen format.
> > 
> > Updates are also available online:
> > 
> > Related draft header file (this patch):
> > https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> > 
> > PDF version(doxgen output):
> > https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> > 
> > Repo:
> > https://github.com/jerinjacobk/libeventdev
> >
> 
> Hi Community,
> 
> So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> Let me know, if anyone else interested in contributing to the definition of eventdev?
> 
> If there are no major issues in proposed spec, then Cavium would like work on
> implementing and up-streaming the common code(lib/librte_eventdev/) and
> an associated HW driver.(Requested minor changes of v2 will be addressed
> in next version).
> 
> We are planning to submit the work for 17.02 or 17.05 release(based on
> how implementation goes).
> 

Hi Jerin,

thanks for driving this. In terms of the common code framework, when
would you see that you might have something to upstream for that? As you
know, we've been working on a software implementation which we are now
looking to move to the eventdev APIs, and which also needs this common
code to support it. 

If it can accelerate this effort, we can perhaps provide as an RFC
the common code part that we have implemented for our work, or else we
are happy to migrate to use common code you provide if it can be
upstreamed fairly soon.

Regards,
/Bruce
  
Bruce Richardson Oct. 26, 2016, 12:54 p.m. UTC | #15
On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > 
> > > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > > 
> > > If there are no major issues in proposed spec, then Cavium would like work on
> > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > in next version).
> > 
> > Hi All,
> > 
> > I will propose a minor change to the rte_event struct, allowing some bits to be implementation specific. Currently the rte_event struct has no space to allow an implementation store any metadata about the event. For software performance it would be really helpful if there are some bits available for the implementation to keep some flags about each event.
> 
> OK.
> 
> > 
> > I suggest to rework the struct as below which opens 6 bits that were otherwise wasted, and define them as implementation specific. By implementation specific it is understood that the implementation can overwrite any information stored in those bits, and the application must not expect the data to remain after the event is scheduled.
> > 
> > OLD:
> > struct rte_event {
> > 	uint32_t flow_id:24;
> > 	uint32_t queue_id:8;
> > 	uint8_t  sched_type; /* Note only 2 bits of 8 are required */
> > 
> > NEW:
> > struct rte_event {
> > 	uint32_t flow_id:24;
> > 	uint32_t sched_type:2; /* reduced size : but 2 bits is enough for the enqueue types Ordered,Atomic,Parallel.*/
> > 	uint32_t implementation:6; /* available for implementation specific metadata */
> > 	uint8_t queue_id; /* still 8 bits as before */
> > 
> > 
> > Thoughts? -Harry
> 
> Looks good to me. I will add it in v3.
> 
Thanks. One other suggestion is that it might be useful to provide
support for having typed queues explicitly in the API. Right now, when
you create an queue, the queue_conf structure takes as parameters how
many atomic flows that are needed for the queue, or how many reorder
slots need to be reserved for it. This implicitly hints at the type of
traffic which will be sent to the queue, but I'm wondering if it's
better to make it explicit. There are certain optimisations that can be
looked at if we know that a queue only handles packets of a particular
type. [Not having to handle reordering when pulling events from a core
can be a big win for software!].

How about adding: "allowed_event_types" as a field to
rte_event_queue_conf, with possible values:
* atomic
* ordered
* parallel
* mixed - allowing all 3 types. I think allowing 2 of three types might
    make things too complicated.

An open question would then be how to behave when the queue type and
requested event type conflict. We can either throw an error, or just
ignore the event type and always treat enqueued events as being of the
queue type. I prefer the latter, because it's faster not having to
error-check, and it pushes the responsibility on the app to know what
it's doing.

/Bruce
  
Jerin Jacob Oct. 26, 2016, 5:30 p.m. UTC | #16
On Wed, Oct 26, 2016 at 01:43:25PM +0100, Bruce Richardson wrote:
> On Tue, Oct 25, 2016 at 11:19:05PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> > > Thanks to Intel and NXP folks for the positive and constructive feedback
> > > I've received so far. Here is the updated RFC(v2).
> > > 
> > > I've attempted to address as many comments as possible.
> > > 
> > > This series adds rte_eventdev.h to the DPDK tree with
> > > adequate documentation in doxygen format.
> > > 
> > > Updates are also available online:
> > > 
> > > Related draft header file (this patch):
> > > https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> > > 
> > > PDF version(doxgen output):
> > > https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> > > 
> > > Repo:
> > > https://github.com/jerinjacobk/libeventdev
> > >
> > 
> > Hi Community,
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> > 
> > We are planning to submit the work for 17.02 or 17.05 release(based on
> > how implementation goes).
> > 
> 
> Hi Jerin,

Hi Bruce,

> 
> thanks for driving this. In terms of the common code framework, when
> would you see that you might have something to upstream for that? As you
> know, we've been working on a software implementation which we are now
> looking to move to the eventdev APIs, and which also needs this common
> code to support it. 
> 
> If it can accelerate this effort, we can perhaps provide as an RFC
> the common code part that we have implemented for our work, or else we
> are happy to migrate to use common code you provide if it can be
> upstreamed fairly soon.

I have already started the common code framework. I will send the common code
as RFC in couple of days with vdev and pci bus interface.

> 
> Regards,
> /Bruce
  
Vincent Jardin Oct. 26, 2016, 6:37 p.m. UTC | #17
Le 26 octobre 2016 2:11:26 PM "Van Haaren, Harry" 
<harry.van.haaren@intel.com> a écrit :

>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
>>
>> So far, I have received constructive feedback from Intel, NXP and Linaro folks.
>> Let me know, if anyone else interested in contributing to the definition of 
>> eventdev?
>>
>> If there are no major issues in proposed spec, then Cavium would like work on
>> implementing and up-streaming the common code(lib/librte_eventdev/) and
>> an associated HW driver.(Requested minor changes of v2 will be addressed
>> in next version).
>
> Hi All,
>
> I will propose a minor change to the rte_event struct, allowing some bits 
> to be implementation specific. Currently the rte_event struct has no space 
> to allow an implementation store any metadata about the event. For software 
> performance it would be really helpful if there are some bits available for 
> the implementation to keep some flags about each event.
>
> I suggest to rework the struct as below which opens 6 bits that were 
> otherwise wasted, and define them as implementation specific. By 
> implementation specific it is understood that the implementation can 
> overwrite any information stored in those bits, and the application must 
> not expect the data to remain after the event is scheduled.
>
> OLD:
> struct rte_event {
> 	uint32_t flow_id:24;
> 	uint32_t queue_id:8;
> 	uint8_t  sched_type; /* Note only 2 bits of 8 are required */
>
> NEW:
> struct rte_event {
> 	uint32_t flow_id:24;
> 	uint32_t sched_type:2; /* reduced size : but 2 bits is enough for the 
> enqueue types Ordered,Atomic,Parallel.*/
> 	uint32_t implementation:6; /* available for implementation specific 
> metadata */
> 	uint8_t queue_id; /* still 8 bits as before */

Bitfileds are efficients on Octeon. What's about other CPUs you have in 
mind? x86 is not as efficient.


>
>
> Thoughts? -Harry
  
Jerin Jacob Oct. 28, 2016, 3:01 a.m. UTC | #18
On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> Thanks. One other suggestion is that it might be useful to provide
> support for having typed queues explicitly in the API. Right now, when
> you create an queue, the queue_conf structure takes as parameters how
> many atomic flows that are needed for the queue, or how many reorder
> slots need to be reserved for it. This implicitly hints at the type of
> traffic which will be sent to the queue, but I'm wondering if it's
> better to make it explicit. There are certain optimisations that can be
> looked at if we know that a queue only handles packets of a particular
> type. [Not having to handle reordering when pulling events from a core
> can be a big win for software!].

If it helps in SW implementation, then I think we can add this in queue
configuration. 

> 
> How about adding: "allowed_event_types" as a field to
> rte_event_queue_conf, with possible values:
> * atomic
> * ordered
> * parallel
> * mixed - allowing all 3 types. I think allowing 2 of three types might
>     make things too complicated.
> 
> An open question would then be how to behave when the queue type and
> requested event type conflict. We can either throw an error, or just
> ignore the event type and always treat enqueued events as being of the
> queue type. I prefer the latter, because it's faster not having to
> error-check, and it pushes the responsibility on the app to know what
> it's doing.

How about making default as "mixed" and let application configures what
is not required?. That way application responsibility is clear.
something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
with default.

/Jerin


> 
> /Bruce
  
Bruce Richardson Oct. 28, 2016, 8:36 a.m. UTC | #19
On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > Thanks. One other suggestion is that it might be useful to provide
> > support for having typed queues explicitly in the API. Right now, when
> > you create an queue, the queue_conf structure takes as parameters how
> > many atomic flows that are needed for the queue, or how many reorder
> > slots need to be reserved for it. This implicitly hints at the type of
> > traffic which will be sent to the queue, but I'm wondering if it's
> > better to make it explicit. There are certain optimisations that can be
> > looked at if we know that a queue only handles packets of a particular
> > type. [Not having to handle reordering when pulling events from a core
> > can be a big win for software!].
> 
> If it helps in SW implementation, then I think we can add this in queue
> configuration. 
> 
> > 
> > How about adding: "allowed_event_types" as a field to
> > rte_event_queue_conf, with possible values:
> > * atomic
> > * ordered
> > * parallel
> > * mixed - allowing all 3 types. I think allowing 2 of three types might
> >     make things too complicated.
> > 
> > An open question would then be how to behave when the queue type and
> > requested event type conflict. We can either throw an error, or just
> > ignore the event type and always treat enqueued events as being of the
> > queue type. I prefer the latter, because it's faster not having to
> > error-check, and it pushes the responsibility on the app to know what
> > it's doing.
> 
> How about making default as "mixed" and let application configures what
> is not required?. That way application responsibility is clear.
> something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
> with default.
> 
I suppose it could work, but why bother doing that? If an app knows it's
only going to use one traffic type, why not let it just state what it
will do rather than try to specify what it won't do. If mixed is needed,
then it's easy enough to specify - and we can make it the zero/default
value too.

Our software implementation for now, only supports one type per queue -
which we suspect should meet a lot of use-cases. We'll have to see about
adding in mixed types in future.

/Bruce
  
Jerin Jacob Oct. 28, 2016, 9:06 a.m. UTC | #20
On Fri, Oct 28, 2016 at 09:36:46AM +0100, Bruce Richardson wrote:
> On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > > On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > > > > > -----Original Message-----
> > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > Thanks. One other suggestion is that it might be useful to provide
> > > support for having typed queues explicitly in the API. Right now, when
> > > you create an queue, the queue_conf structure takes as parameters how
> > > many atomic flows that are needed for the queue, or how many reorder
> > > slots need to be reserved for it. This implicitly hints at the type of
> > > traffic which will be sent to the queue, but I'm wondering if it's
> > > better to make it explicit. There are certain optimisations that can be
> > > looked at if we know that a queue only handles packets of a particular
> > > type. [Not having to handle reordering when pulling events from a core
> > > can be a big win for software!].
> > 
> > If it helps in SW implementation, then I think we can add this in queue
> > configuration. 
> > 
> > > 
> > > How about adding: "allowed_event_types" as a field to
> > > rte_event_queue_conf, with possible values:
> > > * atomic
> > > * ordered
> > > * parallel
> > > * mixed - allowing all 3 types. I think allowing 2 of three types might
> > >     make things too complicated.
> > > 
> > > An open question would then be how to behave when the queue type and
> > > requested event type conflict. We can either throw an error, or just
> > > ignore the event type and always treat enqueued events as being of the
> > > queue type. I prefer the latter, because it's faster not having to
> > > error-check, and it pushes the responsibility on the app to know what
> > > it's doing.
> > 
> > How about making default as "mixed" and let application configures what
> > is not required?. That way application responsibility is clear.
> > something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
> > with default.
> > 
> I suppose it could work, but why bother doing that? If an app knows it's
> only going to use one traffic type, why not let it just state what it
> will do rather than try to specify what it won't do. If mixed is needed,

My thought was more inline with ethdev spec, like, ref-count is default,
if application need exception then set ETH_TXQ_FLAGS_NOREFCOUNT. But it is OK, if
you need other way.

> then it's easy enough to specify - and we can make it the zero/default
> value too.

OK. Then we will make MIX as zero/default and add "allowed_event_types" in
event queue config.

/Jerin

> 
> Our software implementation for now, only supports one type per queue -
> which we suspect should meet a lot of use-cases. We'll have to see about
> adding in mixed types in future.
> 
> /Bruce
  
Van Haaren, Harry Oct. 28, 2016, 1:10 p.m. UTC | #21
> From: Vincent Jardin [mailto:vincent.jardin@6wind.com]

> Sent: Wednesday, October 26, 2016 7:37 PM

> Le 26 octobre 2016 2:11:26 PM "Van Haaren, Harry"

> <harry.van.haaren@intel.com> a écrit :

> 

> >> -----Original Message-----

> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob

> >>

> >> So far, I have received constructive feedback from Intel, NXP and Linaro folks.

> >> Let me know, if anyone else interested in contributing to the definition of

> >> eventdev?

> >>

> >> If there are no major issues in proposed spec, then Cavium would like work on

> >> implementing and up-streaming the common code(lib/librte_eventdev/) and

> >> an associated HW driver.(Requested minor changes of v2 will be addressed

> >> in next version).

> >

> > Hi All,

> >

> > I will propose a minor change to the rte_event struct, allowing some bits

> > to be implementation specific. Currently the rte_event struct has no space

> > to allow an implementation store any metadata about the event. For software

> > performance it would be really helpful if there are some bits available for

> > the implementation to keep some flags about each event.

> >

> > I suggest to rework the struct as below which opens 6 bits that were

> > otherwise wasted, and define them as implementation specific. By

> > implementation specific it is understood that the implementation can

> > overwrite any information stored in those bits, and the application must

> > not expect the data to remain after the event is scheduled.

> >

> > OLD:

> > struct rte_event {

> > 	uint32_t flow_id:24;

> > 	uint32_t queue_id:8;

> > 	uint8_t  sched_type; /* Note only 2 bits of 8 are required */

> >

> > NEW:

> > struct rte_event {

> > 	uint32_t flow_id:24;

> > 	uint32_t sched_type:2; /* reduced size : but 2 bits is enough for the

> > enqueue types Ordered,Atomic,Parallel.*/

> > 	uint32_t implementation:6; /* available for implementation specific

> > metadata */

> > 	uint8_t queue_id; /* still 8 bits as before */

> 

> Bitfileds are efficients on Octeon. What's about other CPUs you have in

> mind? x86 is not as efficient.


Given the rte_event struct is 16 bytes and there's no free space to use, I see no alternative than using bitfields in this case. Wecloming suggestions of a better way to layout the structure to avoid the bitfield.

Regards, -Harry
  
Van Haaren, Harry Oct. 28, 2016, 1:48 p.m. UTC | #22
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> Sent: Tuesday, October 25, 2016 6:49 PM
<snip>
> 
> Hi Community,
> 
> So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> Let me know, if anyone else interested in contributing to the definition of eventdev?
> 
> If there are no major issues in proposed spec, then Cavium would like work on
> implementing and up-streaming the common code(lib/librte_eventdev/) and
> an associated HW driver.(Requested minor changes of v2 will be addressed
> in next version).


Hi All,

I've been looking at the eventdev API from a use-case point of view, and I'm unclear on a how the API caters for two uses. I have simplified these as much as possible, think of them as a theoretical unit-test for the API :)


Fragmentation:
1. Dequeue 8 packets
2. Process 2 packets
3. Processing 3rd, this packet needs fragmentation into two packets
4. Process remaining 5 packets as normal

What function calls does the application make to achieve this?
In particular, I'm referring to how can the scheduler know that the 3rd packet is the one being fragmented, and how to keep packet order valid. 


Dropping packets:
1. Dequeue 8 packets
2. Process 2 packets
3. Processing 3rd, this packet needs to be dropped
4. Process remaining 5 packets as normal

What function calls does the application make to achieve this?
Again, in particular how does the scheduler know that the 3rd packet is being dropped.


Regards, -Harry
  
Bruce Richardson Oct. 28, 2016, 2:16 p.m. UTC | #23
On Fri, Oct 28, 2016 at 02:48:57PM +0100, Van Haaren, Harry wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Tuesday, October 25, 2016 6:49 PM
> <snip>
> > 
> > Hi Community,
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> 
> 
> Hi All,
> 
> I've been looking at the eventdev API from a use-case point of view, and I'm unclear on a how the API caters for two uses. I have simplified these as much as possible, think of them as a theoretical unit-test for the API :)
> 
> 
> Fragmentation:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs fragmentation into two packets
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> In particular, I'm referring to how can the scheduler know that the 3rd packet is the one being fragmented, and how to keep packet order valid. 
> 
> 
> Dropping packets:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs to be dropped
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> Again, in particular how does the scheduler know that the 3rd packet is being dropped.
> 
> 
> Regards, -Harry

Hi,

these questions apply particularly to reordered which has a lot more
complications than the other types in terms of sending packets back into
the scheduler. However, atomic types will still suffer from problems
with things the way they are - again if we assume a burst of 8 packets,
then to forward those packets, we need to re-enqueue them again to the
scheduler, and also then send 8 releases to the scheduler as well, to
release the atomic locks for those packets.
This means that for each packet we have to send two messages to a
scheduler core, something that is really inefficient.

This number of messages is critical for any software implementation, as
the cost of moving items core-to-core is going to be a big bottleneck
(perhaps the biggest bottleneck) in the system. It's for this reason we
need to use burst APIs - as with rte_rings.

How we have solved this in our implementation, is to allow there to be
an event operation type. The four operations we implemented are as below
(using packet as a synonym for event here, since these would mostly
apply to packets flowing through a system):

* NEW     - just a regular enqueue of a packet, without any previous context
* FORWARD - enqueue a packet, and mark the flow processing for the
            equivalent packet that was dequeued as completed, i.e.
	    release any atomic locks, or reorder this packet with
	    respect to any other outstanding packets from the event queue.
* DROP    - this is roughtly equivalent to the existing "release" API call,
            except that having it as an enqueue type allows us to
	    release multiple items in a single call, and also to mix
	    releases with new packets and forwarded packets
* PARTIAL - this indicates that the packet being enqueued should be
	    treated according to the context of the current packet, but
	    that that context should not be released/completed by the
	    enqueue of this packet. This only really applies for
	    reordered events, and is needed to do fragmentation and or
	    multicast of packets with reordering.


Therefore, I think we need to use some of the bits just freed up in the
event structure to include an enqueue operation type. Without it, I just
can't see how the API can ever support burst operation on packets.

Regards,
/Bruce
  
Jerin Jacob Nov. 2, 2016, 8:06 a.m. UTC | #24
On Fri, Oct 28, 2016 at 01:48:57PM +0000, Van Haaren, Harry wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Tuesday, October 25, 2016 6:49 PM
> <snip>
> > 
> > Hi Community,
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> 
> 
> Hi All,
> 
> I've been looking at the eventdev API from a use-case point of view, and I'm unclear on a how the API caters for two uses. I have simplified these as much as possible, think of them as a theoretical unit-test for the API :)
> 
> 
> Fragmentation:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs fragmentation into two packets
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> In particular, I'm referring to how can the scheduler know that the 3rd packet is the one being fragmented, and how to keep packet order valid. 
> 

OK. I will try to share my views on IP fragmentation on event _HW_
models(at least on Cavium HW) then we can see, how we can converge.

First, The fragmentation specific logic should be decoupled from the event
model as it specific to packet and L3 layer(Not specific to generic event)

Now, let us consider the fragmentation handling with non-burst case and single flow.
The following text outlines the event flow

a)Setup an event device with single event queue
b)Link multiple ports to single event queue
c)Event producer enqueues p0..p7 packets to event queue with ORDERED
type.(let's assume p2 packet needs to be fragmented i.e application
needs to create p2.0 and p2.1 from p2)
d)Since it is an ORDERED type, p0 to p7 packets are distributed to multiple
ports in parallel(assigned to each lcore or lightweight thread)
e) each lcore/lightweight thread get the packet from designated event port
and process them in parallel and enqueue back to ATOMIC type to maintain
ordering
f)The one lcore dequeues the p2 packet, understands it needs to be
fragmented due to MTU size etc. So it calls rte_ipv4_fragment_packet()
and store the fragmented packet p2.0 and p2.1 in private area of p2 mbuf.
and as usual like other workers, it enqueues p2 to atomic queue for maintaining
the order.
g)On the atomic flow, when lcore dequeues packets, then it comes in order p0..p7.
The application sends p0 to p7 on the wire. When application checks the p2 mbuf
private area it understands it is fragmented and then sends p2.0 and p2.1
on the wire.

OR

skip the fragmentation step in (f) and in step (g),
while processing the p2, run over rte_ipv4_fragment_packet() and split the packet
and transmit the packets(in case application don't want to deal with mbuf private area)

Now, When it comes to BURST scheme. We are planning to create a SW
structure as a virtual event port and associate N (N=rte_event_port_dequeue_depth())
physical HW event ports to the virtual port.
That way, it just come as an extension to non burst API and on the
release call have explicit "index" and identify the physical event port
associated with the virtual port.

/Jerin

> 
> Dropping packets:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs to be dropped
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> Again, in particular how does the scheduler know that the 3rd packet is being dropped.

rte_event_release(..,..,3)??

> 
> 
> Regards, -Harry
  
Jerin Jacob Nov. 2, 2016, 8:59 a.m. UTC | #25
On Fri, Oct 28, 2016 at 03:16:18PM +0100, Bruce Richardson wrote:
> On Fri, Oct 28, 2016 at 02:48:57PM +0100, Van Haaren, Harry wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > Sent: Tuesday, October 25, 2016 6:49 PM
> > <snip>
> > > 
> > > Hi Community,
> > > 
> > > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > > 
> > > If there are no major issues in proposed spec, then Cavium would like work on
> > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > in next version).
> > 
> > 
> > Hi All,
> > 
> > I've been looking at the eventdev API from a use-case point of view, and I'm unclear on a how the API caters for two uses. I have simplified these as much as possible, think of them as a theoretical unit-test for the API :)
> > 
> > 
> > Fragmentation:
> > 1. Dequeue 8 packets
> > 2. Process 2 packets
> > 3. Processing 3rd, this packet needs fragmentation into two packets
> > 4. Process remaining 5 packets as normal
> > 
> > What function calls does the application make to achieve this?
> > In particular, I'm referring to how can the scheduler know that the 3rd packet is the one being fragmented, and how to keep packet order valid. 
> > 
> > 
> > Dropping packets:
> > 1. Dequeue 8 packets
> > 2. Process 2 packets
> > 3. Processing 3rd, this packet needs to be dropped
> > 4. Process remaining 5 packets as normal
> > 
> > What function calls does the application make to achieve this?
> > Again, in particular how does the scheduler know that the 3rd packet is being dropped.
> > 
> > 
> > Regards, -Harry
> 
> Hi,
> 
> these questions apply particularly to reordered which has a lot more
> complications than the other types in terms of sending packets back into
> the scheduler. However, atomic types will still suffer from problems
> with things the way they are - again if we assume a burst of 8 packets,
> then to forward those packets, we need to re-enqueue them again to the
> scheduler, and also then send 8 releases to the scheduler as well, to
> release the atomic locks for those packets.
> This means that for each packet we have to send two messages to a
> scheduler core, something that is really inefficient.
> 
> This number of messages is critical for any software implementation, as
> the cost of moving items core-to-core is going to be a big bottleneck
> (perhaps the biggest bottleneck) in the system. It's for this reason we
> need to use burst APIs - as with rte_rings.

I agree, That the reason why we have rte_event_*_burst()

> 
> How we have solved this in our implementation, is to allow there to be
> an event operation type. The four operations we implemented are as below
> (using packet as a synonym for event here, since these would mostly
> apply to packets flowing through a system):
> 
> * NEW     - just a regular enqueue of a packet, without any previous context

Makes sense. I was trying derive it.Make sense for application
requesting it.

> * FORWARD - enqueue a packet, and mark the flow processing for the
>             equivalent packet that was dequeued as completed, i.e.
> 	    release any atomic locks, or reorder this packet with
> 	    respect to any other outstanding packets from the event queue.

Default case

> * DROP    - this is roughtly equivalent to the existing "release" API call,
>             except that having it as an enqueue type allows us to
> 	    release multiple items in a single call, and also to mix
> 	    releases with new packets and forwarded packets

Yes. Maps to rte_event_release(), with index parameter, its kind doing
the job. But, Makes sense as flag to enable burst.
But it calls for removing the index parameter. Looks like index parameter
has issue in Intel implementation. If so, may be we(Cavium) can fill the
index in the dequeue as implementation specific bits like Harry
suggested and use it in enqueue.
http://dpdk.org/ml/archives/dev/2016-October/049459.html

Any thoughts from NXP?

> * PARTIAL - this indicates that the packet being enqueued should be
> 	    treated according to the context of the current packet, but
> 	    that that context should not be released/completed by the
> 	    enqueue of this packet. This only really applies for
> 	    reordered events, and is needed to do fragmentation and or
> 	    multicast of packets with reordering.

I believe PARTIAL is something, HW implementation will have trouble.
I have outlined other way to fix without coupling fragmentation logic in
scheduler.
http://dpdk.org/ml/archives/dev/2016-November/049707.html

If it makes sense for everyone then may be can
- Introduce "event operation type" bits (NEW, DROP, FORWARD(may not required as it is default) in enqueue
- remaining bits in "struct rte_event"(128B) for "implementation" defined
- remove rte_event_release() and use event operation type(DROP) as
  replacement.

/Jerin

> 
> 
> Therefore, I think we need to use some of the bits just freed up in the
> event structure to include an enqueue operation type. Without it, I just
> can't see how the API can ever support burst operation on packets.
> 
> Regards,
> /Bruce
>
  
Jerin Jacob Nov. 2, 2016, 10:47 a.m. UTC | #26
On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
>

Hi All,

Two queries,

1) In SW implementation, Is their any connection between "struct
rte_event_port_conf"'s dequeue_queue_depth and enqueue_queue_depth ?
i.e it should be enqueue_queue_depth >= dequeue_queue_depth. Right ?
Thought of adding the common checks in common layer.

2)Any comments on follow item(section under ----) that needs improvement.
-------------------------------------------------------------------------------
Abstract the differences in event QoS management with different
priority schemes available in different HW or SW implementations with portable
application workflow.

Based on the feedback, there three different kinds of QoS support
available in
three different HW or SW implementations.
1) Priority associated with the event queue
2) Priority associated with each event enqueue
(Same flow can have two different priority on two separate enqueue)
3) Priority associated with the flow(each flow has unique priority)

In v2, The differences abstracted based on device capability
(RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
This scheme would call for different application workflow for
nontrivial QoS-enabled applications.
-------------------------------------------------------------------------------
After thinking a while, I think, RTE_EVENT_DEV_CAP_EVENT_QOS is a
super-set.if so, the subset RTE_EVENT_DEV_CAP_QUEUE_QOS can be
implemented with RTE_EVENT_DEV_CAP_EVENT_QOS. i.e We may not need two
flags, Just one flag RTE_EVENT_DEV_CAP_EVENT_QOS is enough to fix
portability issue with basic QoS enabled applications.

i.e Introduce RTE_EVENT_DEV_CAP_EVENT_QOS as config option in device
configure stage if application needs fine granularity on QoS per event
enqueue.For trivial applications, configured
rte_event_queue_conf->priority can be used as rte_event_enqueue(struct
rte_event.priority)

Thoughts?

/Jerin
  
Bruce Richardson Nov. 2, 2016, 11:45 a.m. UTC | #27
On Wed, Nov 02, 2016 at 04:17:04PM +0530, Jerin Jacob wrote:
> On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > 
> > > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > > 
> > > If there are no major issues in proposed spec, then Cavium would like work on
> > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > in next version).
> >
> 
> Hi All,
> 
> Two queries,
> 
> 1) In SW implementation, Is their any connection between "struct
> rte_event_port_conf"'s dequeue_queue_depth and enqueue_queue_depth ?
> i.e it should be enqueue_queue_depth >= dequeue_queue_depth. Right ?
> Thought of adding the common checks in common layer.

I think this is probably best left to the driver layers to enforce. For
us, such a restriction doesn't really make sense, though in many cases
that would be the usual setup. For accurate load balancing, the dequeue
queue depth would be small, and the burst size would probably equal the
queue depth, meaning the enqueue depth needs to be at least as big.
However, for better throughput, or in cases where all traffic is being
coalesced to a single core e.g. for transmit out a network port, there
is no need to keep the dequeue queue shallow and so it can be many times
the burst size, while the enqueue queue can be kept to 1-2 times the
burst size.

> 
> 2)Any comments on follow item(section under ----) that needs improvement.
> -------------------------------------------------------------------------------
> Abstract the differences in event QoS management with different
> priority schemes available in different HW or SW implementations with portable
> application workflow.
> 
> Based on the feedback, there three different kinds of QoS support
> available in
> three different HW or SW implementations.
> 1) Priority associated with the event queue
> 2) Priority associated with each event enqueue
> (Same flow can have two different priority on two separate enqueue)
> 3) Priority associated with the flow(each flow has unique priority)
> 
> In v2, The differences abstracted based on device capability
> (RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
> RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
> This scheme would call for different application workflow for
> nontrivial QoS-enabled applications.
> -------------------------------------------------------------------------------
> After thinking a while, I think, RTE_EVENT_DEV_CAP_EVENT_QOS is a
> super-set.if so, the subset RTE_EVENT_DEV_CAP_QUEUE_QOS can be
> implemented with RTE_EVENT_DEV_CAP_EVENT_QOS. i.e We may not need two
> flags, Just one flag RTE_EVENT_DEV_CAP_EVENT_QOS is enough to fix
> portability issue with basic QoS enabled applications.
> 
> i.e Introduce RTE_EVENT_DEV_CAP_EVENT_QOS as config option in device
> configure stage if application needs fine granularity on QoS per event
> enqueue.For trivial applications, configured
> rte_event_queue_conf->priority can be used as rte_event_enqueue(struct
> rte_event.priority)
> 
So all implementations should support the concept of priority among
queues, and then there is optional support for event or flow based
prioritization. Is that a correct interpretation of what you propose?

/Bruce
  
Bruce Richardson Nov. 2, 2016, 11:48 a.m. UTC | #28
On Wed, Nov 02, 2016 at 01:36:34PM +0530, Jerin Jacob wrote:
> On Fri, Oct 28, 2016 at 01:48:57PM +0000, Van Haaren, Harry wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > Sent: Tuesday, October 25, 2016 6:49 PM
> > <snip>
> > > 
> > > Hi Community,
> > > 
> > > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > > 
> > > If there are no major issues in proposed spec, then Cavium would like work on
> > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > in next version).
> > 
> > 
> > Hi All,
> > 
> > I've been looking at the eventdev API from a use-case point of view, and I'm unclear on a how the API caters for two uses. I have simplified these as much as possible, think of them as a theoretical unit-test for the API :)
> > 
> > 
> > Fragmentation:
> > 1. Dequeue 8 packets
> > 2. Process 2 packets
> > 3. Processing 3rd, this packet needs fragmentation into two packets
> > 4. Process remaining 5 packets as normal
> > 
> > What function calls does the application make to achieve this?
> > In particular, I'm referring to how can the scheduler know that the 3rd packet is the one being fragmented, and how to keep packet order valid. 
> > 
> 
> OK. I will try to share my views on IP fragmentation on event _HW_
> models(at least on Cavium HW) then we can see, how we can converge.
> 
> First, The fragmentation specific logic should be decoupled from the event
> model as it specific to packet and L3 layer(Not specific to generic event)
> 
I would view fragmentation as just one example of a workload like this,
multicast and broadcast may be two other cases. Yes, they all apply to
packet, but the general feature support is just how to provide support
for one event generating multiple further events which should be linked
together for reordering. [I think this only really applies in the
reordered case - which leads to another question: in your experience
do you see other event types other than packet being handled in a
"reordered" manner?]

/Bruce
  
Jerin Jacob Nov. 2, 2016, 12:34 p.m. UTC | #29
On Wed, Nov 02, 2016 at 11:45:07AM +0000, Bruce Richardson wrote:
> On Wed, Nov 02, 2016 at 04:17:04PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 12:11:03PM +0000, Van Haaren, Harry wrote:
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > > 
> > > > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > > > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > > > 
> > > > If there are no major issues in proposed spec, then Cavium would like work on
> > > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > > in next version).
> > >
> > 
> > Hi All,
> > 
> > Two queries,
> > 
> > 1) In SW implementation, Is their any connection between "struct
> > rte_event_port_conf"'s dequeue_queue_depth and enqueue_queue_depth ?
> > i.e it should be enqueue_queue_depth >= dequeue_queue_depth. Right ?
> > Thought of adding the common checks in common layer.
> 
> I think this is probably best left to the driver layers to enforce. For
> us, such a restriction doesn't really make sense, though in many cases
> that would be the usual setup. For accurate load balancing, the dequeue
> queue depth would be small, and the burst size would probably equal the
> queue depth, meaning the enqueue depth needs to be at least as big.
> However, for better throughput, or in cases where all traffic is being
> coalesced to a single core e.g. for transmit out a network port, there
> is no need to keep the dequeue queue shallow and so it can be many times
> the burst size, while the enqueue queue can be kept to 1-2 times the
> burst size.
> 

OK

> > 
> > 2)Any comments on follow item(section under ----) that needs improvement.
> > -------------------------------------------------------------------------------
> > Abstract the differences in event QoS management with different
> > priority schemes available in different HW or SW implementations with portable
> > application workflow.
> > 
> > Based on the feedback, there three different kinds of QoS support
> > available in
> > three different HW or SW implementations.
> > 1) Priority associated with the event queue
> > 2) Priority associated with each event enqueue
> > (Same flow can have two different priority on two separate enqueue)
> > 3) Priority associated with the flow(each flow has unique priority)
> > 
> > In v2, The differences abstracted based on device capability
> > (RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
> > RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
> > This scheme would call for different application workflow for
> > nontrivial QoS-enabled applications.
> > -------------------------------------------------------------------------------
> > After thinking a while, I think, RTE_EVENT_DEV_CAP_EVENT_QOS is a
> > super-set.if so, the subset RTE_EVENT_DEV_CAP_QUEUE_QOS can be
> > implemented with RTE_EVENT_DEV_CAP_EVENT_QOS. i.e We may not need two
> > flags, Just one flag RTE_EVENT_DEV_CAP_EVENT_QOS is enough to fix
> > portability issue with basic QoS enabled applications.
> > 
> > i.e Introduce RTE_EVENT_DEV_CAP_EVENT_QOS as config option in device
> > configure stage if application needs fine granularity on QoS per event
> > enqueue.For trivial applications, configured
> > rte_event_queue_conf->priority can be used as rte_event_enqueue(struct
> > rte_event.priority)
> > 
> So all implementations should support the concept of priority among
> queues, and then there is optional support for event or flow based
> prioritization. Is that a correct interpretation of what you propose?

Yes. If you _can_ implement it and if possible in the system.

> 
> /Bruce
>
  
Jerin Jacob Nov. 2, 2016, 12:57 p.m. UTC | #30
On Wed, Nov 02, 2016 at 11:48:37AM +0000, Bruce Richardson wrote:
> On Wed, Nov 02, 2016 at 01:36:34PM +0530, Jerin Jacob wrote:
> > On Fri, Oct 28, 2016 at 01:48:57PM +0000, Van Haaren, Harry wrote:
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > > Sent: Tuesday, October 25, 2016 6:49 PM
> > > <snip>
> > > > 
> > > > Hi Community,
> > > > 
> > > > So far, I have received constructive feedback from Intel, NXP and Linaro folks.
> > > > Let me know, if anyone else interested in contributing to the definition of eventdev?
> > > > 
> > > > If there are no major issues in proposed spec, then Cavium would like work on
> > > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > > in next version).
> > > 
> > > 
> > > Hi All,
> > > 
> > > I've been looking at the eventdev API from a use-case point of view, and I'm unclear on a how the API caters for two uses. I have simplified these as much as possible, think of them as a theoretical unit-test for the API :)
> > > 
> > > 
> > > Fragmentation:
> > > 1. Dequeue 8 packets
> > > 2. Process 2 packets
> > > 3. Processing 3rd, this packet needs fragmentation into two packets
> > > 4. Process remaining 5 packets as normal
> > > 
> > > What function calls does the application make to achieve this?
> > > In particular, I'm referring to how can the scheduler know that the 3rd packet is the one being fragmented, and how to keep packet order valid. 
> > > 
> > 
> > OK. I will try to share my views on IP fragmentation on event _HW_
> > models(at least on Cavium HW) then we can see, how we can converge.
> > 
> > First, The fragmentation specific logic should be decoupled from the event
> > model as it specific to packet and L3 layer(Not specific to generic event)
> > 
> I would view fragmentation as just one example of a workload like this,
> multicast and broadcast may be two other cases. Yes, they all apply to
> packet, but the general feature support is just how to provide support
> for one event generating multiple further events which should be linked
> together for reordering. [I think this only really applies in the

AFIAK, There two different schemes to "maintain ordering", the first one
is based "reordering buffers" i.e as a list data structure used to hold the
event first and then when it comes correcting the order(ORDERED->ATOMIC),
correct the order based on the previous "reordering buffers".
But some HW implementation use "port" state based reordering scheme
(i.e no external reorder buffer to keep track the order).

So I think, To have portable application workflow, the use case where multiple
event generated based on one event, generated events needs to store in the parent event
and in the downstream, process them as required. like fragmentation example in

http://dpdk.org/ml/archives/dev/2016-November/049707.html

The above scheme should OK in your implementation. Right?


> reordered case - which leads to another question: in your experience
> do you see other event types other than packet being handled in a
> "reordered" manner?]

We use both timer events and crypto completion events etc in ORDERED
type. But not like, one event creates N event scheme on those.

> 
> /Bruce
>
  

Patch

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..28c1329 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -40,6 +40,7 @@  There are many libraries, so their headers may be grouped by topics:
   [ethdev]             (@ref rte_ethdev.h),
   [ethctrl]            (@ref rte_eth_ctrl.h),
   [cryptodev]          (@ref rte_cryptodev.h),
+  [eventdev]           (@ref rte_eventdev.h),
   [devargs]            (@ref rte_devargs.h),
   [bond]               (@ref rte_eth_bond.h),
   [vhost]              (@ref rte_virtio_net.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..9841477 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -41,6 +41,7 @@  INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_cryptodev \
                           lib/librte_distributor \
                           lib/librte_ether \
+                          lib/librte_eventdev \
                           lib/librte_hash \
                           lib/librte_ip_frag \
                           lib/librte_jobstats \
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
new file mode 100644
index 0000000..f60e461
--- /dev/null
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -0,0 +1,1204 @@ 
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 Cavium.
+ *   Copyright 2016 Intel Corporation.
+ *   Copyright 2016 NXP.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Cavium nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_EVENTDEV_H_
+#define _RTE_EVENTDEV_H_
+
+/**
+ * @file
+ *
+ * RTE Event Device API
+ *
+ * The Event Device API is composed of two parts:
+ *
+ * - The application-oriented Event API that includes functions to setup
+ *   an event device (configure it, setup its queues, ports and start it), to
+ *   establish the link between queues to port and to receive events, and so on.
+ *
+ * - The driver-oriented Event API that exports a function allowing
+ *   an event poll Mode Driver (PMD) to simultaneously register itself as
+ *   an event device driver.
+ *
+ * Event device components:
+ *
+ *                     +-----------------+
+ *                     | +-------------+ |
+ *        +-------+    | |    flow 0   | |
+ *        |Packet |    | +-------------+ |
+ *        |event  |    | +-------------+ |
+ *        |       |    | |    flow 1   | |event_port_link(port0, queue0)
+ *        +-------+    | +-------------+ |     |     +--------+
+ *        +-------+    | +-------------+ o-----v-----o        |dequeue +------+
+ *        |Crypto |    | |    flow n   | |           | event  +------->|Core 0|
+ *        |work   |    | +-------------+ o----+      | port 0 |        |      |
+ *        |done ev|    |  event queue 0  |    |      +--------+        +------+
+ *        +-------+    +-----------------+    |
+ *        +-------+                           |
+ *        |Timer  |    +-----------------+    |      +--------+
+ *        |expiry |    | +-------------+ |    +------o        |dequeue +------+
+ *        |event  |    | |    flow 0   | o-----------o event  +------->|Core 1|
+ *        +-------+    | +-------------+ |      +----o port 1 |        |      |
+ *       Event enqueue | +-------------+ |      |    +--------+        +------+
+ *     o-------------> | |    flow 1   | |      |
+ *        enqueue(     | +-------------+ |      |
+ *        queue_id,    |                 |      |    +--------+        +------+
+ *        flow_id,     | +-------------+ |      |    |        |dequeue |Core 2|
+ *        sched_type,  | |    flow n   | o-----------o event  +------->|      |
+ *        event_type,  | +-------------+ |      |    | port 2 |        +------+
+ *        subev_type,  |  event queue 1  |      |    +--------+
+ *        event)       +-----------------+      |    +--------+
+ *                                              |    |        |dequeue +------+
+ *        +-------+    +-----------------+      |    | event  +------->|Core n|
+ *        |Core   |    | +-------------+ o-----------o port n |        |      |
+ *        |(SW)   |    | |    flow 0   | |      |    +--------+        +--+---+
+ *        |event  |    | +-------------+ |      |                         |
+ *        +-------+    | +-------------+ |      |                         |
+ *            ^        | |    flow 1   | |      |                         |
+ *            |        | +-------------+ o------+                         |
+ *            |        | +-------------+ |                                |
+ *            |        | |    flow n   | |                                |
+ *            |        | +-------------+ |                                |
+ *            |        |  event queue n  |                                |
+ *            |        +-----------------+                                |
+ *            |                                                           |
+ *            +-----------------------------------------------------------+
+ *
+ *
+ *
+ * Event device: A hardware or software-based event scheduler.
+ *
+ * Event: A unit of scheduling that encapsulates a packet or other datatype
+ * like SW generated event from the core, Crypto work completion notification,
+ * Timer expiry event notification etc as well as metadata.
+ * The metadata includes flow ID, scheduling type, event priority, event_type,
+ * sub_event_type etc.
+ *
+ * Event queue: A queue containing events that are scheduled by the event dev.
+ * An event queue contains events of different flows associated with scheduling
+ * types, such as atomic, ordered, or parallel.
+ *
+ * Event port: An application's interface into the event dev for enqueue and
+ * dequeue operations. Each event port can be linked with one or more
+ * event queues for dequeue operations.
+ *
+ * By default, all the functions of the Event Device API exported by a PMD
+ * are lock-free functions which assume to not be invoked in parallel on
+ * different logical cores to work on the same target object. For instance,
+ * the dequeue function of a PMD cannot be invoked in parallel on two logical
+ * cores to operates on same  event port. Of course, this function
+ * can be invoked in parallel by different logical cores on different ports.
+ * It is the responsibility of the upper level application to enforce this rule.
+ *
+ * In all functions of the Event API, the Event device is
+ * designated by an integer >= 0 named the device identifier *dev_id*
+ *
+ * At the Event driver level, Event devices are represented by a generic
+ * data structure of type *rte_event_dev*.
+ *
+ * Event devices are dynamically registered during the PCI/SoC device probing
+ * phase performed at EAL initialization time.
+ * When an Event device is being probed, a *rte_event_dev* structure and
+ * a new device identifier are allocated for that device. Then, the
+ * event_dev_init() function supplied by the Event driver matching the probed
+ * device is invoked to properly initialize the device.
+ *
+ * The role of the device init function consists of resetting the hardware or
+ * software event driver implementations.
+ *
+ * If the device init operation is successful, the correspondence between
+ * the device identifier assigned to the new device and its associated
+ * *rte_event_dev* structure is effectively registered.
+ * Otherwise, both the *rte_event_dev* structure and the device identifier are
+ * freed.
+ *
+ * The functions exported by the application Event API to setup a device
+ * designated by its device identifier must be invoked in the following order:
+ *     - rte_event_dev_configure()
+ *     - rte_event_queue_setup()
+ *     - rte_event_port_setup()
+ *     - rte_event_port_link()
+ *     - rte_event_dev_start()
+ *
+ * Then, the application can invoke, in any order, the functions
+ * exported by the Event API to schedule events, dequeue events, enqueue events,
+ * change event queue(s) to event port [un]link establishment and so on.
+ *
+ * Application may use rte_event_[queue/port]_default_conf_get() to get the
+ * default configuration to set up an event queue or event port by
+ * overriding few default values.
+ *
+ * If the application wants to change the configuration (i.e. call
+ * rte_event_dev_configure(), rte_event_queue_setup(), or
+ * rte_event_port_setup()), it must call rte_event_dev_stop() first to stop the
+ * device and then do the reconfiguration before calling rte_event_dev_start()
+ * again. The schedule, enqueue and dequeue functions should not be invoked
+ * when the device is stopped.
+ *
+ * Finally, an application can close an Event device by invoking the
+ * rte_event_dev_close() function.
+ *
+ * Each function of the application Event API invokes a specific function
+ * of the PMD that controls the target device designated by its device
+ * identifier.
+ *
+ * For this purpose, all device-specific functions of an Event driver are
+ * supplied through a set of pointers contained in a generic structure of type
+ * *event_dev_ops*.
+ * The address of the *event_dev_ops* structure is stored in the *rte_event_dev*
+ * structure by the device init function of the Event driver, which is
+ * invoked during the PCI/SoC device probing phase, as explained earlier.
+ *
+ * In other words, each function of the Event API simply retrieves the
+ * *rte_event_dev* structure associated with the device identifier and
+ * performs an indirect invocation of the corresponding driver function
+ * supplied in the *event_dev_ops* structure of the *rte_event_dev* structure.
+ *
+ * For performance reasons, the address of the fast-path functions of the
+ * Event driver is not contained in the *event_dev_ops* structure.
+ * Instead, they are directly stored at the beginning of the *rte_event_dev*
+ * structure to avoid an extra indirect memory access during their invocation.
+ *
+ * RTE event device drivers do not use interrupts for enqueue or dequeue
+ * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue
+ * functions to applications.
+ *
+ * An event driven based application has following typical workflow on fastpath:
+ * \code{.c}
+ *	while (1) {
+ *
+ *		rte_event_schedule(dev_id);
+ *
+ *		rte_event_dequeue(...);
+ *
+ *		(event processing)
+ *
+ *		rte_event_enqueue(...);
+ *	}
+ * \endcode
+ *
+ * The *schedule* operation is intended to do event scheduling, and the
+ * *dequeue* operation returns the scheduled events. An implementation
+ * is free to define the semantics between *schedule* and *dequeue*. For
+ * example, a system based on a hardware scheduler can define its
+ * rte_event_schedule() to be an NOOP, whereas a software scheduler can use
+ * the *schedule* operation to schedule events.
+ *
+ * The events are injected to event device through *enqueue* operation by
+ * event producers in the system. The typical event producers are ethdev
+ * subsystem for generating packet events, core(SW) for generating events based
+ * on different stages of application processing, cryptodev for generating
+ * crypto work completion notification etc
+ *
+ * The *dequeue* operation gets one or more events from the event ports.
+ * The application process the events and send to downstream event queue through
+ * rte_event_enqueue() if it is an intermediate stage of event processing, on
+ * the final stage, the application may send to different subsystem like ethdev
+ * to send the packet/event on the wire using ethdev rte_eth_tx_burst() API.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_pci.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_errno.h>
+
+/**
+ * Get the total number of event devices that have been successfully
+ * initialised.
+ *
+ * @return
+ *   The total number of usable event devices.
+ */
+extern uint8_t
+rte_event_dev_count(void);
+
+/**
+ * Get the device identifier for the named event device.
+ *
+ * @param name
+ *   Event device name to select the event device identifier.
+ *
+ * @return
+ *   Returns event device identifier on success.
+ *   - <0: Failure to find named event device.
+ */
+extern uint8_t
+rte_event_dev_get_dev_id(const char *name);
+
+/**
+ * Return the NUMA socket to which a device is connected.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @return
+ *   The NUMA socket id to which the device is connected or
+ *   a default of zero if the socket could not be determined.
+ *   - -1: dev_id value is out of range.
+ */
+extern int
+rte_event_dev_socket_id(uint8_t dev_id);
+
+/* Event device capability bitmap flags */
+#define RTE_EVENT_DEV_CAP_QUEUE_QOS        (1 << 0)
+/**< Event scheduling prioritization is based on the priority associated with
+ *  each event queue.
+ *
+ *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL
+ */
+#define RTE_EVENT_DEV_CAP_EVENT_QOS        (1 << 1)
+/**< Event scheduling prioritization is based on the priority associated with
+ *  each event. Priority of each event is supplied in *rte_event* structure
+ *  on each enqueue operation.
+ *
+ *  \see rte_event_enqueue()
+ */
+
+/**
+ * Event device information
+ */
+struct rte_event_dev_info {
+	const char *driver_name;	/**< Event driver name */
+	struct rte_pci_device *pci_dev;	/**< PCI information */
+	uint32_t min_dequeue_wait_ns;
+	/**< Minimum supported global dequeue wait delay(ns) by this device */
+	uint32_t max_dequeue_wait_ns;
+	/**< Maximum supported global dequeue wait delay(ns) by this device */
+	uint32_t dequeue_wait_ns;
+	/**< Configured global dequeue wait delay(ns) for this device */
+	uint8_t max_event_queues;
+	/**< Maximum event_queues supported by this device */
+	uint32_t max_event_queue_flows;
+	/**< Maximum supported flows in an event queue by this device*/
+	uint8_t max_event_queue_priority_levels;
+	/**< Maximum number of event queue priority levels by this device.
+	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
+	 */
+	uint8_t nb_event_queues;
+	/**< Configured number of event queues for this device */
+	uint8_t max_event_priority_levels;
+	/**< Maximum number of event priority levels by this device.
+	 * Valid when the device has RTE_EVENT_DEV_CAP_EVENT_QOS capability
+         */
+	uint8_t max_event_ports;
+	/**< Maximum number of event ports supported by this device */
+	uint8_t nb_event_ports;
+	/**< Configured number of event ports for this device */
+	uint8_t max_event_port_dequeue_queue_depth;
+	/**< Maximum dequeue queue depth for any event port.
+	 * Implementations can schedule N events at a time to an event port.
+	 * A device that does not support bulk dequeue will set this as 1.
+	 * \see rte_event_port_setup()
+	 */
+	uint32_t max_event_port_enqueue_queue_depth;
+	/**< Maximum enqueue queue depth for any event port. Implementations
+	 * can batch N events at a time to enqueue through event port
+	 * \see rte_event_port_setup()
+	 */
+	int32_t max_num_events;
+	/**< A *closed system* event dev has a limit on the number of events it
+	 * can manage at a time. An *open system* event dev does not have a
+	 * limit and will specify this as -1.
+	 */
+	uint32_t event_dev_cap;
+	/**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/
+};
+
+/**
+ * Retrieve the contextual information of an event device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param[out] dev_info
+ *   A pointer to a structure of type *rte_event_dev_info* to be filled with the
+ *   contextual information of the device.
+ *
+ */
+extern void
+rte_event_dev_info_get(uint8_t dev_id, struct rte_event_dev_info *dev_info);
+
+/* Event device configuration bitmap flags */
+#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1 << 0)
+/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.
+ *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()
+ */
+
+/** Event device configuration structure */
+struct rte_event_dev_config {
+	uint32_t dequeue_wait_ns;
+	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
+	 * This value should be in the range of *min_dequeue_wait_ns* and
+	 * *max_dequeue_wait_ns* which previously provided in
+	 * rte_event_dev_info_get()
+	 * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+	 */
+	int32_t nb_events_limit;
+	/**< Applies to *closed system* event dev only. This field indicates a
+	 * limit to ethdev-like devices to limit the number of events injected
+	 * into the system to not overwhelm core-to-core events.
+	 * This value cannot exceed the *max_num_events* which previously
+	 * provided in rte_event_dev_info_get()
+	 */
+	uint8_t nb_event_queues;
+	/**< Number of event queues to configure on this device.
+	 * This value cannot exceed the *max_event_queues* which previously
+	 * provided in rte_event_dev_info_get()
+	 */
+	uint8_t nb_event_ports;
+	/**< Number of event ports to configure on this device.
+	 * This value cannot exceed the *max_event_ports* which previously
+	 * provided in rte_event_dev_info_get()
+	 */
+	uint32_t event_dev_cfg;
+	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
+};
+
+/**
+ * Configure an event device.
+ *
+ * This function must be invoked first before any other function in the
+ * API. This function can also be re-invoked when a device is in the
+ * stopped state.
+ *
+ * The caller may use rte_event_dev_info_get() to get the capability of each
+ * resources available for this event device.
+ *
+ * @param dev_id
+ *   The identifier of the device to configure.
+ * @param config
+ *   The event device configuration structure.
+ *
+ * @return
+ *   - 0: Success, device configured.
+ *   - <0: Error code returned by the driver configuration function.
+ */
+extern int
+rte_event_dev_configure(uint8_t dev_id, struct rte_event_dev_config *config);
+
+
+/* Event queue specific APIs */
+
+#define RTE_EVENT_QUEUE_PRIORITY_HIGHEST   0
+/**< Highest event queue priority */
+#define RTE_EVENT_QUEUE_PRIORITY_NORMAL    128
+/**< Normal event queue priority */
+#define RTE_EVENT_QUEUE_PRIORITY_LOWEST    255
+/**< Lowest event queue priority */
+
+/* Event queue configuration bitmap flags */
+#define RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER    (1 << 0)
+/**< This event queue links only to a single event port.
+ *
+ *  \see rte_event_port_setup(), rte_event_port_link()
+ */
+
+/** Event queue configuration structure */
+struct rte_event_queue_conf {
+	uint32_t nb_atomic_flows;
+	/**< The maximum number of active flows this queue can track at any
+	 * given time. The value must be in the range of
+	 * [1 - max_event_queue_flows)] which previously supplied
+	 * to rte_event_dev_configure().
+	 */
+	uint32_t nb_atomic_order_sequences;
+	/**< The maximum number of outstanding events waiting to be (egress-)
+	 * reordered by this queue. In other words, the number of entries in
+	 * this queue’s reorder buffer.The value must be in the range of
+	 * [1 - max_event_queue_flows)] which previously supplied
+	 * to rte_event_dev_configure().
+	 */
+	uint32_t event_queue_cfg; /**< Queue config flags(EVENT_QUEUE_CFG_) */
+	uint8_t priority;
+	/**< Priority for this event queue relative to other event queues.
+	 * The requested priority should in the range of
+	 * [RTE_EVENT_QUEUE_PRIORITY_HIGHEST, RTE_EVENT_QUEUE_PRIORITY_LOWEST].
+	 * The implementation shall normalize the requested priority to
+	 * event device supported priority value.
+	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
+	 */
+};
+
+/**
+ * Retrieve the default configuration information of an event queue designated
+ * by its *queue_id* from the event driver for an event device.
+ *
+ * This function intended to be used in conjunction with rte_event_queue_setup()
+ * where caller needs to set up the queue by overriding few default values.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param queue_id
+ *   The index of the event queue to get the configuration information.
+ *   The value must be in the range [0, nb_event_queues - 1]
+ *   previously supplied to rte_event_dev_configure().
+ * @param[out] queue_conf
+ *   The pointer to the default event queue configuration data.
+ *
+ * \see rte_event_queue_setup()
+ *
+ */
+extern void
+rte_event_queue_default_conf_get(uint8_t dev_id, uint8_t queue_id,
+				 struct rte_event_queue_conf *queue_conf);
+
+/**
+ * Allocate and set up an event queue for an event device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param queue_id
+ *   The index of the event queue to setup. The value must be in the range
+ *   [0, nb_event_queues - 1] previously supplied to rte_event_dev_configure().
+ * @param queue_conf
+ *   The pointer to the configuration data to be used for the event queue.
+ *   NULL value is allowed, in which case default configuration	used.
+ *
+ * \see rte_event_queue_default_conf_get()
+ *
+ * @return
+ *   - 0: Success, event queue correctly set up.
+ *   - <0: event queue configuration failed
+ */
+extern int
+rte_event_queue_setup(uint8_t dev_id, uint8_t queue_id,
+		      struct rte_event_queue_conf *queue_conf);
+
+/**
+ * Get the number of event queues on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @return
+ *   - The number of configured event queues
+ */
+extern uint16_t
+rte_event_queue_count(uint8_t dev_id);
+
+/**
+ * Get the priority of the event queue on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param queue_id
+ *   Event queue identifier.
+ * @return
+ *   - If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then the
+ *    configured priority of the event queue in
+ *    [RTE_EVENT_QUEUE_PRIORITY_HIGHEST, RTE_EVENT_QUEUE_PRIORITY_LOWEST] range
+ *    else the value one
+ */
+extern uint8_t
+rte_event_queue_priority(uint8_t dev_id, uint8_t queue_id);
+
+/* Event port specific APIs */
+
+/** Event port configuration structure */
+struct rte_event_port_conf {
+	int32_t new_event_threshold;
+	/**< A backpressure threshold for new event enqueues on this port.
+	 * Use for *closed system* event dev where event capacity is limited,
+	 * and cannot exceed the capacity of the event dev.
+	 * Configuring ports with different thresholds can make higher priority
+	 * traffic less likely to  be backpressured.
+	 * For example, a port used to inject NIC Rx packets into the event dev
+	 * can have a lower threshold so as not to overwhelm the device,
+	 * while ports used for worker pools can have a higher threshold.
+	 */
+	uint8_t dequeue_queue_depth;
+	/**< Configure number of bulk dequeues for this event port.
+	 * This value cannot exceed the *max_event_port_dequeue_queue_depth*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+	uint8_t enqueue_queue_depth;
+	/**< Configure number of bulk enqueues for this event port.
+	 * This value cannot exceed the *max_event_port_enqueue_queue_depth*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+};
+
+/**
+ * Retrieve the default configuration information of an event port designated
+ * by its *port_id* from the event driver for an event device.
+ *
+ * This function intended to be used in conjunction with rte_event_port_setup()
+ * where caller needs to set up the port by overriding few default values.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The index of the event port to get the configuration information.
+ *   The value must be in the range [0, nb_event_ports - 1]
+ *   previously supplied to rte_event_dev_configure().
+ * @param[out] port_conf
+ *   The pointer to the default event port configuration data
+ *
+ * \see rte_event_port_setup()
+ *
+ */
+extern void
+rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
+				struct rte_event_port_conf *port_conf);
+
+/**
+ * Allocate and set up an event port for an event device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The index of the event port to setup. The value must be in the range
+ *   [0, nb_event_ports - 1] previously supplied to rte_event_dev_configure().
+ * @param port_conf
+ *   The pointer to the configuration data to be used for the queue.
+ *   NULL value is allowed, in which case default configuration	used.
+ *
+ * \see rte_event_port_default_conf_get()
+ *
+ * @return
+ *   - 0: Success, event port correctly set up.
+ *   - <0: Port configuration failed
+ *   - (-EDQUOT) Quota exceeded(Application tried to link the queue configured
+ *   with RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one event ports)
+ */
+extern int
+rte_event_port_setup(uint8_t dev_id, uint8_t port_id,
+		     struct rte_event_port_conf *port_conf);
+
+/**
+ * Get the number of dequeue queue depth configured for event port designated
+ * by its *port_id* on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param port_id
+ *   Event port identifier.
+ * @return
+ *   - The number of configured dequeue queue depth
+ *
+ * \see rte_event_dequeue_burst()
+ */
+extern uint8_t
+rte_event_port_dequeue_depth(uint8_t dev_id, uint8_t port_id);
+
+/**
+ * Get the number of enqueue queue depth configured for event port designated
+ * by its *port_id* on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param port_id
+ *   Event port identifier.
+ * @return
+ *   - The number of configured enqueue queue depth
+ *
+ * \see rte_event_enqueue_burst()
+ */
+extern uint8_t
+rte_event_port_enqueue_depth(uint8_t dev_id, uint8_t port_id);
+
+/**
+ * Get the number of ports on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @return
+ *   - The number of configured ports
+ */
+extern uint8_t
+rte_event_port_count(uint8_t dev_id);
+
+/**
+ * Start an event device.
+ *
+ * The device start step is the last one and consists of setting the event
+ * queues to start accepting the events and schedules to event ports.
+ *
+ * On success, all basic functions exported by the API (event enqueue,
+ * event dequeue and so on) can be invoked.
+ *
+ * @param dev_id
+ *   Event device identifier
+ * @return
+ *   - 0: Success, device started.
+ *   - <0: Error code of the driver device start function.
+ */
+extern int
+rte_event_dev_start(uint8_t dev_id);
+
+/**
+ * Stop an event device. The device can be restarted with a call to
+ * rte_event_dev_start()
+ *
+ * @param dev_id
+ *   Event device identifier.
+ */
+extern void
+rte_event_dev_stop(uint8_t dev_id);
+
+/**
+ * Close an event device. The device cannot be restarted!
+ *
+ * @param dev_id
+ *   Event device identifier
+ *
+ * @return
+ *  - 0 on successfully closing device
+ *  - <0 on failure to close device
+ */
+extern int
+rte_event_dev_close(uint8_t dev_id);
+
+/* Scheduler type definitions */
+#define RTE_SCHED_TYPE_ORDERED		0
+/**< Ordered scheduling
+ *
+ * Events from an ordered flow of an event queue can be scheduled to multiple
+ * ports for concurrent processing while maintaining the original event order.
+ * This scheme enables the user to achieve high single flow throughput by
+ * avoiding SW synchronization for ordering between ports which bound to cores.
+ *
+ * The source flow ordering from an event queue is maintained when events are
+ * enqueued to their destination queue within the same ordered flow context.
+ * An event port holds the context until application call rte_event_dequeue()
+ * from the same port, which implicitly releases the context.
+ * User may allow the scheduler to release the context earlier than that
+ * by calling rte_event_release()
+ *
+ * Events from the source queue appear in their original order when dequeued
+ * from a destination queue.
+ * Event ordering is based on the received event(s), but also other
+ * (newly allocated or stored) events are ordered when enqueued within the same
+ * ordered context. Events not enqueued (e.g. released or stored) within the
+ * context are  considered missing from reordering and are skipped at this time
+ * (but can be ordered again within another context).
+ *
+ * \see rte_event_dequeue(), rte_event_release()
+ */
+
+#define RTE_SCHED_TYPE_ATOMIC		1
+/**< Atomic scheduling
+ *
+ * Events from an atomic flow of an event queue can be scheduled only to a
+ * single port at a time. The port is guaranteed to have exclusive (atomic)
+ * access to the associated flow context, which enables the user to avoid SW
+ * synchronization. Atomic flows also help to maintain event ordering
+ * since only one port at a time can process events from a flow of an
+ * event queue.
+ *
+ * The atomic queue synchronization context is dedicated to the port until
+ * application call rte_event_dequeue() from the same port, which implicitly
+ * releases the context. User may allow the scheduler to release the context
+ * earlier than that by calling rte_event_release()
+ *
+ * \see rte_event_dequeue(), rte_event_release()
+ */
+
+#define RTE_SCHED_TYPE_PARALLEL		2
+/**< Parallel scheduling
+ *
+ * The scheduler performs priority scheduling, load balancing, etc. functions
+ * but does not provide additional event synchronization or ordering.
+ * It is free to schedule events from a single parallel flow of an event queue
+ * to multiple events ports for concurrent processing.
+ * The application is responsible for flow context synchronization and
+ * event ordering (SW synchronization).
+ */
+
+/* Event types to classify the event source */
+#define RTE_EVENT_TYPE_ETHDEV		0x0
+/**< The event generated from ethdev subsystem */
+#define RTE_EVENT_TYPE_CRYPTODEV	0x1
+/**< The event generated from crypodev subsystem */
+#define RTE_EVENT_TYPE_TIMERDEV		0x2
+/**< The event generated from timerdev subsystem */
+#define RTE_EVENT_TYPE_CORE		0x3
+/**< The event generated from core.
+ * Application may use *sub_event_type* to further classify the event
+ */
+#define RTE_EVENT_TYPE_MAX		0x10
+/**< Maximum number of event types */
+
+/* Event priority */
+#define RTE_EVENT_PRIORITY_HIGHEST      0
+/**< Highest event priority */
+#define RTE_EVENT_PRIORITY_NORMAL       128
+/**< Normal event priority */
+#define RTE_EVENT_PRIORITY_LOWEST       255
+/**< Lowest event priority */
+
+/**
+ * The generic *rte_event* structure to hold the event attributes
+ * for dequeue and enqueue operation
+ */
+struct rte_event {
+	/** WORD0 */
+	RTE_STD_C11
+        union {
+		uint64_t u64;
+		/** Event attributes for dequeue or enqueue operation */
+		struct {
+			uint32_t flow_id:24;
+			/**< Targeted flow identifier for the enqueue and
+			 * dequeue operation.
+			 * The value must be in the range of
+			 * [1 - max_event_queue_flows)] which
+			 * previously supplied to rte_event_dev_configure().
+			 */
+			uint32_t queue_id:8;
+			/**< Targeted event queue identifier for the enqueue or
+			 * dequeue operation.
+			 * The value must be in the range of
+			 * [0, nb_event_queues - 1] which previously supplied to
+			 * rte_event_dev_configure().
+			 */
+			uint8_t  sched_type;
+			/**< Scheduler synchronization type (RTE_SCHED_TYPE_)
+			 * associated with flow id on a given event queue
+			 * for the enqueue and dequeue operation.
+			 */
+			uint8_t  event_type;
+			/**< Event type to classify the event source. */
+			uint8_t  sub_event_type;
+			/**< Sub-event types based on the event source.
+			 * \see RTE_EVENT_TYPE_CORE
+			 */
+			uint8_t  priority;
+			/**< Event priority relative to other events in the
+			 * event queue. The requested priority should in the
+			 * range of  [RTE_EVENT_PRIORITY_HIGHEST,
+			 * RTE_EVENT_PRIORITY_LOWEST].
+			 * The implementation shall normalize the requested
+			 * priority to supported priority value.
+			 * Valid when the device has RTE_EVENT_DEV_CAP_EVENT_QOS
+			 * capability.
+			 */
+		};
+	};
+	/** WORD1 */
+	RTE_STD_C11
+	union {
+		uintptr_t event;
+		/**< Opaque event pointer */
+		struct rte_mbuf *mbuf;
+		/**< mbuf pointer if dequeued event is associated with mbuf */
+	};
+};
+
+/**
+ * Schedule one or more events in the event dev.
+ *
+ * An event dev implementation may define this is a NOOP, for instance if
+ * the event dev performs its scheduling in hardware.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ */
+extern void
+rte_event_schedule(uint8_t dev_id);
+
+/**
+ * Enqueue the event object supplied in the *rte_event* structure on an
+ * event device designated by its *dev_id* through the event port specified by
+ * *port_id*. The event object specifies the event queue on which this
+ * event will be enqueued.
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param ev
+ *   Pointer to struct rte_event
+ * @param pin_event
+ *   Hint to the scheduler that the event can be pinned to the same port for
+ *   the next scheduling stage. For implementations that support it, this
+ *   allows the same core to process the next stage in the pipeline for a given
+ *   event, taking advantage of cache locality. The pinned event will be
+ *   received through rte_event_dequeue(). This is a hint and the event is
+ *   not guaranteed to be pinned to the port. This hint is valid only when the
+ *   event is dequeued with rte_event_dequeue() followed by rte_event_enqueue().
+ *
+ * @return
+ *  - 0 on success
+ *  - <0 on failure. Failure can occur if the event port's output queue is
+ *     backpressured, for instance.
+ */
+extern int
+rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev,
+		  bool pin_event);
+
+/**
+ * Enqueue a burst of events objects supplied in *rte_event* structure on an
+ * event device designated by its *dev_id* through the event port specified by
+ * *port_id*. Each event object specifies the event queue on which it
+ * will be enqueued.
+ *
+ * The rte_event_enqueue_burst() function is invoked to enqueue
+ * multiple event objects.
+ * It is the burst variant of rte_event_enqueue() function.
+ *
+ * The *num* parameter is the number of event objects to enqueue which are
+ * supplied in the *ev* array of *rte_event* structure.
+ *
+ * The rte_event_enqueue_burst() function returns the number of
+ * events objects it actually enqueued. A return value equal to *num* means
+ * that all event objects have been enqueued.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param ev
+ *   An array of *num* pointers to *rte_event* structure
+ *   which contain the event object enqueue operations to be processed.
+ * @param num
+ *   The number of event objects to enqueue, typically number of
+ *   rte_event_port_enqueue_depth() available for this port.
+ * @param pin_event
+ *   Hint to the scheduler that the event can be pinned to the same port for
+ *   the next scheduling stage. For implementations that support it, this
+ *   allows the same core to process the next stage in the pipeline for a given
+ *   event, taking advantage of cache locality. The pinned event will be
+ *   received through rte_event_dequeue(). This is a hint and the event is
+ *   not guaranteed to be pinned to the port. This hint is valid only when the
+ *   event is dequeued with rte_event_dequeue() followed by rte_event_enqueue().
+ *
+ * @return
+ *   The number of event objects actually enqueued on the event device. The
+ *   return value can be less than the value of the *num* parameter when the
+ *   event devices queue is full or if invalid parameters are specified in a
+ *   *rte_event*. If return value is less than *num*, the remaining events at
+ *   the end of ev[] are not consumed, and the caller has to take care of them.
+ *
+ * \see rte_event_enqueue(), rte_event_port_enqueue_depth()
+ */
+extern int
+rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
+			struct rte_event ev[], int num, bool pin_event);
+
+/**
+ * Converts nanoseconds to *wait* value for rte_event_dequeue()
+ *
+ * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
+ * application can use this function to convert wait value in nanoseconds to
+ * implementations specific wait value supplied in rte_event_dequeue()
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param ns
+ *   Wait time in nanosecond
+ *
+ * @return
+ * Value for the *wait* parameter in rte_event_dequeue() function
+ *
+ * \see rte_event_dequeue(), RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+ * \see rte_event_dev_configure()
+ *
+ */
+extern uint64_t
+rte_event_dequeue_wait_time(uint8_t dev_id, uint64_t ns);
+
+/**
+ * Dequeue an event from the event port specified by *port_id* on the
+ * event device designated by its *dev_id*.
+ *
+ * rte_event_dequeue() does not dictate the specifics of scheduling algorithm as
+ * each eventdev driver may have different criteria to schedule an event.
+ * However, in general, from an application perspective scheduler may use the
+ * following scheme to dispatch an event to the port.
+ *
+ * 1) Selection of event queue based on
+ *   a) The list of event queues are linked to the event port.
+ *   b) If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then event
+ *   queue selection from list is based on event queue priority relative to
+ *   other event queue supplied as *priority* in rte_event_queue_setup()
+ *   c) If the device has RTE_EVENT_DEV_CAP_EVENT_QOS capability then event
+ *   queue selection from the list is based on event priority supplied as
+ *   *priority* in rte_event_enqueue_burst()
+ * 2) Selection of event
+ *   a) The number of flows available in selected event queue.
+ *   b) Schedule type method associated with the event
+ *
+ * On a successful dequeue, the event port holds flow id and schedule type
+ * context associated with the dispatched event. The context is automatically
+ * released in the next rte_event_dequeue() invocation, or rte_event_release()
+ * can be called to release the context early.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param[out] ev
+ *   Pointer to struct rte_event. On successful event dispatch, implementation
+ *   updates the event attributes.
+ * @param wait
+ *   0 - no-wait, returns immediately if there is no event.
+ *   >0 - wait for the event, if the device is configured with
+ *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
+ *   the event available or *wait* time.
+ *   if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+ *   then this function will wait until the event available or *dequeue_wait_ns*
+ *   ns which was previously supplied to rte_event_dev_configure()
+ *
+ * @return
+ * When true, a valid event has been dispatched by the scheduler.
+ *
+ */
+extern bool
+rte_event_dequeue(uint8_t dev_id, uint8_t port_id,
+		  struct rte_event *ev, uint64_t wait);
+
+/**
+ * Dequeue a burst of events objects from the event port designated by its
+ * *event_port_id*, on an event device designated by its *dev_id*.
+ *
+ * The rte_event_dequeue_burst() function is invoked to dequeue
+ * multiple event objects. It is the burst variant of rte_event_dequeue()
+ * function.
+ *
+ * The *num* parameter is the maximum number of event objects to dequeue which
+ * are returned in the *ev* array of *rte_event* structure.
+ *
+ * The rte_event_dequeue_burst() function returns the number of
+ * events objects it actually dequeued. A return value equal to
+ * *num* means that all event objects have been dequeued.
+ *
+ * The number of events dequeued is the number of scheduler contexts held by
+ * this port. These contexts are automatically released in the next
+ * rte_event_dequeue() invocation, or rte_event_release() can be called once
+ * per event to release the contexts early.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param[out] ev
+ *   An array of *num* pointers to *rte_event* structure which is populated
+ *   with the dequeued event objects.
+ * @param num
+ *   The maximum number of event objects to dequeue, typically number of
+ *   rte_event_port_dequeue_depth() available for this port.
+ * @param wait
+ *   0 - no-wait, returns immediately if there is no event.
+ *   >0 - wait for the event, if the device is configured with
+ *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until the
+ *   event available or *wait* time.
+ *   if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+ *   then this function will wait until the event available or *dequeue_wait_ns*
+ *   ns which was previously supplied to rte_event_dev_configure()
+ *
+ * @return
+ * The number of event objects actually dequeued from the port. The return
+ * value can be less than the value of the *num* parameter when the
+ * event port's queue is not full.
+ *
+ * \see rte_event_dequeue(), rte_event_port_dequeue_depth()
+ */
+extern int
+rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id,
+			struct rte_event *ev, int num, uint64_t wait);
+
+/**
+ * Release the current flow context associated with a schedule type which
+ * dequeued from a given event queue though the event port designated by
+ * its *port_id*
+ *
+ * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
+ * then this function hints the scheduler that the user has completed critical
+ * section processing in the current atomic context.
+ * The scheduler is now allowed to schedule events from the same flow from
+ * an event queue to another port. However, the context may be still held
+ * until the next rte_event_dequeue() or rte_event_dequeue_burst() call, this
+ * call allows but does not force the scheduler to release the context early.
+ *
+ * Early atomic context release may increase parallelism and thus system
+ * performance, but the user needs to design carefully the split into critical
+ * vs non-critical sections.
+ *
+ * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
+ * then this function hints the scheduler that the user has done all that need
+ * to maintain event order in the current ordered context.
+ * The scheduler is allowed to release the ordered context of this port and
+ * avoid reordering any following enqueues.
+ *
+ * Early ordered context release may increase parallelism and thus system
+ * performance.
+ *
+ * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
+ * or no scheduling context is held then this function may be an NOOP,
+ * depending on the implementation.
+ *
+ * If multiple events are dequeued with rte_event_dequeue_burst(),
+ * rte_event_release() will release each flow context associated with a
+ * schedule type of an event though *index*, it denotes the order in
+ * which it was dequeued with rte_event_dequeue_burst()
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param index
+ *   The index of the event that dequeued with rte_event_dequeue_burst()
+ *   which needs to release. The value zero used if the event dequeued with
+ *   rte_event_dequeue()
+ *
+ *  \see rte_event_dequeue(), rte_event_dequeue_burst()
+ */
+extern void
+rte_event_release(uint8_t dev_id, uint8_t port_id, uint8_t index);
+
+#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST  0
+/**< Highest event queue servicing priority */
+#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL   128
+/**< Normal event queue servicing priority */
+#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST   255
+/**< Lowest event queue servicing priority */
+
+/** Structure to hold the queue to port link establishment attributes */
+struct rte_event_queue_link {
+	uint8_t queue_id;
+	/**< Event queue identifier to select the source queue to link */
+	uint8_t priority;
+	/**< The priority of the event queue for this event port.
+	 * The priority defines the event port's servicing priority for
+	 * event queue, which may be ignored by an implementation.
+	 * The requested priority should in the range of
+	 * [RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST,
+	 * RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST].
+	 * The implementation shall normalize the requested priority to
+	 * implementation supported priority value.
+	 */
+};
+
+/**
+ * Link multiple source event queues supplied in *rte_event_queue_link*
+ * structure as *queue_id* to the destination event port designated by its
+ * *port_id* on the event device designated by its *dev_id*.
+ *
+ * The link establishment shall enable the event port *port_id* from
+ * receiving events from the specified event queue *queue_id*
+ *
+ * An event queue may link to one or more event ports.
+ * The number of links can be established from an event queue to event port is
+ * implementation defined.
+ *
+ * Event queue(s) to event port link establishment can be changed at runtime
+ * without re-configuring the device to support scaling and to reduce the
+ * latency of critical work by establishing the link with more event ports
+ * at runtime.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param port_id
+ *   Event port identifier to select the destination port to link.
+ *
+ * @param link
+ *   An array of *num* pointers to *rte_event_queue_link* structure
+ *   which contain the event queue to event port link establishment attributes.
+ *   NULL value is allowed, in which case this function links all the configured
+ *   event queues *nb_event_queues* which previously supplied to
+ *   rte_event_dev_configure() to the event port *port_id* with normal servicing
+ *   priority(RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL).
+ *
+ * @param num
+ *   The number of links to establish
+ *
+ * @return
+ * The number of links actually established on the event device. The return
+ * value can be less than the value of the *num* parameter when the
+ * implementation has the limitation on specific queue to port link
+ * establishment or if invalid parameters are specified
+ * in a *rte_event_queue_link*.
+ * If the return value is less than *num*, the remaining links at the end of
+ * link[] are not established, and the caller has to take care of them.
+ * If return value is less than *num* then implementation shall update the
+ * rte_errno accordingly, Possible rte_errno values are
+ * (-EDQUOT) Quota exceeded(Application tried to link the queue configured with
+ *  RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one event ports)
+ * (-EINVAL) Invalid parameter
+ *
+ */
+extern int
+rte_event_port_link(uint8_t dev_id, uint8_t port_id,
+		    struct rte_event_queue_link link[], int num);
+
+/**
+ * Unlink multiple source event queues supplied in *queues* from the destination
+ * event port designated by its *port_id* on the event device designated
+ * by its *dev_id*.
+ *
+ * The unlink establishment shall disable the event port *port_id* from
+ * receiving events from the specified event queue *queue_id*
+ *
+ * Event queue(s) to event port unlink establishment can be changed at runtime
+ * without re-configuring the device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param port_id
+ *   Event port identifier to select the destination port to unlink.
+ *
+ * @param queues
+ *   An array of *num* event queues to be unlinked from the event port.
+ *   NULL value is allowed, in which case this function unlinks all the
+ *   event queue(s) from the event port *port_id*.
+ *
+ * @param num
+ *   The number of unlinks to establish
+ *
+ * @return
+ * The number of unlinks actually established on the event device. The return
+ * value can be less than the value of the *num* parameter when the
+ * implementation has the limitation on specific queue to port unlink
+ * establishment or if invalid parameters are specified.
+ * If the return value is less than *num*, the remaining queues at the end of
+ * queues[] are not established, and the caller has to take care of them.
+ * If return value is less than *num* then implementation shall update the
+ * rte_errno accordingly, Possible rte_errno values are
+ * (-EINVAL) Invalid parameter
+ *
+ */
+extern int
+rte_event_port_unlink(uint8_t dev_id, uint8_t port_id,
+		    uint8_t queues[], int num);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_EVENTDEV_H_ */