[RFC,1/3] eventdev: allow for event devices requiring maintenance

Message ID 20200408175655.18879-1-mattias.ronnblom@ericsson.com (mailing list archive)
State RFC, archived
Delegated to: Jerin Jacob
Headers
Series [RFC,1/3] eventdev: allow for event devices requiring maintenance |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Mattias Rönnblom April 8, 2020, 5:56 p.m. UTC
  Extend Eventdev API to allow for event devices which require various
forms of internal processing to happen, even when events are not
enqueued to or dequeued from a port.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
 lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
 2 files changed, 79 insertions(+)
  

Comments

Jerin Jacob April 8, 2020, 7:36 p.m. UTC | #1
On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> Extend Eventdev API to allow for event devices which require various
> forms of internal processing to happen, even when events are not
> enqueued to or dequeued from a port.
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
>  lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
>  2 files changed, 79 insertions(+)
>
> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
> index 226f352ad..d69150792 100644
> --- a/lib/librte_eventdev/rte_eventdev.h
> +++ b/lib/librte_eventdev/rte_eventdev.h
> @@ -289,6 +289,15 @@ struct rte_event;
>   * single queue to each port or map a single queue to many port.
>   */
>
> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
> +/**< Event device requires calls to rte_event_maintain() during

This scheme would call for DSW specific API handling in fastpath.

> + * periods when neither rte_event_dequeue_burst() nor

The typical worker thread will be
while (1) {
                rte_event_dequeue_burst();
                 ..proess..
                rte_event_enqueue_burst();
}
If so, Why DSW driver can't do the maintenance in driver context in
dequeue() call.


> + * rte_event_enqueue_burst() are called on a port. This will allow the
> + * event device to perform internal processing, such as flushing
> + * buffered events, return credits to a global pool, or process
> + * signaling related to load balancing.
> + */
  
Mattias Rönnblom April 9, 2020, 12:21 p.m. UTC | #2
On 2020-04-08 21:36, Jerin Jacob wrote:
> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> Extend Eventdev API to allow for event devices which require various
>> forms of internal processing to happen, even when events are not
>> enqueued to or dequeued from a port.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
>>   lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
>>   2 files changed, 79 insertions(+)
>>
>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
>> index 226f352ad..d69150792 100644
>> --- a/lib/librte_eventdev/rte_eventdev.h
>> +++ b/lib/librte_eventdev/rte_eventdev.h
>> @@ -289,6 +289,15 @@ struct rte_event;
>>    * single queue to each port or map a single queue to many port.
>>    */
>>
>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
>> +/**< Event device requires calls to rte_event_maintain() during
> This scheme would call for DSW specific API handling in fastpath.


Initially this would be so, but buffering events might yield performance 
benefits for more event devices than DSW.


In an application, it's often convenient, but sub-optimal from a 
performance point of view, to do single-event enqueue operations. The 
alternative is to use an application-level buffer, and the flush this 
buffer with rte_event_enqueue_burst(). If you allow the event device to 
buffer, you get the simplicity of single-event enqueue operations, but 
without taking any noticeable performance hit.


>> + * periods when neither rte_event_dequeue_burst() nor
> The typical worker thread will be
> while (1) {
>                  rte_event_dequeue_burst();
>                   ..proess..
>                  rte_event_enqueue_burst();
> }
> If so, Why DSW driver can't do the maintenance in driver context in
> dequeue() call.
>

DSW already does maintenance on dequeue, and works well in the above 
scenario. The typical worker does not need to care about the 
rte_event_maintain() functions, since it dequeues events on a regular basis.


What this RFC addresses is the more atypical (but still fairly common) 
case of a port being neither dequeued to or enqueued from on a regular 
basis. The timer and ethernet rx adapters are examples of such.


>> + * rte_event_enqueue_burst() are called on a port. This will allow the
>> + * event device to perform internal processing, such as flushing
>> + * buffered events, return credits to a global pool, or process
>> + * signaling related to load balancing.
>> + */
  
Jerin Jacob April 9, 2020, 1:32 p.m. UTC | #3
On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2020-04-08 21:36, Jerin Jacob wrote:
> > On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >> Extend Eventdev API to allow for event devices which require various
> >> forms of internal processing to happen, even when events are not
> >> enqueued to or dequeued from a port.
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> ---
> >>   lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
> >>   lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
> >>   2 files changed, 79 insertions(+)
> >>
> >> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
> >> index 226f352ad..d69150792 100644
> >> --- a/lib/librte_eventdev/rte_eventdev.h
> >> +++ b/lib/librte_eventdev/rte_eventdev.h
> >> @@ -289,6 +289,15 @@ struct rte_event;
> >>    * single queue to each port or map a single queue to many port.
> >>    */
> >>
> >> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
> >> +/**< Event device requires calls to rte_event_maintain() during
> > This scheme would call for DSW specific API handling in fastpath.
>
>
> Initially this would be so, but buffering events might yield performance
> benefits for more event devices than DSW.
>
>
> In an application, it's often convenient, but sub-optimal from a
> performance point of view, to do single-event enqueue operations. The
> alternative is to use an application-level buffer, and the flush this
> buffer with rte_event_enqueue_burst(). If you allow the event device to
> buffer, you get the simplicity of single-event enqueue operations, but
> without taking any noticeable performance hit.

IMO, It is better to aggregate the burst by the application,  as sending
event by event to the driver to aggregate has performance due to cost
function pointer overhead.

Another concern is the frequency of calling rte_event_maintain() function by
the application, as the timing requirements will vary differently by
the driver to driver and application to application.
IMO, It is not portable and I believe the application should not be
aware of those details. If the driver needs specific maintenance
function for any other reason then better to use DPDK SERVICE core infra.

>
>
> >> + * periods when neither rte_event_dequeue_burst() nor
> > The typical worker thread will be
> > while (1) {
> >                  rte_event_dequeue_burst();
> >                   ..proess..
> >                  rte_event_enqueue_burst();
> > }
> > If so, Why DSW driver can't do the maintenance in driver context in
> > dequeue() call.
> >
>
> DSW already does maintenance on dequeue, and works well in the above
> scenario. The typical worker does not need to care about the
> rte_event_maintain() functions, since it dequeues events on a regular basis.
>
>
> What this RFC addresses is the more atypical (but still fairly common)
> case of a port being neither dequeued to or enqueued from on a regular
> basis. The timer and ethernet rx adapters are examples of such.

If it is an Adapter specific use case problem then maybe, we have
an option to fix the problem in adapter specific API usage or in that area.


>
>
> >> + * rte_event_enqueue_burst() are called on a port. This will allow the
> >> + * event device to perform internal processing, such as flushing
> >> + * buffered events, return credits to a global pool, or process
> >> + * signaling related to load balancing.
> >> + */
>
>
  
Eads, Gage April 9, 2020, 1:33 p.m. UTC | #4
> >> diff --git a/lib/librte_eventdev/rte_eventdev.h
> >> b/lib/librte_eventdev/rte_eventdev.h
> >> index 226f352ad..d69150792 100644
> >> --- a/lib/librte_eventdev/rte_eventdev.h
> >> +++ b/lib/librte_eventdev/rte_eventdev.h
> >> @@ -289,6 +289,15 @@ struct rte_event;
> >>    * single queue to each port or map a single queue to many port.
> >>    */
> >>
> >> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9) /**<
> Event
> >> +device requires calls to rte_event_maintain() during
> > This scheme would call for DSW specific API handling in fastpath.
> 
> 
> Initially this would be so, but buffering events might yield performance
> benefits for more event devices than DSW.
> 

I agree. For applications that process and enqueue one event at a time, buffering in the PMD could give a performance boost with minimal code changes (assuming the application can tolerate higher packet latency caused by buffering).

> 
> In an application, it's often convenient, but sub-optimal from a
> performance point of view, to do single-event enqueue operations. The
> alternative is to use an application-level buffer, and the flush this
> buffer with rte_event_enqueue_burst(). If you allow the event device to
> buffer, you get the simplicity of single-event enqueue operations, but
> without taking any noticeable performance hit.
> 
> 
> >> + * periods when neither rte_event_dequeue_burst() nor
> > The typical worker thread will be
> > while (1) {
> >                  rte_event_dequeue_burst();
> >                   ..proess..
> >                  rte_event_enqueue_burst();
> > }
> > If so, Why DSW driver can't do the maintenance in driver context in
> > dequeue() call.
> >
> 
> DSW already does maintenance on dequeue, and works well in the above
> scenario. The typical worker does not need to care about the
> rte_event_maintain() functions, since it dequeues events on a regular basis.
> 
> 
> What this RFC addresses is the more atypical (but still fairly common)
> case of a port being neither dequeued to or enqueued from on a regular
> basis. The timer and ethernet rx adapters are examples of such.
> 

Those two adapters have application-level buffering already, so adding PMD-level buffering feels unnecessary. Could DSW support this behavior on a port-by-port basis?

If so, I'm picturing something like:
- Add a "PMD buffering" eventdev capability
- If an eventdev has that capability, its ports can be configured for PMD-level buffering (default: no buffering)
-- Convert " uint8_t disable_implicit_release" to a flags bitmap (e.g. "uint8_t event_port_cfg"), with one flag for implicit release disable and another for PMD-level buffering
-- I suspect we can maintain ABI compatibility with function versioning on rte_event_port_setup() and rte_event_port_default_conf_get(), and this flags bitmap could be extended out to 32 bits in 20.11.
- Add "flush" semantics either to a new interface or extend an existing one. I'm partial to a new interface, to avoid an additional check in e.g. the dequeue code. And putting the flush in dequeue doesn't allow an app to batch across multiple iterations of the dequeue-process-enqueue loop.
- Extend rte_event_port_attr_get() to allow users to query this new setting. Adapters that don't call the flush function could error out if the adapter's port is configured for PMD-level buffering.

(eventdev should also forbid "PMD-level buffering" and "implicit release" used together...it's easy to imagine double-release errors occurring otherwise.)

I think this accomplishes Mattias' objective, and there's no effect on existing apps or adapters unless they choose to enable this behavior.

Granted, existing apps would likely see performance loss with dsw until they enable this config option. But perhaps it's worth it to get this behavior properly supported in the interface.

Thanks,
Gage
  
Mattias Rönnblom April 9, 2020, 2:02 p.m. UTC | #5
On 2020-04-09 15:32, Jerin Jacob wrote:
> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> On 2020-04-08 21:36, Jerin Jacob wrote:
>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>> Extend Eventdev API to allow for event devices which require various
>>>> forms of internal processing to happen, even when events are not
>>>> enqueued to or dequeued from a port.
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> ---
>>>>    lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
>>>>    lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
>>>>    2 files changed, 79 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
>>>> index 226f352ad..d69150792 100644
>>>> --- a/lib/librte_eventdev/rte_eventdev.h
>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
>>>> @@ -289,6 +289,15 @@ struct rte_event;
>>>>     * single queue to each port or map a single queue to many port.
>>>>     */
>>>>
>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
>>>> +/**< Event device requires calls to rte_event_maintain() during
>>> This scheme would call for DSW specific API handling in fastpath.
>>
>> Initially this would be so, but buffering events might yield performance
>> benefits for more event devices than DSW.
>>
>>
>> In an application, it's often convenient, but sub-optimal from a
>> performance point of view, to do single-event enqueue operations. The
>> alternative is to use an application-level buffer, and the flush this
>> buffer with rte_event_enqueue_burst(). If you allow the event device to
>> buffer, you get the simplicity of single-event enqueue operations, but
>> without taking any noticeable performance hit.
> IMO, It is better to aggregate the burst by the application,  as sending
> event by event to the driver to aggregate has performance due to cost
> function pointer overhead.


That's a very slight overhead - but for optimal performance, sure. It'll 
come at a cost in terms of code complexity. Just look at the adapters. 
They do this already. I think some applications are ready to take the 
extra 5-10 clock cycles or so it'll cost them to do the function call 
(provided the event device had buffering support).


> Another concern is the frequency of calling rte_event_maintain() function by
> the application, as the timing requirements will vary differently by
> the driver to driver and application to application.
> IMO, It is not portable and I believe the application should not be
> aware of those details. If the driver needs specific maintenance
> function for any other reason then better to use DPDK SERVICE core infra.


The only thing the application needs to be aware of, is that it needs to 
call rte_event_maintain() as often as it would have called dequeue() in 
your "typical worker" example. To make sure this call is cheap-enough is 
up to the driver, and this needs to hold true for all event devices that 
needs maintenance.


If you plan to use a non-buffering hardware device driver or a soft, 
centralized scheduler that doesn't need this, it will also not set the 
flag, and thus the application needs not care about the 
rte_event_maintain() function. DPDK code such as the eventdev adapters 
do need to care, but the increase in complexity is slight, and the cost 
of calling rte_maintain_event() on a maintenance-free devices is very 
low (since the then-NULL function pointer is in the eventdev struct, 
likely on a cache-line already dragged in).


Unfortunately, DPDK doesn't have a per-core delayed-work mechanism. 
Flushing event buffers (and other DSW "background work") can't be done 
on a service core, since they would work on non-MT-safe data structures 
on the worker thread's event ports.


>>
>>>> + * periods when neither rte_event_dequeue_burst() nor
>>> The typical worker thread will be
>>> while (1) {
>>>                   rte_event_dequeue_burst();
>>>                    ..proess..
>>>                   rte_event_enqueue_burst();
>>> }
>>> If so, Why DSW driver can't do the maintenance in driver context in
>>> dequeue() call.
>>>
>> DSW already does maintenance on dequeue, and works well in the above
>> scenario. The typical worker does not need to care about the
>> rte_event_maintain() functions, since it dequeues events on a regular basis.
>>
>>
>> What this RFC addresses is the more atypical (but still fairly common)
>> case of a port being neither dequeued to or enqueued from on a regular
>> basis. The timer and ethernet rx adapters are examples of such.
> If it is an Adapter specific use case problem then maybe, we have
> an option to fix the problem in adapter specific API usage or in that area.
>

It's not adapter specific, I think. There might be producer-only ports, 
for example, which doesn't provide a constant stream of events, but 
rather intermittent bursts. A traffic generator is one example of such 
an application, and there might be other, less synthetic ones as well.


>>
>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
>>>> + * event device to perform internal processing, such as flushing
>>>> + * buffered events, return credits to a global pool, or process
>>>> + * signaling related to load balancing.
>>>> + */
>>
  
Mattias Rönnblom April 9, 2020, 2:14 p.m. UTC | #6
On 2020-04-09 15:33, Eads, Gage wrote:
>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h
>>>> b/lib/librte_eventdev/rte_eventdev.h
>>>> index 226f352ad..d69150792 100644
>>>> --- a/lib/librte_eventdev/rte_eventdev.h
>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
>>>> @@ -289,6 +289,15 @@ struct rte_event;
>>>>     * single queue to each port or map a single queue to many port.
>>>>     */
>>>>
>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9) /**<
>> Event
>>>> +device requires calls to rte_event_maintain() during
>>> This scheme would call for DSW specific API handling in fastpath.
>>
>> Initially this would be so, but buffering events might yield performance
>> benefits for more event devices than DSW.
>>
> I agree. For applications that process and enqueue one event at a time, buffering in the PMD could give a performance boost with minimal code changes (assuming the application can tolerate higher packet latency caused by buffering).
>
>> In an application, it's often convenient, but sub-optimal from a
>> performance point of view, to do single-event enqueue operations. The
>> alternative is to use an application-level buffer, and the flush this
>> buffer with rte_event_enqueue_burst(). If you allow the event device to
>> buffer, you get the simplicity of single-event enqueue operations, but
>> without taking any noticeable performance hit.
>>
>>
>>>> + * periods when neither rte_event_dequeue_burst() nor
>>> The typical worker thread will be
>>> while (1) {
>>>                   rte_event_dequeue_burst();
>>>                    ..proess..
>>>                   rte_event_enqueue_burst();
>>> }
>>> If so, Why DSW driver can't do the maintenance in driver context in
>>> dequeue() call.
>>>
>> DSW already does maintenance on dequeue, and works well in the above
>> scenario. The typical worker does not need to care about the
>> rte_event_maintain() functions, since it dequeues events on a regular basis.
>>
>>
>> What this RFC addresses is the more atypical (but still fairly common)
>> case of a port being neither dequeued to or enqueued from on a regular
>> basis. The timer and ethernet rx adapters are examples of such.
>>
> Those two adapters have application-level buffering already, so adding PMD-level buffering feels unnecessary. Could DSW support this behavior on a port-by-port basis?


Flushing event buffers is just one of DSW's "background tasks". It also 
updates the load estimate, so that other ports, considering migrating 
flows to this port, has a recent-enough data to work with. In addition, 
DSW periodically considered flow migration (i.e. load balancing), which 
includes signaling between the ports. Even idle ports needs to respond 
to these signals, thus they need to be "maintained". While buffering can 
be made optional, the rest of the above can't.


DPDK eventdev seems to aspire to allow distributed scheduler 
implementation, considering it has the 
RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED flag since long. I've put some 
thought into this, and I have yet to find a solution which can avoid 
this kind "background tasks" for an efficient atomic scheduler with 
dynamic load balancing.


> If so, I'm picturing something like:
> - Add a "PMD buffering" eventdev capability
> - If an eventdev has that capability, its ports can be configured for PMD-level buffering (default: no buffering)
> -- Convert " uint8_t disable_implicit_release" to a flags bitmap (e.g. "uint8_t event_port_cfg"), with one flag for implicit release disable and another for PMD-level buffering
> -- I suspect we can maintain ABI compatibility with function versioning on rte_event_port_setup() and rte_event_port_default_conf_get(), and this flags bitmap could be extended out to 32 bits in 20.11.
> - Add "flush" semantics either to a new interface or extend an existing one. I'm partial to a new interface, to avoid an additional check in e.g. the dequeue code. And putting the flush in dequeue doesn't allow an app to batch across multiple iterations of the dequeue-process-enqueue loop.
> - Extend rte_event_port_attr_get() to allow users to query this new setting. Adapters that don't call the flush function could error out if the adapter's port is configured for PMD-level buffering.
>
> (eventdev should also forbid "PMD-level buffering" and "implicit release" used together...it's easy to imagine double-release errors occurring otherwise.)
> I think this accomplishes Mattias' objective, and there's no effect on existing apps or adapters unless they choose to enable this behavior.
>
> Granted, existing apps would likely see performance loss with dsw until they enable this config option. But perhaps it's worth it to get this behavior properly supported in the interface.
>
> Thanks,
> Gage
  
Jerin Jacob April 10, 2020, 1 p.m. UTC | #7
On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2020-04-09 15:32, Jerin Jacob wrote:
> > On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >> On 2020-04-08 21:36, Jerin Jacob wrote:
> >>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
> >>> <mattias.ronnblom@ericsson.com> wrote:
> >>>> Extend Eventdev API to allow for event devices which require various
> >>>> forms of internal processing to happen, even when events are not
> >>>> enqueued to or dequeued from a port.
> >>>>
> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>> ---
> >>>>    lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
> >>>>    lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
> >>>>    2 files changed, 79 insertions(+)
> >>>>
> >>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
> >>>> index 226f352ad..d69150792 100644
> >>>> --- a/lib/librte_eventdev/rte_eventdev.h
> >>>> +++ b/lib/librte_eventdev/rte_eventdev.h
> >>>> @@ -289,6 +289,15 @@ struct rte_event;
> >>>>     * single queue to each port or map a single queue to many port.
> >>>>     */
> >>>>
> >>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
> >>>> +/**< Event device requires calls to rte_event_maintain() during
> >>> This scheme would call for DSW specific API handling in fastpath.
> >>
> >> Initially this would be so, but buffering events might yield performance
> >> benefits for more event devices than DSW.
> >>
> >>
> >> In an application, it's often convenient, but sub-optimal from a
> >> performance point of view, to do single-event enqueue operations. The
> >> alternative is to use an application-level buffer, and the flush this
> >> buffer with rte_event_enqueue_burst(). If you allow the event device to
> >> buffer, you get the simplicity of single-event enqueue operations, but
> >> without taking any noticeable performance hit.
> > IMO, It is better to aggregate the burst by the application,  as sending
> > event by event to the driver to aggregate has performance due to cost
> > function pointer overhead.
>
>
> That's a very slight overhead - but for optimal performance, sure. It'll
> come at a cost in terms of code complexity. Just look at the adapters.
> They do this already. I think some applications are ready to take the
> extra 5-10 clock cycles or so it'll cost them to do the function call
> (provided the event device had buffering support).

So Is there any advantage of moving aggregation logic to PMD? it is costly.

>
>
> > Another concern is the frequency of calling rte_event_maintain() function by
> > the application, as the timing requirements will vary differently by
> > the driver to driver and application to application.
> > IMO, It is not portable and I believe the application should not be
> > aware of those details. If the driver needs specific maintenance
> > function for any other reason then better to use DPDK SERVICE core infra.
>
>
> The only thing the application needs to be aware of, is that it needs to
> call rte_event_maintain() as often as it would have called dequeue() in
> your "typical worker" example. To make sure this call is cheap-enough is
> up to the driver, and this needs to hold true for all event devices that
> needs maintenance.

Why not rte_event_maintain() can't do either in dequeue() or enqueue()
in the driver context? Either one of them has to be called
periodically by application
in any case?


>
>
> If you plan to use a non-buffering hardware device driver or a soft,
> centralized scheduler that doesn't need this, it will also not set the
> flag, and thus the application needs not care about the
> rte_event_maintain() function. DPDK code such as the eventdev adapters
> do need to care, but the increase in complexity is slight, and the cost
> of calling rte_maintain_event() on a maintenance-free devices is very
> low (since the then-NULL function pointer is in the eventdev struct,
> likely on a cache-line already dragged in).
>
>
> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
> Flushing event buffers (and other DSW "background work") can't be done
> on a service core, since they would work on non-MT-safe data structures
> on the worker thread's event ports.

Yes. Otherwise, DSW needs to update to support MT safe.

>
>
> >>
> >>>> + * periods when neither rte_event_dequeue_burst() nor
> >>> The typical worker thread will be
> >>> while (1) {
> >>>                   rte_event_dequeue_burst();
> >>>                    ..proess..
> >>>                   rte_event_enqueue_burst();
> >>> }
> >>> If so, Why DSW driver can't do the maintenance in driver context in
> >>> dequeue() call.
> >>>
> >> DSW already does maintenance on dequeue, and works well in the above
> >> scenario. The typical worker does not need to care about the
> >> rte_event_maintain() functions, since it dequeues events on a regular basis.
> >>
> >>
> >> What this RFC addresses is the more atypical (but still fairly common)
> >> case of a port being neither dequeued to or enqueued from on a regular
> >> basis. The timer and ethernet rx adapters are examples of such.
> > If it is an Adapter specific use case problem then maybe, we have
> > an option to fix the problem in adapter specific API usage or in that area.
> >
>
> It's not adapter specific, I think. There might be producer-only ports,
> for example, which doesn't provide a constant stream of events, but
> rather intermittent bursts. A traffic generator is one example of such
> an application, and there might be other, less synthetic ones as well.

In that case, the application knows the purpose of the eventdev port.
Is changing eventdev spec to configure "port" or "queue" for that use
case help? Meaning, DSW or
Any driver can get the hint and change the function pointers
accordingly for fastpath.
For instance, do maintenance on enqueue() for such ports or so.


>
>
> >>
> >>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
> >>>> + * event device to perform internal processing, such as flushing
> >>>> + * buffered events, return credits to a global pool, or process
> >>>> + * signaling related to load balancing.
> >>>> + */
> >>
>
  
Mattias Rönnblom April 14, 2020, 3:57 p.m. UTC | #8
On 2020-04-10 15:00, Jerin Jacob wrote:
> On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> On 2020-04-09 15:32, Jerin Jacob wrote:
>>> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>> On 2020-04-08 21:36, Jerin Jacob wrote:
>>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
>>>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>>> Extend Eventdev API to allow for event devices which require various
>>>>>> forms of internal processing to happen, even when events are not
>>>>>> enqueued to or dequeued from a port.
>>>>>>
>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>>> ---
>>>>>>     lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
>>>>>>     lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
>>>>>>     2 files changed, 79 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
>>>>>> index 226f352ad..d69150792 100644
>>>>>> --- a/lib/librte_eventdev/rte_eventdev.h
>>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
>>>>>> @@ -289,6 +289,15 @@ struct rte_event;
>>>>>>      * single queue to each port or map a single queue to many port.
>>>>>>      */
>>>>>>
>>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
>>>>>> +/**< Event device requires calls to rte_event_maintain() during
>>>>> This scheme would call for DSW specific API handling in fastpath.
>>>> Initially this would be so, but buffering events might yield performance
>>>> benefits for more event devices than DSW.
>>>>
>>>>
>>>> In an application, it's often convenient, but sub-optimal from a
>>>> performance point of view, to do single-event enqueue operations. The
>>>> alternative is to use an application-level buffer, and the flush this
>>>> buffer with rte_event_enqueue_burst(). If you allow the event device to
>>>> buffer, you get the simplicity of single-event enqueue operations, but
>>>> without taking any noticeable performance hit.
>>> IMO, It is better to aggregate the burst by the application,  as sending
>>> event by event to the driver to aggregate has performance due to cost
>>> function pointer overhead.
>>
>> That's a very slight overhead - but for optimal performance, sure. It'll
>> come at a cost in terms of code complexity. Just look at the adapters.
>> They do this already. I think some applications are ready to take the
>> extra 5-10 clock cycles or so it'll cost them to do the function call
>> (provided the event device had buffering support).
> So Is there any advantage of moving aggregation logic to PMD? it is costly.


What do you mean by aggregation logic?


>
>>
>>> Another concern is the frequency of calling rte_event_maintain() function by
>>> the application, as the timing requirements will vary differently by
>>> the driver to driver and application to application.
>>> IMO, It is not portable and I believe the application should not be
>>> aware of those details. If the driver needs specific maintenance
>>> function for any other reason then better to use DPDK SERVICE core infra.
>>
>> The only thing the application needs to be aware of, is that it needs to
>> call rte_event_maintain() as often as it would have called dequeue() in
>> your "typical worker" example. To make sure this call is cheap-enough is
>> up to the driver, and this needs to hold true for all event devices that
>> needs maintenance.
> Why not rte_event_maintain() can't do either in dequeue() or enqueue()
> in the driver context? Either one of them has to be called
> periodically by application
> in any case?


No, producer-only ports can go idle for long times. For applications 
that don't "go idle" need not worry about the maintain function.


>
>>
>> If you plan to use a non-buffering hardware device driver or a soft,
>> centralized scheduler that doesn't need this, it will also not set the
>> flag, and thus the application needs not care about the
>> rte_event_maintain() function. DPDK code such as the eventdev adapters
>> do need to care, but the increase in complexity is slight, and the cost
>> of calling rte_maintain_event() on a maintenance-free devices is very
>> low (since the then-NULL function pointer is in the eventdev struct,
>> likely on a cache-line already dragged in).
>>
>>
>> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
>> Flushing event buffers (and other DSW "background work") can't be done
>> on a service core, since they would work on non-MT-safe data structures
>> on the worker thread's event ports.
> Yes. Otherwise, DSW needs to update to support MT safe.


I haven't been looking at this in detail, but I suspect it will be both 
complex and not very performant. One of problems that need to be solved 
in such a solution, is the "pausing" of flows during migration. All 
participating lcores needs to ACK that a flow is paused.


>
>>
>>>>>> + * periods when neither rte_event_dequeue_burst() nor
>>>>> The typical worker thread will be
>>>>> while (1) {
>>>>>                    rte_event_dequeue_burst();
>>>>>                     ..proess..
>>>>>                    rte_event_enqueue_burst();
>>>>> }
>>>>> If so, Why DSW driver can't do the maintenance in driver context in
>>>>> dequeue() call.
>>>>>
>>>> DSW already does maintenance on dequeue, and works well in the above
>>>> scenario. The typical worker does not need to care about the
>>>> rte_event_maintain() functions, since it dequeues events on a regular basis.
>>>>
>>>>
>>>> What this RFC addresses is the more atypical (but still fairly common)
>>>> case of a port being neither dequeued to or enqueued from on a regular
>>>> basis. The timer and ethernet rx adapters are examples of such.
>>> If it is an Adapter specific use case problem then maybe, we have
>>> an option to fix the problem in adapter specific API usage or in that area.
>>>
>> It's not adapter specific, I think. There might be producer-only ports,
>> for example, which doesn't provide a constant stream of events, but
>> rather intermittent bursts. A traffic generator is one example of such
>> an application, and there might be other, less synthetic ones as well.
> In that case, the application knows the purpose of the eventdev port.
> Is changing eventdev spec to configure "port" or "queue" for that use
> case help? Meaning, DSW or
> Any driver can get the hint and change the function pointers
> accordingly for fastpath.
> For instance, do maintenance on enqueue() for such ports or so.


This is what DSW does already today. A dequeue() call with a zero-length 
event array serves the purpose of rte_event_maintain(). It's a bit of a 
hack, in my opinion.


>
>>
>>>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
>>>>>> + * event device to perform internal processing, such as flushing
>>>>>> + * buffered events, return credits to a global pool, or process
>>>>>> + * signaling related to load balancing.
>>>>>> + */
  
Jerin Jacob April 14, 2020, 4:15 p.m. UTC | #9
On Tue, Apr 14, 2020 at 9:27 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2020-04-10 15:00, Jerin Jacob wrote:
> > On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >> On 2020-04-09 15:32, Jerin Jacob wrote:
> >>> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
> >>> <mattias.ronnblom@ericsson.com> wrote:
> >>>> On 2020-04-08 21:36, Jerin Jacob wrote:
> >>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
> >>>>> <mattias.ronnblom@ericsson.com> wrote:
> >>>>>> Extend Eventdev API to allow for event devices which require various
> >>>>>> forms of internal processing to happen, even when events are not
> >>>>>> enqueued to or dequeued from a port.
> >>>>>>
> >>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>>> ---
> >>>>>>     lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
> >>>>>>     lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
> >>>>>>     2 files changed, 79 insertions(+)
> >>>>>>
> >>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
> >>>>>> index 226f352ad..d69150792 100644
> >>>>>> --- a/lib/librte_eventdev/rte_eventdev.h
> >>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
> >>>>>> @@ -289,6 +289,15 @@ struct rte_event;
> >>>>>>      * single queue to each port or map a single queue to many port.
> >>>>>>      */
> >>>>>>
> >>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
> >>>>>> +/**< Event device requires calls to rte_event_maintain() during
> >>>>> This scheme would call for DSW specific API handling in fastpath.
> >>>> Initially this would be so, but buffering events might yield performance
> >>>> benefits for more event devices than DSW.
> >>>>
> >>>>
> >>>> In an application, it's often convenient, but sub-optimal from a
> >>>> performance point of view, to do single-event enqueue operations. The
> >>>> alternative is to use an application-level buffer, and the flush this
> >>>> buffer with rte_event_enqueue_burst(). If you allow the event device to
> >>>> buffer, you get the simplicity of single-event enqueue operations, but
> >>>> without taking any noticeable performance hit.
> >>> IMO, It is better to aggregate the burst by the application,  as sending
> >>> event by event to the driver to aggregate has performance due to cost
> >>> function pointer overhead.
> >>
> >> That's a very slight overhead - but for optimal performance, sure. It'll
> >> come at a cost in terms of code complexity. Just look at the adapters.
> >> They do this already. I think some applications are ready to take the
> >> extra 5-10 clock cycles or so it'll cost them to do the function call
> >> (provided the event device had buffering support).
> > So Is there any advantage of moving aggregation logic to PMD? it is costly.
>
>
> What do you mean by aggregation logic?

aggregation == buffering.

>
>
> >
> >>
> >>> Another concern is the frequency of calling rte_event_maintain() function by
> >>> the application, as the timing requirements will vary differently by
> >>> the driver to driver and application to application.
> >>> IMO, It is not portable and I believe the application should not be
> >>> aware of those details. If the driver needs specific maintenance
> >>> function for any other reason then better to use DPDK SERVICE core infra.
> >>
> >> The only thing the application needs to be aware of, is that it needs to
> >> call rte_event_maintain() as often as it would have called dequeue() in
> >> your "typical worker" example. To make sure this call is cheap-enough is
> >> up to the driver, and this needs to hold true for all event devices that
> >> needs maintenance.
> > Why not rte_event_maintain() can't do either in dequeue() or enqueue()
> > in the driver context? Either one of them has to be called
> > periodically by application
> > in any case?
>
>
> No, producer-only ports can go idle for long times. For applications
> that don't "go idle" need not worry about the maintain function.

If I understand it correctly, If the worker does not call enqueue() or dequeue()
for a long time then maintain() needs to be called by the application.

That's where I concern with this API. What is the definition of long
time frame?(ns or ticks?)
Will it be changing driver to driver and arch to arch? And it is
leaking the driver requirements to the application.


>
>
> >
> >>
> >> If you plan to use a non-buffering hardware device driver or a soft,
> >> centralized scheduler that doesn't need this, it will also not set the
> >> flag, and thus the application needs not care about the
> >> rte_event_maintain() function. DPDK code such as the eventdev adapters
> >> do need to care, but the increase in complexity is slight, and the cost
> >> of calling rte_maintain_event() on a maintenance-free devices is very
> >> low (since the then-NULL function pointer is in the eventdev struct,
> >> likely on a cache-line already dragged in).
> >>
> >>
> >> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
> >> Flushing event buffers (and other DSW "background work") can't be done
> >> on a service core, since they would work on non-MT-safe data structures
> >> on the worker thread's event ports.
> > Yes. Otherwise, DSW needs to update to support MT safe.
>
>
> I haven't been looking at this in detail, but I suspect it will be both
> complex and not very performant. One of problems that need to be solved
> in such a solution, is the "pausing" of flows during migration. All
> participating lcores needs to ACK that a flow is paused.

Could you share the patch on the details on how much it costs?

>
>
> >
> >>
> >>>>>> + * periods when neither rte_event_dequeue_burst() nor
> >>>>> The typical worker thread will be
> >>>>> while (1) {
> >>>>>                    rte_event_dequeue_burst();
> >>>>>                     ..proess..
> >>>>>                    rte_event_enqueue_burst();
> >>>>> }
> >>>>> If so, Why DSW driver can't do the maintenance in driver context in
> >>>>> dequeue() call.
> >>>>>
> >>>> DSW already does maintenance on dequeue, and works well in the above
> >>>> scenario. The typical worker does not need to care about the
> >>>> rte_event_maintain() functions, since it dequeues events on a regular basis.
> >>>>
> >>>>
> >>>> What this RFC addresses is the more atypical (but still fairly common)
> >>>> case of a port being neither dequeued to or enqueued from on a regular
> >>>> basis. The timer and ethernet rx adapters are examples of such.
> >>> If it is an Adapter specific use case problem then maybe, we have
> >>> an option to fix the problem in adapter specific API usage or in that area.
> >>>
> >> It's not adapter specific, I think. There might be producer-only ports,
> >> for example, which doesn't provide a constant stream of events, but
> >> rather intermittent bursts. A traffic generator is one example of such
> >> an application, and there might be other, less synthetic ones as well.
> > In that case, the application knows the purpose of the eventdev port.
> > Is changing eventdev spec to configure "port" or "queue" for that use
> > case help? Meaning, DSW or
> > Any driver can get the hint and change the function pointers
> > accordingly for fastpath.
> > For instance, do maintenance on enqueue() for such ports or so.
>
>
> This is what DSW does already today. A dequeue() call with a zero-length
> event array serves the purpose of rte_event_maintain(). It's a bit of a
> hack, in my opinion.

I agree that it is the hack.

One more concern we have we are adding API for the new driver requirements and
not updating the example application. Sharing the example application
patch would
enable us to understand the cost and usage.(i.e Cost wrt to other SW drivers)



>
>
> >
> >>
> >>>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
> >>>>>> + * event device to perform internal processing, such as flushing
> >>>>>> + * buffered events, return credits to a global pool, or process
> >>>>>> + * signaling related to load balancing.
> >>>>>> + */
>
>
  
Mattias Rönnblom April 14, 2020, 5:55 p.m. UTC | #10
On 2020-04-14 18:15, Jerin Jacob wrote:
> On Tue, Apr 14, 2020 at 9:27 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> On 2020-04-10 15:00, Jerin Jacob wrote:
>>> On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>> On 2020-04-09 15:32, Jerin Jacob wrote:
>>>>> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
>>>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>>> On 2020-04-08 21:36, Jerin Jacob wrote:
>>>>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
>>>>>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>>>>> Extend Eventdev API to allow for event devices which require various
>>>>>>>> forms of internal processing to happen, even when events are not
>>>>>>>> enqueued to or dequeued from a port.
>>>>>>>>
>>>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>>>>> ---
>>>>>>>>      lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
>>>>>>>>      lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
>>>>>>>>      2 files changed, 79 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
>>>>>>>> index 226f352ad..d69150792 100644
>>>>>>>> --- a/lib/librte_eventdev/rte_eventdev.h
>>>>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
>>>>>>>> @@ -289,6 +289,15 @@ struct rte_event;
>>>>>>>>       * single queue to each port or map a single queue to many port.
>>>>>>>>       */
>>>>>>>>
>>>>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
>>>>>>>> +/**< Event device requires calls to rte_event_maintain() during
>>>>>>> This scheme would call for DSW specific API handling in fastpath.
>>>>>> Initially this would be so, but buffering events might yield performance
>>>>>> benefits for more event devices than DSW.
>>>>>>
>>>>>>
>>>>>> In an application, it's often convenient, but sub-optimal from a
>>>>>> performance point of view, to do single-event enqueue operations. The
>>>>>> alternative is to use an application-level buffer, and the flush this
>>>>>> buffer with rte_event_enqueue_burst(). If you allow the event device to
>>>>>> buffer, you get the simplicity of single-event enqueue operations, but
>>>>>> without taking any noticeable performance hit.
>>>>> IMO, It is better to aggregate the burst by the application,  as sending
>>>>> event by event to the driver to aggregate has performance due to cost
>>>>> function pointer overhead.
>>>> That's a very slight overhead - but for optimal performance, sure. It'll
>>>> come at a cost in terms of code complexity. Just look at the adapters.
>>>> They do this already. I think some applications are ready to take the
>>>> extra 5-10 clock cycles or so it'll cost them to do the function call
>>>> (provided the event device had buffering support).
>>> So Is there any advantage of moving aggregation logic to PMD? it is costly.
>>
>> What do you mean by aggregation logic?
> aggregation == buffering.


The reason I put it into DSW to begin with was that it yielded a 
significant performance benefit, for situations where the application 
would enqueue() few or even single events per call. For DSW it will 
translate to lower DPDK event ring overhead. I would imagine it could 
improve performance for hardware-based event devices as well.


>>
>>>>> Another concern is the frequency of calling rte_event_maintain() function by
>>>>> the application, as the timing requirements will vary differently by
>>>>> the driver to driver and application to application.
>>>>> IMO, It is not portable and I believe the application should not be
>>>>> aware of those details. If the driver needs specific maintenance
>>>>> function for any other reason then better to use DPDK SERVICE core infra.
>>>> The only thing the application needs to be aware of, is that it needs to
>>>> call rte_event_maintain() as often as it would have called dequeue() in
>>>> your "typical worker" example. To make sure this call is cheap-enough is
>>>> up to the driver, and this needs to hold true for all event devices that
>>>> needs maintenance.
>>> Why not rte_event_maintain() can't do either in dequeue() or enqueue()
>>> in the driver context? Either one of them has to be called
>>> periodically by application
>>> in any case?
>>
>> No, producer-only ports can go idle for long times. For applications
>> that don't "go idle" need not worry about the maintain function.
> If I understand it correctly, If the worker does not call enqueue() or dequeue()
> for a long time then maintain() needs to be called by the application.
>
> That's where I concern with this API. What is the definition of long
> time frame?(ns or ticks?)
> Will it be changing driver to driver and arch to arch? And it is
> leaking the driver requirements to the application.
>

It's difficult to quantify exactly, but the rate should be on the same 
order of magnitude as you would call dequeue() on a consumer-type worker 
port. All the RTE_EVENT_DEV_CAP_* are essentially represents such 
leakage, where the event device driver and/or the underlying hardware 
express various capabilities and limitations.


I'm not sure it needs to be much more complicated for the application to 
handle than the change to the event adapters I included in the patch. 
There, it boils down the service function call rate, which would be high 
during low load (causing buffers to be flush quickly etc), and a little 
lower during high lcore load.


>>
>>>> If you plan to use a non-buffering hardware device driver or a soft,
>>>> centralized scheduler that doesn't need this, it will also not set the
>>>> flag, and thus the application needs not care about the
>>>> rte_event_maintain() function. DPDK code such as the eventdev adapters
>>>> do need to care, but the increase in complexity is slight, and the cost
>>>> of calling rte_maintain_event() on a maintenance-free devices is very
>>>> low (since the then-NULL function pointer is in the eventdev struct,
>>>> likely on a cache-line already dragged in).
>>>>
>>>>
>>>> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
>>>> Flushing event buffers (and other DSW "background work") can't be done
>>>> on a service core, since they would work on non-MT-safe data structures
>>>> on the worker thread's event ports.
>>> Yes. Otherwise, DSW needs to update to support MT safe.
>>
>> I haven't been looking at this in detail, but I suspect it will be both
>> complex and not very performant. One of problems that need to be solved
>> in such a solution, is the "pausing" of flows during migration. All
>> participating lcores needs to ACK that a flow is paused.
> Could you share the patch on the details on how much it costs?


I don't have a ready-made solution for making lcore ports thread-safe.


>
>>
>>>>>>>> + * periods when neither rte_event_dequeue_burst() nor
>>>>>>> The typical worker thread will be
>>>>>>> while (1) {
>>>>>>>                     rte_event_dequeue_burst();
>>>>>>>                      ..proess..
>>>>>>>                     rte_event_enqueue_burst();
>>>>>>> }
>>>>>>> If so, Why DSW driver can't do the maintenance in driver context in
>>>>>>> dequeue() call.
>>>>>>>
>>>>>> DSW already does maintenance on dequeue, and works well in the above
>>>>>> scenario. The typical worker does not need to care about the
>>>>>> rte_event_maintain() functions, since it dequeues events on a regular basis.
>>>>>>
>>>>>>
>>>>>> What this RFC addresses is the more atypical (but still fairly common)
>>>>>> case of a port being neither dequeued to or enqueued from on a regular
>>>>>> basis. The timer and ethernet rx adapters are examples of such.
>>>>> If it is an Adapter specific use case problem then maybe, we have
>>>>> an option to fix the problem in adapter specific API usage or in that area.
>>>>>
>>>> It's not adapter specific, I think. There might be producer-only ports,
>>>> for example, which doesn't provide a constant stream of events, but
>>>> rather intermittent bursts. A traffic generator is one example of such
>>>> an application, and there might be other, less synthetic ones as well.
>>> In that case, the application knows the purpose of the eventdev port.
>>> Is changing eventdev spec to configure "port" or "queue" for that use
>>> case help? Meaning, DSW or
>>> Any driver can get the hint and change the function pointers
>>> accordingly for fastpath.
>>> For instance, do maintenance on enqueue() for such ports or so.
>>
>> This is what DSW does already today. A dequeue() call with a zero-length
>> event array serves the purpose of rte_event_maintain(). It's a bit of a
>> hack, in my opinion.
> I agree that it is the hack.
>
> One more concern we have we are adding API for the new driver requirements and
> not updating the example application. Sharing the example application
> patch would
> enable us to understand the cost and usage.(i.e Cost wrt to other SW drivers)


Good point.



>
>>
>>>>>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
>>>>>>>> + * event device to perform internal processing, such as flushing
>>>>>>>> + * buffered events, return credits to a global pool, or process
>>>>>>>> + * signaling related to load balancing.
>>>>>>>> + */
>>
  
Jerin Jacob April 16, 2020, 5:19 p.m. UTC | #11
On Tue, Apr 14, 2020 at 11:25 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2020-04-14 18:15, Jerin Jacob wrote:
> > On Tue, Apr 14, 2020 at 9:27 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >> On 2020-04-10 15:00, Jerin Jacob wrote:
> >>> On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
> >>> <mattias.ronnblom@ericsson.com> wrote:
> >>>> On 2020-04-09 15:32, Jerin Jacob wrote:
> >>>>> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
> >>>>> <mattias.ronnblom@ericsson.com> wrote:
> >>>>>> On 2020-04-08 21:36, Jerin Jacob wrote:
> >>>>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
> >>>>>>> <mattias.ronnblom@ericsson.com> wrote:
> >>>>>>>> Extend Eventdev API to allow for event devices which require various
> >>>>>>>> forms of internal processing to happen, even when events are not
> >>>>>>>> enqueued to or dequeued from a port.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>>>>> ---
> >>>>>>>>      lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
> >>>>>>>>      lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
> >>>>>>>>      2 files changed, 79 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
> >>>>>>>> index 226f352ad..d69150792 100644
> >>>>>>>> --- a/lib/librte_eventdev/rte_eventdev.h
> >>>>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
> >>>>>>>> @@ -289,6 +289,15 @@ struct rte_event;
> >>>>>>>>       * single queue to each port or map a single queue to many port.
> >>>>>>>>       */
> >>>>>>>>
> >>>>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
> >>>>>>>> +/**< Event device requires calls to rte_event_maintain() during
> >>>>>>> This scheme would call for DSW specific API handling in fastpath.
> >>>>>> Initially this would be so, but buffering events might yield performance
> >>>>>> benefits for more event devices than DSW.
> >>>>>>
> >>>>>>
> >>>>>> In an application, it's often convenient, but sub-optimal from a
> >>>>>> performance point of view, to do single-event enqueue operations. The
> >>>>>> alternative is to use an application-level buffer, and the flush this
> >>>>>> buffer with rte_event_enqueue_burst(). If you allow the event device to
> >>>>>> buffer, you get the simplicity of single-event enqueue operations, but
> >>>>>> without taking any noticeable performance hit.
> >>>>> IMO, It is better to aggregate the burst by the application,  as sending
> >>>>> event by event to the driver to aggregate has performance due to cost
> >>>>> function pointer overhead.
> >>>> That's a very slight overhead - but for optimal performance, sure. It'll
> >>>> come at a cost in terms of code complexity. Just look at the adapters.
> >>>> They do this already. I think some applications are ready to take the
> >>>> extra 5-10 clock cycles or so it'll cost them to do the function call
> >>>> (provided the event device had buffering support).
> >>> So Is there any advantage of moving aggregation logic to PMD? it is costly.
> >>
> >> What do you mean by aggregation logic?
> > aggregation == buffering.
>
>
> The reason I put it into DSW to begin with was that it yielded a
> significant performance benefit, for situations where the application
> would enqueue() few or even single events per call. For DSW it will
> translate to lower DPDK event ring overhead. I would imagine it could
> improve performance for hardware-based event devices as well.

Yes. we are already doing this in application. It makes sense for buffering.

>
>
> >>
> >>>>> Another concern is the frequency of calling rte_event_maintain() function by
> >>>>> the application, as the timing requirements will vary differently by
> >>>>> the driver to driver and application to application.
> >>>>> IMO, It is not portable and I believe the application should not be
> >>>>> aware of those details. If the driver needs specific maintenance
> >>>>> function for any other reason then better to use DPDK SERVICE core infra.
> >>>> The only thing the application needs to be aware of, is that it needs to
> >>>> call rte_event_maintain() as often as it would have called dequeue() in
> >>>> your "typical worker" example. To make sure this call is cheap-enough is
> >>>> up to the driver, and this needs to hold true for all event devices that
> >>>> needs maintenance.
> >>> Why not rte_event_maintain() can't do either in dequeue() or enqueue()
> >>> in the driver context? Either one of them has to be called
> >>> periodically by application
> >>> in any case?
> >>
> >> No, producer-only ports can go idle for long times. For applications
> >> that don't "go idle" need not worry about the maintain function.
> > If I understand it correctly, If the worker does not call enqueue() or dequeue()
> > for a long time then maintain() needs to be called by the application.
> >
> > That's where I concern with this API. What is the definition of long
> > time frame?(ns or ticks?)
> > Will it be changing driver to driver and arch to arch? And it is
> > leaking the driver requirements to the application.
> >
>
> It's difficult to quantify exactly, but the rate should be on the same
> order of magnitude as you would call dequeue() on a consumer-type worker

The challenge is if another driver has a different requirement for maintain()
interms of frequecey if is a public API then how we will abstract?

> port. All the RTE_EVENT_DEV_CAP_* are essentially represents such
> leakage, where the event device driver and/or the underlying hardware
> express various capabilities and limitations.

I agree. But in fastpath, we do not bring any such _functional_ difference.
If it is on a slow path then no issue at all.


>
>
> I'm not sure it needs to be much more complicated for the application to
> handle than the change to the event adapters I included in the patch.
> There, it boils down the service function call rate, which would be high
> during low load (causing buffers to be flush quickly etc), and a little
> lower during high lcore load.
>
>
> >>
> >>>> If you plan to use a non-buffering hardware device driver or a soft,
> >>>> centralized scheduler that doesn't need this, it will also not set the
> >>>> flag, and thus the application needs not care about the
> >>>> rte_event_maintain() function. DPDK code such as the eventdev adapters
> >>>> do need to care, but the increase in complexity is slight, and the cost
> >>>> of calling rte_maintain_event() on a maintenance-free devices is very
> >>>> low (since the then-NULL function pointer is in the eventdev struct,
> >>>> likely on a cache-line already dragged in).
> >>>>
> >>>>
> >>>> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
> >>>> Flushing event buffers (and other DSW "background work") can't be done
> >>>> on a service core, since they would work on non-MT-safe data structures
> >>>> on the worker thread's event ports.
> >>> Yes. Otherwise, DSW needs to update to support MT safe.
> >>
> >> I haven't been looking at this in detail, but I suspect it will be both
> >> complex and not very performant. One of problems that need to be solved
> >> in such a solution, is the "pausing" of flows during migration. All
> >> participating lcores needs to ACK that a flow is paused.
> > Could you share the patch on the details on how much it costs?
>
>
> I don't have a ready-made solution for making lcore ports thread-safe.

OK

>
>
> >
> >>
> >>>>>>>> + * periods when neither rte_event_dequeue_burst() nor
> >>>>>>> The typical worker thread will be
> >>>>>>> while (1) {
> >>>>>>>                     rte_event_dequeue_burst();
> >>>>>>>                      ..proess..
> >>>>>>>                     rte_event_enqueue_burst();
> >>>>>>> }
> >>>>>>> If so, Why DSW driver can't do the maintenance in driver context in
> >>>>>>> dequeue() call.
> >>>>>>>
> >>>>>> DSW already does maintenance on dequeue, and works well in the above
> >>>>>> scenario. The typical worker does not need to care about the
> >>>>>> rte_event_maintain() functions, since it dequeues events on a regular basis.
> >>>>>>
> >>>>>>
> >>>>>> What this RFC addresses is the more atypical (but still fairly common)
> >>>>>> case of a port being neither dequeued to or enqueued from on a regular
> >>>>>> basis. The timer and ethernet rx adapters are examples of such.
> >>>>> If it is an Adapter specific use case problem then maybe, we have
> >>>>> an option to fix the problem in adapter specific API usage or in that area.
> >>>>>
> >>>> It's not adapter specific, I think. There might be producer-only ports,
> >>>> for example, which doesn't provide a constant stream of events, but
> >>>> rather intermittent bursts. A traffic generator is one example of such
> >>>> an application, and there might be other, less synthetic ones as well.
> >>> In that case, the application knows the purpose of the eventdev port.
> >>> Is changing eventdev spec to configure "port" or "queue" for that use
> >>> case help? Meaning, DSW or
> >>> Any driver can get the hint and change the function pointers
> >>> accordingly for fastpath.
> >>> For instance, do maintenance on enqueue() for such ports or so.
> >>
> >> This is what DSW does already today. A dequeue() call with a zero-length
> >> event array serves the purpose of rte_event_maintain(). It's a bit of a
> >> hack, in my opinion.
> > I agree that it is the hack.
> >
> > One more concern we have we are adding API for the new driver requirements and
> > not updating the example application. Sharing the example application
> > patch would
> > enable us to understand the cost and usage.(i.e Cost wrt to other SW drivers)
>
>
> Good point.

I suggest sharing patch based on existing app/adapter usage, based on that, we
can analyze more on abstraction and cost analysis.



>
>
>
> >
> >>
> >>>>>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
> >>>>>>>> + * event device to perform internal processing, such as flushing
> >>>>>>>> + * buffered events, return credits to a global pool, or process
> >>>>>>>> + * signaling related to load balancing.
> >>>>>>>> + */
> >>
>
  
Mattias Rönnblom April 20, 2020, 9:05 a.m. UTC | #12
On 2020-04-16 19:19, Jerin Jacob wrote:
> On Tue, Apr 14, 2020 at 11:25 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> On 2020-04-14 18:15, Jerin Jacob wrote:
>>> On Tue, Apr 14, 2020 at 9:27 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>> On 2020-04-10 15:00, Jerin Jacob wrote:
>>>>> On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
>>>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>>> On 2020-04-09 15:32, Jerin Jacob wrote:
>>>>>>> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
>>>>>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>>>>> On 2020-04-08 21:36, Jerin Jacob wrote:
>>>>>>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
>>>>>>>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>>>>>>> Extend Eventdev API to allow for event devices which require various
>>>>>>>>>> forms of internal processing to happen, even when events are not
>>>>>>>>>> enqueued to or dequeued from a port.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>>>>>>> ---
>>>>>>>>>>       lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
>>>>>>>>>>       lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
>>>>>>>>>>       2 files changed, 79 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
>>>>>>>>>> index 226f352ad..d69150792 100644
>>>>>>>>>> --- a/lib/librte_eventdev/rte_eventdev.h
>>>>>>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
>>>>>>>>>> @@ -289,6 +289,15 @@ struct rte_event;
>>>>>>>>>>        * single queue to each port or map a single queue to many port.
>>>>>>>>>>        */
>>>>>>>>>>
>>>>>>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
>>>>>>>>>> +/**< Event device requires calls to rte_event_maintain() during
>>>>>>>>> This scheme would call for DSW specific API handling in fastpath.
>>>>>>>> Initially this would be so, but buffering events might yield performance
>>>>>>>> benefits for more event devices than DSW.
>>>>>>>>
>>>>>>>>
>>>>>>>> In an application, it's often convenient, but sub-optimal from a
>>>>>>>> performance point of view, to do single-event enqueue operations. The
>>>>>>>> alternative is to use an application-level buffer, and the flush this
>>>>>>>> buffer with rte_event_enqueue_burst(). If you allow the event device to
>>>>>>>> buffer, you get the simplicity of single-event enqueue operations, but
>>>>>>>> without taking any noticeable performance hit.
>>>>>>> IMO, It is better to aggregate the burst by the application,  as sending
>>>>>>> event by event to the driver to aggregate has performance due to cost
>>>>>>> function pointer overhead.
>>>>>> That's a very slight overhead - but for optimal performance, sure. It'll
>>>>>> come at a cost in terms of code complexity. Just look at the adapters.
>>>>>> They do this already. I think some applications are ready to take the
>>>>>> extra 5-10 clock cycles or so it'll cost them to do the function call
>>>>>> (provided the event device had buffering support).
>>>>> So Is there any advantage of moving aggregation logic to PMD? it is costly.
>>>> What do you mean by aggregation logic?
>>> aggregation == buffering.
>>
>> The reason I put it into DSW to begin with was that it yielded a
>> significant performance benefit, for situations where the application
>> would enqueue() few or even single events per call. For DSW it will
>> translate to lower DPDK event ring overhead. I would imagine it could
>> improve performance for hardware-based event devices as well.
> Yes. we are already doing this in application. It makes sense for buffering.


Should I read this as "we are assuming the application will do this"?


(I'm not saying it's a totally unreasonable assumption.)


>>
>>>>>>> Another concern is the frequency of calling rte_event_maintain() function by
>>>>>>> the application, as the timing requirements will vary differently by
>>>>>>> the driver to driver and application to application.
>>>>>>> IMO, It is not portable and I believe the application should not be
>>>>>>> aware of those details. If the driver needs specific maintenance
>>>>>>> function for any other reason then better to use DPDK SERVICE core infra.
>>>>>> The only thing the application needs to be aware of, is that it needs to
>>>>>> call rte_event_maintain() as often as it would have called dequeue() in
>>>>>> your "typical worker" example. To make sure this call is cheap-enough is
>>>>>> up to the driver, and this needs to hold true for all event devices that
>>>>>> needs maintenance.
>>>>> Why not rte_event_maintain() can't do either in dequeue() or enqueue()
>>>>> in the driver context? Either one of them has to be called
>>>>> periodically by application
>>>>> in any case?
>>>> No, producer-only ports can go idle for long times. For applications
>>>> that don't "go idle" need not worry about the maintain function.
>>> If I understand it correctly, If the worker does not call enqueue() or dequeue()
>>> for a long time then maintain() needs to be called by the application.
>>>
>>> That's where I concern with this API. What is the definition of long
>>> time frame?(ns or ticks?)
>>> Will it be changing driver to driver and arch to arch? And it is
>>> leaking the driver requirements to the application.
>>>
>> It's difficult to quantify exactly, but the rate should be on the same
>> order of magnitude as you would call dequeue() on a consumer-type worker
> The challenge is if another driver has a different requirement for maintain()
> interms of frequecey if is a public API then how we will abstract?


It's a challenge indeed, like so often is the case for poll-type APIs.


My thought, and what's implemented in DSW today, is that the application 
will call maintain() with a high frequency, as I've mentioned before. If 
the event device periodically needs to some perform heavy-weight 
operation, such an operation will not be perform on every call. Rather, 
the event device keeps a counter, and only perform these on every Nth 
call. For DSW, one such operation is when a DSW port considers if it 
needs to migrate flows, and in that case where those flows should emigrate.


Whatever maintain() is doing, it makes the event device machinery make 
progress in some way or the other. Calling it rarely will translate to 
latency. Failing to maintain a sufficient call rate shouldn't affect 
correctness.


>> port. All the RTE_EVENT_DEV_CAP_* are essentially represents such
>> leakage, where the event device driver and/or the underlying hardware
>> express various capabilities and limitations.
> I agree. But in fastpath, we do not bring any such _functional_ difference.
> If it is on a slow path then no issue at all.
>

I'm not sure I follow. What do you mean by functional here?


The capabilities certainly change API semantics, and as a result how 
applications need to behave - including fast-path behavior. For example, 
consider an application which prefers 
RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE-capable event devices, but 
also need a code path for those that don't.


>>
>> I'm not sure it needs to be much more complicated for the application to
>> handle than the change to the event adapters I included in the patch.
>> There, it boils down the service function call rate, which would be high
>> during low load (causing buffers to be flush quickly etc), and a little
>> lower during high lcore load.
>>
>>
>>>>>> If you plan to use a non-buffering hardware device driver or a soft,
>>>>>> centralized scheduler that doesn't need this, it will also not set the
>>>>>> flag, and thus the application needs not care about the
>>>>>> rte_event_maintain() function. DPDK code such as the eventdev adapters
>>>>>> do need to care, but the increase in complexity is slight, and the cost
>>>>>> of calling rte_maintain_event() on a maintenance-free devices is very
>>>>>> low (since the then-NULL function pointer is in the eventdev struct,
>>>>>> likely on a cache-line already dragged in).
>>>>>>
>>>>>>
>>>>>> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
>>>>>> Flushing event buffers (and other DSW "background work") can't be done
>>>>>> on a service core, since they would work on non-MT-safe data structures
>>>>>> on the worker thread's event ports.
>>>>> Yes. Otherwise, DSW needs to update to support MT safe.
>>>> I haven't been looking at this in detail, but I suspect it will be both
>>>> complex and not very performant. One of problems that need to be solved
>>>> in such a solution, is the "pausing" of flows during migration. All
>>>> participating lcores needs to ACK that a flow is paused.
>>> Could you share the patch on the details on how much it costs?
>>
>> I don't have a ready-made solution for making lcore ports thread-safe.
> OK
>
>>
>>>>>>>>>> + * periods when neither rte_event_dequeue_burst() nor
>>>>>>>>> The typical worker thread will be
>>>>>>>>> while (1) {
>>>>>>>>>                      rte_event_dequeue_burst();
>>>>>>>>>                       ..proess..
>>>>>>>>>                      rte_event_enqueue_burst();
>>>>>>>>> }
>>>>>>>>> If so, Why DSW driver can't do the maintenance in driver context in
>>>>>>>>> dequeue() call.
>>>>>>>>>
>>>>>>>> DSW already does maintenance on dequeue, and works well in the above
>>>>>>>> scenario. The typical worker does not need to care about the
>>>>>>>> rte_event_maintain() functions, since it dequeues events on a regular basis.
>>>>>>>>
>>>>>>>>
>>>>>>>> What this RFC addresses is the more atypical (but still fairly common)
>>>>>>>> case of a port being neither dequeued to or enqueued from on a regular
>>>>>>>> basis. The timer and ethernet rx adapters are examples of such.
>>>>>>> If it is an Adapter specific use case problem then maybe, we have
>>>>>>> an option to fix the problem in adapter specific API usage or in that area.
>>>>>>>
>>>>>> It's not adapter specific, I think. There might be producer-only ports,
>>>>>> for example, which doesn't provide a constant stream of events, but
>>>>>> rather intermittent bursts. A traffic generator is one example of such
>>>>>> an application, and there might be other, less synthetic ones as well.
>>>>> In that case, the application knows the purpose of the eventdev port.
>>>>> Is changing eventdev spec to configure "port" or "queue" for that use
>>>>> case help? Meaning, DSW or
>>>>> Any driver can get the hint and change the function pointers
>>>>> accordingly for fastpath.
>>>>> For instance, do maintenance on enqueue() for such ports or so.
>>>> This is what DSW does already today. A dequeue() call with a zero-length
>>>> event array serves the purpose of rte_event_maintain(). It's a bit of a
>>>> hack, in my opinion.
>>> I agree that it is the hack.
>>>
>>> One more concern we have we are adding API for the new driver requirements and
>>> not updating the example application. Sharing the example application
>>> patch would
>>> enable us to understand the cost and usage.(i.e Cost wrt to other SW drivers)
>>
>> Good point.
> I suggest sharing patch based on existing app/adapter usage, based on that, we
> can analyze more on abstraction and cost analysis.
>

Will do.


>
>>
>>
>>>>>>>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
>>>>>>>>>> + * event device to perform internal processing, such as flushing
>>>>>>>>>> + * buffered events, return credits to a global pool, or process
>>>>>>>>>> + * signaling related to load balancing.
>>>>>>>>>> + */
  
Mattias Rönnblom May 13, 2020, 6:56 p.m. UTC | #13
On 2020-04-14 19:55, Mattias Rönnblom wrote:
> On 2020-04-14 18:15, Jerin Jacob wrote:
>
<snip>
>> One more concern we have we are adding API for the new driver requirements and
>> not updating the example application. Sharing the example application
>> patch would
>> enable us to understand the cost and usage.(i.e Cost wrt to other SW drivers)
>
> Good point.
>
>

I went through the examples applications (l2fwd, l3fwd and 
eventdev_pipeline). From what I can tell, they require no changes. All 
follow the dequeue()+enqueue() pattern, and thus are both consumers and 
producers.
  

Patch

diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index 226f352ad..d69150792 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -289,6 +289,15 @@  struct rte_event;
  * single queue to each port or map a single queue to many port.
  */
 
+#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
+/**< Event device requires calls to rte_event_maintain() during
+ * periods when neither rte_event_dequeue_burst() nor
+ * rte_event_enqueue_burst() are called on a port. This will allow the
+ * event device to perform internal processing, such as flushing
+ * buffered events, return credits to a global pool, or process
+ * signaling related to load balancing.
+ */
+
 /* Event device priority levels */
 #define RTE_EVENT_DEV_PRIORITY_HIGHEST   0
 /**< Highest priority expressed across eventdev subsystem
@@ -1226,6 +1235,9 @@  typedef uint16_t (*event_dequeue_burst_t)(void *port, struct rte_event ev[],
 		uint16_t nb_events, uint64_t timeout_ticks);
 /**< @internal Dequeue burst of events from port of a device */
 
+typedef void (*maintain_t)(void *port);
+/**< @internal Maintains a port */
+
 typedef uint16_t (*event_tx_adapter_enqueue)(void *port,
 				struct rte_event ev[], uint16_t nb_events);
 /**< @internal Enqueue burst of events on port of a device */
@@ -1301,6 +1313,8 @@  struct rte_eventdev {
 	/**< Pointer to PMD dequeue function. */
 	event_dequeue_burst_t dequeue_burst;
 	/**< Pointer to PMD dequeue burst function. */
+	event_maintain_t maintain;
+	/**< Maintains an event port. */
 	event_tx_adapter_enqueue_same_dest txa_enqueue_same_dest;
 	/**< Pointer to PMD eth Tx adapter burst enqueue function with
 	 * events destined to same Eth port & Tx queue.
@@ -1634,6 +1648,57 @@  rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
 				timeout_ticks);
 }
 
+/**
+ * Maintain an event device.
+ *
+ * This function is only relevant for event devices which has the
+ * RTE_EVENT_DEV_CAP_REQUIRES_MAINT flag set. Such devices requires
+ * the application to call rte_event_maintain() on a port during periods
+ * which it is neither enqueuing nor dequeuing events from this
+ * port. No port may be left unattended.
+ *
+ * An event device's rte_event_maintain() is a low overhead function. In
+ * situations when rte_event_maintain() must be called, the application
+ * should do so often.
+ *
+ * rte_event_maintain() may be called on event devices which hasn't
+ * set RTE_EVENT_DEV_CAP_REQUIRES_MAINT flag, in which case it is a
+ * no-operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @return
+ *  - 0 on success.
+ *  - -ENOTSUP if the device doesn't have RTE_EVENT_DEV_CAP_REQUIRES_MAINT set
+ *  - -EINVAL if *dev_id* or *port_id* is invalid
+ *
+ * @see RTE_EVENT_DEV_CAP_REQUIRES_MAINT
+ */
+static inline void
+rte_event_maintain(uint8_t dev_id, uint8_t port_id)
+{
+	struct rte_eventdev *dev = &rte_eventdevs[dev_id];
+	event_maintain_t fn;
+
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+	if (dev_id >= RTE_EVENT_MAX_DEVS || !rte_eventdevs[dev_id].attached) {
+		rte_errno = EINVAL;
+		return;
+	}
+
+	if (port_id >= dev->data->nb_ports) {
+		rte_errno = EINVAL;
+		return;
+	}
+#endif
+	fn = *dev->maintain;
+
+	if (fn != NULL)
+		fn(dev->data->ports[port_id]);
+}
+
 /**
  * Link multiple source event queues supplied in *queues* to the destination
  * event port designated by its *port_id* with associated service priority
diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h
index d118b9e5b..327e4a2ac 100644
--- a/lib/librte_eventdev/rte_eventdev_pmd.h
+++ b/lib/librte_eventdev/rte_eventdev_pmd.h
@@ -364,6 +364,20 @@  typedef int (*eventdev_port_unlinks_in_progress_t)(struct rte_eventdev *dev,
 typedef int (*eventdev_dequeue_timeout_ticks_t)(struct rte_eventdev *dev,
 		uint64_t ns, uint64_t *timeout_ticks);
 
+/**
+ * Maintains an event port for RTE_EVENT_DEV_CAP_REQUIRES_MAINT devices.
+ *
+ * @param dev
+ *   Event device pointer
+ * @param port_id
+ *   Event port index
+ *
+ * @return
+ *   Returns 0 on success.
+ *
+ */
+typedef int (*eventdev_maintain_t)(struct rte_eventdev *dev, uint8_t port_id);
+
 /**
  * Dump internal information
  *