[dpdk-dev,v2,15/15] app/test: add unit tests for SW eventdev driver

Message ID 1485879273-86228-16-git-send-email-harry.van.haaren@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Bruce Richardson
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel compilation fail Compilation issues

Commit Message

Van Haaren, Harry Jan. 31, 2017, 4:14 p.m. UTC
  From: Bruce Richardson <bruce.richardson@intel.com>

Since the sw driver is a standalone lookaside device that has no HW
requirements, we can provide a set of unit tests that test its
functionality across the different queue types and with different input
scenarios.

This also adds the tests to be automatically run by autotest.py

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
 app/test/Makefile           |    5 +-
 app/test/autotest_data.py   |   26 +
 app/test/test_sw_eventdev.c | 2071 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 2101 insertions(+), 1 deletion(-)
 create mode 100644 app/test/test_sw_eventdev.c
  

Comments

Jerin Jacob Feb. 8, 2017, 10:23 a.m. UTC | #1
On Tue, Jan 31, 2017 at 04:14:33PM +0000, Harry van Haaren wrote:
> From: Bruce Richardson <bruce.richardson@intel.com>
> 
> Since the sw driver is a standalone lookaside device that has no HW
> requirements, we can provide a set of unit tests that test its
> functionality across the different queue types and with different input
> scenarios.
> 

Thanks for SW driver specific test cases. It provided me a good insight
of expected application behavior from SW driver perspective and in turn it created
some challenge in portable applications.

I would like highlight a main difference between the implementation and get a
consensus on how to abstract it?

Based on existing header file, We can do event pipelining in two different ways
a) Flow-based event pipelining
b) queue_id based event pipelining

I will provide an example to showcase application flow in both modes.
Based on my understanding from SW driver source code, it supports only
queue_id based event pipelining. I guess, Flow based event pipelining will
work semantically with SW driver but it will be very slow.

I think, the reason for the difference is the capability of the context definition.
SW model the context is - queue_id
Cavium HW model the context is queue_id + flow_id + sub_event_type +
event_type

AFAIK, queue_id based event pipelining will work with NXP HW but I am not
sure about flow based event pipelining model with NXP HW. Appreciate any
input this?

In Cavium HW, We support both modes.

As an open question, Should we add a capability flag to advertise the supported
models and let application choose the model based on implementation capability. The
downside is, a small portion of stage advance code will be different but we
can reuse the STAGE specific application code(I think it a fair
trade off)

Bruce, Harry, Gage, Hemant, Nipun
Thoughts? Or any other proposal?

I will take an non trivial realworld NW use case show the difference.
A standard IPSec outbound processing will have minimum 4 to 5 stages

stage_0:
--------
a) Takes the pkts from ethdev and push to eventdev as
RTE_EVENT_OP_NEW
b) Some HW implementation, This will be done by HW. In SW implementation
it done by service cores

stage_1:(ORDERED)
------------------
a) Receive pkts from stage_0 in ORDERED flow and it process in parallel on N
of cores
b) Find a SA belongs that packet move to next stage for SA specific
outbound operations.Outbound processing starts with updating the
sequence number in the critical section and followed by packet encryption in
parallel.

stage_2(ATOMIC) based on SA
----------------------------
a) Update the sequence number and move to ORDERED sched_type for packet
encryption in parallel

stage_3(ORDERED) based on SA
----------------------------
a) Encrypt the packets in parallel
b) Do output route look-up and figure out tx port and queue to transmit
the packet
c) Move to ATOMIC stage based on tx port and tx queue_id to transmit
the packet _without_ losing the ingress ordering

stage_4(ATOMIC) based on tx port/tx queue
-----------------------------------------
a) enqueue the encrypted packet to ethdev tx port/tx_queue


1) queue_id based event pipelining
=================================

stage_1_work(assigned to event queue 1)# N ports/N cores establish
link to queue 1 through rte_event_port_link()

on_each_cores_linked_to_queue1(stage1)
while(1)
{
                /* STAGE 1 processing */
                nr_events = rte_event_dequeue_burst(ev,..);
                if (!nr_events);
                                continue;

                sa = find_sa_from_packet(ev.mbuf);

                /* move to next stage(ATOMIC) */
                ev.event_type = RTE_EVENT_TYPE_CPU;
                ev.sub_event_type = 2;
                ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
                ev.flow_id =  sa;
                ev.op = RTE_EVENT_OP_FORWARD;
                ev.queue_id = 2;
                /* move to stage 2(event queue 2) */
                rte_event_enqueue_burst(ev,..);
}

on_each_cores_linked_to_queue2(stage2)
while(1)
{
                /* STAGE 2 processing */
                nr_events = rte_event_dequeue_burst(ev,..);
                if (!nr_events);
			continue;

                sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq number update in critical section */

                /* move to next stage(ORDERED) */
                ev.event_type = RTE_EVENT_TYPE_CPU;
                ev.sub_event_type = 3;
                ev.sched_type = RTE_SCHED_TYPE_ORDERED;
                ev.flow_id =  sa;
                ev.op = RTE_EVENT_OP_FORWARD;
                ev.queue_id = 3;
                /* move to stage 3(event queue 3) */
                rte_event_enqueue_burst(ev,..);
}

on_each_cores_linked_to_queue3(stage3)
while(1)
{
                /* STAGE 3 processing */
                nr_events = rte_event_dequeue_burst(ev,..);
                if (!nr_events);
			continue;

                sa_specific_ordered_processing(sa /*ev.flow_id */);/* packets encryption in parallel */

                /* move to next stage(ATOMIC) */
                ev.event_type = RTE_EVENT_TYPE_CPU;
                ev.sub_event_type = 4;
                ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
		output_tx_port_queue = find_output_tx_queue_and_tx_port(ev.mbuff);
                ev.flow_id =  output_tx_port_queue;
                ev.op = RTE_EVENT_OP_FORWARD;
                ev.queue_id = 4;
                /* move to stage 4(event queue 4) */
                rte_event_enqueue_burst(ev,...);
}

on_each_cores_linked_to_queue4(stage4)
while(1)
{
                /* STAGE 4 processing */
                nr_events = rte_event_dequeue_burst(ev,..);
                if (!nr_events);
			continue;

		rte_eth_tx_buffer();
}

2) flow-based event pipelining
=============================

- No need to partition queues for different stages
- All the cores can operate on all the stages, Thus enables
automatic multicore scaling, true dynamic load balancing,
- Fairly large number of SA(kind of 2^16 to 2^20) can be processed in parallel
Something existing IPSec application has constraints on
http://dpdk.org/doc/guides-16.04/sample_app_ug/ipsec_secgw.html

on_each_worker_cores()
while(1)
{
	rte_event_dequeue_burst(ev,..)
	if (!nr_events);
		continue;

	/* STAGE 1 processing */
	if(ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
		sa = find_it_from_packet(ev.mbuf);
		/* move to next stage2(ATOMIC) */
		ev.event_type = RTE_EVENT_TYPE_CPU;
		ev.sub_event_type = 2;
		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
		ev.flow_id =  sa;
		ev.op = RTE_EVENT_OP_FORWARD;
		rte_event_enqueue_burst(ev..);

	} else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == 2) { /* stage 2 */

		sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq number update in critical section */
		/* move to next stage(ORDERED) */
		ev.event_type = RTE_EVENT_TYPE_CPU;
		ev.sub_event_type = 3;
		ev.sched_type = RTE_SCHED_TYPE_ORDERED;
		ev.flow_id =  sa;
		ev.op = RTE_EVENT_OP_FORWARD;
		rte_event_enqueue_burst(ev,..);

	} else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == 3) { /* stage 3 */

		sa_specific_ordered_processing(sa /* ev.flow_id */);/* like encrypting packets in parallel */
		/* move to next stage(ATOMIC) */
		ev.event_type = RTE_EVENT_TYPE_CPU;
		ev.sub_event_type = 4;
		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
		output_tx_port_queue = find_output_tx_queue_and_tx_port(ev.mbuff);
		ev.flow_id =  output_tx_port_queue;
		ev.op = RTE_EVENT_OP_FORWARD;
		rte_event_enqueue_burst(ev,..);

	} else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == 4) { /* stage 4 */
		rte_eth_tx_buffer();
	}
}

/Jerin
Cavium
  
Van Haaren, Harry Feb. 8, 2017, 10:44 a.m. UTC | #2
> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, February 8, 2017 10:23 AM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David
> <david.hunt@intel.com>; nipun.gupta@nxp.com; hemant.agrawal@nxp.com; Eads, Gage
> <gage.eads@intel.com>
> Subject: Re: [PATCH v2 15/15] app/test: add unit tests for SW eventdev driver

<snip>
 
> Thanks for SW driver specific test cases. It provided me a good insight
> of expected application behavior from SW driver perspective and in turn it created
> some challenge in portable applications.
> 
> I would like highlight a main difference between the implementation and get a
> consensus on how to abstract it?

Thanks for taking the time to detail your thoughts - the examples certainly help to get a better picture of the whole.


> Based on existing header file, We can do event pipelining in two different ways
> a) Flow-based event pipelining
> b) queue_id based event pipelining
> 
> I will provide an example to showcase application flow in both modes.
> Based on my understanding from SW driver source code, it supports only
> queue_id based event pipelining. I guess, Flow based event pipelining will
> work semantically with SW driver but it will be very slow.
> 
> I think, the reason for the difference is the capability of the context definition.
> SW model the context is - queue_id
> Cavium HW model the context is queue_id + flow_id + sub_event_type +
> event_type
> 
> AFAIK, queue_id based event pipelining will work with NXP HW but I am not
> sure about flow based event pipelining model with NXP HW. Appreciate any
> input this?
> 
> In Cavium HW, We support both modes.
> 
> As an open question, Should we add a capability flag to advertise the supported
> models and let application choose the model based on implementation capability. The
> downside is, a small portion of stage advance code will be different but we
> can reuse the STAGE specific application code(I think it a fair
> trade off)
> 
> Bruce, Harry, Gage, Hemant, Nipun
> Thoughts? Or any other proposal?


[HvH] Comments inline.

 
> I will take an non trivial realworld NW use case show the difference.
> A standard IPSec outbound processing will have minimum 4 to 5 stages
> 
> stage_0:
> --------
> a) Takes the pkts from ethdev and push to eventdev as
> RTE_EVENT_OP_NEW
> b) Some HW implementation, This will be done by HW. In SW implementation
> it done by service cores
> 
> stage_1:(ORDERED)
> ------------------
> a) Receive pkts from stage_0 in ORDERED flow and it process in parallel on N
> of cores
> b) Find a SA belongs that packet move to next stage for SA specific
> outbound operations.Outbound processing starts with updating the
> sequence number in the critical section and followed by packet encryption in
> parallel.
> 
> stage_2(ATOMIC) based on SA
> ----------------------------
> a) Update the sequence number and move to ORDERED sched_type for packet
> encryption in parallel
> 
> stage_3(ORDERED) based on SA
> ----------------------------
> a) Encrypt the packets in parallel
> b) Do output route look-up and figure out tx port and queue to transmit
> the packet
> c) Move to ATOMIC stage based on tx port and tx queue_id to transmit
> the packet _without_ losing the ingress ordering
> 
> stage_4(ATOMIC) based on tx port/tx queue
> -----------------------------------------
> a) enqueue the encrypted packet to ethdev tx port/tx_queue
>
>
> 1) queue_id based event pipelining
> =================================
> 
> stage_1_work(assigned to event queue 1)# N ports/N cores establish
> link to queue 1 through rte_event_port_link()
> 
> on_each_cores_linked_to_queue1(stage1)


[HvH] All worker cores can be linked to all stages - we do a lookup of what stage the work is based on the event->queue_id.


> while(1)
> {
>                 /* STAGE 1 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
>                                 continue;
> 
>                 sa = find_sa_from_packet(ev.mbuf);
> 
>                 /* move to next stage(ATOMIC) */
>                 ev.event_type = RTE_EVENT_TYPE_CPU;
>                 ev.sub_event_type = 2;
>                 ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
>                 ev.flow_id =  sa;
>                 ev.op = RTE_EVENT_OP_FORWARD;
>                 ev.queue_id = 2;
>                 /* move to stage 2(event queue 2) */
>                 rte_event_enqueue_burst(ev,..);
> }
> 
> on_each_cores_linked_to_queue2(stage2)
> while(1)
> {
>                 /* STAGE 2 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
> 			continue;
> 
>                 sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq number update in
> critical section */
> 
>                 /* move to next stage(ORDERED) */
>                 ev.event_type = RTE_EVENT_TYPE_CPU;
>                 ev.sub_event_type = 3;
>                 ev.sched_type = RTE_SCHED_TYPE_ORDERED;
>                 ev.flow_id =  sa;
>                 ev.op = RTE_EVENT_OP_FORWARD;
>                 ev.queue_id = 3;
>                 /* move to stage 3(event queue 3) */
>                 rte_event_enqueue_burst(ev,..);
> }
> 
> on_each_cores_linked_to_queue3(stage3)
> while(1)
> {
>                 /* STAGE 3 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
> 			continue;
> 
>                 sa_specific_ordered_processing(sa /*ev.flow_id */);/* packets encryption in
> parallel */
> 
>                 /* move to next stage(ATOMIC) */
>                 ev.event_type = RTE_EVENT_TYPE_CPU;
>                 ev.sub_event_type = 4;
>                 ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> 		output_tx_port_queue = find_output_tx_queue_and_tx_port(ev.mbuff);
>                 ev.flow_id =  output_tx_port_queue;
>                 ev.op = RTE_EVENT_OP_FORWARD;
>                 ev.queue_id = 4;
>                 /* move to stage 4(event queue 4) */
>                 rte_event_enqueue_burst(ev,...);
> }
> 
> on_each_cores_linked_to_queue4(stage4)
> while(1)
> {
>                 /* STAGE 4 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
> 			continue;
> 
> 		rte_eth_tx_buffer();
> }
> 
> 2) flow-based event pipelining
> =============================
> 
> - No need to partition queues for different stages
> - All the cores can operate on all the stages, Thus enables
> automatic multicore scaling, true dynamic load balancing,


[HvH] The sw case is the same - all cores can map to all stages, the lookup for stage of work is the queue_id.


> - Fairly large number of SA(kind of 2^16 to 2^20) can be processed in parallel
> Something existing IPSec application has constraints on
> http://dpdk.org/doc/guides-16.04/sample_app_ug/ipsec_secgw.html
> 
> on_each_worker_cores()
> while(1)
> {
> 	rte_event_dequeue_burst(ev,..)
> 	if (!nr_events);
> 		continue;
> 
> 	/* STAGE 1 processing */
> 	if(ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
> 		sa = find_it_from_packet(ev.mbuf);
> 		/* move to next stage2(ATOMIC) */
> 		ev.event_type = RTE_EVENT_TYPE_CPU;
> 		ev.sub_event_type = 2;
> 		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> 		ev.flow_id =  sa;
> 		ev.op = RTE_EVENT_OP_FORWARD;
> 		rte_event_enqueue_burst(ev..);
> 
> 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == 2) { /* stage 2 */


[HvH] In the case of software eventdev ev.queue_id is used instead of ev.sub_event_type - but this is the same lookup operation as mentioned above. I don't see a fundamental difference between these approaches?

> 
> 		sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq number update in critical
> section */
> 		/* move to next stage(ORDERED) */
> 		ev.event_type = RTE_EVENT_TYPE_CPU;
> 		ev.sub_event_type = 3;
> 		ev.sched_type = RTE_SCHED_TYPE_ORDERED;
> 		ev.flow_id =  sa;
> 		ev.op = RTE_EVENT_OP_FORWARD;
> 		rte_event_enqueue_burst(ev,..);
> 
> 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == 3) { /* stage 3 */
> 
> 		sa_specific_ordered_processing(sa /* ev.flow_id */);/* like encrypting packets in
> parallel */
> 		/* move to next stage(ATOMIC) */
> 		ev.event_type = RTE_EVENT_TYPE_CPU;
> 		ev.sub_event_type = 4;
> 		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> 		output_tx_port_queue = find_output_tx_queue_and_tx_port(ev.mbuff);
> 		ev.flow_id =  output_tx_port_queue;
> 		ev.op = RTE_EVENT_OP_FORWARD;
> 		rte_event_enqueue_burst(ev,..);
> 
> 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU && ev.sub_event_type == 4) { /* stage 4 */
> 		rte_eth_tx_buffer();
> 	}
> }
> 
> /Jerin
> Cavium
  
Nipun Gupta Feb. 8, 2017, 6:02 p.m. UTC | #3
> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, February 08, 2017 15:53
> To: Harry van Haaren <harry.van.haaren@intel.com>
> Cc: dev@dpdk.org; Bruce Richardson <bruce.richardson@intel.com>; David
> Hunt <david.hunt@intel.com>; Nipun Gupta <nipun.gupta@nxp.com>; Hemant
> Agrawal <hemant.agrawal@nxp.com>; gage.eads@intel.com
> Subject: Re: [PATCH v2 15/15] app/test: add unit tests for SW eventdev driver
> 
> On Tue, Jan 31, 2017 at 04:14:33PM +0000, Harry van Haaren wrote:
> > From: Bruce Richardson <bruce.richardson@intel.com>
> >
> > Since the sw driver is a standalone lookaside device that has no HW
> > requirements, we can provide a set of unit tests that test its
> > functionality across the different queue types and with different input
> > scenarios.
> >
> 
> Thanks for SW driver specific test cases. It provided me a good insight
> of expected application behavior from SW driver perspective and in turn it
> created
> some challenge in portable applications.
> 
> I would like highlight a main difference between the implementation and get a
> consensus on how to abstract it?
> 
> Based on existing header file, We can do event pipelining in two different ways
> a) Flow-based event pipelining
> b) queue_id based event pipelining
> 
> I will provide an example to showcase application flow in both modes.
> Based on my understanding from SW driver source code, it supports only
> queue_id based event pipelining. I guess, Flow based event pipelining will
> work semantically with SW driver but it will be very slow.
> 
> I think, the reason for the difference is the capability of the context definition.
> SW model the context is - queue_id
> Cavium HW model the context is queue_id + flow_id + sub_event_type +
> event_type
> 
> AFAIK, queue_id based event pipelining will work with NXP HW but I am not
> sure about flow based event pipelining model with NXP HW. Appreciate any
> input this?

[Nipun] Yes Jerin, that's right. NXP HW will not be suitable for flow based event pipelining.

> 
> In Cavium HW, We support both modes.
> 
> As an open question, Should we add a capability flag to advertise the supported
> models and let application choose the model based on implementation
> capability. The
> downside is, a small portion of stage advance code will be different but we
> can reuse the STAGE specific application code(I think it a fair
> trade off)
> 
> Bruce, Harry, Gage, Hemant, Nipun
> Thoughts? Or any other proposal?
> 
> I will take an non trivial realworld NW use case show the difference.
> A standard IPSec outbound processing will have minimum 4 to 5 stages
> 
> stage_0:
> --------
> a) Takes the pkts from ethdev and push to eventdev as
> RTE_EVENT_OP_NEW
> b) Some HW implementation, This will be done by HW. In SW implementation
> it done by service cores
> 
> stage_1:(ORDERED)
> ------------------
> a) Receive pkts from stage_0 in ORDERED flow and it process in parallel on N
> of cores
> b) Find a SA belongs that packet move to next stage for SA specific
> outbound operations.Outbound processing starts with updating the
> sequence number in the critical section and followed by packet encryption in
> parallel.
> 
> stage_2(ATOMIC) based on SA
> ----------------------------
> a) Update the sequence number and move to ORDERED sched_type for packet
> encryption in parallel
> 
> stage_3(ORDERED) based on SA
> ----------------------------
> a) Encrypt the packets in parallel
> b) Do output route look-up and figure out tx port and queue to transmit
> the packet
> c) Move to ATOMIC stage based on tx port and tx queue_id to transmit
> the packet _without_ losing the ingress ordering
> 
> stage_4(ATOMIC) based on tx port/tx queue
> -----------------------------------------
> a) enqueue the encrypted packet to ethdev tx port/tx_queue
> 
> 
> 1) queue_id based event pipelining
> =================================
> 
> stage_1_work(assigned to event queue 1)# N ports/N cores establish
> link to queue 1 through rte_event_port_link()
> 
> on_each_cores_linked_to_queue1(stage1)
> while(1)
> {
>                 /* STAGE 1 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
>                                 continue;
> 
>                 sa = find_sa_from_packet(ev.mbuf);
> 
>                 /* move to next stage(ATOMIC) */
>                 ev.event_type = RTE_EVENT_TYPE_CPU;
>                 ev.sub_event_type = 2;
>                 ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
>                 ev.flow_id =  sa;
>                 ev.op = RTE_EVENT_OP_FORWARD;
>                 ev.queue_id = 2;
>                 /* move to stage 2(event queue 2) */
>                 rte_event_enqueue_burst(ev,..);
> }
> 
> on_each_cores_linked_to_queue2(stage2)
> while(1)
> {
>                 /* STAGE 2 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
> 			continue;
> 
>                 sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq number
> update in critical section */
> 
>                 /* move to next stage(ORDERED) */
>                 ev.event_type = RTE_EVENT_TYPE_CPU;
>                 ev.sub_event_type = 3;
>                 ev.sched_type = RTE_SCHED_TYPE_ORDERED;
>                 ev.flow_id =  sa;

[Nipun] Queue1 has flow_id as an 'sa' with sched_type as RTE_SCHED_TYPE_ATOMIC and
Queue2 has same flow_id but with sched_type as RTE_SCHED_TYPE_ORDERED.
Does this mean that same flow_id be associated with separate RTE_SCHED_TYPE_* as sched_type?

My understanding is that one flow can either be parallel or atomic or ordered.
The rte_eventdev.h states that sched_type is associated with flow_id, which also seems legitimate:
		uint8_t sched_type:2;
		/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
		 * associated with flow id on a given event queue
		 * for the enqueue and dequeue operation.
		 */

>                 ev.op = RTE_EVENT_OP_FORWARD;
>                 ev.queue_id = 3;
>                 /* move to stage 3(event queue 3) */
>                 rte_event_enqueue_burst(ev,..);
> }
> 
> on_each_cores_linked_to_queue3(stage3)
> while(1)
> {
>                 /* STAGE 3 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
> 			continue;
> 
>                 sa_specific_ordered_processing(sa /*ev.flow_id */);/* packets
> encryption in parallel */
> 
>                 /* move to next stage(ATOMIC) */
>                 ev.event_type = RTE_EVENT_TYPE_CPU;
>                 ev.sub_event_type = 4;
>                 ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> 		output_tx_port_queue =
> find_output_tx_queue_and_tx_port(ev.mbuff);
>                 ev.flow_id =  output_tx_port_queue;
>                 ev.op = RTE_EVENT_OP_FORWARD;
>                 ev.queue_id = 4;
>                 /* move to stage 4(event queue 4) */
>                 rte_event_enqueue_burst(ev,...);
> }
> 
> on_each_cores_linked_to_queue4(stage4)
> while(1)
> {
>                 /* STAGE 4 processing */
>                 nr_events = rte_event_dequeue_burst(ev,..);
>                 if (!nr_events);
> 			continue;
> 
> 		rte_eth_tx_buffer();
> }
> 
> 2) flow-based event pipelining
> =============================
> 
> - No need to partition queues for different stages
> - All the cores can operate on all the stages, Thus enables
> automatic multicore scaling, true dynamic load balancing,
> - Fairly large number of SA(kind of 2^16 to 2^20) can be processed in parallel
> Something existing IPSec application has constraints on
> http://dpdk.org/doc/guides-16.04/sample_app_ug/ipsec_secgw.html
> 
> on_each_worker_cores()
> while(1)
> {
> 	rte_event_dequeue_burst(ev,..)
> 	if (!nr_events);
> 		continue;
> 
> 	/* STAGE 1 processing */
> 	if(ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
> 		sa = find_it_from_packet(ev.mbuf);
> 		/* move to next stage2(ATOMIC) */
> 		ev.event_type = RTE_EVENT_TYPE_CPU;
> 		ev.sub_event_type = 2;
> 		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> 		ev.flow_id =  sa;
> 		ev.op = RTE_EVENT_OP_FORWARD;
> 		rte_event_enqueue_burst(ev..);
> 
> 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU &&
> ev.sub_event_type == 2) { /* stage 2 */

[Nipun] I didn't got that in this case on which event queue (and eventually
its associated event ports) will the RTE_EVENT_TYPE_CPU type events be received on?

Adding on to what Harry also mentions in other mail, If same code is run in the case you
mentioned in '#1 - queue_id based event pipelining', after specifying the ev.queue_id
with appropriate value then also #1 would be good. Isn't it?

> 
> 		sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq
> number update in critical section */
> 		/* move to next stage(ORDERED) */
> 		ev.event_type = RTE_EVENT_TYPE_CPU;
> 		ev.sub_event_type = 3;
> 		ev.sched_type = RTE_SCHED_TYPE_ORDERED;
> 		ev.flow_id =  sa;
> 		ev.op = RTE_EVENT_OP_FORWARD;
> 		rte_event_enqueue_burst(ev,..);
> 
> 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU &&
> ev.sub_event_type == 3) { /* stage 3 */
> 
> 		sa_specific_ordered_processing(sa /* ev.flow_id */);/* like
> encrypting packets in parallel */
> 		/* move to next stage(ATOMIC) */
> 		ev.event_type = RTE_EVENT_TYPE_CPU;
> 		ev.sub_event_type = 4;
> 		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> 		output_tx_port_queue =
> find_output_tx_queue_and_tx_port(ev.mbuff);
> 		ev.flow_id =  output_tx_port_queue;
> 		ev.op = RTE_EVENT_OP_FORWARD;
> 		rte_event_enqueue_burst(ev,..);
> 
> 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU &&
> ev.sub_event_type == 4) { /* stage 4 */
> 		rte_eth_tx_buffer();
> 	}
> }
> 
> /Jerin
> Cavium
  
Jerin Jacob Feb. 11, 2017, 9:13 a.m. UTC | #4
On Tue, Jan 31, 2017 at 04:14:33PM +0000, Harry van Haaren wrote:
> From: Bruce Richardson <bruce.richardson@intel.com>
> 
> Since the sw driver is a standalone lookaside device that has no HW
> requirements, we can provide a set of unit tests that test its
> functionality across the different queue types and with different input
> scenarios.
> 
> This also adds the tests to be automatically run by autotest.py
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> ---
>  app/test/Makefile           |    5 +-
>  app/test/autotest_data.py   |   26 +
>  app/test/test_sw_eventdev.c | 2071 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 2101 insertions(+), 1 deletion(-)
>  create mode 100644 app/test/test_sw_eventdev.c
> 
> diff --git a/app/test/Makefile b/app/test/Makefile
> index e28c079..1770c09 100644
> --- a/app/test/Makefile
> +++ b/app/test/Makefile
> @@ -197,7 +197,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_blockcipher.c
>  SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_perf.c
>  SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c
>  
> -SRCS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += test_eventdev.c
> +ifeq ($(CONFIG_RTE_LIBRTE_EVENTDEV),y)
> +SRCS-y += test_eventdev.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += test_sw_eventdev.c

IMO, In order to make symmetrical with cryptodev test cases, We need to change
the name to test_eventdev_sw.c

Another open question is, Should we allow driver specific test cases?
I think, It is OK till we have mature generic unit test cases. Once we
have driver specific test case from multiple implementations then we can think of
converging all. What you think?

> +endif
>  
>  SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
  
Jerin Jacob Feb. 13, 2017, 11:37 a.m. UTC | #5
On Wed, Feb 08, 2017 at 06:02:26PM +0000, Nipun Gupta wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, February 08, 2017 15:53
> > To: Harry van Haaren <harry.van.haaren@intel.com>
> > Cc: dev@dpdk.org; Bruce Richardson <bruce.richardson@intel.com>; David
> > Hunt <david.hunt@intel.com>; Nipun Gupta <nipun.gupta@nxp.com>; Hemant
> > }
> > 
> > on_each_cores_linked_to_queue2(stage2)
> > while(1)
> > {
> >                 /* STAGE 2 processing */
> >                 nr_events = rte_event_dequeue_burst(ev,..);
> >                 if (!nr_events);
> > 			continue;
> > 
> >                 sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq number
> > update in critical section */
> > 
> >                 /* move to next stage(ORDERED) */
> >                 ev.event_type = RTE_EVENT_TYPE_CPU;
> >                 ev.sub_event_type = 3;
> >                 ev.sched_type = RTE_SCHED_TYPE_ORDERED;
> >                 ev.flow_id =  sa;
> 
> [Nipun] Queue1 has flow_id as an 'sa' with sched_type as RTE_SCHED_TYPE_ATOMIC and
> Queue2 has same flow_id but with sched_type as RTE_SCHED_TYPE_ORDERED.
> Does this mean that same flow_id be associated with separate RTE_SCHED_TYPE_* as sched_type?
> 
> My understanding is that one flow can either be parallel or atomic or ordered.
> The rte_eventdev.h states that sched_type is associated with flow_id, which also seems legitimate:

Yes. flow_id per _event queue_.

> 		uint8_t sched_type:2;
> 		/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> 		 * associated with flow id on a given event queue
> 		 * for the enqueue and dequeue operation.
> 		 */
> 
> >                 ev.op = RTE_EVENT_OP_FORWARD;
> >                 ev.queue_id = 3;
> >                 /* move to stage 3(event queue 3) */
> >                 rte_event_enqueue_burst(ev,..);
> > }
> > 
> > on_each_cores_linked_to_queue3(stage3)
> > while(1)
> > {
> >                 /* STAGE 3 processing */
> >                 nr_events = rte_event_dequeue_burst(ev,..);
> >                 if (!nr_events);
> > 			continue;
> > 
> >                 sa_specific_ordered_processing(sa /*ev.flow_id */);/* packets
> > encryption in parallel */
> > 
> >                 /* move to next stage(ATOMIC) */
> >                 ev.event_type = RTE_EVENT_TYPE_CPU;
> >                 ev.sub_event_type = 4;
> >                 ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> > 		output_tx_port_queue =
> > find_output_tx_queue_and_tx_port(ev.mbuff);
> >                 ev.flow_id =  output_tx_port_queue;
> >                 ev.op = RTE_EVENT_OP_FORWARD;
> >                 ev.queue_id = 4;
> >                 /* move to stage 4(event queue 4) */
> >                 rte_event_enqueue_burst(ev,...);
> > }
> > 
> > on_each_cores_linked_to_queue4(stage4)
> > while(1)
> > {
> >                 /* STAGE 4 processing */
> >                 nr_events = rte_event_dequeue_burst(ev,..);
> >                 if (!nr_events);
> > 			continue;
> > 
> > 		rte_eth_tx_buffer();
> > }
> > 
> > 2) flow-based event pipelining
> > =============================
> > 
> > - No need to partition queues for different stages
> > - All the cores can operate on all the stages, Thus enables
> > automatic multicore scaling, true dynamic load balancing,
> > - Fairly large number of SA(kind of 2^16 to 2^20) can be processed in parallel
> > Something existing IPSec application has constraints on
> > http://dpdk.org/doc/guides-16.04/sample_app_ug/ipsec_secgw.html
> > 
> > on_each_worker_cores()
> > while(1)
> > {
> > 	rte_event_dequeue_burst(ev,..)
> > 	if (!nr_events);
> > 		continue;
> > 
> > 	/* STAGE 1 processing */
> > 	if(ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
> > 		sa = find_it_from_packet(ev.mbuf);
> > 		/* move to next stage2(ATOMIC) */
> > 		ev.event_type = RTE_EVENT_TYPE_CPU;
> > 		ev.sub_event_type = 2;
> > 		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> > 		ev.flow_id =  sa;
> > 		ev.op = RTE_EVENT_OP_FORWARD;
> > 		rte_event_enqueue_burst(ev..);
> > 
> > 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU &&
> > ev.sub_event_type == 2) { /* stage 2 */
> 
> [Nipun] I didn't got that in this case on which event queue (and eventually
> its associated event ports) will the RTE_EVENT_TYPE_CPU type events be received on?

Yes. The same queue which received the event.

> 
> Adding on to what Harry also mentions in other mail, If same code is run in the case you
> mentioned in '#1 - queue_id based event pipelining', after specifying the ev.queue_id
> with appropriate value then also #1 would be good. Isn't it?

See my earlier email

> 
> > 
> > 		sa_specific_atomic_processing(sa /* ev.flow_id */);/* seq
> > number update in critical section */
> > 		/* move to next stage(ORDERED) */
> > 		ev.event_type = RTE_EVENT_TYPE_CPU;
> > 		ev.sub_event_type = 3;
> > 		ev.sched_type = RTE_SCHED_TYPE_ORDERED;
> > 		ev.flow_id =  sa;
> > 		ev.op = RTE_EVENT_OP_FORWARD;
> > 		rte_event_enqueue_burst(ev,..);
> > 
> > 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU &&
> > ev.sub_event_type == 3) { /* stage 3 */
> > 
> > 		sa_specific_ordered_processing(sa /* ev.flow_id */);/* like
> > encrypting packets in parallel */
> > 		/* move to next stage(ATOMIC) */
> > 		ev.event_type = RTE_EVENT_TYPE_CPU;
> > 		ev.sub_event_type = 4;
> > 		ev.sched_type = RTE_SCHED_TYPE_ATOMIC;
> > 		output_tx_port_queue =
> > find_output_tx_queue_and_tx_port(ev.mbuff);
> > 		ev.flow_id =  output_tx_port_queue;
> > 		ev.op = RTE_EVENT_OP_FORWARD;
> > 		rte_event_enqueue_burst(ev,..);
> > 
> > 	} else if(ev.event_type == RTE_EVENT_TYPE_CPU &&
> > ev.sub_event_type == 4) { /* stage 4 */
> > 		rte_eth_tx_buffer();
> > 	}
> > }
> > 
> > /Jerin
> > Cavium
>
  

Patch

diff --git a/app/test/Makefile b/app/test/Makefile
index e28c079..1770c09 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -197,7 +197,10 @@  SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_blockcipher.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c
 
-SRCS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += test_eventdev.c
+ifeq ($(CONFIG_RTE_LIBRTE_EVENTDEV),y)
+SRCS-y += test_eventdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += test_sw_eventdev.c
+endif
 
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
 
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 0cd598b..165ed6c 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -346,6 +346,32 @@  def per_sockets(num):
 non_parallel_test_group_list = [
 
     {
+        "Prefix":    "eventdev",
+        "Memory":    "512",
+        "Tests":
+        [
+            {
+                "Name":    "Eventdev common autotest",
+                "Command": "eventdev_common_autotest",
+                "Func":    default_autotest,
+                "Report":  None,
+            },
+        ]
+    },
+    {
+        "Prefix":    "eventdev_sw",
+        "Memory":    "512",
+        "Tests":
+        [
+            {
+                "Name":    "Eventdev sw autotest",
+                "Command": "eventdev_sw_autotest",
+                "Func":    default_autotest,
+                "Report":  None,
+            },
+        ]
+    },
+    {
         "Prefix":    "kni",
         "Memory":    "512",
         "Tests":
diff --git a/app/test/test_sw_eventdev.c b/app/test/test_sw_eventdev.c
new file mode 100644
index 0000000..6322f36
--- /dev/null
+++ b/app/test/test_sw_eventdev.c
@@ -0,0 +1,2071 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/queue.h>
+
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_debug.h>
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+
+#include <rte_eventdev.h>
+#include "test.h"
+
+#define MAX_PORTS 16
+#define MAX_QIDS 16
+#define NUM_PACKETS (1<<18)
+
+static int evdev;
+
+struct test {
+	struct rte_mempool *mbuf_pool;
+	uint8_t port[MAX_PORTS];
+	uint8_t qid[MAX_QIDS];
+	int nb_qids;
+};
+
+static struct rte_event release_ev = {.op = RTE_EVENT_OP_RELEASE };
+
+static inline struct rte_mbuf *
+rte_gen_arp(int portid, struct rte_mempool *mp)
+{
+	/*
+	* len = 14 + 46
+	* ARP, Request who-has 10.0.0.1 tell 10.0.0.2, length 46
+	*/
+	static const uint8_t arp_request[] = {
+		/*0x0000:*/ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xec, 0xa8,
+		0x6b, 0xfd, 0x02, 0x29, 0x08, 0x06, 0x00, 0x01,
+		/*0x0010:*/ 0x08, 0x00, 0x06, 0x04, 0x00, 0x01, 0xec, 0xa8,
+		0x6b, 0xfd, 0x02, 0x29, 0x0a, 0x00, 0x00, 0x01,
+		/*0x0020:*/ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x00,
+		0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		/*0x0030:*/ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		0x00, 0x00, 0x00, 0x00
+	};
+	struct rte_mbuf *m;
+	int pkt_len = sizeof(arp_request) - 1;
+
+	m = rte_pktmbuf_alloc(mp);
+	if (!m)
+		return 0;
+
+	memcpy((void *)((uintptr_t)m->buf_addr + m->data_off),
+		arp_request, pkt_len);
+	rte_pktmbuf_pkt_len(m) = pkt_len;
+	rte_pktmbuf_data_len(m) = pkt_len;
+
+	RTE_SET_USED(portid);
+
+	return m;
+}
+
+/* initialization and config */
+static inline int
+init(struct test *t, int nb_queues, int nb_ports)
+{
+	struct rte_event_dev_config config = {
+			.nb_event_queues = nb_queues,
+			.nb_event_ports = nb_ports,
+			.nb_event_queue_flows = 1024,
+			.nb_events_limit = 4096,
+			.nb_event_port_dequeue_depth = 128,
+			.nb_event_port_enqueue_depth = 128,
+	};
+	int ret;
+
+	void *temp = t->mbuf_pool; /* save and restore mbuf pool */
+
+	memset(t, 0, sizeof(*t));
+	t->mbuf_pool = temp;
+
+	ret = rte_event_dev_configure(evdev, &config);
+	if (ret < 0)
+		printf("%d: Error configuring device\n", __LINE__);
+	return ret;
+};
+
+static inline int
+create_ports(struct test *t, int num_ports)
+{
+	int i;
+	static const struct rte_event_port_conf conf = {
+			.new_event_threshold = 1024,
+			.dequeue_depth = 32,
+			.enqueue_depth = 64,
+	};
+	if (num_ports > MAX_PORTS)
+		return -1;
+
+	for (i = 0; i < num_ports; i++) {
+		if (rte_event_port_setup(evdev, i, &conf) < 0) {
+			printf("Error setting up port %d\n", i);
+			return -1;
+		}
+		t->port[i] = i;
+	}
+
+	return 0;
+}
+
+static inline int
+create_lb_qids(struct test *t, int num_qids, uint32_t flags)
+{
+	int i;
+
+	/* Q creation */
+	const struct rte_event_queue_conf conf = {
+			.event_queue_cfg = flags,
+			.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
+			.nb_atomic_flows = 1024,
+			.nb_atomic_order_sequences = 1024,
+	};
+
+	for (i = t->nb_qids; i < t->nb_qids + num_qids; i++) {
+		if (rte_event_queue_setup(evdev, i, &conf) < 0) {
+			printf("%d: error creating qid %d\n", __LINE__, i);
+			return -1;
+		}
+		t->qid[i] = i;
+	}
+	t->nb_qids += num_qids;
+	if (t->nb_qids > MAX_QIDS)
+		return -1;
+
+	return 0;
+}
+
+static inline int
+create_atomic_qids(struct test *t, int num_qids)
+{
+	return create_lb_qids(t, num_qids, RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY);
+}
+
+static inline int
+create_ordered_qids(struct test *t, int num_qids)
+{
+	return create_lb_qids(t, num_qids, RTE_EVENT_QUEUE_CFG_ORDERED_ONLY);
+}
+
+
+static inline int
+create_unordered_qids(struct test *t, int num_qids)
+{
+	return create_lb_qids(t, num_qids, RTE_EVENT_QUEUE_CFG_PARALLEL_ONLY);
+}
+
+static inline int
+create_directed_qids(struct test *t, int num_qids, const uint8_t ports[])
+{
+	int i;
+
+	/* Q creation */
+	static const struct rte_event_queue_conf conf = {
+			.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
+			.event_queue_cfg = RTE_EVENT_QUEUE_CFG_SINGLE_LINK,
+			.nb_atomic_flows = 1024,
+			.nb_atomic_order_sequences = 1024,
+	};
+
+	for (i = t->nb_qids; i < t->nb_qids + num_qids; i++) {
+		if (rte_event_queue_setup(evdev, i, &conf) < 0) {
+			printf("%d: error creating qid %d\n", __LINE__, i);
+			return -1;
+		}
+		t->qid[i] = i;
+
+		if (rte_event_port_link(evdev, ports[i - t->nb_qids],
+				&t->qid[i], NULL, 1) != 1) {
+			printf("%d: error creating link for qid %d\n",
+					__LINE__, i);
+			return -1;
+		}
+	}
+	t->nb_qids += num_qids;
+	if (t->nb_qids > MAX_QIDS)
+		return -1;
+
+	return 0;
+}
+
+/* destruction */
+static inline int
+cleanup(struct test *t __rte_unused)
+{
+	rte_event_dev_stop(evdev);
+	rte_event_dev_close(evdev);
+	return 0;
+};
+
+struct test_event_dev_stats {
+	uint64_t rx_pkts;       /**< Total packets received */
+	uint64_t rx_dropped;    /**< Total packets dropped (Eg Invalid QID) */
+	uint64_t tx_pkts;       /**< Total packets transmitted */
+
+	/** Packets received on this port */
+	uint64_t port_rx_pkts[MAX_PORTS];
+	/** Packets dropped on this port */
+	uint64_t port_rx_dropped[MAX_PORTS];
+	/** Packets inflight on this port */
+	uint64_t port_inflight[MAX_PORTS];
+	/** Packets transmitted on this port */
+	uint64_t port_tx_pkts[MAX_PORTS];
+	/** Packets received on this qid */
+	uint64_t qid_rx_pkts[MAX_QIDS];
+	/** Packets dropped on this qid */
+	uint64_t qid_rx_dropped[MAX_QIDS];
+	/** Packets transmitted on this qid */
+	uint64_t qid_tx_pkts[MAX_QIDS];
+};
+
+static inline int
+test_event_dev_stats_get(int dev_id, struct test_event_dev_stats *stats)
+{
+	static uint32_t i;
+	static uint32_t total_ids[3]; /* rx, tx and drop */
+	static uint32_t port_rx_pkts_ids[MAX_PORTS];
+	static uint32_t port_rx_dropped_ids[MAX_PORTS];
+	static uint32_t port_inflight_ids[MAX_PORTS];
+	static uint32_t port_tx_pkts_ids[MAX_PORTS];
+	static uint32_t qid_rx_pkts_ids[MAX_QIDS];
+	static uint32_t qid_rx_dropped_ids[MAX_QIDS];
+	static uint32_t qid_tx_pkts_ids[MAX_QIDS];
+
+
+	stats->rx_pkts = rte_event_dev_xstats_by_name_get(dev_id,
+			"dev_rx", &total_ids[0]);
+	stats->rx_dropped = rte_event_dev_xstats_by_name_get(dev_id,
+			"dev_drop", &total_ids[1]);
+	stats->tx_pkts = rte_event_dev_xstats_by_name_get(dev_id,
+			"dev_tx", &total_ids[2]);
+	for (i = 0; i < MAX_PORTS; i++) {
+		char name[32];
+		snprintf(name, sizeof(name), "port_%u_rx", i);
+		stats->port_rx_pkts[i] = rte_event_dev_xstats_by_name_get(
+				dev_id, name, &port_rx_pkts_ids[i]);
+		snprintf(name, sizeof(name), "port_%u_drop", i);
+		stats->port_rx_dropped[i] = rte_event_dev_xstats_by_name_get(
+				dev_id, name, &port_rx_dropped_ids[i]);
+		snprintf(name, sizeof(name), "port_%u_inflight", i);
+		stats->port_inflight[i] = rte_event_dev_xstats_by_name_get(
+				dev_id, name, &port_inflight_ids[i]);
+		snprintf(name, sizeof(name), "port_%u_tx", i);
+		stats->port_tx_pkts[i] = rte_event_dev_xstats_by_name_get(
+				dev_id, name, &port_tx_pkts_ids[i]);
+	}
+	for (i = 0; i < MAX_QIDS; i++) {
+		char name[32];
+		snprintf(name, sizeof(name), "qid_%u_rx", i);
+		stats->qid_rx_pkts[i] = rte_event_dev_xstats_by_name_get(
+				dev_id, name, &qid_rx_pkts_ids[i]);
+		snprintf(name, sizeof(name), "qid_%u_drop", i);
+		stats->qid_rx_dropped[i] = rte_event_dev_xstats_by_name_get(
+				dev_id, name, &qid_rx_dropped_ids[i]);
+		snprintf(name, sizeof(name), "qid_%u_tx", i);
+		stats->qid_tx_pkts[i] = rte_event_dev_xstats_by_name_get(
+				dev_id, name, &qid_tx_pkts_ids[i]);
+	}
+
+	return 0;
+}
+
+/* run_prio_packet_test
+ * This performs a basic packet priority check on the test instance passed in.
+ * It is factored out of the main priority tests as the same tests must be
+ * performed to ensure prioritization of each type of QID.
+ *
+ * Requirements:
+ *  - An initialized test structure, including mempool
+ *  - t->port[0] is initialized for both Enq / Deq of packets to the QID
+ *  - t->qid[0] is the QID to be tested
+ *  - if LB QID, the CQ must be mapped to the QID.
+ */
+static int
+run_prio_packet_test(struct test *t)
+{
+	int err;
+	const uint32_t MAGIC_SEQN[] = {4711, 1234};
+	const uint32_t PRIORITY[] = {3, 0};
+	unsigned int i;
+	for (i = 0; i < RTE_DIM(MAGIC_SEQN); i++) {
+		/* generate pkt and enqueue */
+		struct rte_event ev;
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			return -1;
+		}
+		arp->seqn = MAGIC_SEQN[i];
+
+		ev = (struct rte_event){
+			.priority = PRIORITY[i],
+			.op = RTE_EVENT_OP_NEW,
+			.queue_id = t->qid[0],
+			.mbuf = arp
+		};
+		err = rte_event_enqueue_burst(evdev, t->port[0], &ev, 1);
+		if (err < 0) {
+			printf("%d: error failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+
+	rte_event_schedule(evdev);
+
+	struct test_event_dev_stats stats;
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: error failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	if (stats.port_rx_pkts[t->port[0]] != 2) {
+		printf("%d: error stats incorrect for directed port\n",
+				__LINE__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+
+	struct rte_event ev, ev2;
+	uint32_t deq_pkts;
+	deq_pkts = rte_event_dequeue_burst(evdev, t->port[0], &ev, 1, 0);
+	if (deq_pkts != 1) {
+		printf("%d: error failed to deq\n", __LINE__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+	if (ev.mbuf->seqn != MAGIC_SEQN[1]) {
+		printf("%d: first packet out not highest priority\n",
+				__LINE__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+	rte_pktmbuf_free(ev.mbuf);
+
+	deq_pkts = rte_event_dequeue_burst(evdev, t->port[0], &ev2, 1, 0);
+	if (deq_pkts != 1) {
+		printf("%d: error failed to deq\n", __LINE__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+	if (ev2.mbuf->seqn != MAGIC_SEQN[0]) {
+		printf("%d: second packet out not lower priority\n",
+				__LINE__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+	rte_pktmbuf_free(ev2.mbuf);
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+test_single_directed_packet(struct test *t)
+{
+	const int rx_enq = 0;
+	const int wrk_enq = 2;
+	int err;
+
+	/* Create instance with 3 directed QIDs going to 3 ports */
+	if (init(t, 3, 3) < 0 ||
+			create_ports(t, 3) < 0 ||
+			create_directed_qids(t, 3, t->port) < 0)
+		return -1;
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/************** FORWARD ****************/
+	struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+	struct rte_event ev = {
+			.op = RTE_EVENT_OP_NEW,
+			.queue_id = wrk_enq,
+			.mbuf = arp,
+	};
+
+	if (!arp) {
+		printf("%d: gen of pkt failed\n", __LINE__);
+		return -1;
+	}
+
+	const uint32_t MAGIC_SEQN = 4711;
+	arp->seqn = MAGIC_SEQN;
+
+	/* generate pkt and enqueue */
+	err = rte_event_enqueue_burst(evdev, rx_enq, &ev, 1);
+	if (err < 0) {
+		printf("%d: error failed to enqueue\n", __LINE__);
+		return -1;
+	}
+
+	/* Run schedule() as dir packets may need to be re-ordered */
+	rte_event_schedule(evdev);
+
+	struct test_event_dev_stats stats;
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: error failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	if (stats.port_rx_pkts[rx_enq] != 1) {
+		printf("%d: error stats incorrect for directed port\n",
+				__LINE__);
+		return -1;
+	}
+
+	uint32_t deq_pkts;
+	deq_pkts = rte_event_dequeue_burst(evdev, wrk_enq, &ev, 1, 0);
+	if (deq_pkts != 1) {
+		printf("%d: error failed to deq\n", __LINE__);
+		return -1;
+	}
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (stats.port_rx_pkts[wrk_enq] != 0 &&
+			stats.port_rx_pkts[wrk_enq] != 1) {
+		printf("%d: error directed stats post-dequeue\n", __LINE__);
+		return -1;
+	}
+
+	if (ev.mbuf->seqn != MAGIC_SEQN) {
+		printf("%d: error magic sequence number not dequeued\n",
+				__LINE__);
+		return -1;
+	}
+
+	rte_pktmbuf_free(ev.mbuf);
+	cleanup(t);
+	return 0;
+}
+
+
+static int
+test_priority_directed(struct test *t)
+{
+	if (init(t, 1, 1) < 0 ||
+			create_ports(t, 1) < 0 ||
+			create_directed_qids(t, 1, t->port) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	return run_prio_packet_test(t);
+}
+
+static int
+test_priority_atomic(struct test *t)
+{
+	if (init(t, 1, 1) < 0 ||
+			create_ports(t, 1) < 0 ||
+			create_atomic_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* map the QID */
+	if (rte_event_port_link(evdev, t->port[0], &t->qid[0], NULL, 1) != 1) {
+		printf("%d: error mapping qid to port\n", __LINE__);
+		return -1;
+	}
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	return run_prio_packet_test(t);
+}
+
+static int
+test_priority_ordered(struct test *t)
+{
+	if (init(t, 1, 1) < 0 ||
+			create_ports(t, 1) < 0 ||
+			create_ordered_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* map the QID */
+	if (rte_event_port_link(evdev, t->port[0], &t->qid[0], NULL, 1) != 1) {
+		printf("%d: error mapping qid to port\n", __LINE__);
+		return -1;
+	}
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	return run_prio_packet_test(t);
+}
+
+static int
+test_priority_unordered(struct test *t)
+{
+	if (init(t, 1, 1) < 0 ||
+			create_ports(t, 1) < 0 ||
+			create_unordered_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* map the QID */
+	if (rte_event_port_link(evdev, t->port[0], &t->qid[0], NULL, 1) != 1) {
+		printf("%d: error mapping qid to port\n", __LINE__);
+		return -1;
+	}
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	return run_prio_packet_test(t);
+}
+
+static int
+burst_packets(struct test *t)
+{
+	/************** CONFIG ****************/
+	uint32_t i;
+	int err;
+	int ret;
+
+	/* Create instance with 2 ports and 2 queues */
+	if (init(t, 2, 2) < 0 ||
+			create_ports(t, 2) < 0 ||
+			create_atomic_qids(t, 2) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* CQ mapping to QID */
+	ret = rte_event_port_link(evdev, t->port[0], &t->qid[0], NULL, 1);
+	if (ret != 1) {
+		printf("%d: error mapping lb qid0\n", __LINE__);
+		return -1;
+	}
+	ret = rte_event_port_link(evdev, t->port[1], &t->qid[1], NULL, 1);
+	if (ret != 1) {
+		printf("%d: error mapping lb qid1\n", __LINE__);
+		return -1;
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/************** FORWARD ****************/
+	const uint32_t rx_port = 0;
+	const uint32_t NUM_PKTS = 2;
+
+	for (i = 0; i < NUM_PKTS; i++) {
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+		if (!arp) {
+			printf("%d: error generating pkt\n", __LINE__);
+			return -1;
+		}
+
+		struct rte_event ev = {
+				.op = RTE_EVENT_OP_NEW,
+				.queue_id = i % 2,
+				.flow_id = i % 3,
+				.mbuf = arp,
+		};
+		/* generate pkt and enqueue */
+		err = rte_event_enqueue_burst(evdev, t->port[rx_port], &ev, 1);
+		if (err < 0) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+	rte_event_schedule(evdev);
+
+	/* Check stats for all NUM_PKTS arrived to sched core */
+	struct test_event_dev_stats stats;
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: failed to get stats\n", __LINE__);
+		return -1;
+	}
+	if (stats.rx_pkts != NUM_PKTS || stats.tx_pkts != NUM_PKTS) {
+		printf("%d: Sched core didn't receive all %d pkts\n",
+				__LINE__, NUM_PKTS);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+
+	uint32_t deq_pkts;
+	int p;
+
+	deq_pkts = 0;
+	/******** DEQ QID 1 *******/
+	do {
+		struct rte_event ev;
+		p = rte_event_dequeue_burst(evdev, t->port[0], &ev, 1, 0);
+		deq_pkts += p;
+		rte_pktmbuf_free(ev.mbuf);
+	} while (p);
+
+	if (deq_pkts != NUM_PKTS/2) {
+		printf("%d: Half of NUM_PKTS didn't arrive at port 1\n",
+				__LINE__);
+		return -1;
+	}
+
+	/******** DEQ QID 2 *******/
+	deq_pkts = 0;
+	do {
+		struct rte_event ev;
+		p = rte_event_dequeue_burst(evdev, t->port[1], &ev, 1, 0);
+		deq_pkts += p;
+		rte_pktmbuf_free(ev.mbuf);
+	} while (p);
+	if (deq_pkts != NUM_PKTS/2) {
+		printf("%d: Half of NUM_PKTS didn't arrive at port 2\n",
+				__LINE__);
+		return -1;
+	}
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+abuse_inflights(struct test *t)
+{
+	const int rx_enq = 0;
+	const int wrk_enq = 2;
+	int err;
+
+	/* Create instance with 4 ports */
+	if (init(t, 1, 4) < 0 ||
+			create_ports(t, 4) < 0 ||
+			create_atomic_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* CQ mapping to QID */
+	err = rte_event_port_link(evdev, t->port[wrk_enq], NULL, NULL, 0);
+	if (err != 1) {
+		printf("%d: error mapping lb qid\n", __LINE__);
+		cleanup(t);
+		return -1;
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/* Enqueue op only */
+	err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &release_ev, 1);
+	if (err < 0) {
+		printf("%d: Failed to enqueue\n", __LINE__);
+		return -1;
+	}
+
+	/* schedule */
+	rte_event_schedule(evdev);
+
+	struct test_event_dev_stats stats;
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	if (stats.rx_pkts != 0 ||
+			stats.tx_pkts != 0 ||
+			stats.port_inflight[wrk_enq] != 0) {
+		printf("%d: Sched core didn't handle pkt as expected\n",
+				__LINE__);
+		return -1;
+	}
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+qid_priorities(struct test *t)
+{
+	/* Test works by having a CQ with enough empty space for all packets,
+	 * and enqueueing 3 packets to 3 QIDs. They must return based on the
+	 * priority of the QID, not the ingress order, to pass the test
+	 */
+	unsigned int i;
+	/* Create instance with 1 ports, and 3 qids */
+	if (init(t, 3, 1) < 0 ||
+			create_ports(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	for (i = 0; i < 3; i++) {
+		/* Create QID */
+		const struct rte_event_queue_conf conf = {
+			.event_queue_cfg = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY,
+			/* increase priority (0 == highest), as we go */
+			.priority = RTE_EVENT_DEV_PRIORITY_NORMAL - i,
+			.nb_atomic_flows = 1024,
+			.nb_atomic_order_sequences = 1024,
+		};
+
+		if (rte_event_queue_setup(evdev, i, &conf) < 0) {
+			printf("%d: error creating qid %d\n", __LINE__, i);
+			return -1;
+		}
+		t->qid[i] = i;
+	}
+	t->nb_qids = i;
+	/* map all QIDs to port */
+	rte_event_port_link(evdev, t->port[0], NULL, NULL, 0);
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/* enqueue 3 packets, setting seqn and QID to check priority */
+	for (i = 0; i < 3; i++) {
+		struct rte_event ev;
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			return -1;
+		}
+		ev.queue_id = t->qid[i];
+		ev.op = RTE_EVENT_OP_NEW;
+		ev.mbuf = arp;
+		arp->seqn = i;
+
+		int err = rte_event_enqueue_burst(evdev, t->port[0], &ev, 1);
+		if (err != 1) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+
+	rte_event_schedule(evdev);
+
+	/* dequeue packets, verify priority was upheld */
+	struct rte_event ev[32];
+	uint32_t deq_pkts =
+		rte_event_dequeue_burst(evdev, t->port[0], ev, 32, 0);
+	if (deq_pkts != 3) {
+		printf("%d: failed to deq packets\n", __LINE__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+	for (i = 0; i < 3; i++) {
+		if (ev[i].mbuf->seqn != 2-i) {
+			printf(
+				"%d: qid priority test: seqn %d incorrectly prioritized\n",
+					__LINE__, i);
+		}
+	}
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+load_balancing(struct test *t)
+{
+	const int rx_enq = 0;
+	int err;
+	uint32_t i;
+
+	if (init(t, 1, 4) < 0 ||
+			create_ports(t, 4) < 0 ||
+			create_atomic_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	for (i = 0; i < 3; i++) {
+		/* map port 1 - 3 inclusive */
+		if (rte_event_port_link(evdev, t->port[i+1], &t->qid[0],
+				NULL, 1) != 1) {
+			printf("%d: error mapping qid to port %d\n",
+					__LINE__, i);
+			return -1;
+		}
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/************** FORWARD ****************/
+	/*
+	 * Create a set of flows that test the load-balancing operation of the
+	 * implementation. Fill CQ 0 and 1 with flows 0 and 1, and test
+	 * with a new flow, which should be sent to the 3rd mapped CQ
+	 */
+	static uint32_t flows[] = {0, 1, 1, 0, 0, 2, 2, 0, 2};
+
+	for (i = 0; i < RTE_DIM(flows); i++) {
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			return -1;
+		}
+
+		struct rte_event ev = {
+				.op = RTE_EVENT_OP_NEW,
+				.queue_id = t->qid[0],
+				.flow_id = flows[i],
+				.mbuf = arp,
+		};
+		/* generate pkt and enqueue */
+		err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &ev, 1);
+		if (err < 0) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+
+	rte_event_schedule(evdev);
+
+	struct test_event_dev_stats stats;
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	if (stats.port_inflight[1] != 4) {
+		printf("%d:%s: port 1 inflight not correct\n", __LINE__,
+				__func__);
+		return -1;
+	}
+	if (stats.port_inflight[2] != 2) {
+		printf("%d:%s: port 2 inflight not correct\n", __LINE__,
+				__func__);
+		return -1;
+	}
+	if (stats.port_inflight[3] != 3) {
+		printf("%d:%s: port 3 inflight not correct\n", __LINE__,
+				__func__);
+		return -1;
+	}
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+load_balancing_history(struct test *t)
+{
+	struct test_event_dev_stats stats = {0};
+	const int rx_enq = 0;
+	int err;
+	uint32_t i;
+
+	/* Create instance with 1 atomic QID going to 3 ports + 1 prod port */
+	if (init(t, 1, 4) < 0 ||
+			create_ports(t, 4) < 0 ||
+			create_atomic_qids(t, 1) < 0)
+		return -1;
+
+	/* CQ mapping to QID */
+	if (rte_event_port_link(evdev, t->port[1], &t->qid[0], NULL, 1) != 1) {
+		printf("%d: error mapping port 1 qid\n", __LINE__);
+		return -1;
+	}
+	if (rte_event_port_link(evdev, t->port[2], &t->qid[0], NULL, 1) != 1) {
+		printf("%d: error mapping port 2 qid\n", __LINE__);
+		return -1;
+	}
+	if (rte_event_port_link(evdev, t->port[3], &t->qid[0], NULL, 1) != 1) {
+		printf("%d: error mapping port 3 qid\n", __LINE__);
+		return -1;
+	}
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/*
+	 * Create a set of flows that test the load-balancing operation of the
+	 * implementation. Fill CQ 0, 1 and 2 with flows 0, 1 and 2, drop
+	 * the packet from CQ 0, send in a new set of flows. Ensure that:
+	 *  1. The new flow 3 gets into the empty CQ0
+	 *  2. packets for existing flow gets added into CQ1
+	 *  3. Next flow 0 pkt is now onto CQ2, since CQ0 and CQ1 now contain
+	 *     more outstanding pkts
+	 *
+	 *  This test makes sure that when a flow ends (i.e. all packets
+	 *  have been completed for that flow), that the flow can be moved
+	 *  to a different CQ when new packets come in for that flow.
+	 */
+	static uint32_t flows1[] = {0, 1, 1, 2};
+
+	for (i = 0; i < RTE_DIM(flows1); i++) {
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+		struct rte_event ev = {
+				.flow_id = flows1[i],
+				.op = RTE_EVENT_OP_NEW,
+				.queue_id = t->qid[0],
+				.event_type = RTE_EVENT_TYPE_CPU,
+				.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
+				.mbuf = arp
+		};
+
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			return -1;
+		}
+		arp->hash.rss = flows1[i];
+		err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &ev, 1);
+		if (err < 0) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+
+	/* call the scheduler */
+	rte_event_schedule(evdev);
+
+	/* Dequeue the flow 0 packet from port 1, so that we can then drop */
+	struct rte_event ev;
+	if (!rte_event_dequeue_burst(evdev, t->port[1], &ev, 1, 0)) {
+		printf("%d: failed to dequeue\n", __LINE__);
+		return -1;
+	}
+	if (ev.mbuf->hash.rss != flows1[0]) {
+		printf("%d: unexpected flow received\n", __LINE__);
+		return -1;
+	}
+
+	/* drop the flow 0 packet from port 1 */
+	rte_event_enqueue_burst(evdev, t->port[1], &release_ev, 1);
+
+	/* call the scheduler */
+	rte_event_schedule(evdev);
+
+	/*
+	 * Set up the next set of flows, first a new flow to fill up
+	 * CQ 0, so that the next flow 0 packet should go to CQ2
+	 */
+	static uint32_t flows2[] = { 3, 3, 3, 1, 1, 0 };
+
+	for (i = 0; i < RTE_DIM(flows2); i++) {
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+		struct rte_event ev = {
+				.flow_id = flows2[i],
+				.op = RTE_EVENT_OP_NEW,
+				.queue_id = t->qid[0],
+				.event_type = RTE_EVENT_TYPE_CPU,
+				.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
+				.mbuf = arp
+		};
+
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			return -1;
+		}
+		arp->hash.rss = flows2[i];
+
+		err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &ev, 1);
+		if (err < 0) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+
+	/* schedule */
+	rte_event_schedule(evdev);
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d:failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	/*
+	 * Now check the resulting inflights on each port.
+	 */
+	if (stats.port_inflight[1] != 3) {
+		printf("%d:%s: port 1 inflight not correct\n", __LINE__,
+				__func__);
+		printf("Inflights, ports 1, 2, 3: %u, %u, %u\n",
+				(unsigned int)stats.port_inflight[1],
+				(unsigned int)stats.port_inflight[2],
+				(unsigned int)stats.port_inflight[3]);
+		return -1;
+	}
+	if (stats.port_inflight[2] != 4) {
+		printf("%d:%s: port 2 inflight not correct\n", __LINE__,
+				__func__);
+		printf("Inflights, ports 1, 2, 3: %u, %u, %u\n",
+				(unsigned int)stats.port_inflight[1],
+				(unsigned int)stats.port_inflight[2],
+				(unsigned int)stats.port_inflight[3]);
+		return -1;
+	}
+	if (stats.port_inflight[3] != 2) {
+		printf("%d:%s: port 3 inflight not correct\n", __LINE__,
+				__func__);
+		printf("Inflights, ports 1, 2, 3: %u, %u, %u\n",
+				(unsigned int)stats.port_inflight[1],
+				(unsigned int)stats.port_inflight[2],
+				(unsigned int)stats.port_inflight[3]);
+		return -1;
+	}
+
+	for (i = 1; i <= 3; i++) {
+		struct rte_event ev;
+		while (rte_event_dequeue_burst(evdev, i, &ev, 1, 0))
+			rte_event_enqueue_burst(evdev, i, &release_ev, 1);
+	}
+	rte_event_schedule(evdev);
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+invalid_qid(struct test *t)
+{
+	struct test_event_dev_stats stats;
+	const int rx_enq = 0;
+	int err;
+	uint32_t i;
+
+	if (init(t, 1, 4) < 0 ||
+			create_ports(t, 4) < 0 ||
+			create_atomic_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* CQ mapping to QID */
+	for (i = 0; i < 4; i++) {
+		err = rte_event_port_link(evdev, t->port[i], &t->qid[0],
+				NULL, 1);
+		if (err != 1) {
+			printf("%d: error mapping port 1 qid\n", __LINE__);
+			return -1;
+		}
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/*
+	 * Send in a packet with an invalid qid to the scheduler.
+	 * We should see the packed enqueued OK, but the inflights for
+	 * that packet should not be incremented, and the rx_dropped
+	 * should be incremented.
+	 */
+	static uint32_t flows1[] = {20};
+
+	for (i = 0; i < RTE_DIM(flows1); i++) {
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			return -1;
+		}
+
+		struct rte_event ev = {
+				.op = RTE_EVENT_OP_NEW,
+				.queue_id = t->qid[0] + flows1[i],
+				.flow_id = i,
+				.mbuf = arp,
+		};
+		/* generate pkt and enqueue */
+		err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &ev, 1);
+		if (err < 0) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+
+	/* call the scheduler */
+	rte_event_schedule(evdev);
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	/*
+	 * Now check the resulting inflights on the port, and the rx_dropped.
+	 */
+	if (stats.port_inflight[0] != 0) {
+		printf("%d:%s: port 1 inflight count not correct\n", __LINE__,
+				__func__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+	if (stats.port_rx_dropped[0] != 1) {
+		printf("%d:%s: port 1 drops\n", __LINE__, __func__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+	/* each packet drop should only be counted in one place - port or dev */
+	if (stats.rx_dropped != 0) {
+		printf("%d:%s: port 1 dropped count not correct\n", __LINE__,
+				__func__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+single_packet(struct test *t)
+{
+	const uint32_t MAGIC_SEQN = 7321;
+	struct rte_event ev;
+	struct test_event_dev_stats stats;
+	const int rx_enq = 0;
+	const int wrk_enq = 2;
+	int err;
+
+	/* Create instance with 4 ports */
+	if (init(t, 1, 4) < 0 ||
+			create_ports(t, 4) < 0 ||
+			create_atomic_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* CQ mapping to QID */
+	err = rte_event_port_link(evdev, t->port[wrk_enq], NULL, NULL, 0);
+	if (err != 1) {
+		printf("%d: error mapping lb qid\n", __LINE__);
+		cleanup(t);
+		return -1;
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/************** Gen pkt and enqueue ****************/
+	struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+	if (!arp) {
+		printf("%d: gen of pkt failed\n", __LINE__);
+		return -1;
+	}
+
+	ev.op = RTE_EVENT_OP_NEW;
+	ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL;
+	ev.mbuf = arp;
+	ev.queue_id = 0;
+	ev.flow_id = 3;
+	arp->seqn = MAGIC_SEQN;
+
+	err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &ev, 1);
+	if (err < 0) {
+		printf("%d: Failed to enqueue\n", __LINE__);
+		return -1;
+	}
+
+	rte_event_schedule(evdev);
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	if (stats.rx_pkts != 1 ||
+			stats.tx_pkts != 1 ||
+			stats.port_inflight[wrk_enq] != 1) {
+		printf("%d: Sched core didn't handle pkt as expected\n",
+				__LINE__);
+		rte_event_dev_dump(evdev, stdout);
+		return -1;
+	}
+
+	uint32_t deq_pkts;
+
+	deq_pkts = rte_event_dequeue_burst(evdev, t->port[wrk_enq], &ev, 1, 0);
+	if (deq_pkts < 1) {
+		printf("%d: Failed to deq\n", __LINE__);
+		return -1;
+	}
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: failed to get stats\n", __LINE__);
+		return -1;
+	}
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (ev.mbuf->seqn != MAGIC_SEQN) {
+		printf("%d: magic sequence number not dequeued\n", __LINE__);
+		return -1;
+	}
+
+	rte_pktmbuf_free(ev.mbuf);
+	err = rte_event_enqueue_burst(evdev, t->port[wrk_enq], &release_ev, 1);
+	if (err < 0) {
+		printf("%d: Failed to enqueue\n", __LINE__);
+		return -1;
+	}
+	rte_event_schedule(evdev);
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (stats.port_inflight[wrk_enq] != 0) {
+		printf("%d: port inflight not correct\n", __LINE__);
+		return -1;
+	}
+
+	cleanup(t);
+	return 0;
+}
+
+static int
+inflight_counts(struct test *t)
+{
+	struct rte_event ev;
+	struct test_event_dev_stats stats;
+	const int rx_enq = 0;
+	const int p1 = 1;
+	const int p2 = 2;
+	int err;
+	int i;
+
+	/* Create instance with 4 ports */
+	if (init(t, 2, 3) < 0 ||
+			create_ports(t, 3) < 0 ||
+			create_atomic_qids(t, 2) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* CQ mapping to QID */
+	err = rte_event_port_link(evdev, t->port[p1], &t->qid[0], NULL, 1);
+	if (err != 1) {
+		printf("%d: error mapping lb qid\n", __LINE__);
+		cleanup(t);
+		return -1;
+	}
+	err = rte_event_port_link(evdev, t->port[p2], &t->qid[1], NULL, 1);
+	if (err != 1) {
+		printf("%d: error mapping lb qid\n", __LINE__);
+		cleanup(t);
+		return -1;
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/************** FORWARD ****************/
+#define QID1_NUM 5
+	for (i = 0; i < QID1_NUM; i++) {
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			goto err;
+		}
+
+		ev.queue_id =  t->qid[0];
+		ev.op = RTE_EVENT_OP_NEW;
+		ev.mbuf = arp;
+		err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &ev, 1);
+		if (err != 1) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			goto err;
+		}
+	}
+#define QID2_NUM 3
+	for (i = 0; i < QID2_NUM; i++) {
+		struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
+
+		if (!arp) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			goto err;
+		}
+		ev.queue_id =  t->qid[1];
+		ev.op = RTE_EVENT_OP_NEW;
+		ev.mbuf = arp;
+		err = rte_event_enqueue_burst(evdev, t->port[rx_enq], &ev, 1);
+		if (err != 1) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			goto err;
+		}
+	}
+
+	/* schedule */
+	rte_event_schedule(evdev);
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (err) {
+		printf("%d: failed to get stats\n", __LINE__);
+		goto err;
+	}
+
+	if (stats.rx_pkts != QID1_NUM + QID2_NUM ||
+			stats.tx_pkts != QID1_NUM + QID2_NUM) {
+		printf("%d: Sched core didn't handle pkt as expected\n",
+				__LINE__);
+		goto err;
+	}
+
+	if (stats.port_inflight[p1] != QID1_NUM) {
+		printf("%d: %s port 1 inflight not correct\n", __LINE__,
+				__func__);
+		goto err;
+	}
+	if (stats.port_inflight[p2] != QID2_NUM) {
+		printf("%d: %s port 2 inflight not correct\n", __LINE__,
+				__func__);
+		goto err;
+	}
+
+	/************** DEQUEUE INFLIGHT COUNT CHECKS  ****************/
+	/* port 1 */
+	struct rte_event events[QID1_NUM + QID2_NUM];
+	uint32_t deq_pkts = rte_event_dequeue_burst(evdev, t->port[p1], events,
+			RTE_DIM(events), 0);
+
+	if (deq_pkts != QID1_NUM) {
+		printf("%d: Port 1: DEQUEUE inflight failed\n", __LINE__);
+		goto err;
+	}
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (stats.port_inflight[p1] != QID1_NUM) {
+		printf("%d: port 1 inflight decrement after DEQ != 0\n",
+				__LINE__);
+		goto err;
+	}
+	for (i = 0; i < QID1_NUM; i++) {
+		err = rte_event_enqueue_burst(evdev, t->port[p1], &release_ev,
+				1);
+		if (err != 1) {
+			printf("%d: %s rte enqueue of inf release failed\n",
+				__LINE__, __func__);
+			goto err;
+		}
+	}
+
+	/*
+	 * As the scheduler core decrements inflights, it needs to run to
+	 * process packets to act on the drop messages
+	 */
+	rte_event_schedule(evdev);
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (stats.port_inflight[p1] != 0) {
+		printf("%d: port 1 inflight NON NULL after DROP\n", __LINE__);
+		goto err;
+	}
+
+	/* port2 */
+	deq_pkts = rte_event_dequeue_burst(evdev, t->port[p2], events,
+			RTE_DIM(events), 0);
+	if (deq_pkts != QID2_NUM) {
+		printf("%d: Port 2: DEQUEUE inflight failed\n", __LINE__);
+		goto err;
+	}
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (stats.port_inflight[p2] != QID2_NUM) {
+		printf("%d: port 1 inflight decrement after DEQ != 0\n",
+				__LINE__);
+		goto err;
+	}
+	for (i = 0; i < QID2_NUM; i++) {
+		err = rte_event_enqueue_burst(evdev, t->port[p2], &release_ev,
+				1);
+		if (err != 1) {
+			printf("%d: %s rte enqueue of inf release failed\n",
+				__LINE__, __func__);
+			goto err;
+		}
+	}
+
+	/*
+	 * As the scheduler core decrements inflights, it needs to run to
+	 * process packets to act on the drop messages
+	 */
+	rte_event_schedule(evdev);
+
+	err = test_event_dev_stats_get(evdev, &stats);
+	if (stats.port_inflight[p2] != 0) {
+		printf("%d: port 2 inflight NON NULL after DROP\n", __LINE__);
+		goto err;
+	}
+	cleanup(t);
+	return 0;
+
+err:
+	rte_event_dev_dump(evdev, stdout);
+	cleanup(t);
+	return -1;
+}
+
+static int
+parallel_basic(struct test *t, int check_order)
+{
+	const uint8_t rx_port = 0;
+	const uint8_t w1_port = 1;
+	const uint8_t w3_port = 3;
+	const uint8_t tx_port = 4;
+	int err;
+	int i;
+	uint32_t deq_pkts, j;
+	struct rte_mbuf *mbufs[3];
+	struct rte_mbuf *mbufs_out[3];
+	const uint32_t MAGIC_SEQN = 1234;
+
+	/* Create instance with 4 ports */
+	if (init(t, 2, tx_port + 1) < 0 ||
+			create_ports(t, tx_port + 1) < 0 ||
+			(check_order ?  create_ordered_qids(t, 1) :
+				create_unordered_qids(t, 1)) < 0 ||
+			create_directed_qids(t, 1, &tx_port)) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/*
+	 * CQ mapping to QID
+	 * We need three ports, all mapped to the same ordered qid0. Then we'll
+	 * take a packet out to each port, re-enqueue in reverse order,
+	 * then make sure the reordering has taken place properly when we
+	 * dequeue from the tx_port.
+	 *
+	 * Simplified test setup diagram:
+	 *
+	 * rx_port        w1_port
+	 *        \     /         \
+	 *         qid0 - w2_port - qid1
+	 *              \         /     \
+	 *                w3_port        tx_port
+	 */
+	/* CQ mapping to QID for LB ports (directed mapped on create) */
+	for (i = w1_port; i <= w3_port; i++) {
+		err = rte_event_port_link(evdev, t->port[i], &t->qid[0], NULL,
+				1);
+		if (err != 1) {
+			printf("%d: error mapping lb qid\n", __LINE__);
+			cleanup(t);
+			return -1;
+		}
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	/* Enqueue 3 packets to the rx port */
+	for (i = 0; i < 3; i++) {
+		struct rte_event ev;
+		mbufs[i] = rte_gen_arp(0, t->mbuf_pool);
+		if (!mbufs[i]) {
+			printf("%d: gen of pkt failed\n", __LINE__);
+			return -1;
+		}
+
+		ev.queue_id = t->qid[0];
+		ev.op = RTE_EVENT_OP_NEW;
+		ev.mbuf = mbufs[i];
+		mbufs[i]->seqn = MAGIC_SEQN + i;
+
+		/* generate pkt and enqueue */
+		err = rte_event_enqueue_burst(evdev, t->port[rx_port], &ev, 1);
+		if (err != 1) {
+			printf("%d: Failed to enqueue pkt %u, retval = %u\n",
+					__LINE__, i, err);
+			return -1;
+		}
+	}
+
+	rte_event_schedule(evdev);
+
+	/* use extra slot to make logic in loops easier */
+	struct rte_event deq_ev[w3_port + 1];
+
+	/* Dequeue the 3 packets, one from each worker port */
+	for (i = w1_port; i <= w3_port; i++) {
+		deq_pkts = rte_event_dequeue_burst(evdev, t->port[i],
+				&deq_ev[i], 1, 0);
+		if (deq_pkts != 1) {
+			printf("%d: Failed to deq\n", __LINE__);
+			rte_event_dev_dump(evdev, stdout);
+			return -1;
+		}
+	}
+
+	/* Enqueue each packet in reverse order, flushing after each one */
+	for (i = w3_port; i >= w1_port; i--) {
+
+		deq_ev[i].op = RTE_EVENT_OP_FORWARD;
+		deq_ev[i].queue_id = t->qid[1];
+		err = rte_event_enqueue_burst(evdev, t->port[i], &deq_ev[i], 1);
+		if (err != 1) {
+			printf("%d: Failed to enqueue\n", __LINE__);
+			return -1;
+		}
+	}
+	rte_event_schedule(evdev);
+
+	/* dequeue from the tx ports, we should get 3 packets */
+	deq_pkts = rte_event_dequeue_burst(evdev, t->port[tx_port], deq_ev,
+			3, 0);
+
+	/* Check to see if we've got all 3 packets */
+	if (deq_pkts != 3) {
+		printf("%d: expected 3 packets at tx port got %d from port %d\n",
+			__LINE__, deq_pkts, tx_port);
+		rte_event_dev_dump(evdev, stdout);
+		return 1;
+	}
+
+	/* Check to see if the sequence numbers are in expected order */
+	if (check_order) {
+		for (j = 0 ; j < deq_pkts ; j++) {
+			if (deq_ev[j].mbuf->seqn != MAGIC_SEQN + j) {
+				printf(
+					"%d: Incorrect sequence number(%d) from port %d\n",
+					__LINE__, mbufs_out[j]->seqn, tx_port);
+				return -1;
+			}
+		}
+	}
+
+	/* Destroy the instance */
+	cleanup(t);
+	return 0;
+}
+
+static int
+ordered_basic(struct test *t)
+{
+	return parallel_basic(t, 1);
+}
+
+static int
+unordered_basic(struct test *t)
+{
+	return parallel_basic(t, 0);
+}
+
+static int
+holb(struct test *t) /* test to check we avoid basic head-of-line blocking */
+{
+	const struct rte_event new_ev = {
+			.op = RTE_EVENT_OP_NEW
+			/* all other fields zero */
+	};
+	struct rte_event ev = new_ev;
+	unsigned int rx_port = 0; /* port we get the first flow on */
+	char rx_port_used_stat[64];
+	char rx_port_free_stat[64];
+	char other_port_used_stat[64];
+
+	if (init(t, 1, 2) < 0 ||
+			create_ports(t, 2) < 0 ||
+			create_atomic_qids(t, 1) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+	int nb_links = rte_event_port_link(evdev, t->port[1], NULL, NULL, 0);
+	if (rte_event_port_link(evdev, t->port[0], NULL, NULL, 0) != 1 ||
+			nb_links != 1) {
+		printf("%d: Error links queue to ports\n", __LINE__);
+		goto err;
+	}
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		goto err;
+	}
+
+	/* send one packet and see where it goes, port 0 or 1 */
+	if (rte_event_enqueue_burst(evdev, t->port[0], &ev, 1) != 1) {
+		printf("%d: Error doing first enqueue\n", __LINE__);
+		goto err;
+	}
+	rte_event_schedule(evdev);
+
+	if (rte_event_dev_xstats_by_name_get(evdev, "port_0_cq_ring_used", NULL) != 1)
+		rx_port = 1;
+	snprintf(rx_port_used_stat, sizeof(rx_port_used_stat),
+			"port_%u_cq_ring_used", rx_port);
+	snprintf(rx_port_free_stat, sizeof(rx_port_free_stat),
+			"port_%u_cq_ring_free", rx_port);
+	snprintf(other_port_used_stat, sizeof(other_port_used_stat),
+			"port_%u_cq_ring_used", rx_port ^ 1);
+	if (rte_event_dev_xstats_by_name_get(evdev, rx_port_used_stat, NULL) != 1) {
+		printf("%d: Error, first event not scheduled\n", __LINE__);
+		goto err;
+	}
+
+	/* now fill up the rx port's queue with one flow to cause HOLB */
+	do {
+		ev = new_ev;
+		if (rte_event_enqueue_burst(evdev, t->port[0], &ev, 1) != 1) {
+			printf("%d: Error with enqueue\n", __LINE__);
+			goto err;
+		}
+		rte_event_schedule(evdev);
+	} while (rte_event_dev_xstats_by_name_get(evdev, rx_port_free_stat, NULL) != 0);
+
+	/* one more packet, which needs to stay in IQ - i.e. HOLB */
+	ev = new_ev;
+	if (rte_event_enqueue_burst(evdev, t->port[0], &ev, 1) != 1) {
+		printf("%d: Error with enqueue\n", __LINE__);
+		goto err;
+	}
+	rte_event_schedule(evdev);
+
+	/* check that the other port still has an empty CQ */
+	if (rte_event_dev_xstats_by_name_get(evdev, other_port_used_stat, NULL) != 0) {
+		printf("%d: Error, second port CQ is not empty\n", __LINE__);
+		goto err;
+	}
+	/* check IQ now has one packet */
+	if (rte_event_dev_xstats_by_name_get(evdev, "qid_0_iq_0_used", NULL) != 1) {
+		printf("%d: Error, QID does not have exactly 1 packet\n", __LINE__);
+		goto err;
+	}
+
+	/* send another flow, which should pass the other IQ entry */
+	ev = new_ev;
+	ev.flow_id = 1;
+	if (rte_event_enqueue_burst(evdev, t->port[0], &ev, 1) != 1) {
+		printf("%d: Error with enqueue\n", __LINE__);
+		goto err;
+	}
+	rte_event_schedule(evdev);
+
+	if (rte_event_dev_xstats_by_name_get(evdev, other_port_used_stat, NULL) != 1) {
+		printf("%d: Error, second flow did not pass out first\n", __LINE__);
+		goto err;
+	}
+
+	if (rte_event_dev_xstats_by_name_get(evdev, "qid_0_iq_0_used", NULL) != 1) {
+		printf("%d: Error, QID does not have exactly 1 packet\n", __LINE__);
+		goto err;
+	}
+	cleanup(t);
+	return 0;
+err:
+	rte_event_dev_dump(evdev, stdout);
+	cleanup(t);
+	return -1;
+}
+
+static int
+worker_loopback_worker_fn(void *arg)
+{
+	struct test *t = arg;
+	uint8_t port = t->port[1];
+	int count = 0;
+	int enqd;
+
+	/*
+	 * Takes packets from the input port and then loops them back through
+	 * the Queue Manager. Each packet gets looped through QIDs 0-8, 16
+	 * times so each packet goes through 8*16 = 128 times.
+	 */
+	printf("%d: \tWorker function started\n", __LINE__);
+	while (count < NUM_PACKETS) {
+#define BURST_SIZE 32
+		struct rte_event ev[BURST_SIZE];
+		uint16_t i, nb_rx = rte_event_dequeue_burst(evdev, port, ev,
+				BURST_SIZE, 0);
+		if (nb_rx == 0) {
+			rte_pause();
+			continue;
+		}
+
+		for (i = 0; i < nb_rx; i++) {
+			ev[i].queue_id++;
+			if (ev[i].queue_id != 8) {
+				ev[i].op = RTE_EVENT_OP_FORWARD;
+				enqd = rte_event_enqueue_burst(evdev, port,
+						&ev[i], 1);
+				if (enqd != 1) {
+					printf("%d: Can't enqueue FWD!!\n",
+							__LINE__);
+					return -1;
+				}
+				continue;
+			}
+
+			ev[i].queue_id = 0;
+			ev[i].mbuf->udata64++;
+			if (ev[i].mbuf->udata64 != 16) {
+				ev[i].op = RTE_EVENT_OP_FORWARD;
+				enqd = rte_event_enqueue_burst(evdev, port,
+						&ev[i], 1);
+				if (enqd != 1) {
+					printf("%d: Can't enqueue FWD!!\n",
+							__LINE__);
+					return -1;
+				}
+				continue;
+			}
+			/* we have hit 16 iterations through system - drop */
+			rte_pktmbuf_free(ev[i].mbuf);
+			count++;
+			ev[i].op = RTE_EVENT_OP_RELEASE;
+			enqd = rte_event_enqueue_burst(evdev, port, &ev[i], 1);
+			if (enqd != 1) {
+				printf("%d drop enqueue failed\n", __LINE__);
+				return -1;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int
+worker_loopback_producer_fn(void *arg)
+{
+	struct test *t = arg;
+	uint8_t port = t->port[0];
+	uint64_t count = 0;
+
+	printf("%d: \tProducer function started\n", __LINE__);
+	while (count < NUM_PACKETS) {
+		struct rte_mbuf *m = 0;
+		do {
+			m = rte_pktmbuf_alloc(t->mbuf_pool);
+		} while (m == NULL);
+
+		m->udata64 = 0;
+
+		struct rte_event ev = {
+				.op = RTE_EVENT_OP_NEW,
+				.queue_id = t->qid[0],
+				.flow_id = (uintptr_t)m & 0xFFFF,
+				.mbuf = m,
+		};
+
+		if (rte_event_enqueue_burst(evdev, port, &ev, 1) != 1) {
+			while (rte_event_enqueue_burst(evdev, port, &ev, 1) !=
+					1)
+				rte_pause();
+		}
+
+		count++;
+	}
+
+	return 0;
+}
+
+static int
+worker_loopback(struct test *t)
+{
+	/* use a single producer core, and a worker core to see what happens
+	 * if the worker loops packets back multiple times
+	 */
+	struct test_event_dev_stats stats;
+	uint64_t print_cycles = 0, cycles = 0;
+	uint64_t tx_pkts = 0;
+	int err;
+	int w_lcore, p_lcore;
+
+	if (init(t, 8, 2) < 0 ||
+			create_atomic_qids(t, 8) < 0) {
+		printf("%d: Error initializing device\n", __LINE__);
+		return -1;
+	}
+
+	/* RX with low max events */
+	static struct rte_event_port_conf conf = {
+			.new_event_threshold = 512,
+			.dequeue_depth = 32,
+			.enqueue_depth = 64,
+	};
+	if (rte_event_port_setup(evdev, 0, &conf) < 0) {
+		printf("Error setting up RX port\n");
+		return -1;
+	}
+	t->port[0] = 0;
+	/* TX with higher max events */
+	conf.new_event_threshold = 4096;
+	if (rte_event_port_setup(evdev, 1, &conf) < 0) {
+		printf("Error setting up TX port\n");
+		return -1;
+	}
+	t->port[1] = 1;
+
+	/* CQ mapping to QID */
+	err = rte_event_port_link(evdev, t->port[1], NULL, NULL, 0);
+	if (err != 8) { /* should have mapped all queues*/
+		printf("%d: error mapping port 2 to all qids\n", __LINE__);
+		return -1;
+	}
+
+	if (rte_event_dev_start(evdev) < 0) {
+		printf("%d: Error with start call\n", __LINE__);
+		return -1;
+	}
+
+	p_lcore = rte_get_next_lcore(
+			/* start core */ -1,
+			/* skip master */ 1,
+			/* wrap */ 0);
+	w_lcore = rte_get_next_lcore(p_lcore, 1, 0);
+
+	rte_eal_remote_launch(worker_loopback_producer_fn, t, p_lcore);
+	rte_eal_remote_launch(worker_loopback_worker_fn, t, w_lcore);
+
+	print_cycles = cycles = rte_get_timer_cycles();
+	while (rte_eal_get_lcore_state(p_lcore) != FINISHED ||
+			rte_eal_get_lcore_state(w_lcore) != FINISHED) {
+
+		rte_event_schedule(evdev);
+
+		uint64_t new_cycles = rte_get_timer_cycles();
+
+		if (new_cycles - print_cycles > rte_get_timer_hz()) {
+			test_event_dev_stats_get(evdev, &stats);
+			printf(
+				"%d: \tSched Rx = %" PRIu64 ", Tx = %" PRIu64 "\n",
+				__LINE__, stats.rx_pkts, stats.tx_pkts);
+
+			print_cycles = new_cycles;
+		}
+		if (new_cycles - cycles > rte_get_timer_hz() * 3) {
+			test_event_dev_stats_get(evdev, &stats);
+			if (stats.tx_pkts == tx_pkts) {
+				rte_event_dev_dump(evdev, stdout);
+				printf(
+					"%d: No schedules for seconds, deadlock\n",
+					__LINE__);
+				return -1;
+			}
+			tx_pkts = stats.tx_pkts;
+			cycles = new_cycles;
+		}
+	}
+	rte_event_schedule(evdev); /* ensure all completions are flushed */
+
+	rte_eal_mp_wait_lcore();
+
+	cleanup(t);
+	return 0;
+}
+
+static struct rte_mempool *eventdev_func_mempool;
+
+static int
+test_sw_eventdev(void)
+{
+	struct test *t = malloc(sizeof(struct test));
+	int ret;
+
+	const char *eventdev_name = "event_sw0";
+	evdev = rte_event_dev_get_dev_id(eventdev_name);
+	if (evdev < 0) {
+		printf("%d: Eventdev %s not found - creating.\n",
+				__LINE__, eventdev_name);
+		if (rte_eal_vdev_init(eventdev_name, NULL) < 0) {
+			printf("Error creating eventdev\n");
+			return -1;
+		}
+		evdev = rte_event_dev_get_dev_id(eventdev_name);
+		if (evdev < 0) {
+			printf("Error finding newly created eventdev\n");
+			return -1;
+		}
+	}
+
+	/* Only create mbuf pool once, reuse for each test run */
+	if (!eventdev_func_mempool) {
+		eventdev_func_mempool = rte_pktmbuf_pool_create(
+				"EVENTDEV_SW_SA_MBUF_POOL",
+				(1<<12), /* 4k buffers */
+				32 /*MBUF_CACHE_SIZE*/,
+				0,
+				512, /* use very small mbufs */
+				rte_socket_id());
+		if (!eventdev_func_mempool) {
+			printf("ERROR creating mempool\n");
+			return -1;
+		}
+	}
+	t->mbuf_pool = eventdev_func_mempool;
+
+	printf("*** Running Single Directed Packet test...\n");
+	ret = test_single_directed_packet(t);
+	if (ret != 0) {
+		printf("ERROR - Single Directed Packet test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Single Load Balanced Packet test...\n");
+	ret = single_packet(t);
+	if (ret != 0) {
+		printf("ERROR - Single Packet test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Unordered Basic test...\n");
+	ret = unordered_basic(t);
+	if (ret != 0) {
+		printf("ERROR -  Unordered Basic test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Ordered Basic test...\n");
+	ret = ordered_basic(t);
+	if (ret != 0) {
+		printf("ERROR -  Ordered Basic test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Burst Packets test...\n");
+	ret = burst_packets(t);
+	if (ret != 0) {
+		printf("ERROR - Burst Packets test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Load Balancing test...\n");
+	ret = load_balancing(t);
+	if (ret != 0) {
+		printf("ERROR - Load Balancing test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Prioritized Directed test...\n");
+	ret = test_priority_directed(t);
+	if (ret != 0) {
+		printf("ERROR - Prioritized Directed test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Prioritized Atomic test...\n");
+	ret = test_priority_atomic(t);
+	if (ret != 0) {
+		printf("ERROR - Prioritized Atomic test FAILED.\n");
+		return ret;
+	}
+
+	printf("*** Running Prioritized Ordered test...\n");
+	ret = test_priority_ordered(t);
+	if (ret != 0) {
+		printf("ERROR - Prioritized Ordered test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Prioritized Unordered test...\n");
+	ret = test_priority_unordered(t);
+	if (ret != 0) {
+		printf("ERROR - Prioritized Unordered test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Invalid QID test...\n");
+	ret = invalid_qid(t);
+	if (ret != 0) {
+		printf("ERROR - Invalid QID test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Load Balancing History test...\n");
+	ret = load_balancing_history(t);
+	if (ret != 0) {
+		printf("ERROR - Load Balancing History test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Inflight Count test...\n");
+	ret = inflight_counts(t);
+	if (ret != 0) {
+		printf("ERROR - Inflight Count test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Abuse Inflights test...\n");
+	ret = abuse_inflights(t);
+	if (ret != 0) {
+		printf("ERROR - Abuse Inflights test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running QID Priority test...\n");
+	ret = qid_priorities(t);
+	if (ret != 0) {
+		printf("ERROR - QID Priority test FAILED.\n");
+		return ret;
+	}
+	printf("*** Running Head-of-line-blocking test...\n");
+	ret = holb(t);
+	if (ret != 0) {
+		printf("ERROR - Head-of-line-blocking test FAILED.\n");
+		return ret;
+	}
+	if (rte_lcore_count() >= 3) {
+		printf("*** Running Worker loopback test...\n");
+		ret = worker_loopback(t);
+		if (ret != 0) {
+			printf("ERROR - Worker loopback test FAILED.\n");
+			return ret;
+		}
+	} else {
+		printf("### Not enough cores for worker loopback test.\n");
+		printf("### Need at least 3 cores for test.\n");
+	}
+	/*
+	 * Free test instance, leaving mempool initialized, and a pointer to it
+	 * in static eventdev_func_mempool, as it is re-used on re-runs
+	 */
+	free(t);
+
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(eventdev_sw_autotest, test_sw_eventdev);