[RFC] app/test-flow-perf: add rte_flow perf app

Message ID 1584452772-31147-1-git-send-email-wisamm@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] app/test-flow-perf: add rte_flow perf app |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Wisam Jaddo March 17, 2020, 1:46 p.m. UTC
  Introducing new application for rte_flow performance
testing. The application provide the ability to test
insertion rate of specific rte_flow rule, by stressing
it to the NIC, and calculate the insertion rate.

It also provides packet per second measurements
after the insertion operation is done.

The application offers some options in the command
line, to configure which rule to apply.

After that the application will start producing rules
with same pattern but increasing the outer IP source
address by 1 each time, thus it will give different
flow each time, and all other items will have open masks.

The current design have single core insertion rate.
In the future we may have a multi core insertion rate
measurement support in the app.

The app supports single and multi core performance
measurements.

Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
---
 app/Makefile                     |   1 +
 app/meson.build                  |   1 +
 app/test-flow-perf/Makefile      |  28 ++
 app/test-flow-perf/actions_gen.c |  26 ++
 app/test-flow-perf/actions_gen.h |  15 +
 app/test-flow-perf/flow_gen.c    |  97 ++++++
 app/test-flow-perf/flow_gen.h    |  47 +++
 app/test-flow-perf/items_gen.c   |  37 +++
 app/test-flow-perf/items_gen.h   |  16 +
 app/test-flow-perf/main.c        | 656 +++++++++++++++++++++++++++++++++++++++
 app/test-flow-perf/meson.build   |  14 +
 config/common_base               |   5 +
 12 files changed, 943 insertions(+)
 create mode 100644 app/test-flow-perf/Makefile
 create mode 100644 app/test-flow-perf/actions_gen.c
 create mode 100644 app/test-flow-perf/actions_gen.h
 create mode 100644 app/test-flow-perf/flow_gen.c
 create mode 100644 app/test-flow-perf/flow_gen.h
 create mode 100644 app/test-flow-perf/items_gen.c
 create mode 100644 app/test-flow-perf/items_gen.h
 create mode 100644 app/test-flow-perf/main.c
 create mode 100644 app/test-flow-perf/meson.build
  

Comments

Jerin Jacob March 20, 2020, 6:49 a.m. UTC | #1
On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo <wisamm@mellanox.com> wrote:

Thanks for this application. Useful stuff.

>
> Introducing new application for rte_flow performance
> testing. The application provide the ability to test
> insertion rate of specific rte_flow rule, by stressing
> it to the NIC, and calculate the insertion rate.
>
> It also provides packet per second measurements
> after the insertion operation is done.
>
> The application offers some options in the command
> line, to configure which rule to apply.
>
> After that the application will start producing rules
> with same pattern but increasing the outer IP source
> address by 1 each time, thus it will give different
> flow each time, and all other items will have open masks.
>
> The current design have single core insertion rate.
> In the future we may have a multi core insertion rate
> measurement support in the app.

If I understand correctly,
# On the main thread, this  application first check the flow insertion
performance
# and then start the worker thread for packet forwarding.
Why this application testing the packet forwarding?, We already have
testpmd for that.

IMO, This application needs to focus only on
- Insertion performance
- Deletion performance
- IMO, it is better to add a framework for the profile where the first
version of this application can
define common a set of ITEMS and set of ACTION and later others can extend it.
And the framework can run over all the profiles and spit out the
insertion and deletion
performance.


>
> The app supports single and multi core performance
> measurements.
>
> Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
> ---
>  app/Makefile                     |   1 +
>  app/meson.build                  |   1 +

# Update MAINTAINERS file

# Add doc for this test under doc/guides/tools/

# Please update release notes
  
Thomas Monjalon March 20, 2020, 11:51 a.m. UTC | #2
20/03/2020 07:49, Jerin Jacob:
> On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo <wisamm@mellanox.com> wrote:
> 
> Thanks for this application. Useful stuff.
> 
> >
> > Introducing new application for rte_flow performance
> > testing. The application provide the ability to test
> > insertion rate of specific rte_flow rule, by stressing
> > it to the NIC, and calculate the insertion rate.
> >
> > It also provides packet per second measurements
> > after the insertion operation is done.
> >
> > The application offers some options in the command
> > line, to configure which rule to apply.
> >
> > After that the application will start producing rules
> > with same pattern but increasing the outer IP source
> > address by 1 each time, thus it will give different
> > flow each time, and all other items will have open masks.
> >
> > The current design have single core insertion rate.
> > In the future we may have a multi core insertion rate
> > measurement support in the app.
> 
> If I understand correctly,
> # On the main thread, this  application first check the flow insertion
> performance
> # and then start the worker thread for packet forwarding.
> Why this application testing the packet forwarding?, We already have
> testpmd for that.

I think it is interesting to measure forwarding performance
when million of flow rules are in effect.

> IMO, This application needs to focus only on
> - Insertion performance
> - Deletion performance
> - IMO, it is better to add a framework for the profile where the first
> version of this application can
> define common a set of ITEMS and set of ACTION and later others can extend it.
> And the framework can run over all the profiles and spit out the
> insertion and deletion
> performance.

What do you call a profile? Is it a set of rules?
I think this first version is proposing rules customization with parameters.
Note: I prefer a non-interactive application for performance testing.

> > The app supports single and multi core performance
> > measurements.
  
Jerin Jacob March 20, 2020, 12:18 p.m. UTC | #3
On Fri, Mar 20, 2020 at 5:21 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 20/03/2020 07:49, Jerin Jacob:
> > On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo <wisamm@mellanox.com> wrote:
> >
> > Thanks for this application. Useful stuff.
> >
> > >
> > > Introducing new application for rte_flow performance
> > > testing. The application provide the ability to test
> > > insertion rate of specific rte_flow rule, by stressing
> > > it to the NIC, and calculate the insertion rate.
> > >
> > > It also provides packet per second measurements
> > > after the insertion operation is done.
> > >
> > > The application offers some options in the command
> > > line, to configure which rule to apply.
> > >
> > > After that the application will start producing rules
> > > with same pattern but increasing the outer IP source
> > > address by 1 each time, thus it will give different
> > > flow each time, and all other items will have open masks.
> > >
> > > The current design have single core insertion rate.
> > > In the future we may have a multi core insertion rate
> > > measurement support in the app.
> >
> > If I understand correctly,
> > # On the main thread, this  application first check the flow insertion
> > performance
> > # and then start the worker thread for packet forwarding.
> > Why this application testing the packet forwarding?, We already have
> > testpmd for that.
>
> I think it is interesting to measure forwarding performance
> when million of flow rules are in effect.

The rules are applied to the HW CAM, Right?
Do you see any performance difference?

>
> > IMO, This application needs to focus only on
> > - Insertion performance
> > - Deletion performance
> > - IMO, it is better to add a framework for the profile where the first
> > version of this application can
> > define common a set of ITEMS and set of ACTION and later others can extend it.
> > And the framework can run over all the profiles and spit out the
> > insertion and deletion
> > performance.
>
> What do you call a profile? Is it a set of rules?

set of rules and/or actions.

> I think this first version is proposing rules customization with parameters.

Just that it better to have a framework where one can easily add new
profiles and
test various combos. IMO, Cascade rules take more insertion time.

> Note: I prefer a non-interactive application for performance testing.

Me too. Command-line is fine.

>
> > > The app supports single and multi core performance
> > > measurements.
>
>
>
  
Wisam Jaddo March 23, 2020, 9:53 a.m. UTC | #4
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, March 20, 2020 2:18 PM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: Wisam Monther <wisamm@mellanox.com>; dpdk-dev <dev@dpdk.org>;
> Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>
> Subject: Re: [dpdk-dev] [RFC] app/test-flow-perf: add rte_flow perf app
> 
> On Fri, Mar 20, 2020 at 5:21 PM Thomas Monjalon <thomas@monjalon.net>
> wrote:
> >
> > 20/03/2020 07:49, Jerin Jacob:
> > > On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo <wisamm@mellanox.com>
> wrote:
> > >
> > > Thanks for this application. Useful stuff.
> > >

😊

> > > >
> > > > Introducing new application for rte_flow performance testing. The
> > > > application provide the ability to test insertion rate of specific
> > > > rte_flow rule, by stressing it to the NIC, and calculate the
> > > > insertion rate.
> > > >
> > > > It also provides packet per second measurements after the
> > > > insertion operation is done.
> > > >
> > > > The application offers some options in the command line, to
> > > > configure which rule to apply.
> > > >
> > > > After that the application will start producing rules with same
> > > > pattern but increasing the outer IP source address by 1 each time,
> > > > thus it will give different flow each time, and all other items
> > > > will have open masks.
> > > >
> > > > The current design have single core insertion rate.
> > > > In the future we may have a multi core insertion rate measurement
> > > > support in the app.
> > >
> > > If I understand correctly,
> > > # On the main thread, this  application first check the flow
> > > insertion performance # and then start the worker thread for packet
> > > forwarding.
> > > Why this application testing the packet forwarding?, We already have
> > > testpmd for that.
> >
> > I think it is interesting to measure forwarding performance when
> > million of flow rules are in effect.
> 
> The rules are applied to the HW CAM, Right?
> Do you see any performance difference?
> 

Yes, there are applied to HW,
No not really, I still didn't test the impact of performance yet.
Moreover it's interesting to see such results and the impact on performance,
Also to see the rules are still matching after all Millions of insertion and millions of packets
Sending/receiving.

> >
> > > IMO, This application needs to focus only on
> > > - Insertion performance
> > > - Deletion performance
> > > - IMO, it is better to add a framework for the profile where the
> > > first version of this application can define common a set of ITEMS
> > > and set of ACTION and later others can extend it.
> > > And the framework can run over all the profiles and spit out the
> > > insertion and deletion performance.
> >
> > What do you call a profile? Is it a set of rules?
> 
> set of rules and/or actions.
> 
> > I think this first version is proposing rules customization with parameters.
> 
> Just that it better to have a framework where one can easily add new
> profiles and test various combos. IMO, Cascade rules take more insertion
> time.
> 
> > Note: I prefer a non-interactive application for performance testing.
> 
> Me too. Command-line is fine.
> 

For this version I'm aiming to have the command line options to decide the profile.
For example:
. /flow-perf -n 4 -w 0000:03:00.1,dv_flow_en=1 -- --ingress --ether --ipv4 --udp --vxlan-gpe --queue --mark
Will mean 4 Million rules of:
Flow create 0 ingress pattern eth / ipv4 src is <X> / udp / vxlan-gpe / end actions mark id 1 / queue < QUEUE _ID> / end

> >
> > > > The app supports single and multi core performance measurements.
> >
> >
> >
  
Jerin Jacob March 23, 2020, 11:15 a.m. UTC | #5
On Mon, Mar 23, 2020 at 3:23 PM Wisam Monther <wisamm@mellanox.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, March 20, 2020 2:18 PM
> > To: Thomas Monjalon <thomas@monjalon.net>
> > Cc: Wisam Monther <wisamm@mellanox.com>; dpdk-dev <dev@dpdk.org>;
> > Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > <rasland@mellanox.com>
> > Subject: Re: [dpdk-dev] [RFC] app/test-flow-perf: add rte_flow perf app
> >
> > On Fri, Mar 20, 2020 at 5:21 PM Thomas Monjalon <thomas@monjalon.net>
> > wrote:
> > >
> > > 20/03/2020 07:49, Jerin Jacob:
> > > > On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo <wisamm@mellanox.com>
> > wrote:
> > > >
> > > > Thanks for this application. Useful stuff.
> > > >
>
>
>
> > > > >
> > > > > Introducing new application for rte_flow performance testing. The
> > > > > application provide the ability to test insertion rate of specific
> > > > > rte_flow rule, by stressing it to the NIC, and calculate the
> > > > > insertion rate.
> > > > >
> > > > > It also provides packet per second measurements after the
> > > > > insertion operation is done.
> > > > >
> > > > > The application offers some options in the command line, to
> > > > > configure which rule to apply.
> > > > >
> > > > > After that the application will start producing rules with same
> > > > > pattern but increasing the outer IP source address by 1 each time,
> > > > > thus it will give different flow each time, and all other items
> > > > > will have open masks.
> > > > >
> > > > > The current design have single core insertion rate.
> > > > > In the future we may have a multi core insertion rate measurement
> > > > > support in the app.
> > > >
> > > > If I understand correctly,
> > > > # On the main thread, this  application first check the flow
> > > > insertion performance # and then start the worker thread for packet
> > > > forwarding.
> > > > Why this application testing the packet forwarding?, We already have
> > > > testpmd for that.
> > >
> > > I think it is interesting to measure forwarding performance when
> > > million of flow rules are in effect.
> >
> > The rules are applied to the HW CAM, Right?
> > Do you see any performance difference?
> >
>
> Yes, there are applied to HW,


OK.IMO, it is better to introduce the command-line argument to
disable/enable packet forwarding.
That will enable if someone needs to test only flow insertion
performance to avoid the IO setup.

>
> No not really, I still didn't test the impact of performance yet.
> Moreover it's interesting to see such results and the impact on performance,
> Also to see the rules are still matching after all Millions of insertion and millions of packets
> Sending/receiving.


>
>
> > >
> > > > IMO, This application needs to focus only on
> > > > - Insertion performance
> > > > - Deletion performance
> > > > - IMO, it is better to add a framework for the profile where the
> > > > first version of this application can define common a set of ITEMS
> > > > and set of ACTION and later others can extend it.
> > > > And the framework can run over all the profiles and spit out the
> > > > insertion and deletion performance.
> > >
> > > What do you call a profile? Is it a set of rules?
> >
> > set of rules and/or actions.
> >
> > > I think this first version is proposing rules customization with parameters.
> >
> > Just that it better to have a framework where one can easily add new
> > profiles and test various combos. IMO, Cascade rules take more insertion
> > time.
> >
> > > Note: I prefer a non-interactive application for performance testing.
> >
> > Me too. Command-line is fine.
> >
>
> For this version I'm aiming to have the command line options to decide the profile.
> For example:
> . /flow-perf -n 4 -w 0000:03:00.1,dv_flow_en=1 -- --ingress --ether --ipv4 --udp --vxlan-gpe --queue --mark
> Will mean 4 Million rules of:
> Flow create 0 ingress pattern eth / ipv4 src is <X> / udp / vxlan-gpe / end actions mark id 1 / queue < QUEUE _ID> / end

Ok. The syntax looks good. I think we can add a number of rules as
well in command like instead of hardcoding to 4Millon.

And what about the flow deletion performance case?


>
>
> > >
> > > > > The app supports single and multi core performance measurements.
> > >
> > >
> > >
  
Wisam Jaddo March 23, 2020, 11:41 a.m. UTC | #6
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, March 23, 2020 1:16 PM
> To: Wisam Monther <wisamm@mellanox.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; dpdk-dev
> <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>
> Subject: Re: [dpdk-dev] [RFC] app/test-flow-perf: add rte_flow perf app
> 
> On Mon, Mar 23, 2020 at 3:23 PM Wisam Monther
> <wisamm@mellanox.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Friday, March 20, 2020 2:18 PM
> > > To: Thomas Monjalon <thomas@monjalon.net>
> > > Cc: Wisam Monther <wisamm@mellanox.com>; dpdk-dev
> <dev@dpdk.org>;
> > > Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > > <rasland@mellanox.com>
> > > Subject: Re: [dpdk-dev] [RFC] app/test-flow-perf: add rte_flow perf
> > > app
> > >
> > > On Fri, Mar 20, 2020 at 5:21 PM Thomas Monjalon
> > > <thomas@monjalon.net>
> > > wrote:
> > > >
> > > > 20/03/2020 07:49, Jerin Jacob:
> > > > > On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo
> > > > > <wisamm@mellanox.com>
> > > wrote:
> > > > >
> > > > > Thanks for this application. Useful stuff.
> > > > >
> >
> >
> >
> > > > > >
> > > > > > Introducing new application for rte_flow performance testing.
> > > > > > The application provide the ability to test insertion rate of
> > > > > > specific rte_flow rule, by stressing it to the NIC, and
> > > > > > calculate the insertion rate.
> > > > > >
> > > > > > It also provides packet per second measurements after the
> > > > > > insertion operation is done.
> > > > > >
> > > > > > The application offers some options in the command line, to
> > > > > > configure which rule to apply.
> > > > > >
> > > > > > After that the application will start producing rules with
> > > > > > same pattern but increasing the outer IP source address by 1
> > > > > > each time, thus it will give different flow each time, and all
> > > > > > other items will have open masks.
> > > > > >
> > > > > > The current design have single core insertion rate.
> > > > > > In the future we may have a multi core insertion rate
> > > > > > measurement support in the app.
> > > > >
> > > > > If I understand correctly,
> > > > > # On the main thread, this  application first check the flow
> > > > > insertion performance # and then start the worker thread for
> > > > > packet forwarding.
> > > > > Why this application testing the packet forwarding?, We already
> > > > > have testpmd for that.
> > > >
> > > > I think it is interesting to measure forwarding performance when
> > > > million of flow rules are in effect.
> > >
> > > The rules are applied to the HW CAM, Right?
> > > Do you see any performance difference?
> > >
> >
> > Yes, there are applied to HW,
> 
> 
> OK.IMO, it is better to introduce the command-line argument to
> disable/enable packet forwarding.
> That will enable if someone needs to test only flow insertion performance to
> avoid the IO setup.
> 

Sure, we can have the forwarding enabled by default, and I'll add --disable-fwd
To command line options, it looks reasonable to have it, I agree

> >
> > No not really, I still didn't test the impact of performance yet.
> > Moreover it's interesting to see such results and the impact on
> > performance, Also to see the rules are still matching after all
> > Millions of insertion and millions of packets Sending/receiving.
> 
> 
> >
> >
> > > >
> > > > > IMO, This application needs to focus only on
> > > > > - Insertion performance
> > > > > - Deletion performance
> > > > > - IMO, it is better to add a framework for the profile where the
> > > > > first version of this application can define common a set of
> > > > > ITEMS and set of ACTION and later others can extend it.
> > > > > And the framework can run over all the profiles and spit out the
> > > > > insertion and deletion performance.
> > > >
> > > > What do you call a profile? Is it a set of rules?
> > >
> > > set of rules and/or actions.
> > >
> > > > I think this first version is proposing rules customization with
> parameters.
> > >
> > > Just that it better to have a framework where one can easily add new
> > > profiles and test various combos. IMO, Cascade rules take more
> > > insertion time.
> > >
> > > > Note: I prefer a non-interactive application for performance testing.
> > >
> > > Me too. Command-line is fine.
> > >
> >
> > For this version I'm aiming to have the command line options to decide the
> profile.
> > For example:
> > . /flow-perf -n 4 -w 0000:03:00.1,dv_flow_en=1 -- --ingress --ether
> > --ipv4 --udp --vxlan-gpe --queue --mark Will mean 4 Million rules of:
> > Flow create 0 ingress pattern eth / ipv4 src is <X> / udp / vxlan-gpe
> > / end actions mark id 1 / queue < QUEUE _ID> / end
> 
> Ok. The syntax looks good. I think we can add a number of rules as well in
> command like instead of hardcoding to 4Millon.
> 

Sure we can have it also
BTW, I'm planning to have a file under "user_paramters.h"
This file for other specific fields such as:
/** Flows count & iteration size **/
#define FLOWS_COUNT      4000000
#define ITERATION_SIZE  100000

/** Configuration **/
#define RXQs 4
#define TXQs 4
#define HAIRPIN_QUEUES 4
#define TOTAL_MBUF_NUM 32000
#define MBUF_SIZE 2048
#define MBUF_CACHE_SIZE 512
#define NR_RXD  256
#define NR_TXD  256

/** Items/Actions parameters **/
#define FLOW_TABLE 1
#define JUMP_ACTION_TABLE 2
#define VLAN_VALUE 1
#define VNI_VALUE 1
#define GRE_PROTO  0x6558
#define META_DATA 1
#define TAG_INDEX 0
#define PORT_ID_DST 1
#define MARK_ID 1
#define TEID_VALUE 1

> And what about the flow deletion performance case?

I agree we should have it as well in this application,
I plan it to do it as well

> 
> 
> >
> >
> > > >
> > > > > > The app supports single and multi core performance
> measurements.
> > > >
> > > >
> > > >
  
Thomas Monjalon March 23, 2020, 1 p.m. UTC | #7
23/03/2020 12:41, Wisam Monther:
> From: Jerin Jacob <jerinjacobk@gmail.com>
> > On Mon, Mar 23, 2020 at 3:23 PM Wisam Monther wrote:
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > On Fri, Mar 20, 2020 at 5:21 PM Thomas Monjalon wrote:
> > > > > 20/03/2020 07:49, Jerin Jacob:
> > > > > > On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo wrote:
> > > > > >
> > > > > > Thanks for this application. Useful stuff.
> > > > > >
> > > > > > >
> > > > > > > Introducing new application for rte_flow performance testing.
> > > > > > > The application provide the ability to test insertion rate of
> > > > > > > specific rte_flow rule, by stressing it to the NIC, and
> > > > > > > calculate the insertion rate.
> > > > > > >
> > > > > > > It also provides packet per second measurements after the
> > > > > > > insertion operation is done.
> > > > > > >
> > > > > > > The application offers some options in the command line, to
> > > > > > > configure which rule to apply.
> > > > > > >
> > > > > > > After that the application will start producing rules with
> > > > > > > same pattern but increasing the outer IP source address by 1
> > > > > > > each time, thus it will give different flow each time, and all
> > > > > > > other items will have open masks.
> > > > > > >
> > > > > > > The current design have single core insertion rate.
> > > > > > > In the future we may have a multi core insertion rate
> > > > > > > measurement support in the app.
> > > > > >
> > > > > > If I understand correctly,
> > > > > > # On the main thread, this  application first check the flow
> > > > > > insertion performance # and then start the worker thread for
> > > > > > packet forwarding.
> > > > > > Why this application testing the packet forwarding?, We already
> > > > > > have testpmd for that.
> > > > >
> > > > > I think it is interesting to measure forwarding performance when
> > > > > million of flow rules are in effect.
> > > >
> > > > The rules are applied to the HW CAM, Right?
> > > > Do you see any performance difference?
> > > >
> > >
> > > Yes, there are applied to HW,
> > 
> > 
> > OK.IMO, it is better to introduce the command-line argument to
> > disable/enable packet forwarding.
> > That will enable if someone needs to test only flow insertion performance to
> > avoid the IO setup.
> > 
> 
> Sure, we can have the forwarding enabled by default, and I'll add --disable-fwd
> To command line options, it looks reasonable to have it, I agree

In general I prefer things disabled by default.
Option --test-fwd makes more sense and can accept some forwarding options.


> > > No not really, I still didn't test the impact of performance yet.
> > > Moreover it's interesting to see such results and the impact on
> > > performance, Also to see the rules are still matching after all
> > > Millions of insertion and millions of packets Sending/receiving.
> > 
> > 
> > > > > > IMO, This application needs to focus only on
> > > > > > - Insertion performance
> > > > > > - Deletion performance
> > > > > > - IMO, it is better to add a framework for the profile where the
> > > > > > first version of this application can define common a set of
> > > > > > ITEMS and set of ACTION and later others can extend it.
> > > > > > And the framework can run over all the profiles and spit out the
> > > > > > insertion and deletion performance.
> > > > >
> > > > > What do you call a profile? Is it a set of rules?
> > > >
> > > > set of rules and/or actions.
> > > >
> > > > > I think this first version is proposing rules customization with
> > parameters.
> > > >
> > > > Just that it better to have a framework where one can easily add new
> > > > profiles and test various combos. IMO, Cascade rules take more
> > > > insertion time.
> > > >
> > > > > Note: I prefer a non-interactive application for performance testing.
> > > >
> > > > Me too. Command-line is fine.
> > > >
> > >
> > > For this version I'm aiming to have the command line options to decide the
> > profile.
> > > For example:
> > > . /flow-perf -n 4 -w 0000:03:00.1,dv_flow_en=1 -- --ingress --ether
> > > --ipv4 --udp --vxlan-gpe --queue --mark Will mean 4 Million rules of:
> > > Flow create 0 ingress pattern eth / ipv4 src is <X> / udp / vxlan-gpe
> > > / end actions mark id 1 / queue < QUEUE _ID> / end
> > 
> > Ok. The syntax looks good. I think we can add a number of rules as well in
> > command like instead of hardcoding to 4Millon.
> 
> Sure we can have it also
> BTW, I'm planning to have a file under "user_paramters.h"
> This file for other specific fields such as:
> /** Flows count & iteration size **/
> #define FLOWS_COUNT      4000000
> #define ITERATION_SIZE  100000

Please make flows count a variable which can be changed with option.


> > And what about the flow deletion performance case?
> 
> I agree we should have it as well in this application,
> I plan it to do it as well

Great, thanks
  
Wisam Jaddo March 23, 2020, 1:09 p.m. UTC | #8
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, March 23, 2020 3:00 PM
> To: Jerin Jacob <jerinjacobk@gmail.com>; Wisam Monther
> <wisamm@mellanox.com>
> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@mellanox.com>;
> Raslan Darawsheh <rasland@mellanox.com>
> Subject: Re: [dpdk-dev] [RFC] app/test-flow-perf: add rte_flow perf app
> 
> 23/03/2020 12:41, Wisam Monther:
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > On Mon, Mar 23, 2020 at 3:23 PM Wisam Monther wrote:
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > On Fri, Mar 20, 2020 at 5:21 PM Thomas Monjalon wrote:
> > > > > > 20/03/2020 07:49, Jerin Jacob:
> > > > > > > On Tue, Mar 17, 2020 at 7:16 PM Wisam Jaddo wrote:
> > > > > > >
> > > > > > > Thanks for this application. Useful stuff.
> > > > > > >
> > > > > > > >
> > > > > > > > Introducing new application for rte_flow performance testing.
> > > > > > > > The application provide the ability to test insertion rate
> > > > > > > > of specific rte_flow rule, by stressing it to the NIC, and
> > > > > > > > calculate the insertion rate.
> > > > > > > >
> > > > > > > > It also provides packet per second measurements after the
> > > > > > > > insertion operation is done.
> > > > > > > >
> > > > > > > > The application offers some options in the command line,
> > > > > > > > to configure which rule to apply.
> > > > > > > >
> > > > > > > > After that the application will start producing rules with
> > > > > > > > same pattern but increasing the outer IP source address by
> > > > > > > > 1 each time, thus it will give different flow each time,
> > > > > > > > and all other items will have open masks.
> > > > > > > >
> > > > > > > > The current design have single core insertion rate.
> > > > > > > > In the future we may have a multi core insertion rate
> > > > > > > > measurement support in the app.
> > > > > > >
> > > > > > > If I understand correctly,
> > > > > > > # On the main thread, this  application first check the flow
> > > > > > > insertion performance # and then start the worker thread for
> > > > > > > packet forwarding.
> > > > > > > Why this application testing the packet forwarding?, We
> > > > > > > already have testpmd for that.
> > > > > >
> > > > > > I think it is interesting to measure forwarding performance
> > > > > > when million of flow rules are in effect.
> > > > >
> > > > > The rules are applied to the HW CAM, Right?
> > > > > Do you see any performance difference?
> > > > >
> > > >
> > > > Yes, there are applied to HW,
> > >
> > >
> > > OK.IMO, it is better to introduce the command-line argument to
> > > disable/enable packet forwarding.
> > > That will enable if someone needs to test only flow insertion
> > > performance to avoid the IO setup.
> > >
> >
> > Sure, we can have the forwarding enabled by default, and I'll add
> > --disable-fwd To command line options, it looks reasonable to have it,
> > I agree
> 
> In general I prefer things disabled by default.
> Option --test-fwd makes more sense and can accept some forwarding
> options.

sure

> 
> 
> > > > No not really, I still didn't test the impact of performance yet.
> > > > Moreover it's interesting to see such results and the impact on
> > > > performance, Also to see the rules are still matching after all
> > > > Millions of insertion and millions of packets Sending/receiving.
> > >
> > >
> > > > > > > IMO, This application needs to focus only on
> > > > > > > - Insertion performance
> > > > > > > - Deletion performance
> > > > > > > - IMO, it is better to add a framework for the profile where
> > > > > > > the first version of this application can define common a
> > > > > > > set of ITEMS and set of ACTION and later others can extend it.
> > > > > > > And the framework can run over all the profiles and spit out
> > > > > > > the insertion and deletion performance.
> > > > > >
> > > > > > What do you call a profile? Is it a set of rules?
> > > > >
> > > > > set of rules and/or actions.
> > > > >
> > > > > > I think this first version is proposing rules customization
> > > > > > with
> > > parameters.
> > > > >
> > > > > Just that it better to have a framework where one can easily add
> > > > > new profiles and test various combos. IMO, Cascade rules take
> > > > > more insertion time.
> > > > >
> > > > > > Note: I prefer a non-interactive application for performance testing.
> > > > >
> > > > > Me too. Command-line is fine.
> > > > >
> > > >
> > > > For this version I'm aiming to have the command line options to
> > > > decide the
> > > profile.
> > > > For example:
> > > > . /flow-perf -n 4 -w 0000:03:00.1,dv_flow_en=1 -- --ingress
> > > > --ether
> > > > --ipv4 --udp --vxlan-gpe --queue --mark Will mean 4 Million rules of:
> > > > Flow create 0 ingress pattern eth / ipv4 src is <X> / udp /
> > > > vxlan-gpe / end actions mark id 1 / queue < QUEUE _ID> / end
> > >
> > > Ok. The syntax looks good. I think we can add a number of rules as
> > > well in command like instead of hardcoding to 4Millon.
> >
> > Sure we can have it also
> > BTW, I'm planning to have a file under "user_paramters.h"
> > This file for other specific fields such as:
> > /** Flows count & iteration size **/
> > #define FLOWS_COUNT      4000000
> > #define ITERATION_SIZE  100000
> 
> Please make flows count a variable which can be changed with option.

Sure

> 
> 
> > > And what about the flow deletion performance case?
> >
> > I agree we should have it as well in this application, I plan it to do
> > it as well
> 
> Great, thanks
> 

Thanks,
  

Patch

diff --git a/app/Makefile b/app/Makefile
index db9d2d5..694df67 100644
--- a/app/Makefile
+++ b/app/Makefile
@@ -9,6 +9,7 @@  DIRS-$(CONFIG_RTE_PROC_INFO) += proc-info
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += pdump
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test-cmdline
+DIRS-$(CONFIG_RTE_TEST_FLOW_PERF) += test-flow-perf
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_IPSEC) += test-sad
 
diff --git a/app/meson.build b/app/meson.build
index 71109cc..20d77b0 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -14,6 +14,7 @@  apps = [
 	'test-compress-perf',
 	'test-crypto-perf',
 	'test-eventdev',
+	'test-flow-perf',
 	'test-pipeline',
 	'test-pmd',
 	'test-sad']
diff --git a/app/test-flow-perf/Makefile b/app/test-flow-perf/Makefile
new file mode 100644
index 0000000..d633725
--- /dev/null
+++ b/app/test-flow-perf/Makefile
@@ -0,0 +1,28 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2020 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifeq ($(CONFIG_RTE_TEST_FLOW_PERF),y)
+
+
+#
+# library name
+#
+APP = flow_perf
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-deprecated-declarations
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-y += actions_gen.c
+SRCS-y += flow_gen.c
+SRCS-y := items_gen.c
+SRCS-y += main.c
+
+include $(RTE_SDK)/mk/rte.app.mk
+
+endif
diff --git a/app/test-flow-perf/actions_gen.c b/app/test-flow-perf/actions_gen.c
new file mode 100644
index 0000000..a40ec0e
--- /dev/null
+++ b/app/test-flow-perf/actions_gen.c
@@ -0,0 +1,26 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * The file contains the implementations of actions generators.
+ * Each generator is responsible for preparing it's action instance
+ * and initializing it with needed data.
+ *
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include "actions_gen.h"
+
+static struct rte_flow_action_queue queue_action;
+static struct rte_flow_action_mark mark_action;
+
+
+static void
+gen_queue(uint16_t queue)
+{
+	queue_action.index = queue;
+}
+
+static void
+gen_mark(uint32_t mark_id)
+{
+	mark_action.id = mark_id;
+}
diff --git a/app/test-flow-perf/actions_gen.h b/app/test-flow-perf/actions_gen.h
new file mode 100644
index 0000000..a690a1a
--- /dev/null
+++ b/app/test-flow-perf/actions_gen.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * This file contains the functions definitions to
+ * generate each supported action.
+ *
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#define MAX_ACTIONS_NUM   4
+
+static void
+gen_queue(uint16_t queue);
+
+static void
+gen_mark(uint32_t mark_id);
diff --git a/app/test-flow-perf/flow_gen.c b/app/test-flow-perf/flow_gen.c
new file mode 100644
index 0000000..74d2908
--- /dev/null
+++ b/app/test-flow-perf/flow_gen.c
@@ -0,0 +1,97 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * The file contains the implementations of the method to
+ * fill items, actions & attributes in their corresponding
+ * arrays, and then generate rte_flow rule.
+ *
+ * After the generation. The rule goes to validation then
+ * creation state and then return the results.
+ *
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include "flow_gen.h"
+#include "items_gen.c"
+#include "actions_gen.c"
+
+struct rte_flow *
+generate_flow(uint16_t port_id,
+		uint16_t flow_items,
+		uint16_t flow_actions,
+		uint8_t flow_attrs,
+		uint16_t group_id,
+		uint16_t nr_queues,
+		uint32_t outer_ip_src,
+		struct rte_flow_error *error)
+{
+	struct rte_flow_attr attr;
+	struct rte_flow_item items[MAX_ITEMS_NUM];
+	struct rte_flow_action actions[MAX_ACTIONS_NUM];
+	struct rte_flow *flow = NULL;
+	int res;
+
+	memset(items, 0, sizeof(items));
+	memset(actions, 0, sizeof(actions));
+	memset(&attr, 0, sizeof(struct rte_flow_attr));
+
+	fill_attributes(&attr, flow_attrs, group_id);
+
+	fill_actions(actions, flow_actions, nr_queues, outer_ip_src);
+
+	fill_items(items, flow_items, outer_ip_src);
+
+	res = rte_flow_validate(port_id, &attr, items, actions, error);
+	if (!res)
+		flow = rte_flow_create(port_id, &attr, items, actions, error);
+	return flow;
+}
+
+static void
+fill_attributes(struct rte_flow_attr *attr,
+	uint8_t flow_attrs, uint16_t group_id)
+{
+	if (flow_attrs & INGRESS)
+		attr->ingress = 1;
+	if (flow_attrs & EGRESS)
+		attr->egress = 1;
+	if (flow_attrs & TRANSFER)
+		attr->transfer = 1;
+	attr->group = group_id;
+}
+
+static void
+fill_items(struct rte_flow_item items[MAX_ITEMS_NUM],
+	uint16_t flow_items, uint32_t outer_ip_src)
+{
+	uint8_t items_counter = 0;
+
+	if (flow_items & ETH_ITEM)
+		add_ether(items, items_counter++);
+	if (flow_items & IPV4_ITEM)
+		add_ipv4(items, items_counter++, outer_ip_src);
+
+	items[items_counter].type = RTE_FLOW_ITEM_TYPE_END;
+}
+
+static void
+fill_actions(struct rte_flow_action actions[MAX_ACTIONS_NUM],
+	uint16_t flow_actions, uint16_t nr_queues, uint32_t counter)
+{
+	uint8_t actions_counter = 0;
+
+	/* None-fate actions */
+	if (flow_actions & MARK_ACTION) {
+		gen_mark(1);
+		actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_MARK;
+		actions[actions_counter++].conf = &mark_action;
+	}
+
+	/* Fate actions */
+	if (flow_actions & QUEUE_ACTION) {
+		gen_queue(counter % nr_queues);
+		actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_QUEUE;
+		actions[actions_counter++].conf = &queue_action;
+	}
+
+	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_END;
+}
diff --git a/app/test-flow-perf/flow_gen.h b/app/test-flow-perf/flow_gen.h
new file mode 100644
index 0000000..b006d10
--- /dev/null
+++ b/app/test-flow-perf/flow_gen.h
@@ -0,0 +1,47 @@ 
+/* SPDX-License-Identifier: BSD-3-Claus
+ *
+ * This file contains the items, actions and attributes
+ * definition. And the methods to prepare and fill items,
+ * actions and attributes to generate rte_flow rule.
+ *
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#define MAX_ACTIONS_NUM   4
+#define MAX_ITEMS_NUM     8
+
+/* Items */
+#define ETH_ITEM  0x0001
+#define IPV4_ITEM 0x0002
+
+/* Actions */
+#define QUEUE_ACTION 0x0001
+#define MARK_ACTION  0x0002
+#define DROP_ACTION  0x0004
+
+/* Attributes */
+#define INGRESS  0x0001
+#define EGRESS   0x0002
+#define TRANSFER 0x0004
+
+struct rte_flow *
+generate_flow(uint16_t port_id,
+		uint16_t flow_items,
+		uint16_t flow_actions,
+		uint8_t flow_attrs,
+		uint16_t group_id,
+		uint16_t nr_queues,
+		uint32_t outer_ip_src,
+		struct rte_flow_error *error);
+
+static void
+fill_attributes(struct rte_flow_attr *attr,
+	uint8_t flow_attrs, uint16_t group_id);
+
+static void
+fill_items(struct rte_flow_item items[MAX_ITEMS_NUM],
+	uint16_t flow_items, uint32_t outer_ip_src);
+
+static void
+fill_actions(struct rte_flow_action actions[MAX_ACTIONS_NUM],
+	uint16_t flow_actions, uint16_t nr_queues, uint32_t counter);
diff --git a/app/test-flow-perf/items_gen.c b/app/test-flow-perf/items_gen.c
new file mode 100644
index 0000000..029d8c6
--- /dev/null
+++ b/app/test-flow-perf/items_gen.c
@@ -0,0 +1,37 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * This file contain the implementations of the items
+ * related methods. Each Item have a method to prepare
+ * the item and add it into items array in given index.
+ *
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include "items_gen.h"
+
+static struct rte_flow_item_eth eth_spec;
+static struct rte_flow_item_eth eth_mask;
+
+static struct rte_flow_item_ipv4 ipv4_spec;
+static struct rte_flow_item_ipv4 ipv4_mask;
+
+static inline void
+add_ether(struct rte_flow_item items[MAX_ITEMS_NUM],
+	uint8_t items_counter)
+{
+	RTE_SET_USED(eth_spec);
+	RTE_SET_USED(eth_mask);
+	RTE_SET_USED(items);
+	RTE_SET_USED(items_counter);
+}
+
+static inline void
+add_ipv4(struct rte_flow_item items[MAX_ITEMS_NUM],
+	uint8_t items_counter, uint32_t src_ipv4)
+{
+	RTE_SET_USED(ipv4_spec);
+	RTE_SET_USED(ipv4_mask);
+	RTE_SET_USED(items);
+	RTE_SET_USED(items_counter);
+	RTE_SET_USED(src_ipv4);
+}
diff --git a/app/test-flow-perf/items_gen.h b/app/test-flow-perf/items_gen.h
new file mode 100644
index 0000000..65ef410
--- /dev/null
+++ b/app/test-flow-perf/items_gen.h
@@ -0,0 +1,16 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * This file contains the items related methods
+ *
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#define MAX_ITEMS_NUM	8
+
+static inline void
+add_ether(struct rte_flow_item items[MAX_ITEMS_NUM],
+	uint8_t items_counter);
+
+static inline void
+add_ipv4(struct rte_flow_item items[MAX_ITEMS_NUM],
+	uint8_t items_counter, uint32_t src_ipv4);
diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
new file mode 100644
index 0000000..201870f
--- /dev/null
+++ b/app/test-flow-perf/main.c
@@ -0,0 +1,656 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * This file contain the application main file
+ * This application provides the user the ability to test the
+ * insertion rate for specific rte_flow rule under stress state ~4M rule/
+ *
+ * Then it will also provide packet per second measurement after installing
+ * all rules, the user may send traffic to test the PPS that match the rules
+ * after all rules are installed, to check performance or functionality after
+ * the stress.
+ *
+ * The flows insertion will go for all ports first, then it will print the
+ * results, after that the application will go into forwarding packets mode
+ * it will start receiving traffic if any and then forwarding it back and
+ * gives packet per second measurement.
+ *
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <sys/types.h>
+#include <sys/queue.h>
+#include <netinet/in.h>
+#include <setjmp.h>
+#include <stdarg.h>
+#include <ctype.h>
+#include <errno.h>
+#include <getopt.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <assert.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/time.h>
+
+
+#include <rte_eal.h>
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+#include <rte_net.h>
+#include <rte_flow.h>
+#include <rte_cycles.h>
+#include <rte_memory.h>
+
+#include "flow_gen.h"
+#include "flow_gen.c"
+
+#define MAX_PKT_BURST 32
+#define LCORE_MODE_PKT 1
+#define LCORE_MODE_STATS 2
+#define MAX_STREAMS 64
+#define MAX_LCORES 64
+
+/* User Parameters */
+#define FLOWS_COUNT 4000000
+#define ITER_COUNT  100000
+#define MAX_ITER    100
+#define RXQs 8
+#define TXQs 8
+#define FLOW_TABLE 1
+#define TOTAL_MBUF_NUM 32000
+#define MBUF_SIZE 2048
+#define MBUF_CACHE_SIZE 512
+#define NR_RXD	256
+#define NR_TXD	256
+
+
+struct rte_flow *flow;
+
+static uint16_t flow_items;
+static uint16_t flow_actions;
+static uint8_t flow_attrs;
+static volatile bool force_quit;
+static struct rte_mempool *mbuf_mp;
+static uint32_t nb_lcores;
+
+struct stream {
+	int tx_port;
+	int tx_queue;
+	int rx_port;
+	int rx_queue;
+};
+
+struct lcore_info {
+	int mode;
+	int streams_nb;
+	struct stream streams[MAX_STREAMS];
+	/* stats */
+	uint64_t tx_pkts;
+	uint64_t tx_drops;
+	uint64_t rx_pkts;
+	struct rte_mbuf *pkts[MAX_PKT_BURST];
+} __attribute__((__aligned__(64))); /* let it be cacheline aligned */
+
+
+static struct lcore_info lcore_infos[MAX_LCORES];
+static void usage(char *progname)
+{
+	RTE_SET_USED(progname);
+	printf("usage: Help will be implemented here :)");
+}
+
+static void
+args_parse(int argc, char **argv)
+{
+	char **argvopt;
+	int opt;
+	int opt_idx;
+	static struct option lgopts[] = {
+		{ "help",                       0, 0, 0 },
+		{ "ingress",                    0, 0, 0 },
+		{ "egress",                     0, 0, 0 },
+		{ "transfer",                   0, 0, 0 },
+		{ "ether",                      0, 0, 0 },
+		{ "ipv4",                       0, 0, 0 },
+		{ "queue",                      0, 0, 0 },
+	};
+
+	flow_items = 0;
+	flow_actions = 0;
+	flow_attrs = 0;
+
+	printf(":: Flow -> ");
+	argvopt = argv;
+	while ((opt = getopt_long(argc, argvopt, "",
+				lgopts, &opt_idx)) != EOF) {
+		switch (opt) {
+		case 0:
+			if (!strcmp(lgopts[opt_idx].name, "help")) {
+				usage(argv[0]);
+				rte_exit(EXIT_SUCCESS, "Displayed help\n");
+			}
+			if (!strcmp(lgopts[opt_idx].name, "ingress")) {
+				flow_attrs |= INGRESS;
+				printf("ingress ");
+			}
+			if (!strcmp(lgopts[opt_idx].name, "egress")) {
+				flow_attrs |= EGRESS;
+				printf("egress ");
+			}
+			if (!strcmp(lgopts[opt_idx].name, "transfer")) {
+				flow_attrs |= TRANSFER;
+				printf("transfer ");
+			}
+			if (!strcmp(lgopts[opt_idx].name, "ether")) {
+				flow_items |= ETH_ITEM;
+				printf("ether / ");
+			}
+			if (!strcmp(lgopts[opt_idx].name, "ipv4")) {
+				flow_items |= IPV4_ITEM;
+				printf("ipv4 / ");
+			}
+			if (!strcmp(lgopts[opt_idx].name, "queue")) {
+				flow_actions |= QUEUE_ACTION;
+				printf("queue / ");
+			}
+			break;
+		default:
+			usage(argv[0]);
+			printf("Invalid option: %s\n", argv[optind]);
+			rte_exit(EXIT_SUCCESS, "Invalid option\n");
+			break;
+		}
+	}
+	printf("end_flow\n");
+}
+
+static void
+print_flow_error(struct rte_flow_error error)
+{
+	printf("Flow can't be created %d message: %s\n",
+			error.type,
+			error.message ? error.message : "(no stated reason)");
+}
+
+static inline void
+flows_creator(void)
+{
+	struct rte_flow_error error;
+	clock_t start, end, start_iter, end_iter;
+	double cpu_time_used, flows_rate;
+	double cpu_time_per_iter[MAX_ITER];
+	double delta;
+	uint16_t nr_ports;
+	uint32_t i;
+	uint32_t eagin_counter = 0;
+	int port_id;
+
+	nr_ports = rte_eth_dev_count_avail();
+
+	for (i = 0; i < MAX_ITER; i++)
+		cpu_time_per_iter[i] = -1;
+
+	printf(":: Flows Count per port: %d\n", FLOWS_COUNT);
+
+	for (port_id = 0; port_id < nr_ports; port_id++) {
+		/* Insertion Rate */
+		printf("Flows insertion on port = %d\n", port_id);
+		start = clock();
+		start_iter = clock();
+		for (i = 0; i < FLOWS_COUNT; i++) {
+			do {
+				rte_errno = 0;
+				flow = generate_flow(port_id, flow_items,
+					flow_actions, flow_attrs, FLOW_TABLE,
+					RXQs, i, &error);
+				if (!flow)
+					eagin_counter++;
+			} while (rte_errno == EAGAIN);
+
+			if (force_quit)
+				i = FLOWS_COUNT;
+
+			if (!flow) {
+				print_flow_error(error);
+				rte_exit(EXIT_FAILURE, "error in creating flow");
+			}
+
+			if (i && !((i + 1) % ITER_COUNT)) {
+				/* Save the insertion rate of each iter */
+				end_iter = clock();
+				delta = (double) (end_iter - start_iter);
+				cpu_time_per_iter[((i + 1) / ITER_COUNT) - 1] =
+					delta / CLOCKS_PER_SEC;
+				start_iter = clock();
+			}
+		}
+		end = clock();
+		cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
+
+		/* Iteration rate per iteration */
+		for (i = 0; i < MAX_ITER; i++) {
+			if (cpu_time_per_iter[i] == -1)
+				continue;
+			delta = (double) (ITER_COUNT / cpu_time_per_iter[i]);
+			flows_rate = delta / 1000;
+			printf(":: Iteration #%d: %d flows in %f sec[ Rate = %f K/Sec ]\n",
+			i, ITER_COUNT, cpu_time_per_iter[i], flows_rate);
+		}
+
+		/* Insertion rate for all flows */
+		flows_rate = ((double) (FLOWS_COUNT / cpu_time_used) / 1000);
+		printf("\n:: Total flow insertion rate -> %f K/Sec\n",
+						flows_rate);
+		printf(":: The time for creating %d in flows %f seconds\n",
+						FLOWS_COUNT, cpu_time_used);
+		printf(":: EAGIN counter = %d\n", eagin_counter);
+	}
+}
+
+static void
+signal_handler(int signum)
+{
+	if (signum == SIGINT || signum == SIGTERM) {
+		printf("\n\nSignal %d received, preparing to exit...\n",
+					signum);
+		printf("Error: Stats are wrong due to sudden signal!\n\n");
+		force_quit = true;
+	}
+}
+
+static inline uint16_t
+do_rx(struct lcore_info *li, uint16_t rx_port, uint16_t rx_queue)
+{
+	uint16_t cnt = 0;
+	cnt = rte_eth_rx_burst(rx_port, rx_queue, li->pkts, MAX_PKT_BURST);
+	li->rx_pkts += cnt;
+	return cnt;
+}
+
+static inline void
+do_tx(struct lcore_info *li, uint16_t cnt, uint16_t tx_port,
+			uint16_t tx_queue)
+{
+	uint16_t nr_tx = 0;
+	uint16_t i;
+
+	nr_tx = rte_eth_tx_burst(tx_port, tx_queue, li->pkts, cnt);
+	li->tx_pkts  += nr_tx;
+	li->tx_drops += cnt - nr_tx;
+
+	for (i = nr_tx; i < cnt; i++)
+		rte_pktmbuf_free(li->pkts[i]);
+}
+
+/*
+ * Here we convert numbers into pretty numbers that easy to
+ * read. The design here is to add comma after each three
+ * digits and set all of this inside buffer.
+ *
+ * For example if n = 1799321, the output will be
+ * 1,799,321 after this method which is easier to read.
+ */
+static char *
+pretty_number(uint64_t n, char *buf)
+{
+	char p[6][4];
+	int i = 0;
+	int off = 0;
+
+	while (n > 1000) {
+		sprintf(p[i], "%03d", (int)(n % 1000));
+		n /= 1000;
+		i += 1;
+	}
+
+	sprintf(p[i++], "%d", (int)n);
+
+	while (i--)
+		off += sprintf(buf + off, "%s,", p[i]);
+	buf[strlen(buf) - 1] = '\0';
+
+	return buf;
+}
+
+static void
+packet_per_second_stats(void)
+{
+	struct lcore_info old[MAX_LCORES];
+	struct lcore_info *li, *oli;
+	int nr_lines = 0;
+	int i;
+
+	memcpy(old, lcore_infos,
+		sizeof(struct lcore_info) * MAX_LCORES);
+
+	while (!force_quit) {
+		uint64_t total_tx_pkts = 0;
+		uint64_t total_rx_pkts = 0;
+		uint64_t total_tx_drops = 0;
+		uint64_t tx_delta, rx_delta, drops_delta;
+		char buf[3][32];
+		int nr_valid_core = 0;
+
+		sleep(1);
+
+		if (nr_lines) {
+			char go_up_nr_lines[16];
+
+			sprintf(go_up_nr_lines, "%c[%dA\r", 27, nr_lines);
+			printf("%s\r", go_up_nr_lines);
+		}
+
+		printf("\n%16s %16s %16s %16s\n", "core", "tx", "tx drops", "rx");
+		printf("%16s %16s %16s %16s\n", "------",
+			"----------------", "----------------", "----------------");
+		nr_lines = 3;
+		for (i = 0; i < MAX_LCORES; i++) {
+			li  = &lcore_infos[i];
+			oli = &old[i];
+			if (li->mode != LCORE_MODE_PKT)
+				continue;
+
+			tx_delta    = li->tx_pkts  - oli->tx_pkts;
+			rx_delta    = li->rx_pkts  - oli->rx_pkts;
+			drops_delta = li->tx_drops - oli->tx_drops;
+			printf("%6d %16s %16s %16s\n", i,
+				pretty_number(tx_delta,    buf[0]),
+				pretty_number(drops_delta, buf[1]),
+				pretty_number(rx_delta,    buf[2]));
+
+			total_tx_pkts  += tx_delta;
+			total_rx_pkts  += rx_delta;
+			total_tx_drops += drops_delta;
+
+			nr_valid_core++;
+			nr_lines += 1;
+		}
+
+		if (nr_valid_core > 1) {
+			printf("%6s %16s %16s %16s\n", "total",
+				pretty_number(total_tx_pkts,  buf[0]),
+				pretty_number(total_tx_drops, buf[1]),
+				pretty_number(total_rx_pkts,  buf[2]));
+			nr_lines += 1;
+		}
+
+		memcpy(old, lcore_infos,
+			sizeof(struct lcore_info) * MAX_LCORES);
+	}
+}
+
+static int
+start_forwarding(void *data __rte_unused)
+{
+	int lcore = rte_lcore_id();
+	int stream_id;
+	uint16_t cnt;
+	struct lcore_info *li = &lcore_infos[lcore];
+
+	if (!li->mode)
+		return 0;
+
+	if (li->mode == LCORE_MODE_STATS) {
+		printf(":: started stats on lcore %u\n", lcore);
+		packet_per_second_stats();
+		return 0;
+	}
+
+	while (!force_quit)
+		for (stream_id = 0; stream_id < MAX_STREAMS; stream_id++) {
+			if (li->streams[stream_id].rx_port == -1)
+				continue;
+
+			cnt = do_rx(li,
+					li->streams[stream_id].rx_port,
+					li->streams[stream_id].rx_queue);
+			if (cnt)
+				do_tx(li, cnt,
+					li->streams[stream_id].tx_port,
+					li->streams[stream_id].tx_queue);
+		}
+	return 0;
+}
+
+static void
+init_lcore_info(void)
+{
+	int i, j;
+	unsigned int lcore;
+	uint16_t nr_port;
+	uint16_t queue;
+	int port;
+	int stream_id = 0;
+	int streams_per_core;
+	int unassigned_streams;
+	int nb_fwd_streams;
+	nr_port = rte_eth_dev_count_avail();
+
+	/** First logical core is reserved for stats printing **/
+	lcore = rte_get_next_lcore(-1, 0, 0);
+	lcore_infos[lcore].mode = LCORE_MODE_STATS;
+
+	/*
+	 * Initialize all cores
+	 * All cores at first must have -1 value in all streams
+	 * This means that this stream is not used, or not set
+	 * yet.
+	 */
+	for (i = 0; i < MAX_LCORES; i++)
+		for (j = 0; j < MAX_STREAMS; j++) {
+			lcore_infos[i].streams[j].tx_port = -1;
+			lcore_infos[i].streams[j].rx_port = -1;
+			lcore_infos[i].streams[j].tx_queue = -1;
+			lcore_infos[i].streams[j].rx_queue = -1;
+			lcore_infos[i].streams_nb = 0;
+		}
+
+	/*
+	 * Calculate the total streams count.
+	 * Also distribute those streams count between the available
+	 * logical cores except first core, since it's reserved for
+	 * stats prints.
+	 */
+	nb_fwd_streams = nr_port * RXQs;
+	if ((int)(nb_lcores - 1) >= nb_fwd_streams)
+		for (i = 0; i < (int)(nb_lcores - 1); i++) {
+			lcore = rte_get_next_lcore(lcore, 0, 0);
+			lcore_infos[lcore].streams_nb = 1;
+		}
+	else {
+		streams_per_core = nb_fwd_streams / (nb_lcores - 1);
+		unassigned_streams = nb_fwd_streams % (nb_lcores - 1);
+		for (i = 0; i < (int)(nb_lcores - 1); i++) {
+			lcore = rte_get_next_lcore(lcore, 0, 0);
+			lcore_infos[lcore].streams_nb = streams_per_core;
+			if (unassigned_streams) {
+				lcore_infos[lcore].streams_nb++;
+				unassigned_streams--;
+			}
+		}
+	}
+
+	/*
+	 * Set the streams for the cores according to each logical
+	 * core stream count.
+	 * The streams is built on the design of what received should
+	 * forward as well, this means that if you received packets on
+	 * port 0 queue 0 then the same queue should forward the
+	 * packets, using the same logical core.
+	 */
+	lcore = rte_get_next_lcore(-1, 0, 0);
+	for (port = 0; port < nr_port; port++) {
+		for (queue = 0; queue < RXQs; queue++) {
+			if (!lcore_infos[lcore].streams_nb ||
+				!(stream_id % lcore_infos[lcore].streams_nb)) {
+				lcore = rte_get_next_lcore(lcore, 0, 0);
+				lcore_infos[lcore].mode = LCORE_MODE_PKT;
+				stream_id = 0;
+			}
+			lcore_infos[lcore].streams[stream_id].rx_queue = queue;
+			lcore_infos[lcore].streams[stream_id].tx_queue = queue;
+			lcore_infos[lcore].streams[stream_id].rx_port = port;
+			lcore_infos[lcore].streams[stream_id].tx_port = port;
+			stream_id++;
+		}
+	}
+
+	/** Print all streams **/
+	printf(":: Stream -> core id[N]: (rx_port, rx_queue)->(tx_port, tx_queue)\n");
+	for (i = 0; i < MAX_LCORES; i++)
+		for (j = 0; j < MAX_STREAMS; j++) {
+			/** No streams for this core **/
+			if (lcore_infos[i].streams[j].tx_port == -1)
+				break;
+			printf("Stream -> core id[%d]: (%d,%d)->(%d,%d)\n",
+				i,
+				lcore_infos[i].streams[j].rx_port,
+				lcore_infos[i].streams[j].rx_queue,
+				lcore_infos[i].streams[j].tx_port,
+				lcore_infos[i].streams[j].tx_queue);
+		}
+}
+
+static void
+init_port(void)
+{
+	int ret;
+	uint16_t i;
+	uint16_t port_id;
+	uint16_t nr_ports = rte_eth_dev_count_avail();
+	struct rte_eth_conf port_conf = {
+		.rxmode = {
+			.split_hdr_size = 0,
+		},
+		.rx_adv_conf = {
+			.rss_conf.rss_hf =
+					ETH_RSS_IP  |
+					ETH_RSS_UDP |
+					ETH_RSS_TCP,
+		},
+	};
+	struct rte_eth_txconf txq_conf;
+	struct rte_eth_rxconf rxq_conf;
+	struct rte_eth_dev_info dev_info;
+
+	if (nr_ports == 0)
+		rte_exit(EXIT_FAILURE, "Error: no port detected\n");
+	mbuf_mp = rte_pktmbuf_pool_create("mbuf_pool",
+					TOTAL_MBUF_NUM, MBUF_CACHE_SIZE,
+					0, MBUF_SIZE,
+					rte_socket_id());
+
+	if (mbuf_mp == NULL)
+		rte_exit(EXIT_FAILURE, "Error: can't init mbuf pool\n");
+
+	for (port_id = 0; port_id < nr_ports; port_id++) {
+		ret = rte_eth_dev_info_get(port_id, &dev_info);
+		if (ret != 0)
+			rte_exit(EXIT_FAILURE,
+					"Error during getting device (port %u) info: %s\n",
+					port_id, strerror(-ret));
+
+		printf(":: initializing port: %d\n", port_id);
+		ret = rte_eth_dev_configure(port_id, RXQs, TXQs, &port_conf);
+		if (ret < 0) {
+			rte_exit(EXIT_FAILURE,
+					":: cannot configure device: err=%d, port=%u\n",
+					ret, port_id);
+			}
+
+		rxq_conf = dev_info.default_rxconf;
+		rxq_conf.offloads = port_conf.rxmode.offloads;
+		for (i = 0; i < RXQs; i++) {
+			ret = rte_eth_rx_queue_setup(port_id, i, NR_RXD,
+						rte_eth_dev_socket_id(port_id),
+						&rxq_conf,
+						mbuf_mp);
+			if (ret < 0) {
+				rte_exit(EXIT_FAILURE,
+						":: Rx queue setup failed: err=%d, port=%u\n",
+						ret, port_id);
+			}
+		}
+
+		txq_conf = dev_info.default_txconf;
+		txq_conf.offloads = port_conf.txmode.offloads;
+
+		for (i = 0; i < TXQs; i++) {
+			ret = rte_eth_tx_queue_setup(port_id, i, NR_TXD,
+						rte_eth_dev_socket_id(port_id),
+						&txq_conf);
+			if (ret < 0) {
+				rte_exit(EXIT_FAILURE,
+						":: Tx queue setup failed: err=%d, port=%u\n",
+						ret, port_id);
+			}
+		}
+
+		ret = rte_eth_dev_start(port_id);
+		if (ret < 0) {
+			rte_exit(EXIT_FAILURE,
+					"rte_eth_dev_start:err=%d, port=%u\n",
+					ret, port_id);
+		}
+
+		printf(":: initializing port: %d done\n", port_id);
+	}
+}
+
+int
+main(int argc, char **argv)
+{
+	uint16_t lcore_id;
+	uint16_t port;
+	uint16_t nr_ports;
+	int ret;
+	struct rte_flow_error error;
+
+	nr_ports = rte_eth_dev_count_avail();
+	ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed\n");
+
+	force_quit = false;
+	signal(SIGINT, signal_handler);
+	signal(SIGTERM, signal_handler);
+
+	argc -= ret;
+	argv += ret;
+
+	if (argc > 1)
+		args_parse(argc, argv);
+
+	init_port();
+
+	nb_lcores = rte_lcore_count();
+
+	if (nb_lcores <= 1)
+		rte_exit(EXIT_FAILURE, "This app needs at least two cores\n");
+
+	flows_creator();
+
+	init_lcore_info();
+
+	rte_eal_mp_remote_launch(start_forwarding, NULL, CALL_MASTER);
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id)
+
+	if (rte_eal_wait_lcore(lcore_id) < 0)
+		break;
+
+	for (port = 0; port < nr_ports; port++) {
+		rte_flow_flush(port, &error);
+		rte_eth_dev_stop(port);
+		rte_eth_dev_close(port);
+	}
+	return 0;
+}
diff --git a/app/test-flow-perf/meson.build b/app/test-flow-perf/meson.build
new file mode 100644
index 0000000..2326bec
--- /dev/null
+++ b/app/test-flow-perf/meson.build
@@ -0,0 +1,14 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2020 Mellanox Technologies, Ltd
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+sources = files(
+	'actions_gen.c',
+	'flow_gen.c',
+	'items_gen.c',
+	'main.c',
+)
diff --git a/config/common_base b/config/common_base
index c31175f..79455bf 100644
--- a/config/common_base
+++ b/config/common_base
@@ -1111,3 +1111,8 @@  CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile the rte flow perf application
+#
+CONFIG_RTE_TEST_FLOW_PERF=y