app/flow-perf: configurable rule batches

Message ID 20201011100326.9074-1-katsikas.gp@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series app/flow-perf: configurable rule batches |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed

Commit Message

Georgios Katsikas Oct. 11, 2020, 10:03 a.m. UTC
  Currently, flow-perf measures the performance of
rule installation/deletion operations by breaking
down the entire number of operations into windows
of fixed size (i.e., 100000 operations per window).
Then, flow-perf measures the total time per window
and computes an average time across all windows.

This commit allows flow-perf users to configure
the number of rules per window instead of using
a fixed pre-compiled value. To do so, users must
pass --rules-batch=N, where N is the number of
rules per window (or batch).
For consistency reasons, flow_count variable is
now renamed to rules_count. This variable is the
total number of rules to be installed/deleted.

For example, if a user wants to measure how much
time it takes to install 1M rules in a certain NIC,
he/she can input:
--rules-count=1000000
This way flow-perf will break down 1M flow rules into
10 batches of 100k flow rules each (this is the default
batch size) and compute an average across the 10
measurements.
Now, if the user modifies the number of rules per
batch as follows:
--rules-count=1000000 --rules-batch=500000
then flow-perf will break down 1M flow rules into
2 batches of 500k flow rules each and compute the
average across the 2 measurements.

Finally, this commit also adds default variables
to the usage function instead of hardcoded values.

Signed-off-by: Georgios Katsikas <katsikas.gp@gmail.com>
---
 app/test-flow-perf/main.c      | 86 ++++++++++++++++++++--------------
 doc/guides/tools/flow-perf.rst | 42 ++++++++++++-----
 2 files changed, 79 insertions(+), 49 deletions(-)
  

Comments

Georgios Katsikas Nov. 3, 2020, 11:26 a.m. UTC | #1
Hi,

Any news on this patch?
Is there anything else I could do?

Thanks,
Georgios

On Sun, Oct 11, 2020 at 1:03 PM Georgios Katsikas <katsikas.gp@gmail.com>
wrote:

> Currently, flow-perf measures the performance of
> rule installation/deletion operations by breaking
> down the entire number of operations into windows
> of fixed size (i.e., 100000 operations per window).
> Then, flow-perf measures the total time per window
> and computes an average time across all windows.
>
> This commit allows flow-perf users to configure
> the number of rules per window instead of using
> a fixed pre-compiled value. To do so, users must
> pass --rules-batch=N, where N is the number of
> rules per window (or batch).
> For consistency reasons, flow_count variable is
> now renamed to rules_count. This variable is the
> total number of rules to be installed/deleted.
>
> For example, if a user wants to measure how much
> time it takes to install 1M rules in a certain NIC,
> he/she can input:
> --rules-count=1000000
> This way flow-perf will break down 1M flow rules into
> 10 batches of 100k flow rules each (this is the default
> batch size) and compute an average across the 10
> measurements.
> Now, if the user modifies the number of rules per
> batch as follows:
> --rules-count=1000000 --rules-batch=500000
> then flow-perf will break down 1M flow rules into
> 2 batches of 500k flow rules each and compute the
> average across the 2 measurements.
>
> Finally, this commit also adds default variables
> to the usage function instead of hardcoded values.
>
> Signed-off-by: Georgios Katsikas <katsikas.gp@gmail.com>
> ---
>  app/test-flow-perf/main.c      | 86 ++++++++++++++++++++--------------
>  doc/guides/tools/flow-perf.rst | 42 ++++++++++++-----
>  2 files changed, 79 insertions(+), 49 deletions(-)
>
> diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
> index c420da6a5..4cdab2c93 100644
> --- a/app/test-flow-perf/main.c
> +++ b/app/test-flow-perf/main.c
> @@ -40,7 +40,8 @@
>
>  #define MAX_ITERATIONS             100
>  #define DEFAULT_RULES_COUNT    4000000
> -#define DEFAULT_ITERATION       100000
> +#define DEFAULT_RULES_BATCH     100000
> +#define DEFAULT_GROUP                0
>
>  struct rte_flow *flow;
>  static uint8_t flow_group;
> @@ -62,8 +63,8 @@ static bool enable_fwd;
>
>  static struct rte_mempool *mbuf_mp;
>  static uint32_t nb_lcores;
> -static uint32_t flows_count;
> -static uint32_t iterations_number;
> +static uint32_t rules_count;
> +static uint32_t rules_batch;
>  static uint32_t hairpin_queues_num; /* total hairpin q number - default:
> 0 */
>  static uint32_t nb_lcores;
>
> @@ -98,8 +99,10 @@ usage(char *progname)
>  {
>         printf("\nusage: %s\n", progname);
>         printf("\nControl configurations:\n");
> -       printf("  --flows-count=N: to set the number of needed"
> -               " flows to insert, default is 4,000,000\n");
> +       printf("  --rules-count=N: to set the number of needed"
> +               " rules to insert, default is %d\n", DEFAULT_RULES_COUNT);
> +       printf("  --rules-batch=N: set number of batched rules,"
> +               " default is %d\n", DEFAULT_RULES_BATCH);
>         printf("  --dump-iterations: To print rates for each"
>                 " iteration\n");
>         printf("  --deletion-rate: Enable deletion rate"
> @@ -114,7 +117,7 @@ usage(char *progname)
>         printf("  --egress: set egress attribute in flows\n");
>         printf("  --transfer: set transfer attribute in flows\n");
>         printf("  --group=N: set group for all flows,"
> -               " default is 0\n");
> +               " default is %d\n", DEFAULT_GROUP);
>
>         printf("To set flow items:\n");
>         printf("  --ether: add ether layer in flow items\n");
> @@ -527,7 +530,8 @@ args_parse(int argc, char **argv)
>         static const struct option lgopts[] = {
>                 /* Control */
>                 { "help",                       0, 0, 0 },
> -               { "flows-count",                1, 0, 0 },
> +               { "rules-count",                1, 0, 0 },
> +               { "rules-batch",                1, 0, 0 },
>                 { "dump-iterations",            0, 0, 0 },
>                 { "deletion-rate",              0, 0, 0 },
>                 { "dump-socket-mem",            0, 0, 0 },
> @@ -705,16 +709,26 @@ args_parse(int argc, char **argv)
>                         }
>                         /* Control */
>                         if (strcmp(lgopts[opt_idx].name,
> -                                       "flows-count") == 0) {
> +                                       "rules-batch") == 0) {
>                                 n = atoi(optarg);
> -                               if (n > (int) iterations_number)
> -                                       flows_count = n;
> +                               if (n >= DEFAULT_RULES_BATCH)
> +                                       rules_batch = n;
>                                 else {
> -                                       printf("\n\nflows_count should be
> > %d\n",
> -                                               iterations_number);
> +                                       printf("\n\nrules_batch should be
> >= %d\n",
> +                                               DEFAULT_RULES_BATCH);
>                                         rte_exit(EXIT_SUCCESS, " ");
>                                 }
>                         }
> +                       if (strcmp(lgopts[opt_idx].name,
> +                                       "rules-count") == 0) {
> +                               n = atoi(optarg);
> +                               if (n >= (int) rules_batch)
> +                                       rules_count = n;
> +                               else {
> +                                       printf("\n\nrules_count should be
> >= %d\n",
> +                                               rules_batch);
> +                               }
> +                       }
>                         if (strcmp(lgopts[opt_idx].name,
>                                         "dump-iterations") == 0)
>                                 dump_iterations = true;
> @@ -826,13 +840,13 @@ destroy_flows(int port_id, struct rte_flow
> **flow_list)
>         for (i = 0; i < MAX_ITERATIONS; i++)
>                 cpu_time_per_iter[i] = -1;
>
> -       if (iterations_number > flows_count)
> -               iterations_number = flows_count;
> +       if (rules_batch > rules_count)
> +               rules_batch = rules_count;
>
>         /* Deletion Rate */
>         printf("Flows Deletion on port = %d\n", port_id);
>         start_iter = clock();
> -       for (i = 0; i < flows_count; i++) {
> +       for (i = 0; i < rules_count; i++) {
>                 if (flow_list[i] == 0)
>                         break;
>
> @@ -842,11 +856,11 @@ destroy_flows(int port_id, struct rte_flow
> **flow_list)
>                         rte_exit(EXIT_FAILURE, "Error in deleting flow");
>                 }
>
> -               if (i && !((i + 1) % iterations_number)) {
> +               if (i && !((i + 1) % rules_batch)) {
>                         /* Save the deletion rate of each iter */
>                         end_iter = clock();
>                         delta = (double) (end_iter - start_iter);
> -                       iter_id = ((i + 1) / iterations_number) - 1;
> +                       iter_id = ((i + 1) / rules_batch) - 1;
>                         cpu_time_per_iter[iter_id] =
>                                 delta / CLOCKS_PER_SEC;
>                         cpu_time_used += cpu_time_per_iter[iter_id];
> @@ -859,21 +873,21 @@ destroy_flows(int port_id, struct rte_flow
> **flow_list)
>                 for (i = 0; i < MAX_ITERATIONS; i++) {
>                         if (cpu_time_per_iter[i] == -1)
>                                 continue;
> -                       delta = (double)(iterations_number /
> +                       delta = (double)(rules_batch /
>                                 cpu_time_per_iter[i]);
>                         flows_rate = delta / 1000;
>                         printf(":: Iteration #%d: %d flows "
>                                 "in %f sec[ Rate = %f K/Sec ]\n",
> -                               i, iterations_number,
> +                               i, rules_batch,
>                                 cpu_time_per_iter[i], flows_rate);
>                 }
>
>         /* Deletion rate for all flows */
> -       flows_rate = ((double) (flows_count / cpu_time_used) / 1000);
> +       flows_rate = ((double) (rules_count / cpu_time_used) / 1000);
>         printf("\n:: Total flow deletion rate -> %f K/Sec\n",
>                 flows_rate);
>         printf(":: The time for deleting %d in flows %f seconds\n",
> -               flows_count, cpu_time_used);
> +               rules_count, cpu_time_used);
>  }
>
>  static inline void
> @@ -902,13 +916,13 @@ flows_handler(void)
>         for (i = 0; i < MAX_ITERATIONS; i++)
>                 cpu_time_per_iter[i] = -1;
>
> -       if (iterations_number > flows_count)
> -               iterations_number = flows_count;
> +       if (rules_batch > rules_count)
> +               rules_batch = rules_count;
>
> -       printf(":: Flows Count per port: %d\n", flows_count);
> +       printf(":: Flows Count per port: %d\n", rules_count);
>
>         flow_list = rte_zmalloc("flow_list",
> -               (sizeof(struct rte_flow *) * flows_count) + 1, 0);
> +               (sizeof(struct rte_flow *) * rules_count) + 1, 0);
>         if (flow_list == NULL)
>                 rte_exit(EXIT_FAILURE, "No Memory available!");
>
> @@ -941,7 +955,7 @@ flows_handler(void)
>                 /* Insertion Rate */
>                 printf("Flows insertion on port = %d\n", port_id);
>                 start_iter = clock();
> -               for (i = 0; i < flows_count; i++) {
> +               for (i = 0; i < rules_count; i++) {
>                         flow = generate_flow(port_id, flow_group,
>                                 flow_attrs, flow_items, flow_actions,
>                                 JUMP_ACTION_TABLE, i,
> @@ -950,7 +964,7 @@ flows_handler(void)
>                                 &error);
>
>                         if (force_quit)
> -                               i = flows_count;
> +                               i = rules_count;
>
>                         if (!flow) {
>                                 print_flow_error(error);
> @@ -959,11 +973,11 @@ flows_handler(void)
>
>                         flow_list[flow_index++] = flow;
>
> -                       if (i && !((i + 1) % iterations_number)) {
> +                       if (i && !((i + 1) % rules_batch)) {
>                                 /* Save the insertion rate of each iter */
>                                 end_iter = clock();
>                                 delta = (double) (end_iter - start_iter);
> -                               iter_id = ((i + 1) / iterations_number) -
> 1;
> +                               iter_id = ((i + 1) / rules_batch) - 1;
>                                 cpu_time_per_iter[iter_id] =
>                                         delta / CLOCKS_PER_SEC;
>                                 cpu_time_used +=
> cpu_time_per_iter[iter_id];
> @@ -976,21 +990,21 @@ flows_handler(void)
>                         for (i = 0; i < MAX_ITERATIONS; i++) {
>                                 if (cpu_time_per_iter[i] == -1)
>                                         continue;
> -                               delta = (double)(iterations_number /
> +                               delta = (double)(rules_batch /
>                                         cpu_time_per_iter[i]);
>                                 flows_rate = delta / 1000;
>                                 printf(":: Iteration #%d: %d flows "
>                                         "in %f sec[ Rate = %f K/Sec ]\n",
> -                                       i, iterations_number,
> +                                       i, rules_batch,
>                                         cpu_time_per_iter[i], flows_rate);
>                         }
>
>                 /* Insertion rate for all flows */
> -               flows_rate = ((double) (flows_count / cpu_time_used) /
> 1000);
> +               flows_rate = ((double) (rules_count / cpu_time_used) /
> 1000);
>                 printf("\n:: Total flow insertion rate -> %f K/Sec\n",
>                                                 flows_rate);
>                 printf(":: The time for creating %d in flows %f seconds\n",
> -                                               flows_count,
> cpu_time_used);
> +                                               rules_count,
> cpu_time_used);
>
>                 if (delete_flag)
>                         destroy_flows(port_id, flow_list);
> @@ -1415,11 +1429,11 @@ main(int argc, char **argv)
>
>         force_quit = false;
>         dump_iterations = false;
> -       flows_count = DEFAULT_RULES_COUNT;
> -       iterations_number = DEFAULT_ITERATION;
> +       rules_count = DEFAULT_RULES_COUNT;
> +       rules_batch = DEFAULT_RULES_BATCH;
>         delete_flag = false;
>         dump_socket_mem_flag = false;
> -       flow_group = 0;
> +       flow_group = DEFAULT_GROUP;
>
>         signal(SIGINT, signal_handler);
>         signal(SIGTERM, signal_handler);
> diff --git a/doc/guides/tools/flow-perf.rst
> b/doc/guides/tools/flow-perf.rst
> index 7e5dc0c54..018358ac1 100644
> --- a/doc/guides/tools/flow-perf.rst
> +++ b/doc/guides/tools/flow-perf.rst
> @@ -5,19 +5,25 @@ Flow Performance Tool
>  =====================
>
>  Application for rte_flow performance testing.
> -The application provide the ability to test insertion rate of specific
> -rte_flow rule, by stressing it to the NIC, and calculate the insertion
> -rate.
> +The application provides the ability to test insertion rate of specific
> +rte_flow rule, by stressing it to the NIC, and calculates the insertion
> +and deletion rates.
>
> -The application offers some options in the command line, to configure
> -which rule to apply.
> +The application allows to configure which rule to apply through several
> +options of the command line.
>
>  After that the application will start producing rules with same pattern
>  but increasing the outer IP source address by 1 each time, thus it will
>  give different flow each time, and all other items will have open masks.
>
> -The application also provide the ability to measure rte flow deletion
> rate,
> -in addition to memory consumption before and after the flows creation.
> +To assess the rule insertion rate, the flow performance tool breaks
> +down the entire number of flow rule operations into windows of fixed size
> +(defaults to 100000 flow rule operations per window, but can be
> configured).
> +Then, the flow performance tool measures the total time per window and
> +computes an average time across all windows.
> +
> +The application also provides the ability to measure rte flow deletion
> rate,
> +in addition to memory consumption before and after the flow rules'
> creation.
>
>  The app supports single and multi core performance measurements.
>
> @@ -59,21 +65,31 @@ with a ``--`` separator:
>
>  .. code-block:: console
>
> -       sudo ./dpdk-test-flow_perf -n 4 -w 08:00.0 -- --ingress --ether
> --ipv4 --queue --flows-count=1000000
> +       sudo ./dpdk-test-flow_perf -n 4 -w 08:00.0 -- --ingress --ether
> --ipv4 --queue --rules-count=1000000
>
>  The command line options are:
>
>  *      ``--help``
>         Display a help message and quit.
>
> -*      ``--flows-count=N``
> -       Set the number of needed flows to insert,
> -       where 1 <= N <= "number of flows".
> +*      ``--rules-count=N``
> +       Set the total number of flow rules to insert,
> +       where 1 <= N <= "number of flow rules".
>         The default value is 4,000,000.
>
> +*      ``--rules-batch=N``
> +       Set the number of flow rules to insert per iteration window,
> +       where 1 <= N <= "number of flow rules per iteration window".
> +       The default value is 100,000 flow rules per iteration window.
> +       For a total of --rules-count=1000000 flow rules to be inserted
> +       and an iteration window size of --rules-batch=100000 flow rules,
> +       the application will measure the insertion rate 10 times
> +       (i.e., once every 100000 flow rules) and then report an average
> +       insertion rate across the 10 measurements.
> +
>  *      ``--dump-iterations``
> -       Print rates for each iteration of flows.
> -       Default iteration is 1,00,000.
> +       Print rates for each iteration window.
> +       Default iteration window equals to the rules-batch size (i.e.,
> 100,000).
>
>  *      ``--deletion-rate``
>         Enable deletion rate calculations.
> --
> 2.17.1
>
>
  
Wisam Monther Nov. 4, 2020, 8:04 a.m. UTC | #2
Hi,

You can add my ack:

Acked-by: Wisam Jaddo <wisamm@nvidia.com<mailto:wisamm@nvidia.com>>

Thomas,
Do you have comments left here?

BRs,
Wisam Jaddo

From: Georgios Katsikas <katsikas.gp@gmail.com>
Sent: Tuesday, November 3, 2020 1:26 PM
To: wisamm@mellanox.com
Cc: dev@dpdk.org
Subject: Re: [PATCH] app/flow-perf: configurable rule batches

Hi,

Any news on this patch?
Is there anything else I could do?

Thanks,
Georgios
  
Georgios Katsikas Nov. 4, 2020, 11:11 a.m. UTC | #3
Hi,

Is what you are asking possible with a simple git commit --amend?

Thanks,
Georgios

On Wed, Nov 4, 2020 at 10:04 AM Wisam Monther <wisamm@nvidia.com> wrote:

> Hi,
>
>
>
> You can add my ack:
>
> Acked-by: Wisam Jaddo <wisamm@nvidia.com>
>
>
>
> Thomas,
>
> Do you have comments left here?
>
>
>
> BRs,
>
> Wisam Jaddo
>
>
>
> *From:* Georgios Katsikas <katsikas.gp@gmail.com>
> *Sent:* Tuesday, November 3, 2020 1:26 PM
> *To:* wisamm@mellanox.com
> *Cc:* dev@dpdk.org
> *Subject:* Re: [PATCH] app/flow-perf: configurable rule batches
>
>
>
> Hi,
>
>
>
> Any news on this patch?
>
> Is there anything else I could do?
>
>
>
> Thanks,
>
> Georgios
>
>
>
>
>
>
  
Wisam Monther Nov. 4, 2020, 11:25 a.m. UTC | #4
I think Thomas can add it if he want to merge it in this version.

Moreover I think you need to keep the ack between versions and the person who acked before should comment if newer version not ok with him, otherwise he is ok with it. “Using git commit —amend”.

Thomas, correct me if I’m wrong, At least this how I understood the procedure here.

Moreover I noticed that you sending the versions as new patch all the time, but for history and comments tracking you should send with versions. V2-v3 ...etc with reply to older version of the patch.

BRs,
Wisam Jaddo

Get Outlook for iOS<https://aka.ms/o0ukef>
  
Thomas Monjalon Nov. 4, 2020, 8:46 p.m. UTC | #5
04/11/2020 09:04, Wisam Monther:
> Hi,
> 
> You can add my ack:
> 
> Acked-by: Wisam Jaddo <wisamm@nvidia.com<mailto:wisamm@nvidia.com>>
> 
> Thomas,
> Do you have comments left here?

No comment, it looks very good.

Applied, thanks
  

Patch

diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index c420da6a5..4cdab2c93 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -40,7 +40,8 @@ 
 
 #define MAX_ITERATIONS             100
 #define DEFAULT_RULES_COUNT    4000000
-#define DEFAULT_ITERATION       100000
+#define DEFAULT_RULES_BATCH     100000
+#define DEFAULT_GROUP                0
 
 struct rte_flow *flow;
 static uint8_t flow_group;
@@ -62,8 +63,8 @@  static bool enable_fwd;
 
 static struct rte_mempool *mbuf_mp;
 static uint32_t nb_lcores;
-static uint32_t flows_count;
-static uint32_t iterations_number;
+static uint32_t rules_count;
+static uint32_t rules_batch;
 static uint32_t hairpin_queues_num; /* total hairpin q number - default: 0 */
 static uint32_t nb_lcores;
 
@@ -98,8 +99,10 @@  usage(char *progname)
 {
 	printf("\nusage: %s\n", progname);
 	printf("\nControl configurations:\n");
-	printf("  --flows-count=N: to set the number of needed"
-		" flows to insert, default is 4,000,000\n");
+	printf("  --rules-count=N: to set the number of needed"
+		" rules to insert, default is %d\n", DEFAULT_RULES_COUNT);
+	printf("  --rules-batch=N: set number of batched rules,"
+		" default is %d\n", DEFAULT_RULES_BATCH);
 	printf("  --dump-iterations: To print rates for each"
 		" iteration\n");
 	printf("  --deletion-rate: Enable deletion rate"
@@ -114,7 +117,7 @@  usage(char *progname)
 	printf("  --egress: set egress attribute in flows\n");
 	printf("  --transfer: set transfer attribute in flows\n");
 	printf("  --group=N: set group for all flows,"
-		" default is 0\n");
+		" default is %d\n", DEFAULT_GROUP);
 
 	printf("To set flow items:\n");
 	printf("  --ether: add ether layer in flow items\n");
@@ -527,7 +530,8 @@  args_parse(int argc, char **argv)
 	static const struct option lgopts[] = {
 		/* Control */
 		{ "help",                       0, 0, 0 },
-		{ "flows-count",                1, 0, 0 },
+		{ "rules-count",                1, 0, 0 },
+		{ "rules-batch",                1, 0, 0 },
 		{ "dump-iterations",            0, 0, 0 },
 		{ "deletion-rate",              0, 0, 0 },
 		{ "dump-socket-mem",            0, 0, 0 },
@@ -705,16 +709,26 @@  args_parse(int argc, char **argv)
 			}
 			/* Control */
 			if (strcmp(lgopts[opt_idx].name,
-					"flows-count") == 0) {
+					"rules-batch") == 0) {
 				n = atoi(optarg);
-				if (n > (int) iterations_number)
-					flows_count = n;
+				if (n >= DEFAULT_RULES_BATCH)
+					rules_batch = n;
 				else {
-					printf("\n\nflows_count should be > %d\n",
-						iterations_number);
+					printf("\n\nrules_batch should be >= %d\n",
+						DEFAULT_RULES_BATCH);
 					rte_exit(EXIT_SUCCESS, " ");
 				}
 			}
+			if (strcmp(lgopts[opt_idx].name,
+					"rules-count") == 0) {
+				n = atoi(optarg);
+				if (n >= (int) rules_batch)
+					rules_count = n;
+				else {
+					printf("\n\nrules_count should be >= %d\n",
+						rules_batch);
+				}
+			}
 			if (strcmp(lgopts[opt_idx].name,
 					"dump-iterations") == 0)
 				dump_iterations = true;
@@ -826,13 +840,13 @@  destroy_flows(int port_id, struct rte_flow **flow_list)
 	for (i = 0; i < MAX_ITERATIONS; i++)
 		cpu_time_per_iter[i] = -1;
 
-	if (iterations_number > flows_count)
-		iterations_number = flows_count;
+	if (rules_batch > rules_count)
+		rules_batch = rules_count;
 
 	/* Deletion Rate */
 	printf("Flows Deletion on port = %d\n", port_id);
 	start_iter = clock();
-	for (i = 0; i < flows_count; i++) {
+	for (i = 0; i < rules_count; i++) {
 		if (flow_list[i] == 0)
 			break;
 
@@ -842,11 +856,11 @@  destroy_flows(int port_id, struct rte_flow **flow_list)
 			rte_exit(EXIT_FAILURE, "Error in deleting flow");
 		}
 
-		if (i && !((i + 1) % iterations_number)) {
+		if (i && !((i + 1) % rules_batch)) {
 			/* Save the deletion rate of each iter */
 			end_iter = clock();
 			delta = (double) (end_iter - start_iter);
-			iter_id = ((i + 1) / iterations_number) - 1;
+			iter_id = ((i + 1) / rules_batch) - 1;
 			cpu_time_per_iter[iter_id] =
 				delta / CLOCKS_PER_SEC;
 			cpu_time_used += cpu_time_per_iter[iter_id];
@@ -859,21 +873,21 @@  destroy_flows(int port_id, struct rte_flow **flow_list)
 		for (i = 0; i < MAX_ITERATIONS; i++) {
 			if (cpu_time_per_iter[i] == -1)
 				continue;
-			delta = (double)(iterations_number /
+			delta = (double)(rules_batch /
 				cpu_time_per_iter[i]);
 			flows_rate = delta / 1000;
 			printf(":: Iteration #%d: %d flows "
 				"in %f sec[ Rate = %f K/Sec ]\n",
-				i, iterations_number,
+				i, rules_batch,
 				cpu_time_per_iter[i], flows_rate);
 		}
 
 	/* Deletion rate for all flows */
-	flows_rate = ((double) (flows_count / cpu_time_used) / 1000);
+	flows_rate = ((double) (rules_count / cpu_time_used) / 1000);
 	printf("\n:: Total flow deletion rate -> %f K/Sec\n",
 		flows_rate);
 	printf(":: The time for deleting %d in flows %f seconds\n",
-		flows_count, cpu_time_used);
+		rules_count, cpu_time_used);
 }
 
 static inline void
@@ -902,13 +916,13 @@  flows_handler(void)
 	for (i = 0; i < MAX_ITERATIONS; i++)
 		cpu_time_per_iter[i] = -1;
 
-	if (iterations_number > flows_count)
-		iterations_number = flows_count;
+	if (rules_batch > rules_count)
+		rules_batch = rules_count;
 
-	printf(":: Flows Count per port: %d\n", flows_count);
+	printf(":: Flows Count per port: %d\n", rules_count);
 
 	flow_list = rte_zmalloc("flow_list",
-		(sizeof(struct rte_flow *) * flows_count) + 1, 0);
+		(sizeof(struct rte_flow *) * rules_count) + 1, 0);
 	if (flow_list == NULL)
 		rte_exit(EXIT_FAILURE, "No Memory available!");
 
@@ -941,7 +955,7 @@  flows_handler(void)
 		/* Insertion Rate */
 		printf("Flows insertion on port = %d\n", port_id);
 		start_iter = clock();
-		for (i = 0; i < flows_count; i++) {
+		for (i = 0; i < rules_count; i++) {
 			flow = generate_flow(port_id, flow_group,
 				flow_attrs, flow_items, flow_actions,
 				JUMP_ACTION_TABLE, i,
@@ -950,7 +964,7 @@  flows_handler(void)
 				&error);
 
 			if (force_quit)
-				i = flows_count;
+				i = rules_count;
 
 			if (!flow) {
 				print_flow_error(error);
@@ -959,11 +973,11 @@  flows_handler(void)
 
 			flow_list[flow_index++] = flow;
 
-			if (i && !((i + 1) % iterations_number)) {
+			if (i && !((i + 1) % rules_batch)) {
 				/* Save the insertion rate of each iter */
 				end_iter = clock();
 				delta = (double) (end_iter - start_iter);
-				iter_id = ((i + 1) / iterations_number) - 1;
+				iter_id = ((i + 1) / rules_batch) - 1;
 				cpu_time_per_iter[iter_id] =
 					delta / CLOCKS_PER_SEC;
 				cpu_time_used += cpu_time_per_iter[iter_id];
@@ -976,21 +990,21 @@  flows_handler(void)
 			for (i = 0; i < MAX_ITERATIONS; i++) {
 				if (cpu_time_per_iter[i] == -1)
 					continue;
-				delta = (double)(iterations_number /
+				delta = (double)(rules_batch /
 					cpu_time_per_iter[i]);
 				flows_rate = delta / 1000;
 				printf(":: Iteration #%d: %d flows "
 					"in %f sec[ Rate = %f K/Sec ]\n",
-					i, iterations_number,
+					i, rules_batch,
 					cpu_time_per_iter[i], flows_rate);
 			}
 
 		/* Insertion rate for all flows */
-		flows_rate = ((double) (flows_count / cpu_time_used) / 1000);
+		flows_rate = ((double) (rules_count / cpu_time_used) / 1000);
 		printf("\n:: Total flow insertion rate -> %f K/Sec\n",
 						flows_rate);
 		printf(":: The time for creating %d in flows %f seconds\n",
-						flows_count, cpu_time_used);
+						rules_count, cpu_time_used);
 
 		if (delete_flag)
 			destroy_flows(port_id, flow_list);
@@ -1415,11 +1429,11 @@  main(int argc, char **argv)
 
 	force_quit = false;
 	dump_iterations = false;
-	flows_count = DEFAULT_RULES_COUNT;
-	iterations_number = DEFAULT_ITERATION;
+	rules_count = DEFAULT_RULES_COUNT;
+	rules_batch = DEFAULT_RULES_BATCH;
 	delete_flag = false;
 	dump_socket_mem_flag = false;
-	flow_group = 0;
+	flow_group = DEFAULT_GROUP;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
diff --git a/doc/guides/tools/flow-perf.rst b/doc/guides/tools/flow-perf.rst
index 7e5dc0c54..018358ac1 100644
--- a/doc/guides/tools/flow-perf.rst
+++ b/doc/guides/tools/flow-perf.rst
@@ -5,19 +5,25 @@  Flow Performance Tool
 =====================
 
 Application for rte_flow performance testing.
-The application provide the ability to test insertion rate of specific
-rte_flow rule, by stressing it to the NIC, and calculate the insertion
-rate.
+The application provides the ability to test insertion rate of specific
+rte_flow rule, by stressing it to the NIC, and calculates the insertion
+and deletion rates.
 
-The application offers some options in the command line, to configure
-which rule to apply.
+The application allows to configure which rule to apply through several
+options of the command line.
 
 After that the application will start producing rules with same pattern
 but increasing the outer IP source address by 1 each time, thus it will
 give different flow each time, and all other items will have open masks.
 
-The application also provide the ability to measure rte flow deletion rate,
-in addition to memory consumption before and after the flows creation.
+To assess the rule insertion rate, the flow performance tool breaks
+down the entire number of flow rule operations into windows of fixed size
+(defaults to 100000 flow rule operations per window, but can be configured).
+Then, the flow performance tool measures the total time per window and
+computes an average time across all windows.
+
+The application also provides the ability to measure rte flow deletion rate,
+in addition to memory consumption before and after the flow rules' creation.
 
 The app supports single and multi core performance measurements.
 
@@ -59,21 +65,31 @@  with a ``--`` separator:
 
 .. code-block:: console
 
-	sudo ./dpdk-test-flow_perf -n 4 -w 08:00.0 -- --ingress --ether --ipv4 --queue --flows-count=1000000
+	sudo ./dpdk-test-flow_perf -n 4 -w 08:00.0 -- --ingress --ether --ipv4 --queue --rules-count=1000000
 
 The command line options are:
 
 *	``--help``
 	Display a help message and quit.
 
-*	``--flows-count=N``
-	Set the number of needed flows to insert,
-	where 1 <= N <= "number of flows".
+*	``--rules-count=N``
+	Set the total number of flow rules to insert,
+	where 1 <= N <= "number of flow rules".
 	The default value is 4,000,000.
 
+*	``--rules-batch=N``
+	Set the number of flow rules to insert per iteration window,
+	where 1 <= N <= "number of flow rules per iteration window".
+	The default value is 100,000 flow rules per iteration window.
+	For a total of --rules-count=1000000 flow rules to be inserted
+	and an iteration window size of --rules-batch=100000 flow rules,
+	the application will measure the insertion rate 10 times
+	(i.e., once every 100000 flow rules) and then report an average
+	insertion rate across the 10 measurements.
+
 *	``--dump-iterations``
-	Print rates for each iteration of flows.
-	Default iteration is 1,00,000.
+	Print rates for each iteration window.
+	Default iteration window equals to the rules-batch size (i.e., 100,000).
 
 *	``--deletion-rate``
 	Enable deletion rate calculations.