From patchwork Thu Nov 26 11:15:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wisam Monther X-Patchwork-Id: 84571 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E10FAA052A; Thu, 26 Nov 2020 12:16:23 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4BFA2C970; Thu, 26 Nov 2020 12:16:06 +0100 (CET) Received: from hqnvemgate24.nvidia.com (hqnvemgate24.nvidia.com [216.228.121.143]) by dpdk.org (Postfix) with ESMTP id 7814FC950 for ; Thu, 26 Nov 2020 12:16:04 +0100 (CET) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Thu, 26 Nov 2020 03:16:09 -0800 Received: from nvidia.com (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 26 Nov 2020 11:16:00 +0000 From: Wisam Jaddo To: , , , CC: Date: Thu, 26 Nov 2020 13:15:40 +0200 Message-ID: <20201126111543.16928-2-wisamm@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201126111543.16928-1-wisamm@nvidia.com> References: <20201126111543.16928-1-wisamm@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1606389369; bh=TrD9KEX1yfTgQ6RDM6fA5bD0Tp1PWdKumFzSkekrwdc=; h=From:To:CC:Subject:Date:Message-ID:X-Mailer:In-Reply-To: References:MIME-Version:Content-Type:X-Originating-IP: X-ClientProxiedBy; b=RrH8zltHZ6mewn5ZSqtD8wov8EvUgfXU9AVygbh5kQYChjrOGFNH2UVOxsDtuJQRY Go/ucJ4+DfYPPEOVFye1l2+OEU3sqnJN8I8hGiOhwTteHCX+KSAJELkhUpmqElO/9I D+fxXzSxkrYb2IHOBt/3qXleUXzhbW1pkItl5gR0pbcXRvevdBELsKpTsdoXYtOt6F Zxmp+5qnRYCmtBauia6jtpOMo+Nq5TGAJprlFJtxD87k5eWO75DJYcRcYT5/xziT5b WtR/2XcHi7lEF3pH65SB8ZLqr8CKRog57NEndbLeTVSN3IZcWYMzQg+AShTlNq/pbh A16tRa1LtK8Fw== Subject: [dpdk-dev] [PATCH 1/4] app/flow-perf: refactor flows handler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Provide the flows_handler() function the ability to control flow performance processes. It is made possible after the introduction of the insert_flows() function. Also provide to the flows_handler() function the ability to print the DPDK layer memory consumption of rte_flow rule, regardless if deletion feature is enabled or not, while in previous solution it was printing all memory changes after flows_handler(). Thus if deletion is there, it will not provide any memory that represents the rte_flow rule size. Also current design is easier to read and understand. Signed-off-by: Wisam Jaddo Reviewed-by: Alexander Kozyrev Reviewed-by: Suanming Mou --- app/test-flow-perf/main.c | 300 ++++++++++++++++++++------------------ 1 file changed, 158 insertions(+), 142 deletions(-) diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c index e2fc5b7f65..5ec9a15c61 100644 --- a/app/test-flow-perf/main.c +++ b/app/test-flow-perf/main.c @@ -38,7 +38,7 @@ #include "config.h" #include "flow_gen.h" -#define MAX_ITERATIONS 100 +#define MAX_BATCHES_COUNT 100 #define DEFAULT_RULES_COUNT 4000000 #define DEFAULT_RULES_BATCH 100000 #define DEFAULT_GROUP 0 @@ -826,188 +826,210 @@ print_flow_error(struct rte_flow_error error) } static inline void -destroy_flows(int port_id, struct rte_flow **flow_list) +print_rules_batches(double *cpu_time_per_batch) +{ + uint8_t idx; + double delta; + double rate; + + for (idx = 0; idx < MAX_BATCHES_COUNT; idx++) { + if (!cpu_time_per_batch[idx]) + break; + delta = (double)(rules_batch / cpu_time_per_batch[idx]); + rate = delta / 1000; /* Save rate in K unit. */ + printf(":: Rules batch #%d: %d rules " + "in %f sec[ Rate = %f K Rule/Sec ]\n", + idx, rules_batch, + cpu_time_per_batch[idx], rate); + } +} + +static inline void +destroy_flows(int port_id, struct rte_flow **flows_list) { struct rte_flow_error error; - clock_t start_iter, end_iter; + clock_t start_batch, end_batch; double cpu_time_used = 0; - double flows_rate; - double cpu_time_per_iter[MAX_ITERATIONS]; + double deletion_rate; + double cpu_time_per_batch[MAX_BATCHES_COUNT] = { 0 }; double delta; uint32_t i; - int iter_id; - - for (i = 0; i < MAX_ITERATIONS; i++) - cpu_time_per_iter[i] = -1; - - if (rules_batch > rules_count) - rules_batch = rules_count; + int rules_batch_idx; /* Deletion Rate */ - printf("Flows Deletion on port = %d\n", port_id); - start_iter = clock(); + printf("\nRules Deletion on port = %d\n", port_id); + + start_batch = clock(); for (i = 0; i < rules_count; i++) { - if (flow_list[i] == 0) + if (flows_list[i] == 0) break; memset(&error, 0x33, sizeof(error)); - if (rte_flow_destroy(port_id, flow_list[i], &error)) { + if (rte_flow_destroy(port_id, flows_list[i], &error)) { print_flow_error(error); rte_exit(EXIT_FAILURE, "Error in deleting flow"); } - if (i && !((i + 1) % rules_batch)) { - /* Save the deletion rate of each iter */ - end_iter = clock(); - delta = (double) (end_iter - start_iter); - iter_id = ((i + 1) / rules_batch) - 1; - cpu_time_per_iter[iter_id] = - delta / CLOCKS_PER_SEC; - cpu_time_used += cpu_time_per_iter[iter_id]; - start_iter = clock(); + /* + * Save the deletion rate for rules batch. + * Check if the deletion reached the rules + * patch counter, then save the deletion rate + * for this batch. + */ + if (!((i + 1) % rules_batch)) { + end_batch = clock(); + delta = (double) (end_batch - start_batch); + rules_batch_idx = ((i + 1) / rules_batch) - 1; + cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC; + cpu_time_used += cpu_time_per_batch[rules_batch_idx]; + start_batch = clock(); } } - /* Deletion rate per iteration */ + /* Print deletion rates for all batches */ if (dump_iterations) - for (i = 0; i < MAX_ITERATIONS; i++) { - if (cpu_time_per_iter[i] == -1) - continue; - delta = (double)(rules_batch / - cpu_time_per_iter[i]); - flows_rate = delta / 1000; - printf(":: Iteration #%d: %d flows " - "in %f sec[ Rate = %f K/Sec ]\n", - i, rules_batch, - cpu_time_per_iter[i], flows_rate); - } + print_rules_batches(cpu_time_per_batch); - /* Deletion rate for all flows */ - flows_rate = ((double) (rules_count / cpu_time_used) / 1000); - printf("\n:: Total flow deletion rate -> %f K/Sec\n", - flows_rate); - printf(":: The time for deleting %d in flows %f seconds\n", + /* Deletion rate for all rules */ + deletion_rate = ((double) (rules_count / cpu_time_used) / 1000); + printf(":: Total rules deletion rate -> %f K Rule/Sec\n", + deletion_rate); + printf(":: The time for deleting %d in rules %f seconds\n", rules_count, cpu_time_used); } -static inline void -flows_handler(void) +static struct rte_flow ** +insert_flows(int port_id) { - struct rte_flow **flow_list; + struct rte_flow **flows_list; struct rte_flow_error error; - clock_t start_iter, end_iter; + clock_t start_batch, end_batch; double cpu_time_used; - double flows_rate; - double cpu_time_per_iter[MAX_ITERATIONS]; + double insertion_rate; + double cpu_time_per_batch[MAX_BATCHES_COUNT] = { 0 }; double delta; - uint16_t nr_ports; - uint32_t i; - int port_id; - int iter_id; uint32_t flow_index; + uint32_t counter; uint64_t global_items[MAX_ITEMS_NUM] = { 0 }; uint64_t global_actions[MAX_ACTIONS_NUM] = { 0 }; + int rules_batch_idx; global_items[0] = FLOW_ITEM_MASK(RTE_FLOW_ITEM_TYPE_ETH); global_actions[0] = FLOW_ITEM_MASK(RTE_FLOW_ACTION_TYPE_JUMP); - nr_ports = rte_eth_dev_count_avail(); + flows_list = rte_zmalloc("flows_list", + (sizeof(struct rte_flow *) * rules_count) + 1, 0); + if (flows_list == NULL) + rte_exit(EXIT_FAILURE, "No Memory available!"); + + cpu_time_used = 0; + flow_index = 0; + if (flow_group > 0) { + /* + * Create global rule to jump into flow_group, + * this way the app will avoid the default rules. + * + * Global rule: + * group 0 eth / end actions jump group + */ + flow = generate_flow(port_id, 0, flow_attrs, + global_items, global_actions, + flow_group, 0, 0, 0, 0, &error); + + if (flow == NULL) { + print_flow_error(error); + rte_exit(EXIT_FAILURE, "error in creating flow"); + } + flows_list[flow_index++] = flow; + } + + /* Insertion Rate */ + printf("Rules insertion on port = %d\n", port_id); + start_batch = clock(); + for (counter = 0; counter < rules_count; counter++) { + flow = generate_flow(port_id, flow_group, + flow_attrs, flow_items, flow_actions, + JUMP_ACTION_TABLE, counter, + hairpin_queues_num, + encap_data, decap_data, + &error); + + if (force_quit) + counter = rules_count; + + if (!flow) { + print_flow_error(error); + rte_exit(EXIT_FAILURE, "error in creating flow"); + } - for (i = 0; i < MAX_ITERATIONS; i++) - cpu_time_per_iter[i] = -1; + flows_list[flow_index++] = flow; + + /* + * Save the insertion rate for rules batch. + * Check if the insertion reached the rules + * patch counter, then save the insertion rate + * for this batch. + */ + if (!((counter + 1) % rules_batch)) { + end_batch = clock(); + delta = (double) (end_batch - start_batch); + rules_batch_idx = ((counter + 1) / rules_batch) - 1; + cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC; + cpu_time_used += cpu_time_per_batch[rules_batch_idx]; + start_batch = clock(); + } + } + + /* Print insertion rates for all batches */ + if (dump_iterations) + print_rules_batches(cpu_time_per_batch); + + /* Insertion rate for all rules */ + insertion_rate = ((double) (rules_count / cpu_time_used) / 1000); + printf(":: Total flow insertion rate -> %f K Rule/Sec\n", + insertion_rate); + printf(":: The time for creating %d in flows %f seconds\n", + rules_count, cpu_time_used); + + return flows_list; +} + +static inline void +flows_handler(void) +{ + struct rte_flow **flows_list; + uint16_t nr_ports; + int64_t alloc, last_alloc; + int flow_size_in_bytes; + int port_id; + + nr_ports = rte_eth_dev_count_avail(); if (rules_batch > rules_count) rules_batch = rules_count; - printf(":: Flows Count per port: %d\n", rules_count); - - flow_list = rte_zmalloc("flow_list", - (sizeof(struct rte_flow *) * rules_count) + 1, 0); - if (flow_list == NULL) - rte_exit(EXIT_FAILURE, "No Memory available!"); + printf(":: Rules Count per port: %d\n\n", rules_count); for (port_id = 0; port_id < nr_ports; port_id++) { /* If port outside portmask */ if (!((ports_mask >> port_id) & 0x1)) continue; - cpu_time_used = 0; - flow_index = 0; - if (flow_group > 0) { - /* - * Create global rule to jump into flow_group, - * this way the app will avoid the default rules. - * - * Global rule: - * group 0 eth / end actions jump group - * - */ - flow = generate_flow(port_id, 0, flow_attrs, - global_items, global_actions, - flow_group, 0, 0, 0, 0, &error); - if (flow == NULL) { - print_flow_error(error); - rte_exit(EXIT_FAILURE, "error in creating flow"); - } - flow_list[flow_index++] = flow; - } + /* Insertion part. */ + last_alloc = (int64_t)dump_socket_mem(stdout); + flows_list = insert_flows(port_id); + alloc = (int64_t)dump_socket_mem(stdout); - /* Insertion Rate */ - printf("Flows insertion on port = %d\n", port_id); - start_iter = clock(); - for (i = 0; i < rules_count; i++) { - flow = generate_flow(port_id, flow_group, - flow_attrs, flow_items, flow_actions, - JUMP_ACTION_TABLE, i, - hairpin_queues_num, - encap_data, decap_data, - &error); - - if (force_quit) - i = rules_count; - - if (!flow) { - print_flow_error(error); - rte_exit(EXIT_FAILURE, "error in creating flow"); - } + /* Deletion part. */ + if (delete_flag) + destroy_flows(port_id, flows_list); - flow_list[flow_index++] = flow; - - if (i && !((i + 1) % rules_batch)) { - /* Save the insertion rate of each iter */ - end_iter = clock(); - delta = (double) (end_iter - start_iter); - iter_id = ((i + 1) / rules_batch) - 1; - cpu_time_per_iter[iter_id] = - delta / CLOCKS_PER_SEC; - cpu_time_used += cpu_time_per_iter[iter_id]; - start_iter = clock(); - } + /* Report rte_flow size in huge pages. */ + if (last_alloc) { + flow_size_in_bytes = (alloc - last_alloc) / rules_count; + printf("\n:: rte_flow size in DPDK layer: %d Bytes", + flow_size_in_bytes); } - - /* Iteration rate per iteration */ - if (dump_iterations) - for (i = 0; i < MAX_ITERATIONS; i++) { - if (cpu_time_per_iter[i] == -1) - continue; - delta = (double)(rules_batch / - cpu_time_per_iter[i]); - flows_rate = delta / 1000; - printf(":: Iteration #%d: %d flows " - "in %f sec[ Rate = %f K/Sec ]\n", - i, rules_batch, - cpu_time_per_iter[i], flows_rate); - } - - /* Insertion rate for all flows */ - flows_rate = ((double) (rules_count / cpu_time_used) / 1000); - printf("\n:: Total flow insertion rate -> %f K/Sec\n", - flows_rate); - printf(":: The time for creating %d in flows %f seconds\n", - rules_count, cpu_time_used); - - if (delete_flag) - destroy_flows(port_id, flow_list); } } @@ -1421,7 +1443,6 @@ main(int argc, char **argv) int ret; uint16_t port; struct rte_flow_error error; - int64_t alloc, last_alloc; ret = rte_eal_init(argc, argv); if (ret < 0) @@ -1449,13 +1470,7 @@ main(int argc, char **argv) if (nb_lcores <= 1) rte_exit(EXIT_FAILURE, "This app needs at least two cores\n"); - last_alloc = (int64_t)dump_socket_mem(stdout); flows_handler(); - alloc = (int64_t)dump_socket_mem(stdout); - - if (last_alloc) - fprintf(stdout, ":: Memory allocation change(M): %.6lf\n", - (alloc - last_alloc) / 1.0e6); if (enable_fwd) { init_lcore_info(); @@ -1468,5 +1483,6 @@ main(int argc, char **argv) printf("Failed to stop device on port %u\n", port); rte_eth_dev_close(port); } + printf("\nBye ...\n"); return 0; } From patchwork Thu Nov 26 11:15:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wisam Monther X-Patchwork-Id: 84572 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B1BB0A052A; Thu, 26 Nov 2020 12:16:47 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 07F40C99C; Thu, 26 Nov 2020 12:16:09 +0100 (CET) Received: from hqnvemgate25.nvidia.com (hqnvemgate25.nvidia.com [216.228.121.64]) by dpdk.org (Postfix) with ESMTP id 3E1E1C96C for ; Thu, 26 Nov 2020 12:16:06 +0100 (CET) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Thu, 26 Nov 2020 03:16:03 -0800 Received: from nvidia.com (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 26 Nov 2020 11:16:02 +0000 From: Wisam Jaddo To: , , , CC: Date: Thu, 26 Nov 2020 13:15:41 +0200 Message-ID: <20201126111543.16928-3-wisamm@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201126111543.16928-1-wisamm@nvidia.com> References: <20201126111543.16928-1-wisamm@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1606389363; bh=wW00TgXxNUV4L/5wvAttlT2ZcsNA+b3dHMydlgqBprs=; h=From:To:CC:Subject:Date:Message-ID:X-Mailer:In-Reply-To: References:MIME-Version:Content-Type:X-Originating-IP: X-ClientProxiedBy; b=MfOVPKlZo1SlG3lps0T5j+MbLsoM/uJD3fG4o545k/qxVWJ6SFy6h5utBaAMPX4Ys X5KiJ9iiDS/gIDKdSnXYS9gu3wOfdliCDRA6ZMbV956ci2PYB7jAHfAoSHdAsrwfXV R3HgXTu3WanZgS4EFHakZu32kxVARO15lARLIFU+WJFycRunM/eoahwByTqpjne/Gb H0g+rZSs1Ev2tL2zN7wfW3DkKteZ+5VfZBhOhYbR/ZOUw1I03U8VJQQXRzW+2JM4hW 0tqzweExctjvaBs8U4e4xcVVFrA0d974DZhzNtyWSf+SAQ/1i16UMZDiIYO6OJ1K5F jEDOcdgNQxiaw== Subject: [dpdk-dev] [PATCH 2/4] app/flow-perf: add multiple cores insertion and deletion X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" One of the ways to increase the insertion/deletion rate is to use multi-threaded insertion/deletion. Thus it's needed to have support for testing and measure those rates using flow-perf application. Now we generate cores and distribute all flows to those cores, and start inserting/deleting in parallel. The app now receive the cores count to use from command line option, then it distribute the rte_flow rules evenly between the cores, and start inserting/deleting. Each worker will report it's own results, and in the end the MAIN worker will report the total results for all cores. The total results are calculated using RULES_COUNT divided over max time used between all cores. Also this touches the memory area, since inserting using multiple cores in same time the pre solution for memory is not valid, thus now we save memory before and after each allocation for all cores. In the end we pick the min pre memory and the max post memory from all cores. The difference between those values represent the total memory consumed by the total rte_flow rules from all cores, and then report the total size of single rte_flow in byte for each port. How to use this feature: --cores=N Where 1 =< N <= RTE_MAX_LCORE Signed-off-by: Wisam Jaddo Reviewed-by: Alexander Kozyrev Reviewed-by: Suanming Mou --- app/test-flow-perf/actions_gen.c | 175 ++++++++++---------- app/test-flow-perf/actions_gen.h | 2 +- app/test-flow-perf/config.h | 1 + app/test-flow-perf/flow_gen.c | 5 +- app/test-flow-perf/flow_gen.h | 1 + app/test-flow-perf/items_gen.c | 103 ++++++------ app/test-flow-perf/items_gen.h | 2 +- app/test-flow-perf/main.c | 266 +++++++++++++++++++++++++------ doc/guides/tools/flow-perf.rst | 14 +- 9 files changed, 372 insertions(+), 197 deletions(-) diff --git a/app/test-flow-perf/actions_gen.c b/app/test-flow-perf/actions_gen.c index ac525f6fdb..1364407056 100644 --- a/app/test-flow-perf/actions_gen.c +++ b/app/test-flow-perf/actions_gen.c @@ -29,6 +29,7 @@ struct additional_para { uint32_t counter; uint64_t encap_data; uint64_t decap_data; + uint8_t core_idx; }; /* Storage for struct rte_flow_action_raw_encap including external data. */ @@ -58,16 +59,16 @@ add_mark(struct rte_flow_action *actions, uint8_t actions_counter, struct additional_para para) { - static struct rte_flow_action_mark mark_action; + static struct rte_flow_action_mark mark_actions[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t counter = para.counter; do { /* Random values from 1 to 256 */ - mark_action.id = (counter % 255) + 1; + mark_actions[para.core_idx].id = (counter % 255) + 1; } while (0); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_MARK; - actions[actions_counter].conf = &mark_action; + actions[actions_counter].conf = &mark_actions[para.core_idx]; } static void @@ -75,14 +76,14 @@ add_queue(struct rte_flow_action *actions, uint8_t actions_counter, struct additional_para para) { - static struct rte_flow_action_queue queue_action; + static struct rte_flow_action_queue queue_actions[RTE_MAX_LCORE] __rte_cache_aligned; do { - queue_action.index = para.queue; + queue_actions[para.core_idx].index = para.queue; } while (0); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_QUEUE; - actions[actions_counter].conf = &queue_action; + actions[actions_counter].conf = &queue_actions[para.core_idx]; } static void @@ -105,39 +106,36 @@ add_rss(struct rte_flow_action *actions, uint8_t actions_counter, struct additional_para para) { - static struct rte_flow_action_rss *rss_action; - static struct action_rss_data *rss_data; + static struct action_rss_data *rss_data[RTE_MAX_LCORE] __rte_cache_aligned; uint16_t queue; - if (rss_data == NULL) - rss_data = rte_malloc("rss_data", + if (rss_data[para.core_idx] == NULL) + rss_data[para.core_idx] = rte_malloc("rss_data", sizeof(struct action_rss_data), 0); - if (rss_data == NULL) + if (rss_data[para.core_idx] == NULL) rte_exit(EXIT_FAILURE, "No Memory available!"); - *rss_data = (struct action_rss_data){ + *rss_data[para.core_idx] = (struct action_rss_data){ .conf = (struct rte_flow_action_rss){ .func = RTE_ETH_HASH_FUNCTION_DEFAULT, .level = 0, .types = GET_RSS_HF(), - .key_len = sizeof(rss_data->key), + .key_len = sizeof(rss_data[para.core_idx]->key), .queue_num = para.queues_number, - .key = rss_data->key, - .queue = rss_data->queue, + .key = rss_data[para.core_idx]->key, + .queue = rss_data[para.core_idx]->queue, }, .key = { 1 }, .queue = { 0 }, }; for (queue = 0; queue < para.queues_number; queue++) - rss_data->queue[queue] = para.queues[queue]; - - rss_action = &rss_data->conf; + rss_data[para.core_idx]->queue[queue] = para.queues[queue]; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_RSS; - actions[actions_counter].conf = rss_action; + actions[actions_counter].conf = &rss_data[para.core_idx]->conf; } static void @@ -212,7 +210,7 @@ add_set_src_mac(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_mac set_mac; + static struct rte_flow_action_set_mac set_macs[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t mac = para.counter; uint16_t i; @@ -222,12 +220,12 @@ add_set_src_mac(struct rte_flow_action *actions, /* Mac address to be set is random each time */ for (i = 0; i < RTE_ETHER_ADDR_LEN; i++) { - set_mac.mac_addr[i] = mac & 0xff; + set_macs[para.core_idx].mac_addr[i] = mac & 0xff; mac = mac >> 8; } actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_MAC_SRC; - actions[actions_counter].conf = &set_mac; + actions[actions_counter].conf = &set_macs[para.core_idx]; } static void @@ -235,7 +233,7 @@ add_set_dst_mac(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_mac set_mac; + static struct rte_flow_action_set_mac set_macs[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t mac = para.counter; uint16_t i; @@ -245,12 +243,12 @@ add_set_dst_mac(struct rte_flow_action *actions, /* Mac address to be set is random each time */ for (i = 0; i < RTE_ETHER_ADDR_LEN; i++) { - set_mac.mac_addr[i] = mac & 0xff; + set_macs[para.core_idx].mac_addr[i] = mac & 0xff; mac = mac >> 8; } actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_MAC_DST; - actions[actions_counter].conf = &set_mac; + actions[actions_counter].conf = &set_macs[para.core_idx]; } static void @@ -258,7 +256,7 @@ add_set_src_ipv4(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_ipv4 set_ipv4; + static struct rte_flow_action_set_ipv4 set_ipv4[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t ip = para.counter; /* Fixed value */ @@ -266,10 +264,10 @@ add_set_src_ipv4(struct rte_flow_action *actions, ip = 1; /* IPv4 value to be set is random each time */ - set_ipv4.ipv4_addr = RTE_BE32(ip + 1); + set_ipv4[para.core_idx].ipv4_addr = RTE_BE32(ip + 1); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC; - actions[actions_counter].conf = &set_ipv4; + actions[actions_counter].conf = &set_ipv4[para.core_idx]; } static void @@ -277,7 +275,7 @@ add_set_dst_ipv4(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_ipv4 set_ipv4; + static struct rte_flow_action_set_ipv4 set_ipv4[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t ip = para.counter; /* Fixed value */ @@ -285,10 +283,10 @@ add_set_dst_ipv4(struct rte_flow_action *actions, ip = 1; /* IPv4 value to be set is random each time */ - set_ipv4.ipv4_addr = RTE_BE32(ip + 1); + set_ipv4[para.core_idx].ipv4_addr = RTE_BE32(ip + 1); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV4_DST; - actions[actions_counter].conf = &set_ipv4; + actions[actions_counter].conf = &set_ipv4[para.core_idx]; } static void @@ -296,7 +294,7 @@ add_set_src_ipv6(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_ipv6 set_ipv6; + static struct rte_flow_action_set_ipv6 set_ipv6[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t ipv6 = para.counter; uint8_t i; @@ -306,12 +304,12 @@ add_set_src_ipv6(struct rte_flow_action *actions, /* IPv6 value to set is random each time */ for (i = 0; i < 16; i++) { - set_ipv6.ipv6_addr[i] = ipv6 & 0xff; + set_ipv6[para.core_idx].ipv6_addr[i] = ipv6 & 0xff; ipv6 = ipv6 >> 8; } actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC; - actions[actions_counter].conf = &set_ipv6; + actions[actions_counter].conf = &set_ipv6[para.core_idx]; } static void @@ -319,7 +317,7 @@ add_set_dst_ipv6(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_ipv6 set_ipv6; + static struct rte_flow_action_set_ipv6 set_ipv6[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t ipv6 = para.counter; uint8_t i; @@ -329,12 +327,12 @@ add_set_dst_ipv6(struct rte_flow_action *actions, /* IPv6 value to set is random each time */ for (i = 0; i < 16; i++) { - set_ipv6.ipv6_addr[i] = ipv6 & 0xff; + set_ipv6[para.core_idx].ipv6_addr[i] = ipv6 & 0xff; ipv6 = ipv6 >> 8; } actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV6_DST; - actions[actions_counter].conf = &set_ipv6; + actions[actions_counter].conf = &set_ipv6[para.core_idx]; } static void @@ -342,7 +340,7 @@ add_set_src_tp(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_tp set_tp; + static struct rte_flow_action_set_tp set_tp[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t tp = para.counter; /* Fixed value */ @@ -352,10 +350,10 @@ add_set_src_tp(struct rte_flow_action *actions, /* TP src port is random each time */ tp = tp % 0xffff; - set_tp.port = RTE_BE16(tp & 0xffff); + set_tp[para.core_idx].port = RTE_BE16(tp & 0xffff); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TP_SRC; - actions[actions_counter].conf = &set_tp; + actions[actions_counter].conf = &set_tp[para.core_idx]; } static void @@ -363,7 +361,7 @@ add_set_dst_tp(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_tp set_tp; + static struct rte_flow_action_set_tp set_tp[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t tp = para.counter; /* Fixed value */ @@ -374,10 +372,10 @@ add_set_dst_tp(struct rte_flow_action *actions, if (tp > 0xffff) tp = tp >> 16; - set_tp.port = RTE_BE16(tp & 0xffff); + set_tp[para.core_idx].port = RTE_BE16(tp & 0xffff); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TP_DST; - actions[actions_counter].conf = &set_tp; + actions[actions_counter].conf = &set_tp[para.core_idx]; } static void @@ -385,17 +383,17 @@ add_inc_tcp_ack(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static rte_be32_t value; + static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t ack_value = para.counter; /* Fixed value */ if (FIXED_VALUES) ack_value = 1; - value = RTE_BE32(ack_value); + value[para.core_idx] = RTE_BE32(ack_value); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_INC_TCP_ACK; - actions[actions_counter].conf = &value; + actions[actions_counter].conf = &value[para.core_idx]; } static void @@ -403,17 +401,17 @@ add_dec_tcp_ack(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static rte_be32_t value; + static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t ack_value = para.counter; /* Fixed value */ if (FIXED_VALUES) ack_value = 1; - value = RTE_BE32(ack_value); + value[para.core_idx] = RTE_BE32(ack_value); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK; - actions[actions_counter].conf = &value; + actions[actions_counter].conf = &value[para.core_idx]; } static void @@ -421,17 +419,17 @@ add_inc_tcp_seq(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static rte_be32_t value; + static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t seq_value = para.counter; /* Fixed value */ if (FIXED_VALUES) seq_value = 1; - value = RTE_BE32(seq_value); + value[para.core_idx] = RTE_BE32(seq_value); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ; - actions[actions_counter].conf = &value; + actions[actions_counter].conf = &value[para.core_idx]; } static void @@ -439,17 +437,17 @@ add_dec_tcp_seq(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static rte_be32_t value; + static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t seq_value = para.counter; /* Fixed value */ if (FIXED_VALUES) seq_value = 1; - value = RTE_BE32(seq_value); + value[para.core_idx] = RTE_BE32(seq_value); actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ; - actions[actions_counter].conf = &value; + actions[actions_counter].conf = &value[para.core_idx]; } static void @@ -457,7 +455,7 @@ add_set_ttl(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_ttl set_ttl; + static struct rte_flow_action_set_ttl set_ttl[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t ttl_value = para.counter; /* Fixed value */ @@ -467,10 +465,10 @@ add_set_ttl(struct rte_flow_action *actions, /* Set ttl to random value each time */ ttl_value = ttl_value % 0xff; - set_ttl.ttl_value = ttl_value; + set_ttl[para.core_idx].ttl_value = ttl_value; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TTL; - actions[actions_counter].conf = &set_ttl; + actions[actions_counter].conf = &set_ttl[para.core_idx]; } static void @@ -486,7 +484,7 @@ add_set_ipv4_dscp(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_dscp set_dscp; + static struct rte_flow_action_set_dscp set_dscp[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t dscp_value = para.counter; /* Fixed value */ @@ -496,10 +494,10 @@ add_set_ipv4_dscp(struct rte_flow_action *actions, /* Set dscp to random value each time */ dscp_value = dscp_value % 0xff; - set_dscp.dscp = dscp_value; + set_dscp[para.core_idx].dscp = dscp_value; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP; - actions[actions_counter].conf = &set_dscp; + actions[actions_counter].conf = &set_dscp[para.core_idx]; } static void @@ -507,7 +505,7 @@ add_set_ipv6_dscp(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_dscp set_dscp; + static struct rte_flow_action_set_dscp set_dscp[RTE_MAX_LCORE] __rte_cache_aligned; uint32_t dscp_value = para.counter; /* Fixed value */ @@ -517,10 +515,10 @@ add_set_ipv6_dscp(struct rte_flow_action *actions, /* Set dscp to random value each time */ dscp_value = dscp_value % 0xff; - set_dscp.dscp = dscp_value; + set_dscp[para.core_idx].dscp = dscp_value; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP; - actions[actions_counter].conf = &set_dscp; + actions[actions_counter].conf = &set_dscp[para.core_idx]; } static void @@ -774,36 +772,36 @@ add_raw_encap(struct rte_flow_action *actions, uint8_t actions_counter, struct additional_para para) { - static struct action_raw_encap_data *action_encap_data; + static struct action_raw_encap_data *action_encap_data[RTE_MAX_LCORE] __rte_cache_aligned; uint64_t encap_data = para.encap_data; uint8_t *header; uint8_t i; /* Avoid double allocation. */ - if (action_encap_data == NULL) - action_encap_data = rte_malloc("encap_data", + if (action_encap_data[para.core_idx] == NULL) + action_encap_data[para.core_idx] = rte_malloc("encap_data", sizeof(struct action_raw_encap_data), 0); /* Check if allocation failed. */ - if (action_encap_data == NULL) + if (action_encap_data[para.core_idx] == NULL) rte_exit(EXIT_FAILURE, "No Memory available!"); - *action_encap_data = (struct action_raw_encap_data) { + *action_encap_data[para.core_idx] = (struct action_raw_encap_data) { .conf = (struct rte_flow_action_raw_encap) { - .data = action_encap_data->data, + .data = action_encap_data[para.core_idx]->data, }, .data = {}, }; - header = action_encap_data->data; + header = action_encap_data[para.core_idx]->data; for (i = 0; i < RTE_DIM(headers); i++) headers[i].funct(&header, encap_data, para); - action_encap_data->conf.size = header - - action_encap_data->data; + action_encap_data[para.core_idx]->conf.size = header - + action_encap_data[para.core_idx]->data; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_RAW_ENCAP; - actions[actions_counter].conf = &action_encap_data->conf; + actions[actions_counter].conf = &action_encap_data[para.core_idx]->conf; } static void @@ -811,36 +809,36 @@ add_raw_decap(struct rte_flow_action *actions, uint8_t actions_counter, struct additional_para para) { - static struct action_raw_decap_data *action_decap_data; + static struct action_raw_decap_data *action_decap_data[RTE_MAX_LCORE] __rte_cache_aligned; uint64_t decap_data = para.decap_data; uint8_t *header; uint8_t i; /* Avoid double allocation. */ - if (action_decap_data == NULL) - action_decap_data = rte_malloc("decap_data", + if (action_decap_data[para.core_idx] == NULL) + action_decap_data[para.core_idx] = rte_malloc("decap_data", sizeof(struct action_raw_decap_data), 0); /* Check if allocation failed. */ - if (action_decap_data == NULL) + if (action_decap_data[para.core_idx] == NULL) rte_exit(EXIT_FAILURE, "No Memory available!"); - *action_decap_data = (struct action_raw_decap_data) { + *action_decap_data[para.core_idx] = (struct action_raw_decap_data) { .conf = (struct rte_flow_action_raw_decap) { - .data = action_decap_data->data, + .data = action_decap_data[para.core_idx]->data, }, .data = {}, }; - header = action_decap_data->data; + header = action_decap_data[para.core_idx]->data; for (i = 0; i < RTE_DIM(headers); i++) headers[i].funct(&header, decap_data, para); - action_decap_data->conf.size = header - - action_decap_data->data; + action_decap_data[para.core_idx]->conf.size = header - + action_decap_data[para.core_idx]->data; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_RAW_DECAP; - actions[actions_counter].conf = &action_decap_data->conf; + actions[actions_counter].conf = &action_decap_data[para.core_idx]->conf; } static void @@ -848,7 +846,7 @@ add_vxlan_encap(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_vxlan_encap vxlan_encap; + static struct rte_flow_action_vxlan_encap vxlan_encap[RTE_MAX_LCORE] __rte_cache_aligned; static struct rte_flow_item items[5]; static struct rte_flow_item_eth item_eth; static struct rte_flow_item_ipv4 item_ipv4; @@ -885,10 +883,10 @@ add_vxlan_encap(struct rte_flow_action *actions, items[4].type = RTE_FLOW_ITEM_TYPE_END; - vxlan_encap.definition = items; + vxlan_encap[para.core_idx].definition = items; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP; - actions[actions_counter].conf = &vxlan_encap; + actions[actions_counter].conf = &vxlan_encap[para.core_idx]; } static void @@ -902,7 +900,7 @@ add_vxlan_decap(struct rte_flow_action *actions, void fill_actions(struct rte_flow_action *actions, uint64_t *flow_actions, uint32_t counter, uint16_t next_table, uint16_t hairpinq, - uint64_t encap_data, uint64_t decap_data) + uint64_t encap_data, uint64_t decap_data, uint8_t core_idx) { struct additional_para additional_para_data; uint8_t actions_counter = 0; @@ -924,6 +922,7 @@ fill_actions(struct rte_flow_action *actions, uint64_t *flow_actions, .counter = counter, .encap_data = encap_data, .decap_data = decap_data, + .core_idx = core_idx, }; if (hairpinq != 0) { diff --git a/app/test-flow-perf/actions_gen.h b/app/test-flow-perf/actions_gen.h index 85e3176b09..77353cfe09 100644 --- a/app/test-flow-perf/actions_gen.h +++ b/app/test-flow-perf/actions_gen.h @@ -19,6 +19,6 @@ void fill_actions(struct rte_flow_action *actions, uint64_t *flow_actions, uint32_t counter, uint16_t next_table, uint16_t hairpinq, - uint64_t encap_data, uint64_t decap_data); + uint64_t encap_data, uint64_t decap_data, uint8_t core_idx); #endif /* FLOW_PERF_ACTION_GEN */ diff --git a/app/test-flow-perf/config.h b/app/test-flow-perf/config.h index 8f42bc589c..94e83c9abc 100644 --- a/app/test-flow-perf/config.h +++ b/app/test-flow-perf/config.h @@ -15,6 +15,7 @@ #define MBUF_CACHE_SIZE 512 #define NR_RXD 256 #define NR_TXD 256 +#define MAX_PORTS 64 /* This is used for encap/decap & header modify actions. * When it's 1: it means all actions have fixed values. diff --git a/app/test-flow-perf/flow_gen.c b/app/test-flow-perf/flow_gen.c index a979b3856d..df4af16de8 100644 --- a/app/test-flow-perf/flow_gen.c +++ b/app/test-flow-perf/flow_gen.c @@ -45,6 +45,7 @@ generate_flow(uint16_t port_id, uint16_t hairpinq, uint64_t encap_data, uint64_t decap_data, + uint8_t core_idx, struct rte_flow_error *error) { struct rte_flow_attr attr; @@ -60,9 +61,9 @@ generate_flow(uint16_t port_id, fill_actions(actions, flow_actions, outer_ip_src, next_table, hairpinq, - encap_data, decap_data); + encap_data, decap_data, core_idx); - fill_items(items, flow_items, outer_ip_src); + fill_items(items, flow_items, outer_ip_src, core_idx); flow = rte_flow_create(port_id, &attr, items, actions, error); return flow; diff --git a/app/test-flow-perf/flow_gen.h b/app/test-flow-perf/flow_gen.h index 3d13737d65..f1d0999af1 100644 --- a/app/test-flow-perf/flow_gen.h +++ b/app/test-flow-perf/flow_gen.h @@ -34,6 +34,7 @@ generate_flow(uint16_t port_id, uint16_t hairpinq, uint64_t encap_data, uint64_t decap_data, + uint8_t core_idx, struct rte_flow_error *error); #endif /* FLOW_PERF_FLOW_GEN */ diff --git a/app/test-flow-perf/items_gen.c b/app/test-flow-perf/items_gen.c index 2b1ab41467..0950023608 100644 --- a/app/test-flow-perf/items_gen.c +++ b/app/test-flow-perf/items_gen.c @@ -15,6 +15,7 @@ /* Storage for additional parameters for items */ struct additional_para { rte_be32_t src_ip; + uint8_t core_idx; }; static void @@ -58,18 +59,19 @@ static void add_ipv4(struct rte_flow_item *items, uint8_t items_counter, struct additional_para para) { - static struct rte_flow_item_ipv4 ipv4_spec; - static struct rte_flow_item_ipv4 ipv4_mask; + static struct rte_flow_item_ipv4 ipv4_specs[RTE_MAX_LCORE] __rte_cache_aligned; + static struct rte_flow_item_ipv4 ipv4_masks[RTE_MAX_LCORE] __rte_cache_aligned; + uint8_t ti = para.core_idx; - memset(&ipv4_spec, 0, sizeof(struct rte_flow_item_ipv4)); - memset(&ipv4_mask, 0, sizeof(struct rte_flow_item_ipv4)); + memset(&ipv4_specs[ti], 0, sizeof(struct rte_flow_item_ipv4)); + memset(&ipv4_masks[ti], 0, sizeof(struct rte_flow_item_ipv4)); - ipv4_spec.hdr.src_addr = RTE_BE32(para.src_ip); - ipv4_mask.hdr.src_addr = RTE_BE32(0xffffffff); + ipv4_specs[ti].hdr.src_addr = RTE_BE32(para.src_ip); + ipv4_masks[ti].hdr.src_addr = RTE_BE32(0xffffffff); items[items_counter].type = RTE_FLOW_ITEM_TYPE_IPV4; - items[items_counter].spec = &ipv4_spec; - items[items_counter].mask = &ipv4_mask; + items[items_counter].spec = &ipv4_specs[ti]; + items[items_counter].mask = &ipv4_masks[ti]; } @@ -77,23 +79,24 @@ static void add_ipv6(struct rte_flow_item *items, uint8_t items_counter, struct additional_para para) { - static struct rte_flow_item_ipv6 ipv6_spec; - static struct rte_flow_item_ipv6 ipv6_mask; + static struct rte_flow_item_ipv6 ipv6_specs[RTE_MAX_LCORE] __rte_cache_aligned; + static struct rte_flow_item_ipv6 ipv6_masks[RTE_MAX_LCORE] __rte_cache_aligned; + uint8_t ti = para.core_idx; - memset(&ipv6_spec, 0, sizeof(struct rte_flow_item_ipv6)); - memset(&ipv6_mask, 0, sizeof(struct rte_flow_item_ipv6)); + memset(&ipv6_specs[ti], 0, sizeof(struct rte_flow_item_ipv6)); + memset(&ipv6_masks[ti], 0, sizeof(struct rte_flow_item_ipv6)); /** Set ipv6 src **/ - memset(&ipv6_spec.hdr.src_addr, para.src_ip, - sizeof(ipv6_spec.hdr.src_addr) / 2); + memset(&ipv6_specs[ti].hdr.src_addr, para.src_ip, + sizeof(ipv6_specs->hdr.src_addr) / 2); /** Full mask **/ - memset(&ipv6_mask.hdr.src_addr, 0xff, - sizeof(ipv6_spec.hdr.src_addr)); + memset(&ipv6_masks[ti].hdr.src_addr, 0xff, + sizeof(ipv6_specs->hdr.src_addr)); items[items_counter].type = RTE_FLOW_ITEM_TYPE_IPV6; - items[items_counter].spec = &ipv6_spec; - items[items_counter].mask = &ipv6_mask; + items[items_counter].spec = &ipv6_specs[ti]; + items[items_counter].mask = &ipv6_masks[ti]; } static void @@ -131,31 +134,31 @@ add_udp(struct rte_flow_item *items, static void add_vxlan(struct rte_flow_item *items, uint8_t items_counter, - __rte_unused struct additional_para para) + struct additional_para para) { - static struct rte_flow_item_vxlan vxlan_spec; - static struct rte_flow_item_vxlan vxlan_mask; - + static struct rte_flow_item_vxlan vxlan_specs[RTE_MAX_LCORE] __rte_cache_aligned; + static struct rte_flow_item_vxlan vxlan_masks[RTE_MAX_LCORE] __rte_cache_aligned; + uint8_t ti = para.core_idx; uint32_t vni_value; uint8_t i; vni_value = VNI_VALUE; - memset(&vxlan_spec, 0, sizeof(struct rte_flow_item_vxlan)); - memset(&vxlan_mask, 0, sizeof(struct rte_flow_item_vxlan)); + memset(&vxlan_specs[ti], 0, sizeof(struct rte_flow_item_vxlan)); + memset(&vxlan_masks[ti], 0, sizeof(struct rte_flow_item_vxlan)); /* Set standard vxlan vni */ for (i = 0; i < 3; i++) { - vxlan_spec.vni[2 - i] = vni_value >> (i * 8); - vxlan_mask.vni[2 - i] = 0xff; + vxlan_specs[ti].vni[2 - i] = vni_value >> (i * 8); + vxlan_masks[ti].vni[2 - i] = 0xff; } /* Standard vxlan flags */ - vxlan_spec.flags = 0x8; + vxlan_specs[ti].flags = 0x8; items[items_counter].type = RTE_FLOW_ITEM_TYPE_VXLAN; - items[items_counter].spec = &vxlan_spec; - items[items_counter].mask = &vxlan_mask; + items[items_counter].spec = &vxlan_specs[ti]; + items[items_counter].mask = &vxlan_masks[ti]; } static void @@ -163,29 +166,29 @@ add_vxlan_gpe(struct rte_flow_item *items, uint8_t items_counter, __rte_unused struct additional_para para) { - static struct rte_flow_item_vxlan_gpe vxlan_gpe_spec; - static struct rte_flow_item_vxlan_gpe vxlan_gpe_mask; - + static struct rte_flow_item_vxlan_gpe vxlan_gpe_specs[RTE_MAX_LCORE] __rte_cache_aligned; + static struct rte_flow_item_vxlan_gpe vxlan_gpe_masks[RTE_MAX_LCORE] __rte_cache_aligned; + uint8_t ti = para.core_idx; uint32_t vni_value; uint8_t i; vni_value = VNI_VALUE; - memset(&vxlan_gpe_spec, 0, sizeof(struct rte_flow_item_vxlan_gpe)); - memset(&vxlan_gpe_mask, 0, sizeof(struct rte_flow_item_vxlan_gpe)); + memset(&vxlan_gpe_specs[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe)); + memset(&vxlan_gpe_masks[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe)); /* Set vxlan-gpe vni */ for (i = 0; i < 3; i++) { - vxlan_gpe_spec.vni[2 - i] = vni_value >> (i * 8); - vxlan_gpe_mask.vni[2 - i] = 0xff; + vxlan_gpe_specs[ti].vni[2 - i] = vni_value >> (i * 8); + vxlan_gpe_masks[ti].vni[2 - i] = 0xff; } /* vxlan-gpe flags */ - vxlan_gpe_spec.flags = 0x0c; + vxlan_gpe_specs[ti].flags = 0x0c; items[items_counter].type = RTE_FLOW_ITEM_TYPE_VXLAN_GPE; - items[items_counter].spec = &vxlan_gpe_spec; - items[items_counter].mask = &vxlan_gpe_mask; + items[items_counter].spec = &vxlan_gpe_specs[ti]; + items[items_counter].mask = &vxlan_gpe_masks[ti]; } static void @@ -216,25 +219,25 @@ add_geneve(struct rte_flow_item *items, uint8_t items_counter, __rte_unused struct additional_para para) { - static struct rte_flow_item_geneve geneve_spec; - static struct rte_flow_item_geneve geneve_mask; - + static struct rte_flow_item_geneve geneve_specs[RTE_MAX_LCORE] __rte_cache_aligned; + static struct rte_flow_item_geneve geneve_masks[RTE_MAX_LCORE] __rte_cache_aligned; + uint8_t ti = para.core_idx; uint32_t vni_value; uint8_t i; vni_value = VNI_VALUE; - memset(&geneve_spec, 0, sizeof(struct rte_flow_item_geneve)); - memset(&geneve_mask, 0, sizeof(struct rte_flow_item_geneve)); + memset(&geneve_specs[ti], 0, sizeof(struct rte_flow_item_geneve)); + memset(&geneve_masks[ti], 0, sizeof(struct rte_flow_item_geneve)); for (i = 0; i < 3; i++) { - geneve_spec.vni[2 - i] = vni_value >> (i * 8); - geneve_mask.vni[2 - i] = 0xff; + geneve_specs[ti].vni[2 - i] = vni_value >> (i * 8); + geneve_masks[ti].vni[2 - i] = 0xff; } items[items_counter].type = RTE_FLOW_ITEM_TYPE_GENEVE; - items[items_counter].spec = &geneve_spec; - items[items_counter].mask = &geneve_mask; + items[items_counter].spec = &geneve_specs[ti]; + items[items_counter].mask = &geneve_masks[ti]; } static void @@ -344,12 +347,14 @@ add_icmpv6(struct rte_flow_item *items, void fill_items(struct rte_flow_item *items, - uint64_t *flow_items, uint32_t outer_ip_src) + uint64_t *flow_items, uint32_t outer_ip_src, + uint8_t core_idx) { uint8_t items_counter = 0; uint8_t i, j; struct additional_para additional_para_data = { .src_ip = outer_ip_src, + .core_idx = core_idx, }; /* Support outer items up to tunnel layer only. */ diff --git a/app/test-flow-perf/items_gen.h b/app/test-flow-perf/items_gen.h index d68958e4d3..f4b0e9a981 100644 --- a/app/test-flow-perf/items_gen.h +++ b/app/test-flow-perf/items_gen.h @@ -13,6 +13,6 @@ #include "config.h" void fill_items(struct rte_flow_item *items, uint64_t *flow_items, - uint32_t outer_ip_src); + uint32_t outer_ip_src, uint8_t core_idx); #endif /* FLOW_PERF_ITEMS_GEN */ diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c index 5ec9a15c61..663b2e9bae 100644 --- a/app/test-flow-perf/main.c +++ b/app/test-flow-perf/main.c @@ -72,7 +72,6 @@ static uint32_t nb_lcores; #define LCORE_MODE_PKT 1 #define LCORE_MODE_STATS 2 #define MAX_STREAMS 64 -#define MAX_LCORES 64 struct stream { int tx_port; @@ -92,7 +91,20 @@ struct lcore_info { struct rte_mbuf *pkts[MAX_PKT_BURST]; } __rte_cache_aligned; -static struct lcore_info lcore_infos[MAX_LCORES]; +static struct lcore_info lcore_infos[RTE_MAX_LCORE]; + +struct multi_cores_pool { + uint32_t cores_count; + uint32_t rules_count; + double cpu_time_used_insertion[MAX_PORTS][RTE_MAX_LCORE]; + double cpu_time_used_deletion[MAX_PORTS][RTE_MAX_LCORE]; + int64_t last_alloc[RTE_MAX_LCORE]; + int64_t current_alloc[RTE_MAX_LCORE]; +} __rte_cache_aligned; + +static struct multi_cores_pool mc_pool = { + .cores_count = 1, +}; static void usage(char *progname) @@ -118,6 +130,8 @@ usage(char *progname) printf(" --transfer: set transfer attribute in flows\n"); printf(" --group=N: set group for all flows," " default is %d\n", DEFAULT_GROUP); + printf(" --cores=N: to set the number of needed " + "cores to insert rte_flow rules, default is 1\n"); printf("To set flow items:\n"); printf(" --ether: add ether layer in flow items\n"); @@ -537,6 +551,7 @@ args_parse(int argc, char **argv) { "dump-socket-mem", 0, 0, 0 }, { "enable-fwd", 0, 0, 0 }, { "portmask", 1, 0, 0 }, + { "cores", 1, 0, 0 }, /* Attributes */ { "ingress", 0, 0, 0 }, { "egress", 0, 0, 0 }, @@ -750,6 +765,21 @@ args_parse(int argc, char **argv) rte_exit(EXIT_FAILURE, "Invalid fwd port mask\n"); ports_mask = pm; } + if (strcmp(lgopts[opt_idx].name, "cores") == 0) { + n = atoi(optarg); + if ((int) rte_lcore_count() <= n) { + printf("\nError: you need %d cores to run on multi-cores\n" + "Existing cores are: %d\n", n, rte_lcore_count()); + rte_exit(EXIT_FAILURE, " "); + } + if (n <= RTE_MAX_LCORE && n > 0) + mc_pool.cores_count = n; + else { + printf("Error: cores count must be > 0 " + " and < %d\n", RTE_MAX_LCORE); + rte_exit(EXIT_FAILURE, " "); + } + } break; default: fprintf(stderr, "Invalid option: %s\n", argv[optind]); @@ -845,7 +875,7 @@ print_rules_batches(double *cpu_time_per_batch) } static inline void -destroy_flows(int port_id, struct rte_flow **flows_list) +destroy_flows(int port_id, uint8_t core_id, struct rte_flow **flows_list) { struct rte_flow_error error; clock_t start_batch, end_batch; @@ -855,12 +885,12 @@ destroy_flows(int port_id, struct rte_flow **flows_list) double delta; uint32_t i; int rules_batch_idx; + int rules_count_per_core; - /* Deletion Rate */ - printf("\nRules Deletion on port = %d\n", port_id); + rules_count_per_core = rules_count / mc_pool.cores_count; start_batch = clock(); - for (i = 0; i < rules_count; i++) { + for (i = 0; i < (uint32_t) rules_count_per_core; i++) { if (flows_list[i] == 0) break; @@ -891,15 +921,17 @@ destroy_flows(int port_id, struct rte_flow **flows_list) print_rules_batches(cpu_time_per_batch); /* Deletion rate for all rules */ - deletion_rate = ((double) (rules_count / cpu_time_used) / 1000); - printf(":: Total rules deletion rate -> %f K Rule/Sec\n", - deletion_rate); - printf(":: The time for deleting %d in rules %f seconds\n", - rules_count, cpu_time_used); + deletion_rate = ((double) (rules_count_per_core / cpu_time_used) / 1000); + printf(":: Port %d :: Core %d :: Rules deletion rate -> %f K Rule/Sec\n", + port_id, core_id, deletion_rate); + printf(":: Port %d :: Core %d :: The time for deleting %d rules is %f seconds\n", + port_id, core_id, rules_count_per_core, cpu_time_used); + + mc_pool.cpu_time_used_deletion[port_id][core_id] = cpu_time_used; } static struct rte_flow ** -insert_flows(int port_id) +insert_flows(int port_id, uint8_t core_id) { struct rte_flow **flows_list; struct rte_flow_error error; @@ -909,32 +941,42 @@ insert_flows(int port_id) double cpu_time_per_batch[MAX_BATCHES_COUNT] = { 0 }; double delta; uint32_t flow_index; - uint32_t counter; + uint32_t counter, start_counter = 0, end_counter; uint64_t global_items[MAX_ITEMS_NUM] = { 0 }; uint64_t global_actions[MAX_ACTIONS_NUM] = { 0 }; int rules_batch_idx; + int rules_count_per_core; + + rules_count_per_core = rules_count / mc_pool.cores_count; + + /* Set boundaries of rules for each core. */ + if (core_id) + start_counter = core_id * rules_count_per_core; + end_counter = (core_id + 1) * rules_count_per_core; global_items[0] = FLOW_ITEM_MASK(RTE_FLOW_ITEM_TYPE_ETH); global_actions[0] = FLOW_ITEM_MASK(RTE_FLOW_ACTION_TYPE_JUMP); flows_list = rte_zmalloc("flows_list", - (sizeof(struct rte_flow *) * rules_count) + 1, 0); + (sizeof(struct rte_flow *) * rules_count_per_core) + 1, 0); if (flows_list == NULL) rte_exit(EXIT_FAILURE, "No Memory available!"); cpu_time_used = 0; flow_index = 0; - if (flow_group > 0) { + if (flow_group > 0 && core_id == 0) { /* * Create global rule to jump into flow_group, * this way the app will avoid the default rules. * + * This rule will be created only once. + * * Global rule: * group 0 eth / end actions jump group */ flow = generate_flow(port_id, 0, flow_attrs, global_items, global_actions, - flow_group, 0, 0, 0, 0, &error); + flow_group, 0, 0, 0, 0, core_id, &error); if (flow == NULL) { print_flow_error(error); @@ -943,19 +985,17 @@ insert_flows(int port_id) flows_list[flow_index++] = flow; } - /* Insertion Rate */ - printf("Rules insertion on port = %d\n", port_id); start_batch = clock(); - for (counter = 0; counter < rules_count; counter++) { + for (counter = start_counter; counter < end_counter; counter++) { flow = generate_flow(port_id, flow_group, flow_attrs, flow_items, flow_actions, JUMP_ACTION_TABLE, counter, hairpin_queues_num, encap_data, decap_data, - &error); + core_id, &error); if (force_quit) - counter = rules_count; + counter = end_counter; if (!flow) { print_flow_error(error); @@ -984,23 +1024,25 @@ insert_flows(int port_id) if (dump_iterations) print_rules_batches(cpu_time_per_batch); - /* Insertion rate for all rules */ - insertion_rate = ((double) (rules_count / cpu_time_used) / 1000); - printf(":: Total flow insertion rate -> %f K Rule/Sec\n", - insertion_rate); - printf(":: The time for creating %d in flows %f seconds\n", - rules_count, cpu_time_used); + printf(":: Port %d :: Core %d boundaries :: start @[%d] - end @[%d]\n", + port_id, core_id, start_counter, end_counter - 1); + + /* Insertion rate for all rules in one core */ + insertion_rate = ((double) (rules_count_per_core / cpu_time_used) / 1000); + printf(":: Port %d :: Core %d :: Rules insertion rate -> %f K Rule/Sec\n", + port_id, core_id, insertion_rate); + printf(":: Port %d :: Core %d :: The time for creating %d in rules %f seconds\n", + port_id, core_id, rules_count_per_core, cpu_time_used); + mc_pool.cpu_time_used_insertion[port_id][core_id] = cpu_time_used; return flows_list; } -static inline void -flows_handler(void) +static void +flows_handler(uint8_t core_id) { struct rte_flow **flows_list; uint16_t nr_ports; - int64_t alloc, last_alloc; - int flow_size_in_bytes; int port_id; nr_ports = rte_eth_dev_count_avail(); @@ -1016,21 +1058,148 @@ flows_handler(void) continue; /* Insertion part. */ - last_alloc = (int64_t)dump_socket_mem(stdout); - flows_list = insert_flows(port_id); - alloc = (int64_t)dump_socket_mem(stdout); + mc_pool.last_alloc[core_id] = (int64_t)dump_socket_mem(stdout); + flows_list = insert_flows(port_id, core_id); + if (flows_list == NULL) + rte_exit(EXIT_FAILURE, "Error: Insertion Failed!\n"); + mc_pool.current_alloc[core_id] = (int64_t)dump_socket_mem(stdout); /* Deletion part. */ if (delete_flag) - destroy_flows(port_id, flows_list); + destroy_flows(port_id, core_id, flows_list); + } +} + +static int +run_rte_flow_handler_cores(void *data __rte_unused) +{ + uint16_t port; + /* Latency: total count of rte rules divided + * over max time used by thread between all + * threads time. + * + * Throughput: total count of rte rules divided + * over the average of the time cosumed by all + * threads time. + */ + double insertion_latency_time; + double insertion_throughput_time; + double deletion_latency_time; + double deletion_throughput_time; + double insertion_latency, insertion_throughput; + double deletion_latency, deletion_throughput; + int64_t last_alloc, current_alloc; + int flow_size_in_bytes; + int lcore_counter = 0; + int lcore_id = rte_lcore_id(); + int i; + + RTE_LCORE_FOREACH(i) { + /* If core not needed return. */ + if (lcore_id == i) { + printf(":: lcore %d mapped with index %d\n", lcore_id, lcore_counter); + if (lcore_counter >= (int) mc_pool.cores_count) + return 0; + break; + } + lcore_counter++; + } + lcore_id = lcore_counter; + + if (lcore_id >= (int) mc_pool.cores_count) + return 0; + + mc_pool.rules_count = rules_count; - /* Report rte_flow size in huge pages. */ - if (last_alloc) { - flow_size_in_bytes = (alloc - last_alloc) / rules_count; - printf("\n:: rte_flow size in DPDK layer: %d Bytes", - flow_size_in_bytes); + flows_handler(lcore_id); + + /* Only main core to print total results. */ + if (lcore_id != 0) + return 0; + + /* Make sure all cores finished insertion/deletion process. */ + rte_eal_mp_wait_lcore(); + + /* Save first insertion/deletion rates from first thread. + * Start comparing with all threads, if any thread used + * time more than current saved, replace it. + * + * Thus in the end we will have the max time used for + * insertion/deletion by one thread. + * + * As for memory consumption, save the min of all threads + * of last alloc, and save the max for all threads for + * current alloc. + */ + RTE_ETH_FOREACH_DEV(port) { + last_alloc = mc_pool.last_alloc[0]; + current_alloc = mc_pool.current_alloc[0]; + + insertion_latency_time = mc_pool.cpu_time_used_insertion[port][0]; + deletion_latency_time = mc_pool.cpu_time_used_deletion[port][0]; + insertion_throughput_time = mc_pool.cpu_time_used_insertion[port][0]; + deletion_throughput_time = mc_pool.cpu_time_used_deletion[port][0]; + i = mc_pool.cores_count; + while (i-- > 1) { + insertion_throughput_time += mc_pool.cpu_time_used_insertion[port][i]; + deletion_throughput_time += mc_pool.cpu_time_used_deletion[port][i]; + if (insertion_latency_time < mc_pool.cpu_time_used_insertion[port][i]) + insertion_latency_time = mc_pool.cpu_time_used_insertion[port][i]; + if (deletion_latency_time < mc_pool.cpu_time_used_deletion[port][i]) + deletion_latency_time = mc_pool.cpu_time_used_deletion[port][i]; + if (last_alloc > mc_pool.last_alloc[i]) + last_alloc = mc_pool.last_alloc[i]; + if (current_alloc < mc_pool.current_alloc[i]) + current_alloc = mc_pool.current_alloc[i]; } + + flow_size_in_bytes = (current_alloc - last_alloc) / mc_pool.rules_count; + + insertion_latency = ((double) (mc_pool.rules_count / insertion_latency_time) / 1000); + deletion_latency = ((double) (mc_pool.rules_count / deletion_latency_time) / 1000); + + insertion_throughput_time /= mc_pool.cores_count; + deletion_throughput_time /= mc_pool.cores_count; + insertion_throughput = ((double) (mc_pool.rules_count / insertion_throughput_time) / 1000); + deletion_throughput = ((double) (mc_pool.rules_count / deletion_throughput_time) / 1000); + + /* Latency stats */ + printf("\n:: [Latency | Insertion] All Cores :: Port %d :: ", port); + printf("Total flows insertion rate -> %f K Rules/Sec\n", + insertion_latency); + printf(":: [Latency | Insertion] All Cores :: Port %d :: ", port); + printf("The time for creating %d rules is %f seconds\n", + mc_pool.rules_count, insertion_latency_time); + + /* Throughput stats */ + printf(":: [Throughput | Insertion] All Cores :: Port %d :: ", port); + printf("Total flows insertion rate -> %f K Rules/Sec\n", + insertion_throughput); + printf(":: [Throughput | Insertion] All Cores :: Port %d :: ", port); + printf("The average time for creating %d rules is %f seconds\n", + mc_pool.rules_count, insertion_throughput_time); + + if (delete_flag) { + /* Latency stats */ + printf(":: [Latency | Deletion] All Cores :: Port %d :: Total flows " + "deletion rate -> %f K Rules/Sec\n", + port, deletion_latency); + printf(":: [Latency | Deletion] All Cores :: Port %d :: ", port); + printf("The time for deleting %d rules is %f seconds\n", + mc_pool.rules_count, deletion_latency_time); + + /* Throughput stats */ + printf(":: [Throughput | Deletion] All Cores :: Port %d :: Total flows " + "deletion rate -> %f K Rules/Sec\n", port, deletion_throughput); + printf(":: [Throughput | Deletion] All Cores :: Port %d :: ", port); + printf("The average time for deleting %d rules is %f seconds\n", + mc_pool.rules_count, deletion_throughput_time); + } + printf("\n:: Port %d :: rte_flow size in DPDK layer: %d Bytes\n", + port, flow_size_in_bytes); } + + return 0; } static void @@ -1107,12 +1276,12 @@ packet_per_second_stats(void) int i; old = rte_zmalloc("old", - sizeof(struct lcore_info) * MAX_LCORES, 0); + sizeof(struct lcore_info) * RTE_MAX_LCORE, 0); if (old == NULL) rte_exit(EXIT_FAILURE, "No Memory available!"); memcpy(old, lcore_infos, - sizeof(struct lcore_info) * MAX_LCORES); + sizeof(struct lcore_info) * RTE_MAX_LCORE); while (!force_quit) { uint64_t total_tx_pkts = 0; @@ -1135,7 +1304,7 @@ packet_per_second_stats(void) printf("%6s %16s %16s %16s\n", "------", "----------------", "----------------", "----------------"); nr_lines = 3; - for (i = 0; i < MAX_LCORES; i++) { + for (i = 0; i < RTE_MAX_LCORE; i++) { li = &lcore_infos[i]; oli = &old[i]; if (li->mode != LCORE_MODE_PKT) @@ -1166,7 +1335,7 @@ packet_per_second_stats(void) } memcpy(old, lcore_infos, - sizeof(struct lcore_info) * MAX_LCORES); + sizeof(struct lcore_info) * RTE_MAX_LCORE); } } @@ -1227,7 +1396,7 @@ init_lcore_info(void) * This means that this stream is not used, or not set * yet. */ - for (i = 0; i < MAX_LCORES; i++) + for (i = 0; i < RTE_MAX_LCORE; i++) for (j = 0; j < MAX_STREAMS; j++) { lcore_infos[i].streams[j].tx_port = -1; lcore_infos[i].streams[j].rx_port = -1; @@ -1289,7 +1458,7 @@ init_lcore_info(void) /* Print all streams */ printf(":: Stream -> core id[N]: (rx_port, rx_queue)->(tx_port, tx_queue)\n"); - for (i = 0; i < MAX_LCORES; i++) + for (i = 0; i < RTE_MAX_LCORE; i++) for (j = 0; j < MAX_STREAMS; j++) { /* No streams for this core */ if (lcore_infos[i].streams[j].tx_port == -1) @@ -1470,7 +1639,10 @@ main(int argc, char **argv) if (nb_lcores <= 1) rte_exit(EXIT_FAILURE, "This app needs at least two cores\n"); - flows_handler(); + + printf(":: Flows Count per port: %d\n\n", rules_count); + + rte_eal_mp_remote_launch(run_rte_flow_handler_cores, NULL, CALL_MAIN); if (enable_fwd) { init_lcore_info(); diff --git a/doc/guides/tools/flow-perf.rst b/doc/guides/tools/flow-perf.rst index 634009ccee..40d157e8cb 100644 --- a/doc/guides/tools/flow-perf.rst +++ b/doc/guides/tools/flow-perf.rst @@ -25,15 +25,8 @@ computes an average time across all windows. The application also provides the ability to measure rte flow deletion rate, in addition to memory consumption before and after the flow rules' creation. -The app supports single and multi core performance measurements. - - -Known Limitations ------------------ - -The current version has limitations which can be removed in future: - -* Single core insertion only. +The app supports single and multiple core performance measurements, and +support multiple cores insertion/deletion as well. Compiling the Application @@ -103,6 +96,9 @@ The command line options are: * ``--portmask=N`` hexadecimal bitmask of ports to be used. +* ``--cores=N`` + Set the number of needed cores to insert/delete rte_flow rules. + Default cores count is 1. Attributes: From patchwork Thu Nov 26 11:15:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wisam Monther X-Patchwork-Id: 84573 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id F301CA052A; Thu, 26 Nov 2020 12:17:11 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 78D75C9B4; Thu, 26 Nov 2020 12:16:11 +0100 (CET) Received: from hqnvemgate24.nvidia.com (hqnvemgate24.nvidia.com [216.228.121.143]) by dpdk.org (Postfix) with ESMTP id EC389C97A for ; Thu, 26 Nov 2020 12:16:07 +0100 (CET) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Thu, 26 Nov 2020 03:16:13 -0800 Received: from nvidia.com (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 26 Nov 2020 11:16:04 +0000 From: Wisam Jaddo To: , , , CC: Date: Thu, 26 Nov 2020 13:15:42 +0200 Message-ID: <20201126111543.16928-4-wisamm@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201126111543.16928-1-wisamm@nvidia.com> References: <20201126111543.16928-1-wisamm@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1606389373; bh=8ULVCLgAPqyghXGLebWw6zC1jVXHiDvPDPT4aug3S6w=; h=From:To:CC:Subject:Date:Message-ID:X-Mailer:In-Reply-To: References:MIME-Version:Content-Type:X-Originating-IP: X-ClientProxiedBy; b=nlokN1LV8EhTsAQiiDkKXHedAWtBNscXYOPhcgeTYcOXmwITNi54CTTR2FGDEPboz NJwONpKgD9lXQoVHwUxkQE8dgZ6GLz5IO1C17BYyxlI+dwG5ug8BMVbRfx9eY4DzFa uY7dlTSI/7dChETr9gl3pLJyVUhAdO0YcKJmtIdULNfd//g4GxMhsy0SOMoiyyz7Z8 A7Vtth1s1L7KwLLN3AuXNeFPexhTNzJ7NIgura9X4WVocY8uGW5caOTF5Otbr+9Mtr t9lWOY+E1/DP73GRFpPgL35burdoxQR6n8XWaCtDi/tUPZY63XMCAsv8jzOoSnZuVr dYpvjFm3aMZaA== Subject: [dpdk-dev] [PATCH 3/4] app/flow-perf: change clock measurement functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The clock() function is not good practice to use for multiple cores/threads, since it measures the CPU time used by the process and not the wall clock time, while when running through multiple cores/threads simultaneously, we can burn through CPU time much faster. As a result this commit will change the way of measurement to use rd_tsc, and the results will be divided by the processor frequency. Signed-off-by: Wisam Jaddo Reviewed-by: Alexander Kozyrev Reviewed-by: Suanming Mou --- app/test-flow-perf/main.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c index 663b2e9bae..3a0e4c1951 100644 --- a/app/test-flow-perf/main.c +++ b/app/test-flow-perf/main.c @@ -889,7 +889,7 @@ destroy_flows(int port_id, uint8_t core_id, struct rte_flow **flows_list) rules_count_per_core = rules_count / mc_pool.cores_count; - start_batch = clock(); + start_batch = rte_rdtsc(); for (i = 0; i < (uint32_t) rules_count_per_core; i++) { if (flows_list[i] == 0) break; @@ -907,12 +907,12 @@ destroy_flows(int port_id, uint8_t core_id, struct rte_flow **flows_list) * for this batch. */ if (!((i + 1) % rules_batch)) { - end_batch = clock(); + end_batch = rte_rdtsc(); delta = (double) (end_batch - start_batch); rules_batch_idx = ((i + 1) / rules_batch) - 1; - cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC; + cpu_time_per_batch[rules_batch_idx] = delta / rte_get_tsc_hz(); cpu_time_used += cpu_time_per_batch[rules_batch_idx]; - start_batch = clock(); + start_batch = rte_rdtsc(); } } @@ -985,7 +985,7 @@ insert_flows(int port_id, uint8_t core_id) flows_list[flow_index++] = flow; } - start_batch = clock(); + start_batch = rte_rdtsc(); for (counter = start_counter; counter < end_counter; counter++) { flow = generate_flow(port_id, flow_group, flow_attrs, flow_items, flow_actions, @@ -1011,12 +1011,12 @@ insert_flows(int port_id, uint8_t core_id) * for this batch. */ if (!((counter + 1) % rules_batch)) { - end_batch = clock(); + end_batch = rte_rdtsc(); delta = (double) (end_batch - start_batch); rules_batch_idx = ((counter + 1) / rules_batch) - 1; - cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC; + cpu_time_per_batch[rules_batch_idx] = delta / rte_get_tsc_hz(); cpu_time_used += cpu_time_per_batch[rules_batch_idx]; - start_batch = clock(); + start_batch = rte_rdtsc(); } } From patchwork Thu Nov 26 11:15:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wisam Monther X-Patchwork-Id: 84574 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C0581A052A; Thu, 26 Nov 2020 12:17:31 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 36B5DC9A6; Thu, 26 Nov 2020 12:16:20 +0100 (CET) Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by dpdk.org (Postfix) with ESMTP id EEC4FC93C; Thu, 26 Nov 2020 12:16:16 +0100 (CET) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Thu, 26 Nov 2020 03:16:18 -0800 Received: from nvidia.com (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 26 Nov 2020 11:16:06 +0000 From: Wisam Jaddo To: , , , CC: , , Date: Thu, 26 Nov 2020 13:15:43 +0200 Message-ID: <20201126111543.16928-5-wisamm@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201126111543.16928-1-wisamm@nvidia.com> References: <20201126111543.16928-1-wisamm@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1606389378; bh=yZBO+RlNh8AREJ8hU8jqHuZiiZqLXxLajN3iUjk5S9g=; h=From:To:CC:Subject:Date:Message-ID:X-Mailer:In-Reply-To: References:MIME-Version:Content-Type:X-Originating-IP: X-ClientProxiedBy; b=TW2QoVa7Nm82Za9aj80NcV6R3QlhDiko7ijg1XxO+fzsvnrVRpVy1giPWTPNGoRdB PZ4cbz2o0jHCYGYafz0/YyHUD9bvuaHzidrIPBdC9GUH2Imr05lO+954CyL8c4Li/t dva7KfvOSaAcu6eCZMNBzqG6rDqCDffB6BotvG+ynlhAafZfxTBSKll4ek3PFXUhQF /HhJSpVGQPuunk6UBMhkXhBm2CZMcq1150ezMZDrzOZ5oo7nhE6xOg/QuUE8aKbl58 ezh1zumO+2kQHygiNu86lBzXMdr3YSZJWtBiS7sutDb37LXGfY5ebO8SMkUy6x9YIf sc/DobVE834Xw== Subject: [dpdk-dev] [PATCH 4/4] app/flow-perf: remove redundant items memset and vars X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Since items are static then the default values will be zero, thus the memset to zero value is just a redundant code. Also remove the all not needed variables, that can be replaced with direct set to the structure itself. Fixes: bf3688f1e816 ("app/flow-perf: add insertion rate calculation") Cc: wisamm@mellanox.com Cc: stable@dpdk.org Signed-off-by: Wisam Jaddo Reviewed-by: Alexander Kozyrev Reviewed-by: Suanming Mou --- app/test-flow-perf/actions_gen.c | 30 +++----- app/test-flow-perf/items_gen.c | 123 ++++++++----------------------- 2 files changed, 44 insertions(+), 109 deletions(-) diff --git a/app/test-flow-perf/actions_gen.c b/app/test-flow-perf/actions_gen.c index 1364407056..c3545ba32f 100644 --- a/app/test-flow-perf/actions_gen.c +++ b/app/test-flow-perf/actions_gen.c @@ -143,12 +143,10 @@ add_set_meta(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_meta meta_action; - - do { - meta_action.data = RTE_BE32(META_DATA); - meta_action.mask = RTE_BE32(0xffffffff); - } while (0); + static struct rte_flow_action_set_meta meta_action = { + .data = RTE_BE32(META_DATA), + .mask = RTE_BE32(0xffffffff), + }; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_META; actions[actions_counter].conf = &meta_action; @@ -159,13 +157,11 @@ add_set_tag(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_set_tag tag_action; - - do { - tag_action.data = RTE_BE32(META_DATA); - tag_action.mask = RTE_BE32(0xffffffff); - tag_action.index = TAG_INDEX; - } while (0); + static struct rte_flow_action_set_tag tag_action = { + .data = RTE_BE32(META_DATA), + .mask = RTE_BE32(0xffffffff), + .index = TAG_INDEX, + }; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TAG; actions[actions_counter].conf = &tag_action; @@ -176,11 +172,9 @@ add_port_id(struct rte_flow_action *actions, uint8_t actions_counter, __rte_unused struct additional_para para) { - static struct rte_flow_action_port_id port_id; - - do { - port_id.id = PORT_ID_DST; - } while (0); + static struct rte_flow_action_port_id port_id = { + .id = PORT_ID_DST, + }; actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_PORT_ID; actions[actions_counter].conf = &port_id; diff --git a/app/test-flow-perf/items_gen.c b/app/test-flow-perf/items_gen.c index 0950023608..ccebc08b39 100644 --- a/app/test-flow-perf/items_gen.c +++ b/app/test-flow-perf/items_gen.c @@ -26,9 +26,6 @@ add_ether(struct rte_flow_item *items, static struct rte_flow_item_eth eth_spec; static struct rte_flow_item_eth eth_mask; - memset(ð_spec, 0, sizeof(struct rte_flow_item_eth)); - memset(ð_mask, 0, sizeof(struct rte_flow_item_eth)); - items[items_counter].type = RTE_FLOW_ITEM_TYPE_ETH; items[items_counter].spec = ð_spec; items[items_counter].mask = ð_mask; @@ -39,16 +36,12 @@ add_vlan(struct rte_flow_item *items, uint8_t items_counter, __rte_unused struct additional_para para) { - static struct rte_flow_item_vlan vlan_spec; - static struct rte_flow_item_vlan vlan_mask; - - uint16_t vlan_value = VLAN_VALUE; - - memset(&vlan_spec, 0, sizeof(struct rte_flow_item_vlan)); - memset(&vlan_mask, 0, sizeof(struct rte_flow_item_vlan)); - - vlan_spec.tci = RTE_BE16(vlan_value); - vlan_mask.tci = RTE_BE16(0xffff); + static struct rte_flow_item_vlan vlan_spec = { + .tci = RTE_BE16(VLAN_VALUE), + }; + static struct rte_flow_item_vlan vlan_mask = { + .tci = RTE_BE16(0xffff), + }; items[items_counter].type = RTE_FLOW_ITEM_TYPE_VLAN; items[items_counter].spec = &vlan_spec; @@ -63,9 +56,6 @@ add_ipv4(struct rte_flow_item *items, static struct rte_flow_item_ipv4 ipv4_masks[RTE_MAX_LCORE] __rte_cache_aligned; uint8_t ti = para.core_idx; - memset(&ipv4_specs[ti], 0, sizeof(struct rte_flow_item_ipv4)); - memset(&ipv4_masks[ti], 0, sizeof(struct rte_flow_item_ipv4)); - ipv4_specs[ti].hdr.src_addr = RTE_BE32(para.src_ip); ipv4_masks[ti].hdr.src_addr = RTE_BE32(0xffffffff); @@ -83,9 +73,6 @@ add_ipv6(struct rte_flow_item *items, static struct rte_flow_item_ipv6 ipv6_masks[RTE_MAX_LCORE] __rte_cache_aligned; uint8_t ti = para.core_idx; - memset(&ipv6_specs[ti], 0, sizeof(struct rte_flow_item_ipv6)); - memset(&ipv6_masks[ti], 0, sizeof(struct rte_flow_item_ipv6)); - /** Set ipv6 src **/ memset(&ipv6_specs[ti].hdr.src_addr, para.src_ip, sizeof(ipv6_specs->hdr.src_addr) / 2); @@ -107,9 +94,6 @@ add_tcp(struct rte_flow_item *items, static struct rte_flow_item_tcp tcp_spec; static struct rte_flow_item_tcp tcp_mask; - memset(&tcp_spec, 0, sizeof(struct rte_flow_item_tcp)); - memset(&tcp_mask, 0, sizeof(struct rte_flow_item_tcp)); - items[items_counter].type = RTE_FLOW_ITEM_TYPE_TCP; items[items_counter].spec = &tcp_spec; items[items_counter].mask = &tcp_mask; @@ -123,9 +107,6 @@ add_udp(struct rte_flow_item *items, static struct rte_flow_item_udp udp_spec; static struct rte_flow_item_udp udp_mask; - memset(&udp_spec, 0, sizeof(struct rte_flow_item_udp)); - memset(&udp_mask, 0, sizeof(struct rte_flow_item_udp)); - items[items_counter].type = RTE_FLOW_ITEM_TYPE_UDP; items[items_counter].spec = &udp_spec; items[items_counter].mask = &udp_mask; @@ -144,9 +125,6 @@ add_vxlan(struct rte_flow_item *items, vni_value = VNI_VALUE; - memset(&vxlan_specs[ti], 0, sizeof(struct rte_flow_item_vxlan)); - memset(&vxlan_masks[ti], 0, sizeof(struct rte_flow_item_vxlan)); - /* Set standard vxlan vni */ for (i = 0; i < 3; i++) { vxlan_specs[ti].vni[2 - i] = vni_value >> (i * 8); @@ -174,9 +152,6 @@ add_vxlan_gpe(struct rte_flow_item *items, vni_value = VNI_VALUE; - memset(&vxlan_gpe_specs[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe)); - memset(&vxlan_gpe_masks[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe)); - /* Set vxlan-gpe vni */ for (i = 0; i < 3; i++) { vxlan_gpe_specs[ti].vni[2 - i] = vni_value >> (i * 8); @@ -196,18 +171,12 @@ add_gre(struct rte_flow_item *items, uint8_t items_counter, __rte_unused struct additional_para para) { - static struct rte_flow_item_gre gre_spec; - static struct rte_flow_item_gre gre_mask; - - uint16_t proto; - - proto = RTE_ETHER_TYPE_TEB; - - memset(&gre_spec, 0, sizeof(struct rte_flow_item_gre)); - memset(&gre_mask, 0, sizeof(struct rte_flow_item_gre)); - - gre_spec.protocol = RTE_BE16(proto); - gre_mask.protocol = RTE_BE16(0xffff); + static struct rte_flow_item_gre gre_spec = { + .protocol = RTE_BE16(RTE_ETHER_TYPE_TEB), + }; + static struct rte_flow_item_gre gre_mask = { + .protocol = RTE_BE16(0xffff), + }; items[items_counter].type = RTE_FLOW_ITEM_TYPE_GRE; items[items_counter].spec = &gre_spec; @@ -227,9 +196,6 @@ add_geneve(struct rte_flow_item *items, vni_value = VNI_VALUE; - memset(&geneve_specs[ti], 0, sizeof(struct rte_flow_item_geneve)); - memset(&geneve_masks[ti], 0, sizeof(struct rte_flow_item_geneve)); - for (i = 0; i < 3; i++) { geneve_specs[ti].vni[2 - i] = vni_value >> (i * 8); geneve_masks[ti].vni[2 - i] = 0xff; @@ -245,18 +211,12 @@ add_gtp(struct rte_flow_item *items, uint8_t items_counter, __rte_unused struct additional_para para) { - static struct rte_flow_item_gtp gtp_spec; - static struct rte_flow_item_gtp gtp_mask; - - uint32_t teid_value; - - teid_value = TEID_VALUE; - - memset(>p_spec, 0, sizeof(struct rte_flow_item_gtp)); - memset(>p_mask, 0, sizeof(struct rte_flow_item_gtp)); - - gtp_spec.teid = RTE_BE32(teid_value); - gtp_mask.teid = RTE_BE32(0xffffffff); + static struct rte_flow_item_gtp gtp_spec = { + .teid = RTE_BE32(TEID_VALUE), + }; + static struct rte_flow_item_gtp gtp_mask = { + .teid = RTE_BE32(0xffffffff), + }; items[items_counter].type = RTE_FLOW_ITEM_TYPE_GTP; items[items_counter].spec = >p_spec; @@ -268,18 +228,12 @@ add_meta_data(struct rte_flow_item *items, uint8_t items_counter, __rte_unused struct additional_para para) { - static struct rte_flow_item_meta meta_spec; - static struct rte_flow_item_meta meta_mask; - - uint32_t data; - - data = META_DATA; - - memset(&meta_spec, 0, sizeof(struct rte_flow_item_meta)); - memset(&meta_mask, 0, sizeof(struct rte_flow_item_meta)); - - meta_spec.data = RTE_BE32(data); - meta_mask.data = RTE_BE32(0xffffffff); + static struct rte_flow_item_meta meta_spec = { + .data = RTE_BE32(META_DATA), + }; + static struct rte_flow_item_meta meta_mask = { + .data = RTE_BE32(0xffffffff), + }; items[items_counter].type = RTE_FLOW_ITEM_TYPE_META; items[items_counter].spec = &meta_spec; @@ -292,21 +246,14 @@ add_meta_tag(struct rte_flow_item *items, uint8_t items_counter, __rte_unused struct additional_para para) { - static struct rte_flow_item_tag tag_spec; - static struct rte_flow_item_tag tag_mask; - uint32_t data; - uint8_t index; - - data = META_DATA; - index = TAG_INDEX; - - memset(&tag_spec, 0, sizeof(struct rte_flow_item_tag)); - memset(&tag_mask, 0, sizeof(struct rte_flow_item_tag)); - - tag_spec.data = RTE_BE32(data); - tag_mask.data = RTE_BE32(0xffffffff); - tag_spec.index = index; - tag_mask.index = 0xff; + static struct rte_flow_item_tag tag_spec = { + .data = RTE_BE32(META_DATA), + .index = TAG_INDEX, + }; + static struct rte_flow_item_tag tag_mask = { + .data = RTE_BE32(0xffffffff), + .index = 0xff, + }; items[items_counter].type = RTE_FLOW_ITEM_TYPE_TAG; items[items_counter].spec = &tag_spec; @@ -321,9 +268,6 @@ add_icmpv4(struct rte_flow_item *items, static struct rte_flow_item_icmp icmpv4_spec; static struct rte_flow_item_icmp icmpv4_mask; - memset(&icmpv4_spec, 0, sizeof(struct rte_flow_item_icmp)); - memset(&icmpv4_mask, 0, sizeof(struct rte_flow_item_icmp)); - items[items_counter].type = RTE_FLOW_ITEM_TYPE_ICMP; items[items_counter].spec = &icmpv4_spec; items[items_counter].mask = &icmpv4_mask; @@ -337,9 +281,6 @@ add_icmpv6(struct rte_flow_item *items, static struct rte_flow_item_icmp6 icmpv6_spec; static struct rte_flow_item_icmp6 icmpv6_mask; - memset(&icmpv6_spec, 0, sizeof(struct rte_flow_item_icmp6)); - memset(&icmpv6_mask, 0, sizeof(struct rte_flow_item_icmp6)); - items[items_counter].type = RTE_FLOW_ITEM_TYPE_ICMP6; items[items_counter].spec = &icmpv6_spec; items[items_counter].mask = &icmpv6_mask;