From patchwork Fri Sep 6 09:45:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58812 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 48AD51F13D; Fri, 6 Sep 2019 11:46:08 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 83FA61F13D for ; Fri, 6 Sep 2019 11:46:06 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 14A071570; Fri, 6 Sep 2019 02:46:06 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id CA03C3F59C; Fri, 6 Sep 2019 02:46:03 -0700 (PDT) From: Ruifeng Wang To: bruce.richardson@intel.com, vladimir.medvedkin@intel.com, olivier.matz@6wind.com Cc: dev@dpdk.org, stephen@networkplumber.org, konstantin.ananyev@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, dharmik.thakkar@arm.com, nd@arm.com Date: Fri, 6 Sep 2019 17:45:29 +0800 Message-Id: <20190906094534.36060-2-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190906094534.36060-1-ruifeng.wang@arm.com> References: <20190822063457.41596-1-ruifeng.wang@arm.com> <20190906094534.36060-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Honnappa Nagarahalli Add a section to describe a design to integrate QSBR RCU library with other libraries in DPDK. Signed-off-by: Honnappa Nagarahalli --- doc/guides/prog_guide/rcu_lib.rst | 52 +++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst index 8fe5b1f73..211948530 100644 --- a/doc/guides/prog_guide/rcu_lib.rst +++ b/doc/guides/prog_guide/rcu_lib.rst @@ -186,3 +186,55 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid in debugging issues. One can mark the access to shared data structures on the reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if all the locks are unlocked. + +Integrating QSBR RCU with other libraries +----------------------------------------- + +Lock-free algorithms place additional burden on the application to reclaim +memory. Integrating memory reclamation mechanisms in the libraries help +remove some of the burden. Though QSBR method presents flexibility to +achieve performance, it presents challenges while integrating with libraries. + +The memory reclamation process using QSBR can be split into 4 parts: + +#. Initialization +#. Quiescent State Reporting +#. Reclaiming Resources +#. Shutdown + +The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc.. + +The application has to handle 'Initialization' and 'Quiescent State Reporting'. So, + +* the application has to create the RCU variable and register the reader threads to report their quiescent state. +* the application has to register the same RCU variable with the client library. +* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state. + +The client library will handle 'Reclaiming Resources' part of the process. The +client libraries will make use of the writer thread context to execute the memory +reclamation algorithm. So, + +* client library should provide an API to register a RCU variable that it will use. +* client library should trigger the readers to report quiescent state status upon deleting the resources by calling ``rte_rcu_qsbr_start``. + +* client library should store the token and deleted resources for later use to free them after the readers have reported their quiescent state. Since the readers will report the quiescent state status in the order of deletion, the library must store the tokens/resources in the order in which the resources were deleted. A FIFO data structure would achieve the desired results. The length of the FIFO would depend on the rate of deletion and the rate at which the readers report their quiescent state. In the worst case the length of FIFO would be equal to the maximum number of resources the data structure supports. However, in most cases, the length will be much smaller. But, the client library should not take the length of FIFO as an input from the application. Instead, it should implement a data structure which should be able to grow/shrink dynamically. Overhead introduced by such a data structure on delete operations should be considered as well. + +* client library should query the quiescent state and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. This allows the application to do useful work while the readers report their quiescent state. If there are tokens/resources present in the FIFO already, the delete API should peek the head of the FIFO and check the quiescent state status. If the status is success, the token/resource should be dequeued and the resource should be freed. This process can be repeated till the quiescent state status for a token returns failure indicating that subsequent tokens will also fail quiescent state status query. The same process can be incorporated while adding new entries in the data structure if the client library runs out of resources. + +The 'Shutdown' process needs to be shared between the application and the +client library. + +* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function. + +* client library should check the quiescent state status of all the tokens that may be present in the FIFO and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the tokens do not pass the quiescent state check, the client library should print an error and stop the memory reclamation process. + +Integrating the resource reclamation with client libraries removes the burden from +the application and makes it easy to use lock-free algorithms. + +This design has several advantages over currently known methods. + +#. Application does not need a dedicated thread to reclaim resources. Memory + reclamation happens as part of the writer thread with little impact on + performance. +#. The client library has better control over the resources. For ex: the client + library can attempt to reclaim when it has run out of resources. From patchwork Fri Sep 6 09:45:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58813 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 90C6E1F2AD; Fri, 6 Sep 2019 11:46:14 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 9179B1F2AD for ; Fri, 6 Sep 2019 11:46:12 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1FC441570; Fri, 6 Sep 2019 02:46:12 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A67CA3F59C; Fri, 6 Sep 2019 02:46:09 -0700 (PDT) From: Ruifeng Wang To: bruce.richardson@intel.com, vladimir.medvedkin@intel.com, olivier.matz@6wind.com Cc: dev@dpdk.org, stephen@networkplumber.org, konstantin.ananyev@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, dharmik.thakkar@arm.com, nd@arm.com, Ruifeng Wang Date: Fri, 6 Sep 2019 17:45:30 +0800 Message-Id: <20190906094534.36060-3-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190906094534.36060-1-ruifeng.wang@arm.com> References: <20190822063457.41596-1-ruifeng.wang@arm.com> <20190906094534.36060-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The peek API allows fetching the next available object in the ring without dequeuing it. This helps in scenarios where dequeuing of objects depend on their value. Signed-off-by: Dharmik Thakkar Signed-off-by: Ruifeng Wang Reviewed-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu --- lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, r->cons.single, available); } +/** + * Peek one object from a ring. + * + * The peek API allows fetching the next available object in the ring + * without dequeuing it. This API is not multi-thread safe with respect + * to other consumer threads. + * + * @param r + * A pointer to the ring structure. + * @param obj_p + * A pointer to a void * pointer (object) that will be filled. + * @return + * - 0: Success, object available + * - -ENOENT: Not enough entries in the ring. + */ +__rte_experimental +static __rte_always_inline int +rte_ring_peek(struct rte_ring *r, void **obj_p) +{ + uint32_t prod_tail = r->prod.tail; + uint32_t cons_head = r->cons.head; + uint32_t count = (prod_tail - cons_head) & r->mask; + unsigned int n = 1; + if (count) { + DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *); + return 0; + } + return -ENOENT; +} + #ifdef __cplusplus } #endif From patchwork Fri Sep 6 09:45:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58814 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 578351F2AC; Fri, 6 Sep 2019 11:46:18 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 326781F2C6 for ; Fri, 6 Sep 2019 11:46:17 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BB8981570; Fri, 6 Sep 2019 02:46:16 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 54E8B3F59C; Fri, 6 Sep 2019 02:46:14 -0700 (PDT) From: Ruifeng Wang To: bruce.richardson@intel.com, vladimir.medvedkin@intel.com, olivier.matz@6wind.com Cc: dev@dpdk.org, stephen@networkplumber.org, konstantin.ananyev@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, dharmik.thakkar@arm.com, nd@arm.com, Ruifeng Wang Date: Fri, 6 Sep 2019 17:45:31 +0800 Message-Id: <20190906094534.36060-4-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190906094534.36060-1-ruifeng.wang@arm.com> References: <20190822063457.41596-1-ruifeng.wang@arm.com> <20190906094534.36060-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, the tbl8 group is freed even though the readers might be using the tbl8 group entries. The freed tbl8 group can be reallocated quickly. This results in incorrect lookup results. RCU QSBR process is integrated for safe tbl8 group reclaim. Refer to RCU documentation to understand various aspects of integrating RCU library into other libraries. Signed-off-by: Ruifeng Wang Reviewed-by: Honnappa Nagarahalli --- lib/librte_lpm/Makefile | 3 +- lib/librte_lpm/meson.build | 2 + lib/librte_lpm/rte_lpm.c | 223 +++++++++++++++++++++++++++-- lib/librte_lpm/rte_lpm.h | 22 +++ lib/librte_lpm/rte_lpm_version.map | 6 + lib/meson.build | 3 +- 6 files changed, 244 insertions(+), 15 deletions(-) diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index a7946a1c5..ca9e16312 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_lpm.a +CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal -lrte_hash +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu EXPORT_MAP := rte_lpm_version.map diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build index a5176d8ae..19a35107f 100644 --- a/lib/librte_lpm/meson.build +++ b/lib/librte_lpm/meson.build @@ -2,9 +2,11 @@ # Copyright(c) 2017 Intel Corporation version = 2 +allow_experimental_apis = true sources = files('rte_lpm.c', 'rte_lpm6.c') headers = files('rte_lpm.h', 'rte_lpm6.h') # since header files have different names, we can install all vector headers # without worrying about which architecture we actually need headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h') deps += ['hash'] +deps += ['rcu'] diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index 3a929a1b1..9764b8de6 100644 --- a/lib/librte_lpm/rte_lpm.c +++ b/lib/librte_lpm/rte_lpm.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arm Limited */ #include @@ -22,6 +23,7 @@ #include #include #include +#include #include "rte_lpm.h" @@ -39,6 +41,11 @@ enum valid_flag { VALID }; +struct __rte_lpm_qs_item { + uint64_t token; /**< QSBR token.*/ + uint32_t index; /**< tbl8 group index.*/ +}; + /* Macro to enable/disable run-time checks. */ #if defined(RTE_LIBRTE_LPM_DEBUG) #include @@ -381,6 +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm) rte_mcfg_tailq_write_unlock(); + rte_ring_free(lpm->qs_fifo); rte_free(lpm->tbl8); rte_free(lpm->rules_tbl); rte_free(lpm); @@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04); MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm), rte_lpm_free_v1604); +/* Add an item into FIFO. + * return: 0 - success + */ +static int +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo, + struct __rte_lpm_qs_item *item) +{ + if (rte_ring_free_count(fifo) < 2) { + RTE_LOG(ERR, LPM, "QS FIFO full\n"); + rte_errno = ENOSPC; + return 1; + } + + (void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token); + (void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index); + + return 0; +} + +/* Remove item from FIFO. + * Used when data observed by rte_ring_peek. + */ +static void +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo, + struct __rte_lpm_qs_item *item) +{ + void *obj_token = NULL; + void *obj_index = NULL; + + (void)rte_ring_sc_dequeue(fifo, &obj_token); + (void)rte_ring_sc_dequeue(fifo, &obj_index); + + if (item) { + item->token = (uint64_t)((uintptr_t)obj_token); + item->index = (uint32_t)((uintptr_t)obj_index); + } +} + +/* Max number of tbl8 groups to reclaim at one time. */ +#define RCU_QSBR_RECLAIM_SIZE 8 + +/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL), + * reclaim will be triggered by tbl8_free. + */ +#define RCU_QSBR_RECLAIM_LEVEL 3 + +/* Reclaim some tbl8 groups based on quiescent state check. + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max. + * Params: lpm - lpm object handle + * index - (onput) one of successfully reclaimed tbl8 groups + * return: 0 - success, 1 - no group reclaimed. + */ +static uint32_t +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index) +{ + struct __rte_lpm_qs_item qs_item; + struct rte_lpm_tbl_entry *tbl8_entry = NULL; + void *obj_token; + uint32_t cnt = 0; + + RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n"); + /* Check reader threads quiescent state and + * reclaim as much tbl8 groups as possible. + */ + while ((cnt < RCU_QSBR_RECLAIM_SIZE) && + (rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) && + (rte_rcu_qsbr_check(lpm->qsv, (uint64_t)((uintptr_t)obj_token), + false) == 1)) { + __rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item); + + tbl8_entry = &lpm->tbl8[qs_item.index * + RTE_LPM_TBL8_GROUP_NUM_ENTRIES]; + memset(&tbl8_entry[0], 0, + RTE_LPM_TBL8_GROUP_NUM_ENTRIES * + sizeof(tbl8_entry[0])); + cnt++; + } + + RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt); + if (cnt) { + if (index) + *index = qs_item.index; + return 0; + } + return 1; +} + +/* Trigger tbl8 group reclaim when necessary. + * Reclaim happens when RCU QSBR queue usage + * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL). + */ +static void +__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm) +{ + if (lpm->qsv == NULL) + return; + + if (rte_ring_count(lpm->qs_fifo) < + (rte_ring_get_capacity(lpm->qs_fifo) >> RCU_QSBR_RECLAIM_LEVEL)) + return; + + (void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL); +} + +/* Associate QSBR variable with an LPM object. + */ +int +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) +{ + uint32_t qs_fifo_size; + char rcu_ring_name[RTE_RING_NAMESIZE]; + + if ((lpm == NULL) || (v == NULL)) { + rte_errno = EINVAL; + return 1; + } + + if (lpm->qsv) { + rte_errno = EEXIST; + return 1; + } + + /* round up qs_fifo_size to next power of two that is not less than + * number_tbl8s. Will store 'token' and 'index'. + */ + qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1); + + /* Init QSBR reclaiming FIFO. */ + snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm->name); + lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size, + SOCKET_ID_ANY, 0); + if (lpm->qs_fifo == NULL) { + RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation failed\n"); + rte_errno = ENOMEM; + return 1; + } + lpm->qsv = v; + + return 0; +} + /* * Adds a rule to the rule table. * @@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth) return -EINVAL; } +static int32_t +tbl8_alloc_reclaimed(struct rte_lpm *lpm) +{ + struct rte_lpm_tbl_entry *tbl8_entry = NULL; + uint32_t index; + + if (lpm->qsv != NULL) { + if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) { + /* Set the last reclaimed tbl8 group as VALID. */ + struct rte_lpm_tbl_entry new_tbl8_entry = { + .next_hop = 0, + .valid = INVALID, + .depth = 0, + .valid_group = VALID, + }; + + tbl8_entry = &lpm->tbl8[index * + RTE_LPM_TBL8_GROUP_NUM_ENTRIES]; + __atomic_store(tbl8_entry, &new_tbl8_entry, + __ATOMIC_RELAXED); + + /* Return group index for reclaimed tbl8 group. */ + return index; + } + } + + return -ENOSPC; +} + /* * Find, clean and allocate a tbl8. */ @@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8) } static int32_t -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s) +tbl8_alloc_v1604(struct rte_lpm *lpm) { uint32_t group_idx; /* tbl8 group index. */ struct rte_lpm_tbl_entry *tbl8_entry; /* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */ - for (group_idx = 0; group_idx < number_tbl8s; group_idx++) { - tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES]; + for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) { + tbl8_entry = &lpm->tbl8[group_idx * + RTE_LPM_TBL8_GROUP_NUM_ENTRIES]; /* If a free tbl8 group is found clean it and set as VALID. */ if (!tbl8_entry->valid_group) { struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -708,8 +887,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s) } } - /* If there are no tbl8 groups free then return error. */ - return -ENOSPC; + /* If there are no tbl8 groups free then check reclaim queue. */ + return tbl8_alloc_reclaimed(lpm); } static void @@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start) } static void -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start) +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start) { - /* Set tbl8 group invalid*/ + struct __rte_lpm_qs_item qs_item; struct rte_lpm_tbl_entry zero_tbl8_entry = {0}; - __atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry, - __ATOMIC_RELAXED); + if (lpm->qsv != NULL) { + /* Push into QSBR FIFO. */ + qs_item.token = rte_rcu_qsbr_start(lpm->qsv); + qs_item.index = + tbl8_group_start / RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0) + /* This should never happen as FIFO size is big enough + * to hold all tbl8 groups. + */ + RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n"); + + /* Speculatively reclaim tbl8 groups. + * Help spread the reclaim work load across multiple calls. + */ + __rte_lpm_rcu_qsbr_try_reclaim(lpm); + } else { + /* Set tbl8 group invalid*/ + __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry, + __ATOMIC_RELAXED); + } } static __rte_noinline int32_t @@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, if (!lpm->tbl24[tbl24_index].valid) { /* Search for a free tbl8 group. */ - tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s); + tbl8_group_index = tbl8_alloc_v1604(lpm); /* Check tbl8 allocation was successful. */ if (tbl8_group_index < 0) { @@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, } /* If valid entry but not extended calculate the index into Table8. */ else if (lpm->tbl24[tbl24_index].valid_group == 0) { /* Search for free tbl8 group. */ - tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s); + tbl8_group_index = tbl8_alloc_v1604(lpm); if (tbl8_group_index < 0) { return tbl8_group_index; @@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, */ lpm->tbl24[tbl24_index].valid = 0; __atomic_thread_fence(__ATOMIC_RELEASE); - tbl8_free_v1604(lpm->tbl8, tbl8_group_start); + tbl8_free_v1604(lpm, tbl8_group_start); } else if (tbl8_recycle_index > -1) { /* Update tbl24 entry. */ struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7 +2031,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, __atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry, __ATOMIC_RELAXED); __atomic_thread_fence(__ATOMIC_RELEASE); - tbl8_free_v1604(lpm->tbl8, tbl8_group_start); + tbl8_free_v1604(lpm, tbl8_group_start); } #undef group_idx return 0; diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index 906ec4483..5079fb262 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arm Limited */ #ifndef _RTE_LPM_H_ @@ -21,6 +22,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -186,6 +188,8 @@ struct rte_lpm { __rte_cache_aligned; /**< LPM tbl24 table. */ struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */ struct rte_lpm_rule *rules_tbl; /**< LPM rules. */ + struct rte_rcu_qsbr *qsv; /**< RCU QSBR variable for tbl8 group.*/ + struct rte_ring *qs_fifo; /**< RCU QSBR reclaiming queue. */ }; /** @@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm); void rte_lpm_free_v1604(struct rte_lpm *lpm); +/** + * Associate RCU QSBR variable with an LPM object. + * + * @param lpm + * the lpm object to add RCU QSBR + * @param v + * RCU QSBR variable + * @return + * On success - 0 + * On error - 1 with error code set in rte_errno. + * Possible rte_errno codes are: + * - EINVAL - invalid pointer + * - EEXIST - already added QSBR + * - ENOMEM - memory allocation failure + */ +__rte_experimental +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v); + /** * Add a rule to the LPM table. * diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map index 90beac853..b353aabd2 100644 --- a/lib/librte_lpm/rte_lpm_version.map +++ b/lib/librte_lpm/rte_lpm_version.map @@ -44,3 +44,9 @@ DPDK_17.05 { rte_lpm6_lookup_bulk_func; } DPDK_16.04; + +EXPERIMENTAL { + global: + + rte_lpm_rcu_qsbr_add; +}; diff --git a/lib/meson.build b/lib/meson.build index e5ff83893..3a96f005d 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -11,6 +11,7 @@ libraries = [ 'kvargs', # eal depends on kvargs 'eal', # everything depends on eal + 'rcu', # hash and lpm depends on this 'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core 'cmdline', 'metrics', # bitrate/latency stats depends on this @@ -22,7 +23,7 @@ libraries = [ 'gro', 'gso', 'ip_frag', 'jobstats', 'kni', 'latencystats', 'lpm', 'member', 'power', 'pdump', 'rawdev', - 'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost', + 'reorder', 'sched', 'security', 'stack', 'vhost', # ipsec lib depends on net, crypto and security 'ipsec', # add pkt framework libs which use other libs from above From patchwork Fri Sep 6 09:45:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58815 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D18BB1F2D9; Fri, 6 Sep 2019 11:46:21 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id CFA061F13E for ; Fri, 6 Sep 2019 11:46:20 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 64A5A1570; Fri, 6 Sep 2019 02:46:20 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id F16093F59C; Fri, 6 Sep 2019 02:46:17 -0700 (PDT) From: Ruifeng Wang To: bruce.richardson@intel.com, vladimir.medvedkin@intel.com, olivier.matz@6wind.com Cc: dev@dpdk.org, stephen@networkplumber.org, konstantin.ananyev@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, dharmik.thakkar@arm.com, nd@arm.com, Ruifeng Wang Date: Fri, 6 Sep 2019 17:45:32 +0800 Message-Id: <20190906094534.36060-5-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190906094534.36060-1-ruifeng.wang@arm.com> References: <20190822063457.41596-1-ruifeng.wang@arm.com> <20190906094534.36060-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add positive and negative tests for API rte_lpm_rcu_qsbr_add. Also test LPM library behavior when RCU QSBR is enabled. Signed-off-by: Ruifeng Wang Reviewed-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli --- app/test/test_lpm.c | 153 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 152 insertions(+), 1 deletion(-) diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index e969fe051..cfd372395 100644 --- a/app/test/test_lpm.c +++ b/app/test/test_lpm.c @@ -8,6 +8,7 @@ #include #include +#include #include "test.h" #include "test_xmmt_ops.h" @@ -40,6 +41,8 @@ static int32_t test15(void); static int32_t test16(void); static int32_t test17(void); static int32_t test18(void); +static int32_t test19(void); +static int32_t test20(void); rte_lpm_test tests[] = { /* Test Cases */ @@ -61,7 +64,9 @@ rte_lpm_test tests[] = { test15, test16, test17, - test18 + test18, + test19, + test20 }; #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0])) @@ -1266,6 +1271,152 @@ test18(void) return PASS; } +/* + * rte_lpm_rcu_qsbr_add positive and negative tests. + * - Add RCU QSBR variable to LPM + * - Add another RCU QSBR variable to LPM + * - Check LPM attached RCU QSBR variable and FIFO queue + */ +int32_t +test19(void) +{ + struct rte_lpm *lpm = NULL; + struct rte_lpm_config config; + size_t sz; + struct rte_rcu_qsbr *qsv; + struct rte_rcu_qsbr *qsv2; + int32_t status; + + config.max_rules = MAX_RULES; + config.number_tbl8s = NUMBER_TBL8S; + config.flags = 0; + + lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config); + TEST_LPM_ASSERT(lpm != NULL); + + /* Create RCU QSBR variable */ + sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE); + qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz, + RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY); + TEST_LPM_ASSERT(qsv != NULL); + + status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE); + TEST_LPM_ASSERT(status == 0); + + /* Attach RCU QSBR to LPM table */ + status = rte_lpm_rcu_qsbr_add(lpm, qsv); + TEST_LPM_ASSERT(status == 0); + + /* Create and attach another RCU QSBR to LPM table */ + qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz, + RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY); + TEST_LPM_ASSERT(qsv2 != NULL); + + status = rte_lpm_rcu_qsbr_add(lpm, qsv2); + TEST_LPM_ASSERT(status != 0); + + TEST_LPM_ASSERT(lpm->qsv == qsv); + TEST_LPM_ASSERT(lpm->qs_fifo != NULL); + + rte_lpm_free(lpm); + rte_free(qsv); + rte_free(qsv2); + + return PASS; +} + +/* + * rte_lpm_rcu_qsbr_add functional test. + * - Create LPM which supports 1 tbl8 group at max + * - Add RCU QSBR variable to LPM + * - Add a rule with depth=28 (> 24) + * - Register a reader thread (not a real thread) + * - Reader lookup existing rule + * - Writer delete the rule + * - Reader lookup the rule + * - Writer re-add the rule (no available tbl8 group) + * - Reader report quiescent state and unregister + * - Writer re-add the rule + * - Reader lookup the rule + */ +int32_t +test20(void) +{ + struct rte_lpm *lpm = NULL; + struct rte_lpm_config config; + size_t sz; + struct rte_rcu_qsbr *qsv; + int32_t status; + uint32_t ip, next_hop, next_hop_return; + uint8_t depth; + + config.max_rules = MAX_RULES; + config.number_tbl8s = 1; + config.flags = 0; + + lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config); + TEST_LPM_ASSERT(lpm != NULL); + + /* Create RCU QSBR variable */ + sz = rte_rcu_qsbr_get_memsize(1); + qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz, + RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY); + TEST_LPM_ASSERT(qsv != NULL); + + status = rte_rcu_qsbr_init(qsv, 1); + TEST_LPM_ASSERT(status == 0); + + /* Attach RCU QSBR to LPM table */ + status = rte_lpm_rcu_qsbr_add(lpm, qsv); + TEST_LPM_ASSERT(status == 0); + + ip = RTE_IPV4(192, 18, 100, 100); + depth = 28; + next_hop = 1; + status = rte_lpm_add(lpm, ip, depth, next_hop); + TEST_LPM_ASSERT(status == 0); + TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group); + + /* Register pseudo reader */ + status = rte_rcu_qsbr_thread_register(qsv, 0); + TEST_LPM_ASSERT(status == 0); + rte_rcu_qsbr_thread_online(qsv, 0); + + status = rte_lpm_lookup(lpm, ip, &next_hop_return); + TEST_LPM_ASSERT(status == 0); + TEST_LPM_ASSERT(next_hop_return == next_hop); + + /* Writer update */ + status = rte_lpm_delete(lpm, ip, depth); + TEST_LPM_ASSERT(status == 0); + TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid); + + status = rte_lpm_lookup(lpm, ip, &next_hop_return); + TEST_LPM_ASSERT(status != 0); + + status = rte_lpm_add(lpm, ip, depth, next_hop); + TEST_LPM_ASSERT(status != 0); + + /* Reader quiescent */ + rte_rcu_qsbr_quiescent(qsv, 0); + + status = rte_lpm_add(lpm, ip, depth, next_hop); + TEST_LPM_ASSERT(status == 0); + + rte_rcu_qsbr_thread_offline(qsv, 0); + status = rte_rcu_qsbr_thread_unregister(qsv, 0); + TEST_LPM_ASSERT(status == 0); + + status = rte_lpm_lookup(lpm, ip, &next_hop_return); + TEST_LPM_ASSERT(status == 0); + TEST_LPM_ASSERT(next_hop_return == next_hop); + + rte_lpm_free(lpm); + rte_free(qsv); + + return PASS; +} + /* * Do all unit tests. */ From patchwork Fri Sep 6 09:45:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58816 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D2F251F2BC; Fri, 6 Sep 2019 11:46:26 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 8F9BF1F13E; Fri, 6 Sep 2019 11:46:24 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1DC8A1570; Fri, 6 Sep 2019 02:46:24 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9A48D3F59C; Fri, 6 Sep 2019 02:46:21 -0700 (PDT) From: Ruifeng Wang To: bruce.richardson@intel.com, vladimir.medvedkin@intel.com, olivier.matz@6wind.com Cc: dev@dpdk.org, stephen@networkplumber.org, konstantin.ananyev@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, dharmik.thakkar@arm.com, nd@arm.com, stable@dpdk.org Date: Fri, 6 Sep 2019 17:45:33 +0800 Message-Id: <20190906094534.36060-6-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190906094534.36060-1-ruifeng.wang@arm.com> References: <20190822063457.41596-1-ruifeng.wang@arm.com> <20190906094534.36060-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Honnappa Nagarahalli total_time needs to be reset to measure the cycles for delete API. Fixes: af75078fece3 ("first public release") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu Reviewed-by: Ruifeng Wang --- app/test/test_lpm_perf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c index 77eea66ad..a2578fe90 100644 --- a/app/test/test_lpm_perf.c +++ b/app/test/test_lpm_perf.c @@ -460,7 +460,7 @@ test_lpm_perf(void) (double)total_time / ((double)ITERATIONS * BATCH_SIZE), (count * 100.0) / (double)(ITERATIONS * BATCH_SIZE)); - /* Delete */ + /* Measure Delete */ status = 0; begin = rte_rdtsc(); @@ -470,7 +470,7 @@ test_lpm_perf(void) large_route_table[i].depth); } - total_time += rte_rdtsc() - begin; + total_time = rte_rdtsc() - begin; printf("Average LPM Delete: %g cycles\n", (double)total_time / NUM_ROUTE_ENTRIES); From patchwork Fri Sep 6 09:45:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 58817 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BF6FE1F2EE; Fri, 6 Sep 2019 11:46:30 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 65E831F2E9 for ; Fri, 6 Sep 2019 11:46:28 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C27B71570; Fri, 6 Sep 2019 02:46:27 -0700 (PDT) Received: from net-arm-c2400-02.shanghai.arm.com (net-arm-c2400-02.shanghai.arm.com [10.169.40.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 87E713F59C; Fri, 6 Sep 2019 02:46:25 -0700 (PDT) From: Ruifeng Wang To: bruce.richardson@intel.com, vladimir.medvedkin@intel.com, olivier.matz@6wind.com Cc: dev@dpdk.org, stephen@networkplumber.org, konstantin.ananyev@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, dharmik.thakkar@arm.com, nd@arm.com Date: Fri, 6 Sep 2019 17:45:34 +0800 Message-Id: <20190906094534.36060-7-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190906094534.36060-1-ruifeng.wang@arm.com> References: <20190822063457.41596-1-ruifeng.wang@arm.com> <20190906094534.36060-1-ruifeng.wang@arm.com> Subject: [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Honnappa Nagarahalli Add performance tests for RCU integration. The performance difference with and without RCU integration is very small (~1% to ~2%) on both Arm and x86 platforms. Signed-off-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu Reviewed-by: Ruifeng Wang --- app/test/test_lpm_perf.c | 274 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 271 insertions(+), 3 deletions(-) diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c index a2578fe90..475e5d488 100644 --- a/app/test/test_lpm_perf.c +++ b/app/test/test_lpm_perf.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arm Limited */ #include @@ -10,12 +11,23 @@ #include #include #include +#include #include #include +#include #include "test.h" #include "test_xmmt_ops.h" +struct rte_lpm *lpm; +static struct rte_rcu_qsbr *rv; +static volatile uint8_t writer_done; +static volatile uint32_t thr_id; +/* Report quiescent state interval every 8192 lookups. Larger critical + * sections in reader will result in writer polling multiple times. + */ +#define QSBR_REPORTING_INTERVAL 8192 + #define TEST_LPM_ASSERT(cond) do { \ if (!(cond)) { \ printf("Error at line %d: \n", __LINE__); \ @@ -24,6 +36,7 @@ } while(0) #define ITERATIONS (1 << 10) +#define RCU_ITERATIONS 10 #define BATCH_SIZE (1 << 12) #define BULK_SIZE 32 @@ -35,9 +48,13 @@ struct route_rule { }; struct route_rule large_route_table[MAX_RULE_NUM]; +/* Route table for routes with depth > 24 */ +struct route_rule large_ldepth_route_table[MAX_RULE_NUM]; static uint32_t num_route_entries; +static uint32_t num_ldepth_route_entries; #define NUM_ROUTE_ENTRIES num_route_entries +#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries enum { IP_CLASS_A, @@ -191,7 +208,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth) uint32_t ip_head_mask; uint32_t rule_num; uint32_t k; - struct route_rule *ptr_rule; + struct route_rule *ptr_rule, *ptr_ldepth_rule; if (ip_class == IP_CLASS_A) { /* IP Address class A */ fixed_bit_num = IP_HEAD_BIT_NUM_A; @@ -236,10 +253,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth) */ start = lrand48() & mask; ptr_rule = &large_route_table[num_route_entries]; + ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries]; for (k = 0; k < rule_num; k++) { ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth)) | ip_head_mask; ptr_rule->depth = depth; + /* If the depth of the route is more than 24, store it + * in another table as well. + */ + if (depth > 24) { + ptr_ldepth_rule->ip = ptr_rule->ip; + ptr_ldepth_rule->depth = ptr_rule->depth; + ptr_ldepth_rule++; + num_ldepth_route_entries++; + } ptr_rule++; start = (start + step) & mask; } @@ -273,6 +300,7 @@ static void generate_large_route_rule_table(void) uint8_t depth; num_route_entries = 0; + num_ldepth_route_entries = 0; memset(large_route_table, 0, sizeof(large_route_table)); for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) { @@ -316,10 +344,248 @@ print_route_distribution(const struct route_rule *table, uint32_t n) printf("\n"); } +/* Check condition and return an error if true. */ +static uint16_t enabled_core_ids[RTE_MAX_LCORE]; +static unsigned int num_cores; + +/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */ +static inline uint32_t +alloc_thread_id(void) +{ + uint32_t tmp_thr_id; + + tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED); + if (tmp_thr_id >= RTE_MAX_LCORE) + printf("Invalid thread id %u\n", tmp_thr_id); + + return tmp_thr_id; +} + +/* + * Reader thread using rte_lpm data structure without RCU. + */ +static int +test_lpm_reader(__attribute__((unused)) void *arg) +{ + int i; + uint32_t ip_batch[QSBR_REPORTING_INTERVAL]; + uint32_t next_hop_return = 0; + + do { + for (i = 0; i < QSBR_REPORTING_INTERVAL; i++) + ip_batch[i] = rte_rand(); + + for (i = 0; i < QSBR_REPORTING_INTERVAL; i++) + rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return); + + } while (!writer_done); + + return 0; +} + +/* + * Reader thread using rte_lpm data structure with RCU. + */ +static int +test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg) +{ + int i; + uint32_t thread_id = alloc_thread_id(); + uint32_t ip_batch[QSBR_REPORTING_INTERVAL]; + uint32_t next_hop_return = 0; + + /* Register this thread to report quiescent state */ + rte_rcu_qsbr_thread_register(rv, thread_id); + rte_rcu_qsbr_thread_online(rv, thread_id); + + do { + for (i = 0; i < QSBR_REPORTING_INTERVAL; i++) + ip_batch[i] = rte_rand(); + + for (i = 0; i < QSBR_REPORTING_INTERVAL; i++) + rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return); + + /* Update quiescent state */ + rte_rcu_qsbr_quiescent(rv, thread_id); + } while (!writer_done); + + rte_rcu_qsbr_thread_offline(rv, thread_id); + rte_rcu_qsbr_thread_unregister(rv, thread_id); + + return 0; +} + +/* + * Functional test: + * Single writer, Single QS variable, Single QSBR query, + * Non-blocking rcu_qsbr_check + */ +static int +test_lpm_rcu_perf(void) +{ + struct rte_lpm_config config; + uint64_t begin, total_cycles; + size_t sz; + unsigned int i, j; + uint16_t core_id; + uint32_t next_hop_add = 0xAA; + + if (rte_lcore_count() < 2) { + printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n"); + return TEST_SKIPPED; + } + + num_cores = 0; + RTE_LCORE_FOREACH_SLAVE(core_id) { + enabled_core_ids[num_cores] = core_id; + num_cores++; + } + + printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n", + num_cores); + + /* Create LPM table */ + config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES; + config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES; + config.flags = 0; + lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config); + TEST_LPM_ASSERT(lpm != NULL); + + /* Init RCU variable */ + sz = rte_rcu_qsbr_get_memsize(num_cores); + rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz, + RTE_CACHE_LINE_SIZE); + rte_rcu_qsbr_init(rv, num_cores); + + /* Assign the RCU variable to LPM */ + if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) { + printf("RCU variable assignment failed\n"); + goto error; + } + + writer_done = 0; + __atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST); + + /* Launch reader threads */ + for (i = 0; i < num_cores; i++) + rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL, + enabled_core_ids[i]); + + /* Measure add/delete. */ + begin = rte_rdtsc_precise(); + for (i = 0; i < RCU_ITERATIONS; i++) { + /* Add all the entries */ + for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++) + if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip, + large_ldepth_route_table[j].depth, + next_hop_add) != 0) { + printf("Failed to add iteration %d, route# %d\n", + i, j); + goto error; + } + + /* Delete all the entries */ + for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++) + if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip, + large_ldepth_route_table[j].depth) != 0) { + printf("Failed to delete iteration %d, route# %d\n", + i, j); + goto error; + } + } + total_cycles = rte_rdtsc_precise() - begin; + + printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES); + printf("Total LPM Deletes: %d\n", + ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES); + printf("Average LPM Add/Del: %g cycles\n", + (double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS)); + + writer_done = 1; + /* Wait and check return value from reader threads */ + for (i = 0; i < num_cores; i++) + if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0) + goto error; + + rte_lpm_free(lpm); + rte_free(rv); + lpm = NULL; + rv = NULL; + + /* Test without RCU integration */ + printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n", + num_cores); + + /* Create LPM table */ + config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES; + config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES; + config.flags = 0; + lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config); + TEST_LPM_ASSERT(lpm != NULL); + + writer_done = 0; + __atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST); + + /* Launch reader threads */ + for (i = 0; i < num_cores; i++) + rte_eal_remote_launch(test_lpm_reader, NULL, + enabled_core_ids[i]); + + /* Measure add/delete. */ + begin = rte_rdtsc_precise(); + for (i = 0; i < RCU_ITERATIONS; i++) { + /* Add all the entries */ + for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++) + if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip, + large_ldepth_route_table[j].depth, + next_hop_add) != 0) { + printf("Failed to add iteration %d, route# %d\n", + i, j); + goto error; + } + + /* Delete all the entries */ + for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++) + if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip, + large_ldepth_route_table[j].depth) != 0) { + printf("Failed to delete iteration %d, route# %d\n", + i, j); + goto error; + } + } + total_cycles = rte_rdtsc_precise() - begin; + + printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES); + printf("Total LPM Deletes: %d\n", + ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES); + printf("Average LPM Add/Del: %g cycles\n", + (double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS)); + + writer_done = 1; + /* Wait and check return value from reader threads */ + for (i = 0; i < num_cores; i++) + if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0) + printf("Warning: lcore %u not finished.\n", + enabled_core_ids[i]); + + rte_lpm_free(lpm); + + return 0; + +error: + writer_done = 1; + /* Wait until all readers have exited */ + rte_eal_mp_wait_lcore(); + + rte_lpm_free(lpm); + rte_free(rv); + + return -1; +} + static int test_lpm_perf(void) { - struct rte_lpm *lpm = NULL; struct rte_lpm_config config; config.max_rules = 2000000; @@ -343,7 +609,7 @@ test_lpm_perf(void) lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config); TEST_LPM_ASSERT(lpm != NULL); - /* Measue add. */ + /* Measure add. */ begin = rte_rdtsc(); for (i = 0; i < NUM_ROUTE_ENTRIES; i++) { @@ -478,6 +744,8 @@ test_lpm_perf(void) rte_lpm_delete_all(lpm); rte_lpm_free(lpm); + test_lpm_rcu_perf(); + return 0; }