List cover comments

GET /api/covers/53087/comments/?format=api
HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Link: 
<http://patches.dpdk.org/api/covers/53087/comments/?format=api&page=1>; rel="first",
<http://patches.dpdk.org/api/covers/53087/comments/?format=api&page=1>; rel="last"
Vary: Accept
[ { "id": 95575, "web_url": "http://patches.dpdk.org/comment/95575/", "msgid": "<2601191342CEEE43887BDE71AB9772580148A9C434@irsmsx105.ger.corp.intel.com>", "list_archive_url": "https://inbox.dpdk.org/dev/2601191342CEEE43887BDE71AB9772580148A9C434@irsmsx105.ger.corp.intel.com", "date": "2019-04-26T12:04:58", "subject": "Re: [dpdk-dev] [PATCH v8 0/4] lib/rcu: add RCU library supporting\n\tQSBR mechanism", "submitter": { "id": 33, "url": "http://patches.dpdk.org/api/people/33/?format=api", "name": "Ananyev, Konstantin", "email": "konstantin.ananyev@intel.com" }, "content": "> -----Original Message-----\n> From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]\n> Sent: Friday, April 26, 2019 5:40 AM\n> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; paulmck@linux.ibm.com; Kovacevic, Marko\n> <marko.kovacevic@intel.com>; dev@dpdk.org\n> Cc: honnappa.nagarahalli@arm.com; gavin.hu@arm.com; dharmik.thakkar@arm.com; malvika.gupta@arm.com\n> Subject: [PATCH v8 0/4] lib/rcu: add RCU library supporting QSBR mechanism\n> \n> Lock-less data structures provide scalability and determinism.\n> They enable use cases where locking may not be allowed\n> (for ex: real-time applications).\n> \n> In the following paras, the term 'memory' refers to memory allocated\n> by typical APIs like malloc or anything that is representative of\n> memory, for ex: an index of a free element array.\n> \n> Since these data structures are lock less, the writers and readers\n> are accessing the data structures concurrently. Hence, while removing\n> an element from a data structure, the writers cannot return the memory\n> to the allocator, without knowing that the readers are not\n> referencing that element/memory anymore. Hence, it is required to\n> separate the operation of removing an element into 2 steps:\n> \n> Delete: in this step, the writer removes the reference to the element from\n> the data structure but does not return the associated memory to the\n> allocator. This will ensure that new readers will not get a reference to\n> the removed element. Removing the reference is an atomic operation.\n> \n> Free(Reclaim): in this step, the writer returns the memory to the\n> memory allocator, only after knowing that all the readers have stopped\n> referencing the deleted element.\n> \n> This library helps the writer determine when it is safe to free the\n> memory.\n> \n> This library makes use of thread Quiescent State (QS). QS can be\n> defined as 'any point in the thread execution where the thread does\n> not hold a reference to shared memory'. It is upto the application to\n> determine its quiescent state. Let us consider the following diagram:\n> \n> Time -------------------------------------------------->\n> \n> | |\n> RT1 $++++****D1****+++***D2*|**+++|+++**D3*****++++$\n> | |\n> RT2 $++++****D1****++|+**D2|***++++++**D3*****++++$\n> | |\n> RT3 $++++****D1****+++***|D2***|++++++**D2*****++++$\n> | |\n> |<--->|\n> Del | Free\n> |\n> Cannot free memory\n> during this period\n> (Grace Period)\n> \n> RTx - Reader thread\n> < and > - Start and end of while(1) loop\n> ***Dx*** - Reader thread is accessing the shared data structure Dx.\n> i.e. critical section.\n> +++ - Reader thread is not accessing any shared data structure.\n> i.e. non critical section or quiescent state.\n> Del - Point in time when the reference to the entry is removed using\n> atomic operation.\n> Free - Point in time when the writer can free the entry.\n> Grace Period - Time duration between Del and Free, during which memory cannot\n> be freed.\n> \n> As shown, thread RT1 accesses data structures D1, D2 and D3. When it is\n> accessing D2, if the writer has to remove an element from D2, the\n> writer cannot free the memory associated with that element immediately.\n> The writer can return the memory to the allocator only after the reader\n> stops referencing D2. In other words, reader thread RT1 has to enter\n> a quiescent state.\n> \n> Similarly, since thread RT3 is also accessing D2, writer has to wait till\n> RT3 enters quiescent state as well.\n> \n> However, the writer does not need to wait for RT2 to enter quiescent state.\n> Thread RT2 was not accessing D2 when the delete operation happened.\n> So, RT2 will not get a reference to the deleted entry.\n> \n> It can be noted that, the critical sections for D2 and D3 are quiescent states\n> for D1. i.e. for a given data structure Dx, any point in the thread execution\n> that does not reference Dx is a quiescent state.\n> \n> Since memory is not freed immediately, there might be a need for\n> provisioning of additional memory, depending on the application requirements.\n> \n> It is important to make sure that this library keeps the overhead of\n> identifying the end of grace period and subsequent freeing of memory,\n> to a minimum. The following paras explain how grace period and critical\n> section affect this overhead.\n> \n> The writer has to poll the readers to identify the end of grace period.\n> Polling introduces memory accesses and wastes CPU cycles. The memory\n> is not available for reuse during grace period. Longer grace periods\n> exasperate these conditions.\n> \n> The length of the critical section and the number of reader threads\n> is proportional to the duration of the grace period. Keeping the critical\n> sections smaller will keep the grace period smaller. However, keeping the\n> critical sections smaller requires additional CPU cycles(due to additional\n> reporting) in the readers.\n> \n> Hence, we need the characteristics of small grace period and large critical\n> section. This library addresses this by allowing the writer to do\n> other work without having to block till the readers report their quiescent\n> state.\n> \n> For DPDK applications, the start and end of while(1) loop (where no\n> references to shared data structures are kept) act as perfect quiescent\n> states. This will combine all the shared data structure accesses into a\n> single, large critical section which helps keep the overhead on the\n> reader side to a minimum.\n> \n> DPDK supports pipeline model of packet processing and service cores.\n> In these use cases, a given data structure may not be used by all the\n> workers in the application. The writer does not have to wait for all\n> the workers to report their quiescent state. To provide the required\n> flexibility, this library has a concept of QS variable. The application\n> can create one QS variable per data structure to help it track the\n> end of grace period for each data structure. This helps keep the grace\n> period to a minimum.\n> \n> The application has to allocate memory and initialize a QS variable.\n> \n> Application can call rte_rcu_qsbr_get_memsize to calculate the size\n> of memory to allocate. This API takes maximum number of reader threads,\n> using this variable, as a parameter. Currently, a maximum of 1024 threads\n> are supported.\n> \n> Further, the application can initialize a QS variable using the API\n> rte_rcu_qsbr_init.\n> \n> Each reader thread is assumed to have a unique thread ID. Currently, the\n> management of the thread ID (for ex: allocation/free) is left to the\n> application. The thread ID should be in the range of 0 to\n> maximum number of threads provided while creating the QS variable.\n> The application could also use lcore_id as the thread ID where applicable.\n> \n> rte_rcu_qsbr_thread_register API will register a reader thread\n> to report its quiescent state. This can be called from a reader thread.\n> A control plane thread can also call this on behalf of a reader thread.\n> The reader thread must call rte_rcu_qsbr_thread_online API to start reporting\n> its quiescent state.\n> \n> Some of the use cases might require the reader threads to make\n> blocking API calls (for ex: while using eventdev APIs). The writer thread\n> should not wait for such reader threads to enter quiescent state.\n> The reader thread must call rte_rcu_qsbr_thread_offline API, before calling\n> blocking APIs. It can call rte_rcu_qsbr_thread_online API once the blocking\n> API call returns.\n> \n> The writer thread can trigger the reader threads to report their quiescent\n> state by calling the API rte_rcu_qsbr_start. It is possible for multiple\n> writer threads to query the quiescent state status simultaneously. Hence,\n> rte_rcu_qsbr_start returns a token to each caller.\n> \n> The writer thread has to call rte_rcu_qsbr_check API with the token to get the\n> current quiescent state status. Option to block till all the reader threads\n> enter the quiescent state is provided. If this API indicates that all the\n> reader threads have entered the quiescent state, the application can free the\n> deleted entry.\n> \n> The APIs rte_rcu_qsbr_start and rte_rcu_qsbr_check are lock free. Hence, they\n> can be called concurrently from multiple writers even while running\n> as worker threads.\n> \n> The separation of triggering the reporting from querying the status provides\n> the writer threads flexibility to do useful work instead of blocking for the\n> reader threads to enter the quiescent state or go offline. This reduces the\n> memory accesses due to continuous polling for the status.\n> \n> rte_rcu_qsbr_synchronize API combines the functionality of rte_rcu_qsbr_start\n> and blocking rte_rcu_qsbr_check into a single API. This API triggers the reader\n> threads to report their quiescent state and polls till all the readers enter\n> the quiescent state or go offline. This API does not allow the writer to\n> do useful work while waiting and also introduces additional memory accesses\n> due to continuous polling.\n> \n> The reader thread must call rte_rcu_qsbr_thread_offline and\n> rte_rcu_qsbr_thread_unregister APIs to remove itself from reporting its\n> quiescent state. The rte_rcu_qsbr_check API will not wait for this reader\n> thread to report the quiescent state status anymore.\n> \n> The reader threads should call rte_rcu_qsbr_update API to indicate that they\n> entered a quiescent state. This API checks if a writer has triggered a\n> quiescent state query and update the state accordingly.\n> \n> Patch v8:\n> 1) Library changes\n> a) Symbols prefixed with '__RTE' or 'rte_' as required (Thomas)\n> b) Used PRI?64 macros to support 32b compilation (Thomas)\n> c) Fixed shared library compilation (Thomas)\n> 2) Test cases\n> a) Fixed segmentation fault when more than 20 cores are used for testing (Jerin)\n> b) Used PRI?64 macros to support 32b compilation (Thomas)\n> c) Testing done on x86, ThunderX2, Octeon TX, BlueField for 32b(x86 only)/64b,\n> debug/non-debug, shared/static linking, meson/makefile with various\n> number of cores\n> \n> Patch v7:\n> 1) Library changes\n> a) Added macro RCU_IS_LOCK_CNT_ZERO\n> b) Added lock counter validation to rte_rcu_qsbr_thread_online/\n> rte_rcu_qsbr_thread_offline/rte_rcu_qsbr_thread_register/\n> rte_rcu_qsbr_thread_unregister APIs (Paul)\n> \n> Patch v6:\n> 1) Library changes\n> a) Fixed and tested meson build on Arm and x86 (Konstantin)\n> b) Moved rte_rcu_qsbr_synchronize API to rte_rcu_qsbr.c\n> \n> Patch v5:\n> 1) Library changes\n> a) Removed extra alignment in rte_rcu_qsbr_get_memsize API (Paul)\n> b) Added rte_rcu_qsbr_lock/rte_rcu_qsbr_unlock APIs (Paul)\n> c) Clarified the need for 64b counters (Paul)\n> 2) Test cases\n> a) Added additional performance test cases to benchmark\n> __rcu_qsbr_check_all\n> b) Added rte_rcu_qsbr_lock/rte_rcu_qsbr_unlock calls in various test cases\n> 3) Documentation\n> a) Added rte_rcu_qsbr_lock/rte_rcu_qsbr_unlock usage description\n> \n> Patch v4:\n> 1) Library changes\n> a) Fixed the compilation issue on x86 (Konstantin)\n> b) Rebased with latest master\n> \n> Patch v3:\n> 1) Library changes\n> a) Moved the registered thread ID array to the end of the\n> structure (Konstantin)\n> b) Removed the compile time constant RTE_RCU_MAX_THREADS\n> c) Added code to keep track of registered number of threads\n> \n> Patch v2:\n> 1) Library changes\n> a) Corrected the RTE_ASSERT checks (Konstantin)\n> b) Replaced RTE_ASSERT with 'if' checks for non-datapath APIs (Konstantin)\n> c) Made rte_rcu_qsbr_thread_register/unregister non-datapath critical APIs\n> d) Renamed rte_rcu_qsbr_update to rte_rcu_qsbr_quiescent (Ola)\n> e) Used rte_smp_mb() in rte_rcu_qsbr_thread_online API for x86 (Konstantin)\n> f) Removed the macro to access the thread QS counters (Konstantin)\n> 2) Test cases\n> a) Added additional test cases for removing RTE_ASSERT\n> 3) Documentation\n> a) Changed the figure to make it bigger (Marko)\n> b) Spelling and format corrections (Marko)\n> \n> Patch v1:\n> 1) Library changes\n> a) Changed the maximum number of reader threads to 1024\n> b) Renamed rte_rcu_qsbr_register/unregister_thread to\n> rte_rcu_qsbr_thread_register/unregister\n> c) Added rte_rcu_qsbr_thread_online/offline API. These are optimized\n> version of rte_rcu_qsbr_thread_register/unregister API. These\n> also provide the flexibility for performance when the requested\n> maximum number of threads is higher than the current number of\n> threads.\n> d) Corrected memory orderings in rte_rcu_qsbr_update\n> e) Changed the signature of rte_rcu_qsbr_start API to return the token\n> f) Changed the signature of rte_rcu_qsbr_start API to not take the\n> expected number of QS states to wait.\n> g) Added debug logs\n> h) Added API and programmer guide documentation.\n> \n> RFC v3:\n> 1) Library changes\n> a) Rebased to latest master\n> b) Added new API rte_rcu_qsbr_get_memsize\n> c) Add support for memory allocation for QSBR variable (Konstantin)\n> d) Fixed a bug in rte_rcu_qsbr_check (Konstantin)\n> 2) Testcase changes\n> a) Separated stress tests into a performance test case file\n> b) Added performance statistics\n> \n> RFC v2:\n> 1) Cover letter changes\n> a) Explian the parameters that affect the overhead of using RCU\n> and their effect\n> b) Explain how this library addresses these effects to keep\n> the overhead to minimum\n> 2) Library changes\n> a) Rename the library to avoid confusion (Matias, Bruce, Konstantin)\n> b) Simplify the code/remove APIs to keep this library inline with\n> other synchronisation mechanisms like locks (Konstantin)\n> c) Change the design to support more than 64 threads (Konstantin)\n> d) Fixed version map to remove static inline functions\n> 3) Testcase changes\n> a) Add boundary and additional functional test cases\n> b) Add stress test cases (Paul E. McKenney)\n> \n> Dharmik Thakkar (1):\n> test/rcu_qsbr: add API and functional tests\n> \n> Honnappa Nagarahalli (3):\n> rcu: add RCU library supporting QSBR mechanism\n> doc/rcu: add lib_rcu documentation\n> doc: added RCU to the release notes\n> \n> MAINTAINERS | 5 +\n> app/test/Makefile | 2 +\n> app/test/autotest_data.py | 12 +\n> app/test/meson.build | 7 +-\n> app/test/test_rcu_qsbr.c | 1014 +++++++++++++++++\n> app/test/test_rcu_qsbr_perf.c | 704 ++++++++++++\n> config/common_base | 6 +\n> doc/api/doxy-api-index.md | 3 +-\n> doc/api/doxy-api.conf.in | 1 +\n> .../prog_guide/img/rcu_general_info.svg | 509 +++++++++\n> doc/guides/prog_guide/index.rst | 1 +\n> doc/guides/prog_guide/rcu_lib.rst | 185 +++\n> doc/guides/rel_notes/release_19_05.rst | 8 +\n> lib/Makefile | 2 +\n> lib/librte_rcu/Makefile | 23 +\n> lib/librte_rcu/meson.build | 7 +\n> lib/librte_rcu/rte_rcu_qsbr.c | 277 +++++\n> lib/librte_rcu/rte_rcu_qsbr.h | 641 +++++++++++\n> lib/librte_rcu/rte_rcu_version.map | 13 +\n> lib/meson.build | 2 +-\n> mk/rte.app.mk | 1 +\n> 21 files changed, 3420 insertions(+), 3 deletions(-)\n> create mode 100644 app/test/test_rcu_qsbr.c\n> create mode 100644 app/test/test_rcu_qsbr_perf.c\n> create mode 100644 doc/guides/prog_guide/img/rcu_general_info.svg\n> create mode 100644 doc/guides/prog_guide/rcu_lib.rst\n> create mode 100644 lib/librte_rcu/Makefile\n> create mode 100644 lib/librte_rcu/meson.build\n> create mode 100644 lib/librte_rcu/rte_rcu_qsbr.c\n> create mode 100644 lib/librte_rcu/rte_rcu_qsbr.h\n> create mode 100644 lib/librte_rcu/rte_rcu_version.map\n> \n> --\n\nRun UT on my box (SKX) for both x86_64 and i686 over 96 cores.\nAll passed.\nTested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>\n\n> 2.17.1", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [127.0.0.1])\n\tby dpdk.org (Postfix) with ESMTP id C48C81B122;\n\tFri, 26 Apr 2019 14:05:05 +0200 (CEST)", "from mga05.intel.com (mga05.intel.com [192.55.52.43])\n\tby dpdk.org (Postfix) with ESMTP id 239D569D4\n\tfor <dev@dpdk.org>; Fri, 26 Apr 2019 14:05:02 +0200 (CEST)", "from orsmga003.jf.intel.com ([10.7.209.27])\n\tby fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t26 Apr 2019 05:05:02 -0700", "from irsmsx154.ger.corp.intel.com ([163.33.192.96])\n\tby orsmga003.jf.intel.com with ESMTP; 26 Apr 2019 05:04:59 -0700", "from irsmsx105.ger.corp.intel.com ([169.254.7.31]) by\n\tIRSMSX154.ger.corp.intel.com ([169.254.12.101]) with mapi id\n\t14.03.0415.000; Fri, 26 Apr 2019 13:04:58 +0100" ], "X-Amp-Result": "SKIPPED(no attachment in message)", "X-Amp-File-Uploaded": "False", "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos;i=\"5.60,397,1549958400\"; d=\"scan'208\";a=\"145942326\"", "From": "\"Ananyev, Konstantin\" <konstantin.ananyev@intel.com>", "To": "Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,\n\t\"stephen@networkplumber.org\" <stephen@networkplumber.org>,\n\t\"paulmck@linux.ibm.com\" <paulmck@linux.ibm.com>, \"Kovacevic, Marko\"\n\t<marko.kovacevic@intel.com>, \"dev@dpdk.org\" <dev@dpdk.org>", "CC": "\"gavin.hu@arm.com\" <gavin.hu@arm.com>, \"dharmik.thakkar@arm.com\"\n\t<dharmik.thakkar@arm.com>,\n\t\"malvika.gupta@arm.com\" <malvika.gupta@arm.com>", "Thread-Topic": "[PATCH v8 0/4] lib/rcu: add RCU library supporting QSBR\n\tmechanism", "Thread-Index": "AQHU++otMwUDBj7tgk6l9944m0u286ZOV9+w", "Date": "Fri, 26 Apr 2019 12:04:58 +0000", "Message-ID": "<2601191342CEEE43887BDE71AB9772580148A9C434@irsmsx105.ger.corp.intel.com>", "References": "<20181122033055.3431-1-honnappa.nagarahalli@arm.com>\n\t<20190426044000.32670-1-honnappa.nagarahalli@arm.com>", "In-Reply-To": "<20190426044000.32670-1-honnappa.nagarahalli@arm.com>", "Accept-Language": "en-IE, en-US", "Content-Language": "en-US", "X-MS-Has-Attach": "", "X-MS-TNEF-Correlator": "", "x-titus-metadata-40": "eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNGUxZGEwZjgtNDNjNS00MzIwLWFmZDAtOWFiM2I5ZWYwYzY0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiSElaK0pldmFubjYxY3VsU1ZxcGRZOFwvQlluRXhGRmZrSzl5dHhWRDRDWGJvTUw5NlEwRG95eVUrb2NWS1dSQmcifQ==", "x-ctpclassification": "CTP_NT", "dlp-product": "dlpe-windows", "dlp-version": "11.0.600.7", "dlp-reaction": "no-action", "x-originating-ip": "[163.33.239.181]", "Content-Type": "text/plain; charset=\"us-ascii\"", "Content-Transfer-Encoding": "quoted-printable", "MIME-Version": "1.0", "Subject": "Re: [dpdk-dev] [PATCH v8 0/4] lib/rcu: add RCU library supporting\n\tQSBR mechanism", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://mails.dpdk.org/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null } ]