get:
Show a patch.

patch:
Update a patch.

put:
Update a patch.

GET /api/patches/9289/?format=api
HTTP 200 OK
Allow: GET, PUT, PATCH, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "id": 9289,
    "url": "http://patches.dpdk.org/api/patches/9289/?format=api",
    "web_url": "http://patches.dpdk.org/project/dpdk/patch/1449134905-28261-2-git-send-email-ian.betts@intel.com/",
    "project": {
        "id": 1,
        "url": "http://patches.dpdk.org/api/projects/1/?format=api",
        "name": "DPDK",
        "link_name": "dpdk",
        "list_id": "dev.dpdk.org",
        "list_email": "dev@dpdk.org",
        "web_url": "http://core.dpdk.org",
        "scm_url": "git://dpdk.org/dpdk",
        "webscm_url": "http://git.dpdk.org/dpdk",
        "list_archive_url": "https://inbox.dpdk.org/dev",
        "list_archive_url_format": "https://inbox.dpdk.org/dev/{}",
        "commit_url_format": ""
    },
    "msgid": "<1449134905-28261-2-git-send-email-ian.betts@intel.com>",
    "list_archive_url": "https://inbox.dpdk.org/dev/1449134905-28261-2-git-send-email-ian.betts@intel.com",
    "date": "2015-12-03T09:28:22",
    "name": "[dpdk-dev,v5,1/4] doc: add sample application guide for performance-thread",
    "commit_ref": null,
    "pull_url": null,
    "state": "superseded",
    "archived": true,
    "hash": "17c20201a7d9fd23db4359357dbbe6bb7696d45d",
    "submitter": {
        "id": 340,
        "url": "http://patches.dpdk.org/api/people/340/?format=api",
        "name": "ibetts",
        "email": "ian.betts@intel.com"
    },
    "delegate": null,
    "mbox": "http://patches.dpdk.org/project/dpdk/patch/1449134905-28261-2-git-send-email-ian.betts@intel.com/mbox/",
    "series": [],
    "comments": "http://patches.dpdk.org/api/patches/9289/comments/",
    "check": "pending",
    "checks": "http://patches.dpdk.org/api/patches/9289/checks/",
    "tags": {},
    "related": [],
    "headers": {
        "Return-Path": "<dev-bounces@dpdk.org>",
        "X-Original-To": "patchwork@dpdk.org",
        "Delivered-To": "patchwork@dpdk.org",
        "Received": [
            "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id 45EDE5913;\n\tThu,  3 Dec 2015 10:31:05 +0100 (CET)",
            "from mga14.intel.com (mga14.intel.com [192.55.52.115])\n\tby dpdk.org (Postfix) with ESMTP id C85AC3787\n\tfor <dev@dpdk.org>; Thu,  3 Dec 2015 10:31:03 +0100 (CET)",
            "from fmsmga002.fm.intel.com ([10.253.24.26])\n\tby fmsmga103.fm.intel.com with ESMTP; 03 Dec 2015 01:31:02 -0800",
            "from irvmail001.ir.intel.com ([163.33.26.43])\n\tby fmsmga002.fm.intel.com with ESMTP; 03 Dec 2015 01:28:29 -0800",
            "from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com\n\t[10.237.217.45])\n\tby irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id\n\ttB39SS1M029043; Thu, 3 Dec 2015 09:28:28 GMT",
            "from sivswdev01.ir.intel.com (localhost [127.0.0.1])\n\tby sivswdev01.ir.intel.com with ESMTP id tB39SSJL028401;\n\tThu, 3 Dec 2015 09:28:28 GMT",
            "(from ibetts@localhost)\n\tby sivswdev01.ir.intel.com with  id tB39SSkP028397;\n\tThu, 3 Dec 2015 09:28:28 GMT"
        ],
        "X-ExtLoop1": "1",
        "X-IronPort-AV": "E=Sophos;i=\"5.20,377,1444719600\"; d=\"scan'208\";a=\"865693746\"",
        "From": "ibetts <ian.betts@intel.com>",
        "To": "dev@dpdk.org",
        "Date": "Thu,  3 Dec 2015 09:28:22 +0000",
        "Message-Id": "<1449134905-28261-2-git-send-email-ian.betts@intel.com>",
        "X-Mailer": "git-send-email 1.7.4.1",
        "In-Reply-To": "<1449134905-28261-1-git-send-email-ian.betts@intel.com>",
        "References": "<1449134905-28261-1-git-send-email-ian.betts@intel.com>",
        "Cc": "Ian Betts <ian.betts@intel.com>",
        "Subject": "[dpdk-dev] [PATCH v5 1/4] doc: add sample application guide for\n\tperformance-thread",
        "X-BeenThere": "dev@dpdk.org",
        "X-Mailman-Version": "2.1.15",
        "Precedence": "list",
        "List-Id": "patches and discussions about DPDK <dev.dpdk.org>",
        "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>",
        "List-Archive": "<http://dpdk.org/ml/archives/dev/>",
        "List-Post": "<mailto:dev@dpdk.org>",
        "List-Help": "<mailto:dev-request@dpdk.org?subject=help>",
        "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>",
        "Errors-To": "dev-bounces@dpdk.org",
        "Sender": "\"dev\" <dev-bounces@dpdk.org>"
    },
    "content": "From: Ian Betts <ian.betts@intel.com>\n\nThis commit adds the sample application user guide for the\nperformance thread sample application.\n\nSigned-off-by: Ian Betts <ian.betts@intel.com>\n---\n doc/guides/sample_app_ug/performance_thread.rst | 1263 +++++++++++++++++++++++\n 1 file changed, 1263 insertions(+)\n create mode 100644 doc/guides/sample_app_ug/performance_thread.rst",
    "diff": "diff --git a/doc/guides/sample_app_ug/performance_thread.rst b/doc/guides/sample_app_ug/performance_thread.rst\nnew file mode 100644\nindex 0000000..d71bb84\n--- /dev/null\n+++ b/doc/guides/sample_app_ug/performance_thread.rst\n@@ -0,0 +1,1263 @@\n+..  BSD LICENSE\n+    Copyright(c) 2015 Intel Corporation. All rights reserved.\n+    All rights reserved.\n+\n+    Redistribution and use in source and binary forms, with or without\n+    modification, are permitted provided that the following conditions\n+    are met:\n+\n+    * Re-distributions of source code must retain the above copyright\n+    notice, this list of conditions and the following disclaimer.\n+    * Redistributions in binary form must reproduce the above copyright\n+    notice, this list of conditions and the following disclaimer in\n+    the documentation and/or other materials provided with the\n+    distribution.\n+    * Neither the name of Intel Corporation nor the names of its\n+    contributors may be used to endorse or promote products derived\n+    from this software without specific prior written permission.\n+\n+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n+    \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\n+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\n+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\n+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\n+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\n+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\n+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n+\n+\n+Performance Thread Sample Application\n+=====================================\n+\n+The performance thread sample application is a derivative of the standard L3\n+forwarding application that demonstrates different threading models.\n+\n+Overview\n+--------\n+For a general description of the L3 forwarding applications capabilities\n+please refer to the documentation of the standard application in\n+:doc:`l3_forward`.\n+\n+The performance thread sample application differs from the standard L3\n+forwarding example in that it divides the TX and RX processing between\n+different threads, and makes it possible to assign individual threads to\n+different cores.\n+\n+Three threading models are considered:\n+\n+#. When there is one EAL thread per physical core.\n+#. When there are multiple EAL threads per physical core.\n+#. When there are multiple lightweight threads per EAL thread.\n+\n+Since DPDK release 2.0 it is possible to launch applications using the\n+``--lcores`` EAL parameter, specifying cpu-sets for a physical core. With the\n+performance thread sample application its is now also possible to assign\n+individual RX and TX functions to different cores.\n+\n+As an alternative to dividing the L3 forwarding work between different EAL\n+threads the performance thread sample introduces the possibility to run the\n+application threads as lightweight threads (L-threads) within one or\n+more EAL threads.\n+\n+In order to facilitate this threading model the example includes a primitive\n+cooperative scheduler (L-thread) subsystem. More details of the L-thread\n+subsystem can be found in :ref:`lthread_subsystem`.\n+\n+**Note:** Whilst theoretically possible it is not anticipated that multiple\n+L-thread schedulers would be run on the same physical core, this mode of\n+operation should not be expected to yield useful performance and is considered\n+invalid.\n+\n+Compiling the Application\n+-------------------------\n+The application is located in the sample application folder in the\n+``performance-thread`` folder.\n+\n+#.  Go to the example applications folder\n+\n+    .. code-block:: console\n+\n+       export RTE_SDK=/path/to/rte_sdk\n+       cd ${RTE_SDK}/examples/performance-thread/l3fwd-thread\n+\n+#.  Set the target (a default target is used if not specified). For example:\n+\n+    .. code-block:: console\n+\n+       export RTE_TARGET=x86_64-native-linuxapp-gcc\n+\n+    See the *DPDK Linux Getting Started Guide* for possible RTE_TARGET values.\n+\n+#.  Build the application:\n+\n+        make\n+\n+\n+Running the Application\n+-----------------------\n+\n+The application has a number of command line options::\n+\n+    ./build/l3fwd-thread [EAL options] --\n+        -p PORTMASK [-P]\n+        --rx(port,queue,lcore,thread)[,(port,queue,lcore,thread)]\n+        --tx(lcore,thread)[,(lcore,thread)]\n+        [--enable-jumbo] [--max-pkt-len PKTLEN]]  [--no-numa]\n+        [--hash-entry-num] [--ipv6] [--no-lthreads] [--stat-lcore lcore]\n+\n+Where:\n+\n+* ``-p PORTMASK``: Hexadecimal bitmask of ports to configure.\n+\n+* ``-P``: optional, sets all ports to promiscuous mode so that packets are\n+  accepted regardless of the packet's Ethernet MAC destination address.\n+  Without this option, only packets with the Ethernet MAC destination address\n+  set to the Ethernet address of the port are accepted.\n+\n+* ``--rx (port,queue,lcore,thread)[,(port,queue,lcore,thread)]``: the list of\n+  NIC RX ports and queues handled by the RX lcores and threads. The parameters\n+  are explained below.\n+\n+* ``--tx (lcore,thread)[,(lcore,thread)]``: the list of TX threads identifying\n+  the lcore the thread runs on, and the id of RX thread with which it is\n+  associated. The parameters are explained below.\n+\n+* ``--enable-jumbo``: optional, enables jumbo frames.\n+\n+* ``--max-pkt-len``: optional, maximum packet length in decimal (64-9600).\n+\n+* ``--no-numa``: optional, disables numa awareness.\n+\n+* ``--hash-entry-num``: optional, specifies the hash entry number in hex to be\n+  setup.\n+\n+* ``--ipv6``: optional, set it if running ipv6 packets.\n+\n+* ``--no-lthreads``: optional, disables l-thread model and uses EAL threading\n+  model. See below.\n+\n+* ``--stat-lcore``: optional, run CPU load stats collector on the specified\n+  lcore.\n+\n+The parameters of the ``--rx`` and ``--tx`` options are:\n+\n+* ``--rx`` parameters\n+\n+   .. _table_l3fwd_rx_parameters:\n+\n+   +--------+------------------------------------------------------+\n+   | port   | RX port                                              |\n+   +--------+------------------------------------------------------+\n+   | queue  | RX queue that will be read on the specified RX port  |\n+   +--------+------------------------------------------------------+\n+   | lcore  | Core to use for the thread                           |\n+   +--------+------------------------------------------------------+\n+   | thread | Thread id (continuously from 0 to N)                 |\n+   +--------+------------------------------------------------------+\n+\n+\n+* ``--tx`` parameters\n+\n+   .. _table_l3fwd_tx_parameters:\n+\n+   +--------+------------------------------------------------------+\n+   | lcore  | Core to use for L3 route match and transmit          |\n+   +--------+------------------------------------------------------+\n+   | thread | Id of RX thread to be associated with this TX thread |\n+   +--------+------------------------------------------------------+\n+\n+The ``l3fwd-thread`` application allows you to start packet processing in two\n+threading models: L-Threads (default) and EAL Threads (when the\n+``--no-lthreads`` parameter is used). For consistency all parameters are used\n+in the same way for both models.\n+\n+\n+Running with L-threads\n+~~~~~~~~~~~~~~~~~~~~~~\n+\n+When the L-thread model is used (default option), lcore and thread parameters\n+in ``--rx/--tx`` are used to affinitize threads to the selected scheduler.\n+\n+For example, the following places every l-thread on different lcores::\n+\n+   l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+                --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+                --tx=\"(2,0)(3,1)\"\n+\n+The following places RX l-threads on lcore 0 and TX l-threads on lcore 1 and 2\n+and so on::\n+\n+   l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+                --rx=\"(0,0,0,0)(1,0,0,1)\" \\\n+                --tx=\"(1,0)(2,1)\"\n+\n+\n+Running with EAL threads\n+~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+When the ``--no-lthreads`` parameter is used, the L-threading model is turned\n+off and EAL threads are used for all processing. EAL threads are enumerated in\n+the same way as L-threads, but the ``--lcores`` EAL parameter is used to\n+affinitize threads to the selected cpu-set (scheduler). Thus it is possible to\n+place every RX and TX thread on different lcores.\n+\n+For example, the following places every EAL thread on different lcores::\n+\n+   l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+                --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+                --tx=\"(2,0)(3,1)\" \\\n+                --no-lthreads\n+\n+\n+To affinitize two or more EAL threads to one cpu-set, the EAL ``--lcores``\n+parameter is used.\n+\n+The following places RX EAL threads on lcore 0 and TX EAL threads on lcore 1\n+and 2 and so on::\n+\n+   l3fwd-thread -c ff -n 2 --lcores=\"(0,1)@0,(2,3)@1\" -- -P -p 3 \\\n+                --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+                --tx=\"(2,0)(3,1)\" \\\n+                --no-lthreads\n+\n+\n+Examples\n+~~~~~~~~\n+\n+For selected scenarios the command line configuration of the application for L-threads\n+and its corresponding EAL threads command line can be realized as follows:\n+\n+a) Start every thread on different scheduler (1:1)::\n+\n+      l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+                   --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+                   --tx=\"(2,0)(3,1)\"\n+\n+   EAL thread equivalent::\n+\n+      l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+                   --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+                   --tx=\"(2,0)(3,1)\" \\\n+                   --no-lthreads\n+\n+b) Start all threads on one core (N:1).\n+\n+   Start 4 L-threads on lcore 0::\n+\n+      l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+                   --rx=\"(0,0,0,0)(1,0,0,1)\" \\\n+                   --tx=\"(0,0)(0,1)\"\n+\n+   Start 4 EAL threads on cpu-set 0::\n+\n+      l3fwd-thread -c ff -n 2 --lcores=\"(0-3)@0\" -- -P -p 3 \\\n+                   --rx=\"(0,0,0,0)(1,0,0,1)\" \\\n+                   --tx=\"(2,0)(3,1)\" \\\n+                   --no-lthreads\n+\n+c) Start threads on different cores (N:M).\n+\n+   Start 2 L-threads for RX on lcore 0, and 2 L-threads for TX on lcore 1::\n+\n+      l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+                   --rx=\"(0,0,0,0)(1,0,0,1)\" \\\n+                   --tx=\"(1,0)(1,1)\"\n+\n+   Start 2 EAL threads for RX on cpu-set 0, and 2 EAL threads for TX on\n+   cpu-set 1::\n+\n+      l3fwd-thread -c ff -n 2 --lcores=\"(0-1)@0,(2-3)@1\" -- -P -p 3 \\\n+                   --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+                   --tx=\"(2,0)(3,1)\" \\\n+                   --no-lthreads\n+\n+Explanation\n+-----------\n+\n+To a great extent the sample application differs little from the standard L3\n+forwarding application, and readers are advised to familiarize themselves with\n+the material covered in the :doc:`l3_forward` documentation before proceeding.\n+\n+The following explanation is focused on the way threading is handled in the\n+performance thread example.\n+\n+\n+Mode of operation with EAL threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+The performance thread sample application has split the RX and TX functionality\n+into two different threads, and the RX and TX threads are\n+interconnected via software rings. With respect to these rings the RX threads\n+are producers and the TX threads are consumers.\n+\n+On initialization the TX and RX threads are started according to the command\n+line parameters.\n+\n+The RX threads poll the network interface queues and post received packets to a\n+TX thread via a corresponding software ring.\n+\n+The TX threads poll software rings, perform the L3 forwarding hash/LPM match,\n+and assemble packet bursts before performing burst transmit on the network\n+interface.\n+\n+As with the standard L3 forward application, burst draining of residual packets\n+is performed periodically with the period calculated from elapsed time using\n+the timestamps counter.\n+\n+The diagram below illustrates a case with two RX threads and three TX threads.\n+\n+.. _figure_performance_thread_1:\n+\n+.. figure:: img/performance_thread_1.*\n+\n+\n+Mode of operation with L-threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+Like the EAL thread configuration the application has split the RX and TX\n+functionality into different threads, and the pairs of RX and TX threads are\n+interconnected via software rings.\n+\n+On initialization an L-thread scheduler is started on every EAL thread. On all\n+but the master EAL thread only a a dummy L-thread is initially started.\n+The L-thread started on the master EAL thread then spawns other L-threads on\n+different L-thread schedulers according the the command line parameters.\n+\n+The RX threads poll the network interface queues and post received packets\n+to a TX thread via the corresponding software ring.\n+\n+The ring interface is augmented by means of an L-thread condition variable that\n+enables the TX thread to be suspended when the TX ring is empty. The RX thread\n+signals the condition whenever it posts to the TX ring, causing the TX thread\n+to be resumed.\n+\n+Additionally the TX L-thread spawns a worker L-thread to take care of\n+polling the software rings, whilst it handles burst draining of the transmit\n+buffer.\n+\n+The worker threads poll the software rings, perform L3 route lookup and\n+assemble packet bursts. If the TX ring is empty the worker thread suspends\n+itself by waiting on the condition variable associated with the ring.\n+\n+Burst draining of residual packets, less than the burst size, is performed by\n+the TX thread which sleeps (using an L-thread sleep function) and resumes\n+periodically to flush the TX buffer.\n+\n+This design means that L-threads that have no work, can yield the CPU to other\n+L-threads and avoid having to constantly poll the software rings.\n+\n+The diagram below illustrates a case with two RX threads and three TX functions\n+(each comprising a thread that processes forwarding and a thread that\n+periodically drains the output buffer of residual packets).\n+\n+.. _figure_performance_thread_2:\n+\n+.. figure:: img/performance_thread_2.*\n+\n+\n+CPU load statistics\n+~~~~~~~~~~~~~~~~~~~\n+\n+It is possible to display statistics showing estimated CPU load on each core.\n+The statistics indicate the percentage of CPU time spent: processing\n+received packets (forwarding), polling queues/rings (waiting for work),\n+and doing any other processing (context switch and other overhead).\n+\n+When enabled statistics are gathered by having the application threads set and\n+clear flags when they enter and exit pertinent code sections. The flags are\n+then sampled in real time by a statistics collector thread running on another\n+core. This thread displays the data in real time on the console.\n+\n+This feature is enabled by designating a statistics collector core, using the\n+``--stat-lcore`` parameter.\n+\n+\n+.. _lthread_subsystem:\n+\n+The L-thread subsystem\n+----------------------\n+\n+The L-thread subsystem resides in the examples/performance-thread/common\n+directory and is built and linked automatically when building the\n+``l3fwd-thread`` example.\n+\n+The subsystem provides a simple cooperative scheduler to enable arbitrary\n+functions to run as cooperative threads within a single EAL thread.\n+The subsystem provides a pthread like API that is intended to assist in\n+reuse of legacy code written for POSIX pthreads.\n+\n+The following sections provide some detail on the features, constraints,\n+performance and porting considerations when using L-threads.\n+\n+\n+.. _comparison_between_lthreads_and_pthreads:\n+\n+Comparison between L-threads and POSIX pthreads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+The fundamental difference between the L-thread and pthread models is the\n+way in which threads are scheduled. The simplest way to think about this is to\n+consider the case of a processor with a single CPU. To run multiple threads\n+on a single CPU, the scheduler must frequently switch between the threads,\n+in order that each thread is able to make timely progress.\n+This is the basis of any multitasking operating system.\n+\n+This section explores the differences between the pthread model and the\n+L-thread model as implemented in the provided L-thread subsystem. If needed a\n+theoretical discussion of preemptive vs cooperative multi-threading can be\n+found in any good text on operating system design.\n+\n+\n+Scheduling and context switching\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+The POSIX pthread library provides an application programming interface to\n+create and synchronize threads. Scheduling policy is determined by the host OS,\n+and may be configurable. The OS may use sophisticated rules to determine which\n+thread should be run next, threads may suspend themselves or make other threads\n+ready, and the scheduler may employ a time slice giving each thread a maximum\n+time quantum after which it will be preempted in favor of another thread that\n+is ready to run. To complicate matters further threads may be assigned\n+different scheduling priorities.\n+\n+By contrast the L-thread subsystem is considerably simpler. Logically the\n+L-thread scheduler performs the same multiplexing function for L-threads\n+within a single pthread as the OS scheduler does for pthreads within an\n+application process. The L-thread scheduler is simply the main loop of a\n+pthread, and in so far as the host OS is concerned it is a regular pthread\n+just like any other. The host OS is oblivious about the existence of and\n+not at all involved in the scheduling of L-threads.\n+\n+The other and most significant difference between the two models is that\n+L-threads are scheduled cooperatively. L-threads cannot not preempt each\n+other, nor can the L-thread scheduler preempt a running L-thread (i.e.\n+there is no time slicing). The consequence is that programs implemented with\n+L-threads must possess frequent rescheduling points, meaning that they must\n+explicitly and of their own volition return to the scheduler at frequent\n+intervals, in order to allow other L-threads an opportunity to proceed.\n+\n+In both models switching between threads requires that the current CPU\n+context is saved and a new context (belonging to the next thread ready to run)\n+is restored. With pthreads this context switching is handled transparently\n+and the set of CPU registers that must be preserved between context switches\n+is as per an interrupt handler.\n+\n+An L-thread context switch is achieved by the thread itself making a function\n+call to the L-thread scheduler. Thus it is only necessary to preserve the\n+callee registers. The caller is responsible to save and restore any other\n+registers it is using before a function call, and restore them on return,\n+and this is handled by the compiler. For ``X86_64`` on both Linux and BSD the\n+System V calling convention is used, this defines registers RSP, RBP, and\n+R12-R15 as callee-save registers (for more detailed discussion a good reference\n+is `X86 Calling Conventions <https://en.wikipedia.org/wiki/X86_calling_conventions>`_).\n+\n+Taking advantage of this, and due to the absence of preemption, an L-thread\n+context switch is achieved with less than 20 load/store instructions.\n+\n+The scheduling policy for L-threads is fixed, there is no prioritization of\n+L-threads, all L-threads are equal and scheduling is based on a FIFO\n+ready queue.\n+\n+An L-thread is a struct containing the CPU context of the thread\n+(saved on context switch) and other useful items. The ready queue contains\n+pointers to threads that are ready to run. The L-thread scheduler is a simple\n+loop that polls the ready queue, reads from it the next thread ready to run,\n+which it resumes by saving the current context (the current position in the\n+scheduler loop) and restoring the context of the next thread from its thread\n+struct. Thus an L-thread is always resumed at the last place it yielded.\n+\n+A well behaved L-thread will call the context switch regularly (at least once\n+in its main loop) thus returning to the scheduler's own main loop. Yielding\n+inserts the current thread at the back of the ready queue, and the process of\n+servicing the ready queue is repeated, thus the system runs by flipping back\n+and forth the between L-threads and scheduler loop.\n+\n+In the case of pthreads, the preemptive scheduling, time slicing, and support\n+for thread prioritization means that progress is normally possible for any\n+thread that is ready to run. This comes at the price of a relatively heavier\n+context switch and scheduling overhead.\n+\n+With L-threads the progress of any particular thread is determined by the\n+frequency of rescheduling opportunities in the other L-threads. This means that\n+an errant L-thread monopolizing the CPU might cause scheduling of other threads\n+to be stalled. Due to the lower cost of context switching, however, voluntary\n+rescheduling to ensure progress of other threads, if managed sensibly, is not\n+a prohibitive overhead, and overall performance can exceed that of an\n+application using pthreads.\n+\n+\n+Mutual exclusion\n+^^^^^^^^^^^^^^^^\n+\n+With pthreads preemption means that threads that share data must observe\n+some form of mutual exclusion protocol.\n+\n+The fact that L-threads cannot preempt each other means that in many cases\n+mutual exclusion devices can be completely avoided.\n+\n+Locking to protect shared data can be a significant bottleneck in\n+multi-threaded applications so a carefully designed cooperatively scheduled\n+program can enjoy significant performance advantages.\n+\n+So far we have considered only the simplistic case of a single core CPU,\n+when multiple CPUs are considered things are somewhat more complex.\n+\n+First of all it is inevitable that there must be multiple L-thread schedulers,\n+one running on each EAL thread. So long as these schedulers remain isolated\n+from each other the above assertions about the potential advantages of\n+cooperative scheduling hold true.\n+\n+A configuration with isolated cooperative schedulers is less flexible than the\n+pthread model where threads can be affinitized to run on any CPU. With isolated\n+schedulers scaling of applications to utilize fewer or more CPUs according to\n+system demand is very difficult to achieve.\n+\n+The L-thread subsystem makes it possible for L-threads to migrate between\n+schedulers running on different CPUs. Needless to say if the migration means\n+that threads that share data end up running on different CPUs then this will\n+introduce the need for some kind of mutual exclusion system.\n+\n+Of course ``rte_ring`` software rings can always be used to interconnect\n+threads running on different cores, however to protect other kinds of shared\n+data structures, lock free constructs or else explicit locking will be\n+required. This is a consideration for the application design.\n+\n+In support of this extended functionality, the L-thread subsystem implements\n+thread safe mutexes and condition variables.\n+\n+The cost of affinitizing and of condition variable signaling is significantly\n+lower than the equivalent pthread operations, and so applications using these\n+features will see a performance benefit.\n+\n+\n+Thread local storage\n+^^^^^^^^^^^^^^^^^^^^\n+\n+As with applications written for pthreads an application written for L-threads\n+can take advantage of thread local storage, in this case local to an L-thread.\n+An application may save and retrieve a single pointer to application data in\n+the L-thread struct.\n+\n+For legacy and backward compatibility reasons two alternative methods are also\n+offered, the first is modelled directly on the pthread get/set specific APIs,\n+the second approach is modelled on the ``RTE_PER_LCORE`` macros, whereby\n+``PER_LTHREAD`` macros are introduced, in both cases the storage is local to\n+the L-thread.\n+\n+\n+.. _constraints_and_performance_implications:\n+\n+Constraints and performance implications when using L-threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+\n+.. _API_compatibility:\n+\n+API compatibility\n+^^^^^^^^^^^^^^^^^\n+\n+The L-thread subsystem provides a set of functions that are logically equivalent\n+to the corresponding functions offered by the POSIX pthread library, however not\n+all pthread functions have a corresponding L-thread equivalent, and not all\n+features available to pthreads are implemented for L-threads.\n+\n+The pthread library offers considerable flexibility via programmable attributes\n+that can be associated with threads, mutexes, and condition variables.\n+\n+By contrast the L-thread subsystem has fixed functionality, the scheduler policy\n+cannot be varied, and L-threads cannot be prioritized. There are no variable\n+attributes associated with any L-thread objects. L-threads, mutexes and\n+conditional variables, all have fixed functionality. (Note: reserved parameters\n+are included in the APIs to facilitate possible future support for attributes).\n+\n+The table below lists the pthread and equivalent L-thread APIs with notes on\n+differences and/or constraints. Where there is no L-thread entry in the table,\n+then the L-thread subsystem provides no equivalent function.\n+\n+.. _table_lthread_pthread:\n+\n+.. table:: Pthread and equivalent L-thread APIs.\n+\n+   +----------------------------+------------------------+-------------------+\n+   | **Pthread function**       | **L-thread function**  | **Notes**         |\n+   +============================+========================+===================+\n+   | pthread_barrier_destroy    |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_barrier_init       |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_barrier_wait       |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_cond_broadcast     | lthread_cond_broadcast | See note 1        |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_cond_destroy       | lthread_cond_destroy   |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_cond_init          | lthread_cond_init      |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_cond_signal        | lthread_cond_signal    | See note 1        |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_cond_timedwait     |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_cond_wait          | lthread_cond_wait      | See note 5        |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_create             | lthread_create         | See notes 2, 3    |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_detach             | lthread_detach         | See note 4        |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_equal              |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_exit               | lthread_exit           |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_getspecific        | lthread_getspecific    |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_getcpuclockid      |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_join               | lthread_join           |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_key_create         | lthread_key_create     |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_key_delete         | lthread_key_delete     |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_mutex_destroy      | lthread_mutex_destroy  |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_mutex_init         | lthread_mutex_init     |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_mutex_lock         | lthread_mutex_lock     | See note 6        |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_mutex_trylock      | lthread_mutex_trylock  | See note 6        |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_mutex_timedlock    |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_mutex_unlock       | lthread_mutex_unlock   |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_once               |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_destroy     |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_init        |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_rdlock      |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_timedrdlock |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_timedwrlock |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_tryrdlock   |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_trywrlock   |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_unlock      |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_rwlock_wrlock      |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_self               | lthread_current        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_setspecific        | lthread_setspecific    |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_spin_init          |                        | See note 10       |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_spin_destroy       |                        | See note 10       |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_spin_lock          |                        | See note 10       |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_spin_trylock       |                        | See note 10       |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_spin_unlock        |                        | See note 10       |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_cancel             | lthread_cancel         |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_setcancelstate     |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_setcanceltype      |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_testcancel         |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_getschedparam      |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_setschedparam      |                        |                   |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_yield              | lthread_yield          | See note 7        |\n+   +----------------------------+------------------------+-------------------+\n+   | pthread_setaffinity_np     | lthread_set_affinity   | See notes 2, 3, 8 |\n+   +----------------------------+------------------------+-------------------+\n+   |                            | lthread_sleep          | See note 9        |\n+   +----------------------------+------------------------+-------------------+\n+   |                            | lthread_sleep_clks     | See note 9        |\n+   +----------------------------+------------------------+-------------------+\n+\n+\n+**Note 1**:\n+\n+Neither lthread signal nor broadcast may be called concurrently by L-threads\n+running on different schedulers, although multiple L-threads running in the\n+same scheduler may freely perform signal or broadcast operations. L-threads\n+running on the same or different schedulers may always safely wait on a\n+condition variable.\n+\n+\n+**Note 2**:\n+\n+Pthread attributes may be used to affinitize a pthread with a cpu-set. The\n+L-thread subsystem does not support a cpu-set. An L-thread may be affinitized\n+only with a single CPU at any time.\n+\n+\n+**Note 3**:\n+\n+If an L-thread is intended to run on a different NUMA node than the node that\n+creates the thread then, when calling ``lthread_create()`` it is advantageous\n+to specify the destination core as a parameter of ``lthread_create()``. See\n+:ref:`memory_allocation_and_NUMA_awareness` for details.\n+\n+\n+**Note 4**:\n+\n+An L-thread can only detach itself, and cannot detach other L-threads.\n+\n+\n+**Note 5**:\n+\n+A wait operation on a pthread condition variable is always associated with and\n+protected by a mutex which must be owned by the thread at the time it invokes\n+``pthread_wait()``. By contrast L-thread condition variables are thread safe\n+(for waiters) and do not use an associated mutex. Multiple L-threads (including\n+L-threads running on other schedulers) can safely wait on a L-thread condition\n+variable. As a consequence the performance of an L-thread condition variables\n+is typically an order of magnitude faster than its pthread counterpart.\n+\n+\n+**Note 6**:\n+\n+Recursive locking is not supported with L-threads, attempts to take a lock\n+recursively will be detected and rejected.\n+\n+\n+**Note 7**:\n+\n+``lthread_yield()`` will save the current context, insert the current thread\n+to the back of the ready queue, and resume the next ready thread. Yielding\n+increases ready queue backlog, see :ref:`ready_queue_backlog` for more details\n+about the implications of this.\n+\n+\n+N.B. The context switch time as measured from immediately before the call to\n+``lthread_yield()`` to the point at which the next ready thread is resumed,\n+can be an order of magnitude faster that the same measurement for\n+pthread_yield.\n+\n+\n+**Note 8**:\n+\n+``lthread_set_affinity()`` is similar to a yield apart from the fact that the\n+yielding thread is inserted into a peer ready queue of another scheduler.\n+The peer ready queue is actually a separate thread safe queue, which means that\n+threads appearing in the peer ready queue can jump any backlog in the local\n+ready queue on the destination scheduler.\n+\n+The context switch time as measured from the time just before the call to\n+``lthread_set_affinity()`` to just after the same thread is resumed on the new\n+scheduler can be orders of magnitude faster than the same measurement for\n+``pthread_setaffinity_np()``.\n+\n+\n+**Note 9**:\n+\n+Although there is no ``pthread_sleep()`` function, ``lthread_sleep()`` and\n+``lthread_sleep_clks()`` can be used wherever ``sleep()``, ``usleep()`` or\n+``nanosleep()`` might ordinarily be used. The L-thread sleep functions suspend\n+the current thread, start an ``rte_timer`` and resume the thread when the\n+timer matures. The ``rte_timer_manage()`` entry point is called on every pass\n+of the scheduler loop. This means that the worst case jitter on timer expiry\n+is determined by the longest period between context switches of any running\n+L-threads.\n+\n+In a synthetic test with many threads sleeping and resuming then the measured\n+jitter is typically orders of magnitude lower than the same measurement made\n+for ``nanosleep()``.\n+\n+\n+**Note 10**:\n+\n+Spin locks are not provided because they are problematical in a cooperative\n+environment, see :ref:`porting_locks_and_spinlocks` for a more detailed\n+discussion on how to avoid spin locks.\n+\n+\n+.. _Thread_local_storage_performance:\n+\n+Thread local storage\n+^^^^^^^^^^^^^^^^^^^^\n+\n+Of the three L-thread local storage options the simplest and most efficient is\n+storing a single application data pointer in the L-thread struct.\n+\n+The ``PER_LTHREAD`` macros involve a run time computation to obtain the address\n+of the variable being saved/retrieved and also require that the accesses are\n+de-referenced  via a pointer. This means that code that has used\n+``RTE_PER_LCORE`` macros being ported to L-threads might need some slight\n+adjustment (see :ref:`porting_thread_local_storage` for hints about porting\n+code that makes use of thread local storage).\n+\n+The get/set specific APIs are consistent with their pthread counterparts both\n+in use and in performance.\n+\n+\n+.. _memory_allocation_and_NUMA_awareness:\n+\n+Memory allocation and NUMA awareness\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+All memory allocation is from DPDK huge pages, and is NUMA aware. Each\n+scheduler maintains its own caches of objects: lthreads, their stacks, TLS,\n+mutexes and condition variables. These caches are implemented as unbounded lock\n+free MPSC queues. When objects are created they are always allocated from the\n+caches on the local core (current EAL thread).\n+\n+If an L-thread has been affinitized to a different scheduler, then it can\n+always safely free resources to the caches from which they originated (because\n+the caches are MPSC queues).\n+\n+If the L-thread has been affinitized to a different NUMA node then the memory\n+resources associated with it may incur longer access latency.\n+\n+The commonly used pattern of setting affinity on entry to a thread after it has\n+started, means that memory allocation for both the stack and TLS will have been\n+made from caches on the NUMA node on which the threads creator is running.\n+This has the side effect that access latency will be sub-optimal after\n+affinitizing.\n+\n+This side effect can be mitigated to some extent (although not completely) by\n+specifying the destination CPU as a parameter of ``lthread_create()`` this\n+causes the L-thread's stack and TLS to be allocated when it is first scheduled\n+on the destination scheduler, if the destination is a on another NUMA node it\n+results in a more optimal memory allocation.\n+\n+Note that the lthread struct itself remains allocated from memory on the\n+creating node, this is unavoidable because an L-thread is known everywhere by\n+the address of this struct.\n+\n+\n+.. _object_cache_sizing:\n+\n+Object cache sizing\n+^^^^^^^^^^^^^^^^^^^\n+\n+The per lcore object caches pre-allocate objects in bulk whenever a request to\n+allocate an object finds a cache empty. By default 100 objects are\n+pre-allocated, this is defined by ``LTHREAD_PREALLOC`` in the public API\n+header file lthread_api.h. This means that the caches constantly grow to meet\n+system demand.\n+\n+In the present implementation there is no mechanism to reduce the cache sizes\n+if system demand reduces. Thus the caches will remain at their maximum extent\n+indefinitely.\n+\n+A consequence of the bulk pre-allocation of objects is that every 100 (default\n+value) additional new object create operations results in a call to\n+``rte_malloc()``. For creation of objects such as L-threads, which trigger the\n+allocation of even more objects (i.e. their stacks and TLS) then this can\n+cause outliers in scheduling performance.\n+\n+If this is a problem the simplest mitigation strategy is to dimension the\n+system, by setting the bulk object pre-allocation size to some large number\n+that you do not expect to be exceeded. This means the caches will be populated\n+once only, the very first time a thread is created.\n+\n+\n+.. _Ready_queue_backlog:\n+\n+Ready queue backlog\n+^^^^^^^^^^^^^^^^^^^\n+\n+One of the more subtle performance considerations is managing the ready queue\n+backlog. The fewer threads that are waiting in the ready queue then the faster\n+any particular thread will get serviced.\n+\n+In a naive L-thread application with N L-threads simply looping and yielding,\n+this backlog will always be equal to the number of L-threads, thus the cost of\n+a yield to a particular L-thread will be N times the context switch time.\n+\n+This side effect can be mitigated by arranging for threads to be suspended and\n+wait to be resumed, rather than polling for work by constantly yielding.\n+Blocking on a mutex or condition variable or even more obviously having a\n+thread sleep if it has a low frequency workload are all mechanisms by which a\n+thread can be excluded from the ready queue until it really does need to be\n+run. This can have a significant positive impact on performance.\n+\n+\n+.. _Initialization_and_shutdown_dependencies:\n+\n+Initialization, shutdown and dependencies\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+The L-thread subsystem depends on DPDK for huge page allocation and depends on\n+the ``rte_timer subsystem``. The DPDK EAL initialization and\n+``rte_timer_subsystem_init()`` **MUST** be completed before the L-thread sub\n+system can be used.\n+\n+Thereafter initialization of the L-thread subsystem is largely transparent to\n+the application. Constructor functions ensure that global variables are properly\n+initialized. Other than global variables each scheduler is initialized\n+independently the first time that an L-thread is created by a particular EAL\n+thread.\n+\n+If the schedulers are to be run as isolated and independent schedulers, with\n+no intention that L-threads running on different schedulers will migrate between\n+schedulers or synchronize with L-threads running on other schedulers, then\n+initialization consists simply of creating an L-thread, and then running the\n+L-thread scheduler.\n+\n+If there will be interaction between L-threads running on different schedulers,\n+then it is important that the starting of schedulers on different EAL threads\n+is synchronized.\n+\n+To achieve this an additional initialization step is necessary, this is simply\n+to set the number of schedulers by calling the API function\n+``lthread_num_schedulers_set(n)``, where ``n`` is the number of EAL threads\n+that will run L-thread schedulers. Setting the number of schedulers to a\n+number greater than 0 will cause all schedulers to wait until the others have\n+started before beginning to schedule L-threads.\n+\n+The L-thread scheduler is started by calling the function ``lthread_run()``\n+and should be called from the EAL thread and thus become the main loop of the\n+EAL thread.\n+\n+The function ``lthread_run()``, will not return until all threads running on\n+the scheduler have exited, and the scheduler has been explicitly stopped by\n+calling ``lthread_scheduler_shutdown(lcore)`` or\n+``lthread_scheduler_shutdown_all()``.\n+\n+All these function do is tell the scheduler that it can exit when there are no\n+longer any running L-threads, neither function forces any running L-thread to\n+terminate. Any desired application shutdown behavior must be designed and\n+built into the application to ensure that L-threads complete in a timely\n+manner.\n+\n+**Important Note:** It is assumed when the scheduler exits that the application\n+is terminating for good, the scheduler does not free resources before exiting\n+and running the scheduler a subsequent time will result in undefined behavior.\n+\n+\n+.. _porting_legacy_code_to_run_on_lthreads:\n+\n+Porting legacy code to run on L-threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+Legacy code originally written for a pthread environment may be ported to\n+L-threads if the considerations about differences in scheduling policy, and\n+constraints discussed in the previous sections can be accommodated.\n+\n+This section looks in more detail at some of the issues that may have to be\n+resolved when porting code.\n+\n+\n+.. _pthread_API_compatibility:\n+\n+pthread API compatibility\n+^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+The first step is to establish exactly which pthread APIs the legacy\n+application uses, and to understand the requirements of those APIs. If there\n+are corresponding L-lthread APIs, and where the default pthread functionality\n+is used by the application then, notwithstanding the other issues discussed\n+here, it should be feasible to run the application with L-threads. If the\n+legacy code modifies the default behavior using attributes then if may be\n+necessary to make some adjustments to eliminate those requirements.\n+\n+\n+.. _blocking_system_calls:\n+\n+Blocking system API calls\n+^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+It is important to understand what other system services the application may be\n+using, bearing in mind that in a cooperatively scheduled environment a thread\n+cannot block without stalling the scheduler and with it all other cooperative\n+threads. Any kind of blocking system call, for example file or socket IO, is a\n+potential problem, a good tool to analyze the application for this purpose is\n+the ``strace`` utility.\n+\n+There are many strategies to resolve these kind of issues, each with it\n+merits. Possible solutions include:\n+\n+* Adopting a polled mode of the system API concerned (if available).\n+\n+* Arranging for another core to perform the function and synchronizing with\n+  that core via constructs that will not block the L-thread.\n+\n+* Affinitizing the thread to another scheduler devoted (as a matter of policy)\n+  to handling threads wishing to make blocking calls, and then back again when\n+  finished.\n+\n+\n+.. _porting_locks_and_spinlocks:\n+\n+Locks and spinlocks\n+^^^^^^^^^^^^^^^^^^^\n+\n+Locks and spinlocks are another source of blocking behavior that for the same\n+reasons as system calls will need to be addressed.\n+\n+If the application design ensures that the contending L-threads will always\n+run on the same scheduler then it its probably safe to remove locks and spin\n+locks completely.\n+\n+The only exception to the above rule is if for some reason the\n+code performs any kind of context switch whilst holding the lock\n+(e.g. yield, sleep, or block on a different lock, or on a condition variable).\n+This will need to determined before deciding to eliminate a lock.\n+\n+If a lock cannot be eliminated then an L-thread mutex can be substituted for\n+either kind of lock.\n+\n+An L-thread blocking on an L-thread mutex will be suspended and will cause\n+another ready L-thread to be resumed, thus not blocking the scheduler. When\n+default behavior is required, it can be used as a direct replacement for a\n+pthread mutex lock.\n+\n+Spin locks are typically used when lock contention is likely to be rare and\n+where the period during which the lock may be held is relatively short.\n+When the contending L-threads are running on the same scheduler then an\n+L-thread blocking on a spin lock will enter an infinite loop stopping the\n+scheduler completely (see :ref:`porting_infinite_loops` below).\n+\n+If the application design ensures that contending L-threads will always run\n+on different schedulers then it might be reasonable to leave a short spin lock\n+that rarely experiences contention in place.\n+\n+If after all considerations it appears that a spin lock can neither be\n+eliminated completely, replaced with an L-thread mutex, or left in place as\n+is, then an alternative is to loop on a flag, with a call to\n+``lthread_yield()`` inside the loop (n.b. if the contending L-threads might\n+ever run on different schedulers the flag will need to be manipulated\n+atomically).\n+\n+Spinning and yielding is the least preferred solution since it introduces\n+ready queue backlog (see also :ref:`ready_queue_backlog`).\n+\n+\n+.. _porting_sleeps_and_delays:\n+\n+Sleeps and delays\n+^^^^^^^^^^^^^^^^^\n+\n+Yet another kind of blocking behavior (albeit momentary) are delay functions\n+like ``sleep()``, ``usleep()``, ``nanosleep()`` etc. All will have the\n+consequence of stalling the L-thread scheduler and unless the delay is very\n+short (e.g. a very short nanosleep) calls to these functions will need to be\n+eliminated.\n+\n+The simplest mitigation strategy is to use the L-thread sleep API functions,\n+of which two variants exist, ``lthread_sleep()`` and ``lthread_sleep_clks()``.\n+These functions start an rte_timer against the L-thread, suspend the L-thread\n+and cause another ready L-thread to be resumed. The suspended L-thread is\n+resumed when the rte_timer matures.\n+\n+\n+.. _porting_infinite_loops:\n+\n+Infinite loops\n+^^^^^^^^^^^^^^\n+\n+Some applications have threads with loops that contain no inherent\n+rescheduling opportunity, and rely solely on the OS time slicing to share\n+the CPU. In a cooperative environment this will stop everything dead. These\n+kind of loops are not hard to identify, in a debug session you will find the\n+debugger is always stopping in the same loop.\n+\n+The simplest solution to this kind of problem is to insert an explicit\n+``lthread_yield()`` or ``lthread_sleep()`` into the loop. Another solution\n+might be to include the function performed by the loop into the execution path\n+of some other loop that does in fact yield, if this is possible.\n+\n+\n+.. _porting_thread_local_storage:\n+\n+Thread local storage\n+^^^^^^^^^^^^^^^^^^^^\n+\n+If the application uses thread local storage, the use case should be\n+studied carefully.\n+\n+In a legacy pthread application either or both the ``__thread`` prefix, or the\n+pthread set/get specific APIs may have been used to define storage local to a\n+pthread.\n+\n+In some applications it may be a reasonable assumption that the data could\n+or in fact most likely should be placed in L-thread local storage.\n+\n+If the application (like many DPDK applications) has assumed a certain\n+relationship between a pthread and the CPU to which it is affinitized, there\n+is a risk that thread local storage may have been used to save some data items\n+that are correctly logically associated with the CPU, and others items which\n+relate to application context for the thread. Only a good understanding of the\n+application will reveal such cases.\n+\n+If the application requires an that an L-thread is to be able to move between\n+schedulers then care should be taken to separate these kinds of data, into per\n+lcore, and per L-thread storage. In this way a migrating thread will bring with\n+it the local data it needs, and pick up the new logical core specific values\n+from pthread local storage at its new home.\n+\n+\n+.. _pthread_shim:\n+\n+Pthread shim\n+~~~~~~~~~~~~\n+\n+A convenient way to get something working with legacy code can be to use a\n+shim that adapts pthread API calls to the corresponding L-thread ones.\n+This approach will not mitigate any of the porting considerations mentioned\n+in the previous sections, but it will reduce the amount of code churn that\n+would otherwise been involved. It is a reasonable approach to evaluate\n+L-threads, before investing effort in porting to the native L-thread APIs.\n+\n+\n+Overview\n+^^^^^^^^\n+The L-thread subsystem includes an example pthread shim. This is a partial\n+implementation but does contain the API stubs needed to get basic applications\n+running. There is a simple \"hello world\" application that demonstrates the\n+use of the pthread shim.\n+\n+A subtlety of working with a shim is that the application will still need\n+to make use of the genuine pthread library functions, at the very least in\n+order to create the EAL threads in which the L-thread schedulers will run.\n+This is the case with DPDK initialization, and exit.\n+\n+To deal with the initialization and shutdown scenarios, the shim is capable of\n+switching on or off its adaptor functionality, an application can control this\n+behavior by the calling the function ``pt_override_set()``. The default state\n+is disabled.\n+\n+The pthread shim uses the dynamic linker loader and saves the loaded addresses\n+of the genuine pthread API functions in an internal table, when the shim\n+functionality is enabled it performs the adaptor function, when disabled it\n+invokes the genuine pthread function.\n+\n+The function ``pthread_exit()`` has additional special handling. The standard\n+system header file pthread.h declares ``pthread_exit()`` with\n+``__attribute__((noreturn))`` this is an optimization that is possible because\n+the pthread is terminating and this enables the compiler to omit the normal\n+handling of stack and protection of registers since the function is not\n+expected to return, and in fact the thread is being destroyed. These\n+optimizations are applied in both the callee and the caller of the\n+``pthread_exit()`` function.\n+\n+In our cooperative scheduling environment this behavior is inadmissible. The\n+pthread is the L-thread scheduler thread, and, although an L-thread is\n+terminating, there must be a return to the scheduler in order that the system\n+can continue to run. Further, returning from a function with attribute\n+``noreturn`` is invalid and may result in undefined behavior.\n+\n+The solution is to redefine the ``pthread_exit`` function with a macro,\n+causing it to be mapped to a stub function in the shim that does not have the\n+``noreturn`` attribute. This macro is defined in the file\n+``pthread_shim.h``. The stub function is otherwise no different than any of\n+the other stub functions in the shim, and will switch between the real\n+``pthread_exit()`` function or the ``lthread_exit()`` function as\n+required. The only difference is that the mapping to the stub by macro\n+substitution.\n+\n+A consequence of this is that the file ``pthread_shim.h`` must be included in\n+legacy code wishing to make use of the shim. It also means that dynamic\n+linkage of a pre-compiled binary that did not include pthread_shim.h is not be\n+supported.\n+\n+Given the requirements for porting legacy code outlined in\n+:ref:`porting_legacy_code_to_run_on_lthreads` most applications will require at\n+least some minimal adjustment and recompilation to run on L-threads so\n+pre-compiled binaries are unlikely to be met in practice.\n+\n+In summary the shim approach adds some overhead but can be a useful tool to help\n+establish the feasibility of a code reuse project. It is also a fairly\n+straightforward task to extend the shim if necessary.\n+\n+**Note:** Bearing in mind the preceding discussions about the impact of making\n+blocking calls then switching the shim in and out on the fly to invoke any\n+pthread API this might block is something that should typically be avoided.\n+\n+\n+Building and running the pthread shim\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+The shim example application is located in the sample application\n+in the performance-thread folder\n+\n+To build and run the pthread shim example\n+\n+#. Go to the example applications folder\n+\n+   .. code-block:: console\n+\n+       export RTE_SDK=/path/to/rte_sdk\n+       cd ${RTE_SDK}/examples/performance-thread/pthread_shim\n+\n+\n+#. Set the target (a default target is used if not specified). For example:\n+\n+   .. code-block:: console\n+\n+       export RTE_TARGET=x86_64-native-linuxapp-gcc\n+\n+   See the DPDK Getting Started Guide for possible RTE_TARGET values.\n+\n+#. Build the application:\n+\n+   .. code-block:: console\n+\n+       make\n+\n+#. To run the pthread_shim example\n+\n+   .. code-block:: console\n+\n+       lthread-pthread-shim -c core_mask -n number_of_channels\n+\n+.. _lthread_diagnostics:\n+\n+L-thread Diagnostics\n+~~~~~~~~~~~~~~~~~~~~\n+\n+When debugging you must take account of the fact that the L-threads are run in\n+a single pthread. The current scheduler is defined by\n+``RTE_PER_LCORE(this_sched)``, and the current lthread is stored at\n+``RTE_PER_LCORE(this_sched)->current_lthread``. Thus on a breakpoint in a GDB\n+session the current lthread can be obtained by displaying the pthread local\n+variable ``per_lcore_this_sched->current_lthread``.\n+\n+Another useful diagnostic feature is the possibility to trace significant\n+events in the life of an L-thread, this feature is enabled by changing the\n+value of LTHREAD_DIAG from 0 to 1 in the file ``lthread_diag_api.h``.\n+\n+Tracing of events can be individually masked, and the mask may be programmed\n+at run time. An unmasked event results in a callback that provides information\n+about the event. The default callback simply prints trace information. The\n+default mask is 0 (all events off) the mask can be modified by calling the\n+function ``lthread_diagniostic_set_mask()``.\n+\n+It is possible register a user callback function to implement more\n+sophisticated diagnostic functions.\n+Object creation events (lthread, mutex, and condition variable) accept, and\n+store in the created object, a user supplied reference value returned by the\n+callback function.\n+\n+The lthread reference value is passed back in all subsequent event callbacks,\n+the mutex and APIs are provided to retrieve the reference value from\n+mutexes and condition variables. This enables a user to monitor, count, or\n+filter for specific events, on specific objects, for example to monitor for a\n+specific thread signalling a specific condition variable, or to monitor\n+on all timer events, the possibilities and combinations are endless.\n+\n+The callback function can be set by calling the function\n+``lthread_diagnostic_enable()`` supplying a callback function pointer and an\n+event mask.\n+\n+Setting ``LTHREAD_DIAG`` also enables counting of statistics about cache and\n+queue usage, and these statistics can be displayed by calling the function\n+``lthread_diag_stats_display()``. This function also performs a consistency\n+check on the caches and queues. The function should only be called from the\n+master EAL thread after all slave threads have stopped and returned to the C\n+main program, otherwise the consistency check will fail.\n",
    "prefixes": [
        "dpdk-dev",
        "v5",
        "1/4"
    ]
}