get:
Show a patch.

patch:
Update a patch.

put:
Update a patch.

GET /api/patches/7324/?format=api
HTTP 200 OK
Allow: GET, PUT, PATCH, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "id": 7324,
    "url": "https://patches.dpdk.org/api/patches/7324/?format=api",
    "web_url": "https://patches.dpdk.org/project/dpdk/patch/1443623388-29104-2-git-send-email-ian.betts@intel.com/",
    "project": {
        "id": 1,
        "url": "https://patches.dpdk.org/api/projects/1/?format=api",
        "name": "DPDK",
        "link_name": "dpdk",
        "list_id": "dev.dpdk.org",
        "list_email": "dev@dpdk.org",
        "web_url": "http://core.dpdk.org",
        "scm_url": "git://dpdk.org/dpdk",
        "webscm_url": "http://git.dpdk.org/dpdk",
        "list_archive_url": "https://inbox.dpdk.org/dev",
        "list_archive_url_format": "https://inbox.dpdk.org/dev/{}",
        "commit_url_format": ""
    },
    "msgid": "<1443623388-29104-2-git-send-email-ian.betts@intel.com>",
    "list_archive_url": "https://inbox.dpdk.org/dev/1443623388-29104-2-git-send-email-ian.betts@intel.com",
    "date": "2015-09-30T14:29:44",
    "name": "[dpdk-dev,v1,1/5] doc: add performance-thread sample application guide",
    "commit_ref": null,
    "pull_url": null,
    "state": "superseded",
    "archived": true,
    "hash": "df7a900e4f14ad12e39563df9ef02730cd016d71",
    "submitter": {
        "id": 340,
        "url": "https://patches.dpdk.org/api/people/340/?format=api",
        "name": "ibetts",
        "email": "ian.betts@intel.com"
    },
    "delegate": null,
    "mbox": "https://patches.dpdk.org/project/dpdk/patch/1443623388-29104-2-git-send-email-ian.betts@intel.com/mbox/",
    "series": [],
    "comments": "https://patches.dpdk.org/api/patches/7324/comments/",
    "check": "pending",
    "checks": "https://patches.dpdk.org/api/patches/7324/checks/",
    "tags": {},
    "related": [],
    "headers": {
        "Return-Path": "<dev-bounces@dpdk.org>",
        "X-Original-To": "patchwork@dpdk.org",
        "Delivered-To": "patchwork@dpdk.org",
        "Received": [
            "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id 9F61F8DAA;\n\tWed, 30 Sep 2015 16:30:09 +0200 (CEST)",
            "from mga14.intel.com (mga14.intel.com [192.55.52.115])\n\tby dpdk.org (Postfix) with ESMTP id 537BC8D9E\n\tfor <dev@dpdk.org>; Wed, 30 Sep 2015 16:30:07 +0200 (CEST)",
            "from fmsmga002.fm.intel.com ([10.253.24.26])\n\tby fmsmga103.fm.intel.com with ESMTP; 30 Sep 2015 07:29:53 -0700",
            "from irvmail001.ir.intel.com ([163.33.26.43])\n\tby fmsmga002.fm.intel.com with ESMTP; 30 Sep 2015 07:29:52 -0700",
            "from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com\n\t[10.237.217.45])\n\tby irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id\n\tt8UETpvH003355; Wed, 30 Sep 2015 15:29:51 +0100",
            "from sivswdev01.ir.intel.com (localhost [127.0.0.1])\n\tby sivswdev01.ir.intel.com with ESMTP id t8UETpFK029149;\n\tWed, 30 Sep 2015 15:29:51 +0100",
            "(from ibetts@localhost)\n\tby sivswdev01.ir.intel.com with  id t8UETp0g029145;\n\tWed, 30 Sep 2015 15:29:51 +0100"
        ],
        "X-ExtLoop1": "1",
        "X-IronPort-AV": "E=Sophos;i=\"5.17,612,1437462000\"; d=\"scan'208\";a=\"816265024\"",
        "From": "ibetts <ian.betts@intel.com>",
        "To": "dev@dpdk.org",
        "Date": "Wed, 30 Sep 2015 15:29:44 +0100",
        "Message-Id": "<1443623388-29104-2-git-send-email-ian.betts@intel.com>",
        "X-Mailer": "git-send-email 1.7.4.1",
        "In-Reply-To": "<1443623388-29104-1-git-send-email-ian.betts@intel.com>",
        "References": "<1443623388-29104-1-git-send-email-ian.betts@intel.com>",
        "MIME-Version": "1.0",
        "Content-Type": "text/plain; charset=UTF-8",
        "Content-Transfer-Encoding": "8bit",
        "Cc": "Ian Betts <ian.betts@intel.com>",
        "Subject": "[dpdk-dev] =?utf-8?q?=5BPATCH_v1_1/5=5D_doc=3A_add_performance-th?=\n\t=?utf-8?q?read_sample_application_guide?=",
        "X-BeenThere": "dev@dpdk.org",
        "X-Mailman-Version": "2.1.15",
        "Precedence": "list",
        "List-Id": "patches and discussions about DPDK <dev.dpdk.org>",
        "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>",
        "List-Archive": "<http://dpdk.org/ml/archives/dev/>",
        "List-Post": "<mailto:dev@dpdk.org>",
        "List-Help": "<mailto:dev-request@dpdk.org?subject=help>",
        "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>",
        "Errors-To": "dev-bounces@dpdk.org",
        "Sender": "\"dev\" <dev-bounces@dpdk.org>"
    },
    "content": "From: Ian Betts <ian.betts@intel.com>\n\nThis commit adds documentation for the performance-thread\nsample application.\n\nSigned-off-by: Ian Betts <ian.betts@intel.com>\n---\n doc/guides/rel_notes/release_2_2.rst            |    6 +\n doc/guides/sample_app_ug/index.rst              |    1 +\n doc/guides/sample_app_ug/performance_thread.rst | 1221 +++++++++++++++++++++++\n 3 files changed, 1228 insertions(+)\n create mode 100644 doc/guides/sample_app_ug/performance_thread.rst",
    "diff": "diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst\nindex 5687676..e9772d3 100644\n--- a/doc/guides/rel_notes/release_2_2.rst\n+++ b/doc/guides/rel_notes/release_2_2.rst\n@@ -52,6 +52,12 @@ Libraries\n Examples\n ~~~~~~~~\n \n+* **examples: Introducing a performance thread example**\n+\n+  This an l3fwd derivative focused to enable characterization of performance\n+  with different threading models, including multiple EAL threads per physical\n+  core, and multiple Lightweight threads running in an EAL thread.\n+  The examples includes a simple cooperative scheduler.\n \n Other\n ~~~~~\ndiff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst\nindex 9beedd9..70d4a5c 100644\n--- a/doc/guides/sample_app_ug/index.rst\n+++ b/doc/guides/sample_app_ug/index.rst\n@@ -73,6 +73,7 @@ Sample Applications User Guide\n     vm_power_management\n     tep_termination\n     proc_info\n+    performance_thread\n \n **Figures**\n \ndiff --git a/doc/guides/sample_app_ug/performance_thread.rst b/doc/guides/sample_app_ug/performance_thread.rst\nnew file mode 100644\nindex 0000000..497d729\n--- /dev/null\n+++ b/doc/guides/sample_app_ug/performance_thread.rst\n@@ -0,0 +1,1220 @@\n+..  BSD LICENSE\n+    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.\n+    All rights reserved.\n+\n+    Redistribution and use in source and binary forms, with or without\n+    modification, are permitted provided that the following conditions\n+    are met:\n+\n+    * Re-distributions of source code must retain the above copyright\n+    notice, this list of conditions and the following disclaimer.\n+    * Redistributions in binary form must reproduce the above copyright\n+    notice, this list of conditions and the following disclaimer in\n+    the documentation and/or other materials provided with the\n+    distribution.\n+    * Neither the name of Intel Corporation nor the names of its\n+    contributors may be used to endorse or promote products derived\n+    from this software without specific prior written permission.\n+\n+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n+    \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\n+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\n+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\n+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\n+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\n+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\n+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n+\n+\n+Performance Thread Sample Application\n+=====================================\n+\n+The performance thread sample application is a derivative of the standard L3\n+forwarding application that demonstrates different threading models.\n+\n+Overview\n+--------\n+For a general description of the L3 forwarding applications capabilities\n+please refer to the documentation of the standard application in\n+:doc:`l3_forward`.\n+\n+The performance thread sample application differs from the standard L3 forward\n+example in that it divides the TX and Rx processing between different threads,\n+and makes it possible to assign individual threads to different cores.\n+\n+Three threading models are considered:-\n+\n+#.  When there is EAL thread per physical core\n+#.  When there are multiple EAL threads per physical core\n+#.  When there are multiple lightweight threads per EAL thread\n+\n+Since DPDK release 2.0 it is possible to launch applications using the –lcores\n+EAL parameter, specifying CPU sets for a physical core. With the  performance\n+thread sample application its is now also possible to assign individual Rx\n+and TX functions to different cores.\n+\n+As an alternative to dividing the L3 forwarding work between different EAL\n+threads the performance thread sample introduces the possibility to run the\n+application threads as lightweight threads (L-threads) within one or\n+more EAL threads.\n+\n+In order to facilitate this threading model the example includes a primitive\n+cooperative scheduler (L-thread) subsystem. More details of the L-thread\n+subsystem can be found in :ref:`lthread_subsystem`\n+\n+**Note:** Whilst theoretcially possible it is not anticipated that multiple\n+L-thread schedulers would be run on the same physical core, this mode of\n+operataion should not be expected to yield useful performance and is considered\n+invalid.\n+\n+Compiling the Application\n+-------------------------\n+The application is located in the sample application in the\n+performance-thread folder.\n+\n+#.  Go to the example applications folder\n+\n+    .. code-block:: console\n+\n+       export RTE_SDK=/path/to/rte_sdk cd ${RTE_SDK}/examples/performance-thread/l3fwd-thread\n+\n+#.  Set the target (a default target is used if not specified). For example:\n+\n+    .. code-block:: console\n+\n+       export RTE_TARGET=x86_64-native-linuxapp-gcc\n+\n+    See the DPDK Getting Started Guide for possible RTE_TARGET values.\n+\n+#.  Build the application:\n+\n+\tmake\n+\n+\n+\n+Running the Application\n+-----------------------\n+\n+The application has a number of command line options:\n+\n+.. code-block:: console\n+\n+    ./build/l3fwd-thread [EAL options] -- -p PORTMASK [-P] --rx(port,queue,lcore,thread)[,(port,queue,lcore,thread)] --tx(port,lcore,thread)[,(port,lcore,thread)] [--enable-jumbo [--max-pkt-len PKTLEN]]  [--no-numa][--hash-entry-num][--ipv6] [--no-lthreads]\n+\n+where,\n+\n+*   -p PORTMASK: Hexadecimal bitmask of ports to configure\n+\n+*   -P: optional, sets all ports to promiscuous mode so that packets are\n+     accepted regardless of the packet's Ethernet MAC destination address.\n+     Without this option, only packets with the Ethernet MAC destination\n+     address set to the Ethernet address of the port are accepted.\n+\n+*   --rx (port,queue,lcore,thread)[,(port,queue,lcore,thread)]:\n+\tthe list of NIC RX ports and queues handled by the RX lcores and threads\n+\n+*   --tx (port,lcore,thread)[,(port,lcore,thread)]:\n+\tthe list of NIC TX ports handled by the I/O TX lcores and threads.\n+\n+*   --enable-jumbo: optional, enables jumbo frames\n+\n+*   --max-pkt-len: optional, maximum packet length in decimal (64-9600)\n+\n+*   --no-numa: optional, disables numa awareness\n+\n+*   --hash-entry-num: optional, specifies the hash entry number in hex to be setup\n+\n+*   --ipv6: optional, set it if running ipv6 packets\n+\n+*   --no-lthreads: optional, disables lthread model and uses EAL threading model\n+\n+The l3fwd-threads application allows you to start packet processing in two threading\n+models: L-Threads (default) and EAL Threads (when \"--no-lthreads\" parameter is used).\n+For consistency all parameters are used the same way for both models.\n+\n+* rx  parameters\n+\n+.. _table_l3fwd_rx_parameters:\n+\n++--------+------------------------------------------------------+\n+| port   | rx port                                              |\n++--------+------------------------------------------------------+\n+| queue  | rx queue that will be read on the specified rx port  |\n++--------+------------------------------------------------------+\n+| lcore  | core to use for the thread                           |\n++--------+------------------------------------------------------+\n+| thread | thread id (continuously from 0 to N)                 |\n++--------+------------------------------------------------------+\n+\n+\n+* tx parameters\n+\n+.. _table_l3fwd_tx_parameters:\n+\n++--------+------------------------------------------------------+\n+| port   | default port to transmit (if lookup fails nor found) |\n++--------+------------------------------------------------------+\n+| lcore  | core to use for L3 route match and transmit          |\n++--------+------------------------------------------------------+\n+| thread | thread id (continuously from 0 to N)                 |\n++--------+------------------------------------------------------+\n+\n+\n+\n+Running with L-threads\n+~~~~~~~~~~~~~~~~~~~~~~\n+\n+When the L-thread model is used (default option), lcore and thread parameters in\n+--rx/--tx are used to affine threads to the selected scheduler using the rules:\n+\n+**If lcores are the same, l-threads are placed on the same scheduler**\n+\n+**If both lcore and l-thread id are the same, only one l-thread is used and\n+queues / rings are polled inside it**\n+\n+e.g.\n+\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+        --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+        --tx=\"(0,2,0)(1,3,1)(0,4,2)(1,5,3)(0,6,4)(1,7,5)\"\n+\n+Places every l-thread on different lcore\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+        --rx=\"(0,0,0,0)(1,0,0,1)\" \\\n+        --tx=\"(0,1,0)(1,1,1)(0,2,2)(1,2,3)\"\n+\n+Places rx lthreads on lcore 0 and tx l-threads on lcore 1 and 2\n+\n+and so on.\n+\n+Running with EAL threads\n+~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+When --no-lthreads parameter is used, L-threading model is turned off and EAL\n+threads are used for all processing. EAL Threads are enumerated in the same way as L-threads,\n+but --lcores EAL parameter is used to affine thread to the selected cpu-set (scheduler).\n+\n+Thus it is possible to place every Rx and TX thread on different lcores\n+a) If lcore id is the same, only one EAL thread is used and queues / rings are\n+polled inside it.\n+\n+e.g.\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+        --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+        --tx=\"(0,2,0)(1,3,1)(0,4,2)(1,5,3)(0,6,4)(1,7,5)\" \\\n+\t--no-lthreads\n+\n+Places every EAL thread on different lcore.\n+\n+To affine two ore more EAL threads to one cpu-set, eal --lcores parameter is used\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 --lcores=\"(0,1)@0,(2,3)@1,(4,5)@2\" -- -P -p 3 \\\n+        --rx=\"(0,0,0,0)(1,0,1,1)\" \\\n+        --tx=\"(0,2,0)(1,3,1)(0,4,2)(1,5,3)\" \\\n+\t--no-lthreads\n+\n+Places rx EAL threads on lcore 0 and tx eal threads on lcore 1 and 2 and so on.\n+\n+\n+Examples\n+~~~~~~~~\n+\n+For selected scenarios the command line configuration of the application for L-Threads\n+and its corresponding EAL Threads command line can be realized as follows:\n+\n+a) Start every thread on different scheduler (1:1)\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+\t--rx=\"(0,0,0,0)(0,1,1,1)(1,0,2,2)(0,1,3,3)\" \\\n+\t--tx=\"(0,4,0)(1,5,1)\"\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+\t\t--rx=\"(0,0,0,0)(0,1,1,1)(1,0,2,2)(0,1,3,3)\" \\\n+\t\t--tx=\"(0,4,0)(1,5,1)\" \\\n+\t\t--no-lthreads\n+\n+b) Start all threads on one scheduler (N:1)\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+\t\t--rx=\"(0,0,0,0)(0,1,0,1)(1,0,0,2)(0,1,0,3)\" \\\n+\t\t--tx=\"(0,0,0)(1,0,1)\"\n+\n+Example above, starts 6 L-threads on lcore 0.\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 --lcores=\"(0-5)@0\" -- -P -p 3 \\\n+\t\t--rx=\"(0,0,0,0)(0,1,1,1)(1,0,2,2)(0,1,3,3)\" \\\n+\t\t--tx=\"(0,4,0)(1,5,1)\" \\\n+\t\t--no-lthreads\n+\n+Example above, starts 6 EAL threads on cpu-set 0.\n+\n+\n+c) Start threads on different schedulers (N:M)\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 -- -P -p 3 \\\n+\t\t--rx=\"(0,0,0,0)(0,1,0,1)(1,0,0,2)(0,1,0,3)\" \\\n+\t\t--tx=\"(0,1,0)(1,1,1)\"\n+\n+Example above, starts 4 L-threads (0,1,2,3) for rx on lcore 0, and 2 L-threads\n+for tx on lcore 1.\n+\n+    .. code-block:: console\n+\n+l3fwd-thread -c ff -n 2 --lcores=\"(0-3)@0,(4,5)@1\" -- -P -p 3 \\\n+\t\t--rx=\"(0,0,0,0)(0,1,1,1)(1,0,2,2)(0,1,3,3)\" \\\n+\t\t--tx=\"(0,4,0)(1,5,1)\" \\\n+\t\t--no-lthreads\n+\n+Example above, starts 4 EAL threads (0,1,2,3) for rx on cpu-set 0, and\n+2 EAL threads for tx on cpu-set 1.\n+\n+\n+Explanation\n+-----------\n+\n+To a great extent the sample application differs little from the standard L3\n+forwarding application, and readers are advised to familiarize themselves with the\n+material covered in the :doc:`l3_forward` documentation before proceeding.\n+\n+The following explanation is focused on the way threading is handled in the\n+performance thread example.\n+\n+\n+Mode of operation with EAL threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+The performance thread sample application has split the Rx and TX functionality\n+into two different threads, and the pairs of Rx and TX threads are\n+interconnected via software rings. With respect to these rings the Rx threads\n+are producers and the TX threads are consumers.\n+\n+On initialization the tx and rx threads are started according to the command\n+line parameters.\n+\n+The Rx threads poll the network interface queues and post received packets to a\n+TX thread via the corresponding software ring.\n+\n+The TX threads poll software rings, perform the L3 forwarding hash/LPM match,\n+and assemble packet bursts before performing burst transmit on the network\n+interface.\n+\n+As with the standard L3 forward application, burst draining of residual packets\n+is performed periodically with the period calculated from elapsed time using\n+the timestamps counter.\n+\n+\n+Mode of operation with L-threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+Like the EAL thread configuration the application has split the Rx and TX\n+functionality into different threads, and the pairs of Rx and TX threads are\n+interconnected via software rings.\n+\n+On initialization an L-thread scheduler is started on every EAL thread. On all\n+but the master EAL thread only a a dummy L-thread is initially started.\n+The L-thread started on the master EAL thread then spawns other L-threads on\n+different L-thread shedulers according the the command line parameters.\n+\n+The Rx threads poll the network interface queues and post received packets\n+to a TX thread via the corresponding software ring.\n+\n+The ring interface is augmented by means of an L-thread condition variable that\n+enables the TX thread to be suspended when the TX ring is empty. The Rx thread\n+signals the condition whenever it posts to the TX ring, causing the TX thread\n+to be resumed.\n+\n+Additionally the TX L-thread spawns a worker L-thread to take care of\n+polling the software rings, whilst it handles burst draining of the transmit\n+buffer.\n+\n+The worker threads poll the software rings, perform L3 route lookup and\n+assemble packet bursts. If the TX ring is empty the worker thread suspends\n+itself by waiting on the condition variable associated with the ring.\n+\n+Burst draining of residual packets is performed by the  TX thread which sleeps\n+(using an L-thread sleep function) and resumes periodically to flush the TX\n+buffer.\n+\n+This design means that L-threads that have no work, can yield the CPU to other\n+L-threads and avoid having to constantly poll the software rings.\n+\n+\n+.. _lthread_subsystem:\n+\n+The L-thread subsystem\n+----------------------\n+The L-thread subsystem resides in the examples/performance-thread/common\n+directory and is built and linked automatically when building the l3fwd-lthread\n+example.\n+\n+The subsystem provides a simple cooperative scheduler to enable arbitrary\n+functions to run as cooperative threads within a single EAL thread.\n+The subsystem provides a pthread like API that is intended to assist in\n+reuse of legacy code written for POSIX pthreads.\n+\n+The following sections provide some detail on the features, constraints,\n+performance and porting considerations when using L-threads.\n+\n+.. _comparison_between_lthreads_and_pthreads:\n+\n+Comparison between L-threads and POSIX pthreads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+The fundamental difference between the L-thread and pthread models is the\n+way which threads are scheduled. The simplest way to think about this is to\n+consider the case of a processor with a single CPU.  To run multiple threads\n+on a single CPU, then the scheduler must frequently switch between the threads,\n+in order that each thread is able to make timely progress.\n+This is the basis of any multitasking operating system.\n+\n+This section explores the differences between the pthread model and the\n+L-thread model as implemented in the provided L-thread subsystem. If needed a\n+theoretical discussion of preemptive vs cooperative multithreading can be\n+found in any good text on operating system design.\n+\n+Sceduling and context switching\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+The POSIX pthread library provides an application programming interface to\n+create and synchronize threads. Scheduling policy is determined by the host OS,\n+and may be configurable. The OS may use sophisticated rules to determine which\n+thread should be run next, threads may suspend themselves or make other threads\n+ready, and the scheduler may employ a time slice giving each thread a maximum\n+time quantum after which it will be preempted in favor of another thread that\n+is ready to run. To complicate matters further threads may be assigned\n+different scheduling priorities.\n+\n+By contrast the L-thread subsystem is considerably simpler. Logically the\n+L-thread scheduler performs the same multiplexing function for L-threads\n+within a single pthread as the OS scheduler does for pthreads within an\n+application process. The L-thread scheduler is simply the main loop of a\n+pthread, and in so far as the host OS is concerned it is just a regular\n+pthread like any other.  The host OS is oblivious about the existence of and\n+not at all involved in the scheduling of L-threads.\n+\n+The other and most significant difference between the two models is that\n+L-threads are scheduled cooperatively. L-threads cannot not preempt each\n+other, nor can the L-thread scheduler preempt a running L-thread ( i.e.\n+there is no time slicing). The consequence is that programs implemented with\n+L-threads must possess frequent rescheduling points, meaning that they must\n+explicitly and of their own volition return to the scheduler at frequent\n+intervals, in order to allow other L-threads an opportunity to proceed.\n+\n+In both models switching between threads requires that the current CPU\n+context is saved and a new context (belonging to the next thread ready to run)\n+is restored. With pthreads this context switching is handled transparently\n+and the set of CPU registers that must be preserved between context switches\n+is as per an interrupt handler.\n+\n+An L-thread context switch is achieved by the thread itself making a function\n+call to the L-thread scheduler. Thus it is only necessary to preserve the\n+callee registers. The caller is responsible to save and restore any other\n+registers it is using before a function call, and restore them on return,\n+and this is handled by the compiler. For X86_64 on both Linux and BSD the\n+System V calling convention is used, this defines registers RSP,RBP,and R12-R15\n+as callee-save registers (for more detailed discussion a good reference\n+can be found here https://en.wikipedia.org/wiki/X86_calling_conventions).Taking\n+advantage of this, and due to the absence of preemption, an L-thread context\n+switch is acheived with less than 20 load/store instructions.\n+\n+The scheduling policy for L-threads is fixed, there is no prioritization of\n+L-threads, all L-threads are equal and scheduling is based on a FIFO\n+ready queue.\n+\n+An L-thread is a struct containing the CPU context of the thread\n+(saved on context switch) and other useful items. The ready queue contains\n+pointers to threads that are ready to run. The L-thread scheduler is a simple\n+loop that polls the ready queue, reads from it the next thread ready to run,\n+which it resumes by saving the current context (the current position in the\n+scheduler loop) and restoring the context of the next thread from its thread\n+struct. Thus an L-thread is always resumed at the last place it yielded.\n+\n+A well behaved L-thread will call the context switch regularly (at least once\n+in its main loop) thus returning to the schedulers own main loop. Yielding\n+inserts the current thread at the back of the ready queue, and the process of\n+servicing the ready queue is repeated, thus the system runs by flipping back\n+and forth the between L-threads and scheduler loop.\n+\n+In the case of pthreads, the preemptive scheduling, time slicing, and support\n+for thread prioritization means that progress is normally possible for any\n+thread that is ready to run. This comes at the price of a relatively heavier\n+context switch and scheduling overhead.\n+\n+With L-threads the progress of any particular thread is determined by the\n+frequency of rescheduling opportunities in the other L-threads. This means that\n+an errant L-thread monopolizing the CPU might cause scheduling of other threads\n+to be stalled. Due to the lower cost of context switching, however, voluntary\n+rescheduling to ensure progress of other threads, if managed sensibly, is not\n+a prohibitive overhead, and overall performance can exceed that of an\n+application using pthreads.\n+\n+Mutual exclusion\n+^^^^^^^^^^^^^^^^\n+With pthreads preemption means that threads which share data must observe\n+some form of mutual exclusion protocol.\n+\n+The fact that L-threads cannot preempt each other means that mutual exclusion\n+devices can be completely avoided.\n+\n+Locking to protect shared data can be a significant bottleneck in\n+multi-threaded applications so a carefully designed cooperatively scheduled\n+program can enjoy significant performance advantages.\n+\n+So far we have considered only the simplistic case of a single core CPU,\n+when multiple CPUs are considered things are somewhat more complex.\n+\n+First of all it is inevitable that there must be multiple L-thread schedulers,\n+one on each EAL thread. So long as these schedulers remain isolated from each\n+other the above assertions about the potential advantages of cooperative\n+scheduling hold true.\n+\n+A configuration with isolated cooperative schedulers is less flexible than the\n+pthread model where threads can be affined to run on any CPU. With isolated\n+schedulers scaling of applications to utilize fewer or more CPUs accorindg to\n+system demand is very difficult to achieve.\n+\n+The L-thread subsystem makes it possible for L-threads to migrate between\n+schedulers running on different CPUs. Needless to say if the migration means\n+that threads that share data end up running on different CPUs then this will\n+introduce the need for some kind mutual exclusion device.\n+\n+Of course rte_ring s/w rings can always be used to interconnect threads running\n+on different cores, however to protect other kinds of shared data structures,\n+lock free constructs or else explicit locking will be required. This is a\n+consideration for the application design.\n+\n+In support of this extended functionality, the L-thread subsystem implements\n+thread safe mutexes and condition variables.\n+\n+The cost of affining and of condition variable signaling is significantly\n+lower than the equivalent pthread operations, and so applications using\n+these features will see a performance benefit.\n+\n+\n+Thread local storage\n+^^^^^^^^^^^^^^^^^^^^\n+\n+As with applications written for pthreads an application written for L-threads\n+can take advantage of thread local storage, in this case local to an L-thread.\n+An application may save and retrieve a single pointer to application data in\n+the L-thread struct.\n+\n+For legacy and backward compatibility reasons two alternative methods are also\n+offered, the first is modelled directly on the pthread get/set specific APIs,\n+the second approach is modelled on the RTE_PER_LCORE macros, whereby PER_LTHREAD\n+macros are introduced, in both cases the storage is local to the L-thread.\n+\n+\n+.. _constraints_and_performance_implications:\n+\n+Constraints and performance implications when using L-threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+\n+.. _API_compatibility:\n+\n+API compatibility\n+^^^^^^^^^^^^^^^^^\n+\n+The L-thread subsystem provides a set of functions that are logically equivalent\n+to the corresponding functions offered by the POSIX pthread library, however not\n+all pthread functions have a corresponding L-thread equivalent, and not all\n+features available to pthreads are implemented for L-threads.\n+\n+The pthread library offers considerable flexibility via programmable attributes\n+that can be associated with threads, mutexes, and condition variables.\n+\n+By contrast the L-thread subsystem has fixed functionality, the scheduler policy\n+cannot be varied, and L-threads cannot be prioritized. There are no variable\n+attributes associated with any L-thread objects. L-threads, mutexs and\n+conditional variables, all have fixed functionality. (Note: reserved parameters\n+are included in the APIs to facilitate possible future support for attributes).\n+\n+The table below lists the pthread and equivalent L-thread APIs with notes on\n+differences and/or constraints. Where there is no L-thread entry in the table,\n+then the L-thread subsystem provides no equivalent function.\n+\n+.. _table_lthread_pthread:\n+\n++-----------------------------+-----------------------------+--------------------+\n+| **Pthread function**        | **L-thread function**       | **Notes**          |\n++=============================+=============================+====================+\n+| pthread_barrier_destroy     |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_barrier_init        |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_barrier_wait        |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_cond_broadcast      | lthread_cond_broadcast      | See note 1         |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_cond_destroy        | lthread_cond_destroy        |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_cond_init           | lthread_cond_init           |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_cond_signal         | lthread_cond_signal         | See note 1         |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_cond_timedwait      |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_cond_wait           | lthread_cond_wait           | See note 5         |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_create              | lthread_create              | See notes 2, 3     |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_detach              | lthread_detach              | See note 4         |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_equal               |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_exit                | lthread_exit                |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_getspecific         | lthread_getspecific         |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_getcpuclockid       |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_join                | lthread_join                |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_key_create          | lthread_key_create          |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_key_delete          | lthread_key_delete          |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_mutex_destroy       | lthread_mutex_destroy       |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_mutex_init          | lthread_mutex_init          |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_mutex_lock          | lthread_mutex_lock          | See note 6         |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_mutex_trylock       | lthread_mutex_trylock       | See note 6         |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_mutex_timedlock     |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_mutex_unlock        | lthread_mutex_unlock        |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_once                |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_destroy      |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_init         |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_rdlock       |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_timedrdlock  |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_timedwrlock  |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_tryrdlock    |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_trywrlock    |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_unlock       |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_rwlock_wrlock       |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_self                | lthread_current             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_setspecific         | lthread_setspecific         |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_spin_init           |                             | See note 10        |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_spin_destroy        |                             | See note 10        |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_spin_lock           |                             | See note 10        |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_spin_trylock        |                             | See note 10        |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_spin_unlock         |                             | See note 10        |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_cancel              | lthread_cancel              |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_setcancelstate      |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_setcanceltype       |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_testcancel          |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_getschedparam       |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_setschedparam       |                             |                    |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_yield               | lthread_yield               | See note 7         |\n++-----------------------------+-----------------------------+--------------------+\n+| pthread_setaffinity_np      | lthread_set_affinity        | See notes 2, 3, 8  |\n++-----------------------------+-----------------------------+--------------------+\n+|                             | lthread_sleep               | See note 9         |\n++-----------------------------+-----------------------------+--------------------+\n+|                             | lthread_sleep_clks          | See note 9         |\n++-----------------------------+-----------------------------+--------------------+\n+\n+\n+Note 1:\n+\n+neither lthread_signal nor broadcast may be called concurrently by L-threads\n+running on different schedulers, although multiple L-threads running in the\n+same scheduler may freely perform signal or broadcast operations. L-threads\n+running on the same or different schedulers may always safely wait on a condition\n+variable.\n+\n+\n+Note 2:\n+\n+pthread attributes may be used to affine a pthread with a cpu-set. The L-thread\n+subsystem does not support a cpu-set. An L-thread may be affined only with a\n+single CPU at any time.\n+\n+\n+Note 3:\n+\n+if an L-thread is intended to run on a different NUMA node than the node that\n+creates it then, when calling lthread_create() it is advantageous to specify\n+the destination core as a parameter of lthread_create()\n+See :ref:`memory_allocation_and_NUMA_awareness` for details.\n+\n+\n+Note 4:\n+\n+an L-thread can only detach itself, and cannot detach other L-threads.\n+\n+\n+Note 5:\n+\n+a wait operation on a pthread condition variable is always associated with and\n+protected by a mutex which must be owned by the thread at the time it invokes\n+pthread_wait(). By contrast L-thread condition variables are thread safe\n+(for waiters) and do not use an associated mutex. Multiple L-threads (including\n+L-threads running on other schedulers) can safely wait on a L-thread condition\n+variable. As a consequence the performance of an L-thread condition variable is\n+typically an order of magnitude faster than its pthread counterpart.\n+\n+\n+Note 6:\n+\n+recursive locking is not supported with L-threads, attempts to take a lock\n+recursively will be detected and rejected.\n+\n+\n+Note 7:\n+\n+lthread_yield() will save the current context, insert the current thread to the\n+back of the ready queue, and resume the next ready thread. Yielding increases\n+ready queue backlog, see :ref:`ready_queue_backlog` for more details about the\n+implications of this.\n+\n+\n+N.B. The context switch time as measured from immediately before the call to\n+lthread_yield() to the point at which the next ready thread is resumed, can be\n+an order of magnitude faster that the same measurement for pthread_yield.\n+\n+\n+Note 8:\n+\n+lthread_set_affinity() is similar to a yield apart from the fact that the\n+yielding thread is inserted into a peer ready queue of another scheduler.\n+The peer ready queue is actually a separate thread safe queue, which means that\n+threads appearing in the peer ready queue can jump any backlog in the local\n+ready queue on the destination scheduler.\n+\n+The context switch time as measured from the time just before the call to\n+lthread_set_affinity() to just after the same thread is resumed on the new\n+scheduler can be orders of magnitude faster than the same measurement for\n+pthread_setaffinity_np().\n+\n+\n+Note 9:\n+\n+although there is no pthread_sleep() function, lthread_sleep() and\n+lthread_sleep_clks() can be used wherever sleep(), usleep() or  nanosleep()\n+might ordinarily be used. The L-thread sleep functions suspend the current\n+thread, start an rte_timer and resume the thread when the timer matures.\n+The rte_timer_manage() entry point is called on every pass of the scheduler\n+loop. This means that the worst case jitter on timer expiry is determined by\n+the longest period between context switches of any running L-threads.\n+\n+In a synthetic test with many threads sleeping and resuming then the measured\n+jitter is typically orders of magnitude lower than the same measurement made\n+for nanosleep().\n+\n+\n+Note 10:\n+\n+spin locks are not provided because they are problematical in a cooperative\n+environment, see :ref:`porting_locks_and_spinlocks` for a more detailed\n+discussion on how to avoid spin locks.\n+\n+.. _Thread_local_storage_performance:\n+\n+Thread local storage\n+^^^^^^^^^^^^^^^^^^^^\n+\n+Of the three L-thread local storage options the simplest and most efficient is\n+storing a single application data pointer in the L-thread struct.\n+\n+The PER_LTHREAD macros involve a run time computation to obtain the address\n+of the variable being saved/retrieved and also require that the accesses are\n+de-referenced  via a pointer. This means that code that has used\n+RTE_PER_LCORE macros being ported to L-threads might need some slight\n+adjustment (see :ref:`porting_thread_local_storage` for hints about porting\n+code that makes use of thread local storage).\n+\n+The get/set specific APIs are consistent with their pthread counterparts both\n+in use and in performance.\n+\n+.. _memory_allocation_and_NUMA_awareness:\n+\n+Memory allocation and NUMA awareness\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+All memory allocation is from DPDK huge pages, and is NUMA aware. Each\n+scheduler maintains its own caches of objects: lthreads, their stacks, TLS,\n+mutexes and condition variables. These caches are implemented as unbounded lock\n+free MPSC queues.  When objects are created they are always allocated from the\n+caches on the local core (current EAL thread).\n+\n+If an L-thread has affined to a different sheduler, then it can always safely\n+free resources to the caches from which they originated (because the caches are\n+MPSC queues).\n+\n+If the L-thread has affined to a different NUMA node then the memory resources\n+associated with it may incur longer access latency.\n+\n+The commonly used pattern of setting affinity on entry to a thread after it has\n+started, means that memory allocation for both the stack and TLS will have been\n+made from caches on the NUMA node on which the threads creator is running.\n+This has the side effect that access latency will be sub-optimal after\n+affining.\n+\n+This side effect can be mitigated to some extent (although not completely) by\n+specifying the destination CPU as a parameter of lthread_create() this causes\n+the L-thread’s stack and TLS to be allocated when it is first scheduled on the\n+destination scheduler, if the destination is a on another NUMA node it results\n+in a more optimal memory allocation.\n+\n+Note that the lthread struct itself remains allocated from memory on the node\n+creating node, this is unavoidable because an L-thread is known everywhere by\n+the address of this struct.\n+\n+.. _object_cache_sizing:\n+\n+Object cache sizing\n+^^^^^^^^^^^^^^^^^^^\n+\n+The per lcore object caches pre-allocate objects in bulk whenever a request to\n+allocate an object finds a cache empty.  By default 100 objects are\n+pre-allocated, this is defined by LTHREAD_PREALLOC in the public API header\n+file lthread_api.h. This means that the caches constantly grow to meet system\n+demand.\n+\n+In the present implementation there is no mechanism to reduce the cache sizes\n+if system demand reduces. Thus the caches will remain at their maximum extent\n+indefinitely.\n+\n+A consequence of the bulk pre-allocation of objects is that every 100\n+(default value) additional new object create operations results in a call to\n+rte_malloc. For creation of objects such as L-threads, which trigger the\n+allocation of even more objects ( i.e. their stacks and TLS) then this can\n+cause outliers in scheduling performance.\n+\n+If this is a problem the simplest mitigation strategy is to dimension the\n+system, by setting the bulk object pre-allocation size to some large number\n+that you do not expect to be exceeded. This means the caches will be populated\n+once only, the very first time a thread is created.\n+\n+.. _Ready_queue_backlog:\n+\n+Ready queue backlog\n+^^^^^^^^^^^^^^^^^^^\n+\n+One of the more subtle performance considerations is managing the ready queue\n+backlog. The fewer threads that are waiting in the ready queue then the faster\n+any particular thread will get serviced.\n+\n+In a naive L-thread application with N L-threads simply looping and yielding,\n+this backlog will always be equal to the number of L-threads, thus the cost of\n+a yield to a particular L-thread will be N times the context switch time.\n+\n+This side effect can be mitigated by arranging for threads to be suspended and\n+waiting to be resumed, rather than polling for work by constantly yielding.\n+Blocking on a mutex or condition variable or even more obviously having a\n+thread sleep if it has a low frequency workload are all mechanisms by which a\n+thread can be excluded from the ready queue until it really does need to be\n+running.  This can have a significant positive impact on performance.\n+\n+.. _Initialization_and_shutdown_dependencies:\n+\n+Initialization, shutdown and dependencies\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+The L-thread subsystem depends on DPDK for huge page allocation and depends on\n+the rte_timer subsystem. The DPDK EAL initialization and\n+rte_timer_subsystem_init()  MUST be completed before the L-thread sub system\n+can be used.\n+\n+Thereafter initialization of the L-thread subsystem is largely transparent to\n+the application. Constructor functions ensure that global variables are properly\n+initialized. Other than global variables each scheduler is initialized\n+independently the first time that an L-thread is created by a particular EAL\n+thread.\n+\n+If the schedulers are to be run as isolated and independent schedulers, with\n+no intention that L-threads running on different schedulers will migrate between\n+schedulers or synchronize with L-threads running on other schedulers, then\n+initialization consists simply of creating an L-thread, and then running the\n+L-thread scheduler.\n+\n+If there will be interaction between L-threads running on different schedulers,\n+then it is important that the starting of schedulers on different EAL threads\n+is synchronized.\n+\n+To achieve this an additional initialization step is necessary, this is simply\n+to set the number of schedulers by calling the API function\n+lthread_num_schedulers_set(n), where n = the number of EAL threads that will\n+run L-thread schedulers. Setting the number of schedulers to a number greater\n+than 0 will cause all schedulers to wait until the others have started before\n+beginning to schedule L-threads.\n+\n+The L-thread scheduler is started by calling the function\n+lthread_scheduler_run() and should be called from the EAL thread and thus\n+become the main loop of the EAL thread.\n+\n+The function lthread_scheduler run(), will not return until all threads running\n+on the scheduler have exited, and the scheduler has been explicitly stopped by\n+calling lthread_scheduler_shutdown(lcore) or lthread_scheduler_shutdown_all().\n+\n+All these function do is tell the scheduler that it can exit when there are no\n+longer any running L-threads, neither function forces any running L-thread to\n+terminate.  Any desired application shutdown behavior must be designed and\n+built into the application to ensure that L-threads complete in a timely\n+manner.\n+\n+**Important Note:** It is assumed when the scheduler exits that the application\n+is terminating for good, the scheduler does not free resources before exiting\n+and running the scheduler a subsequent time will result in undefined behavior.\n+\n+.. _porting_legacy_code_to_run_on_lthreads:\n+\n+Porting legacy code to run on L-threads\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+Legacy code originally written for a pthread environment may be ported to\n+L-threads if the considerations about differences in scheduling policy, and\n+constraints discussed in the previous sections can be accommodated.\n+\n+This section looks in more detail at some of the issues that may have to be\n+resolved when porting code.\n+\n+.. _pthread_API_compatibility:\n+\n+pthread API compatibility\n+^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+The first step is to establish exactly which pthread APIs the legacy\n+application uses, and to understand the requirements of those APIs.  If there\n+are corresponding L-lthread APIs, and where the default pthread functionality\n+is used by the application then, notwithstanding the other issues discussed\n+here, it should be feasible to run the application with L-threads. If the\n+legacy code modifies the default behavior using attributes then if may be\n+necessary to make some adjustments to eliminate those requirements.\n+\n+.. _blocking_system_calls:\n+\n+Blocking system API calls\n+^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+It is important to understand what other system services the application may be\n+using, bearing in mind that in a cooperatively scheduled environment a thread\n+cannot block without stalling the scheduler and with it all other cooperative\n+threads. Any kind of blocking system call, for example file or socket IO, is a\n+potential problem, a good tool to analyze the application for this purpose is\n+the “strace” utility.\n+\n+There are many strategies to resolve these kind of issues, each with it\n+merits. Possible solutions include:-\n+\n+Adopting a polled mode of the system API concerned (if available).\n+\n+Arranging for another core to perform the function and synchronizing with that\n+core via constructs that will not block the L-thread.\n+\n+Affining the thread to another scheduler devoted (as a matter of policy) to\n+handling threads wishing to make blocking calls, and then back again when\n+finished.\n+\n+\n+.. _porting_locks_and_spinlocks:\n+\n+Locks and spinlocks\n+^^^^^^^^^^^^^^^^^^^\n+\n+Locks and spinlocks are another source of blocking behavior that for the same\n+reasons as system calls will need to be addressed.\n+\n+If the application design ensures that the contending L-threads will always\n+run on the same scheduler then it its probably safe to remove locks and spin\n+locks completely, the only exception to this rule is if for some reason the\n+code performs any kind of context switch whilst holding the lock, this will\n+need to determined before deciding to eliminate a lock.\n+\n+If a lock cannot be eliminated then an L-thread mutex can be substituted for\n+either kind of lock.\n+\n+An L-thread blocking on an L-thread mutex will be suspended and will cause\n+another ready L-thread to be resumed, thus not blocking the scheduler. When\n+default behaviour is required, it can be used as a direct replacement for a\n+pthread mutex lock.\n+\n+Spin locks are typically used when lock contention is likely to be rare and\n+where the period during which the lock is held is relatively short.  When the\n+contending L-threads are running on the same scheduler then an L-thread\n+blocking on a spin lock will enter an infinite loop stopping the scheduler\n+completely (see :ref:`porting_infinite_loops` below ).\n+\n+If the application design ensures that contending L-threads will always run\n+on different schedulers then it might be reasonable to leave a short spin lock\n+that rarely experiences contention in place.\n+\n+If after all considerations it appears that a spin lock can neither be\n+eliminated completely, replaced with an L-thread mutex, or left in place as\n+is, then an alternative is to loop on a flag, with a call to lthread_yield()\n+inside the loop ( n.b. if the contending L-threads might ever run on different\n+schedulers the flag will need to be manipulated atomically ).\n+\n+Spinning and yielding is the least preferred solution since it introduces\n+ready queue backlog ( see also :ref:`ready_queue_backlog`).\n+\n+.. _porting_sleeps_and_delays:\n+\n+Sleeps and delays\n+^^^^^^^^^^^^^^^^^\n+\n+Yet another kind of blocking behavior (albeit momentary) are delay functions\n+like sleep(), usleep(), nanosleep() etc. All will have the consequence of\n+stalling the L-thread scheduler and unless the delay is very short ( e.g. a\n+very short nanosleep) calls to these functions will need to be eliminated.\n+\n+The simplest mitigation strategy is to use the L-thread sleep API functions,\n+of which two variants exist, lthread_sleep()  and lthread_sleep_clks().\n+These functions start an rte_timer against the L-thread, suspend the L-thread\n+and cause another ready L-thread to be resumed. The suspended L-thread is\n+resumed when the rte_timer matures.\n+\n+.. _porting_infinite_loops:\n+\n+Infinite loops\n+^^^^^^^^^^^^^^\n+\n+Some applications have threads with loops that contain no inherent\n+rescheduling opportunity, and rely solely on the OS time slicing to share\n+the CPU.  In a cooperative environment this will stop everything dead. These\n+kind of loops are not hard to identify, in a debug session you will find the\n+debugger is always stopping in the same loop.\n+\n+The simplest solution to this kind of problem is to insert an explicit\n+lthread_yield() or lthread_sleep()  into the loop. Another solution might be\n+to include the function performed by the loop into the execution path of some\n+other loop that does in fact yield, if this is possible.\n+\n+.. _porting_thread_local_storage:\n+\n+Thread local storage\n+^^^^^^^^^^^^^^^^^^^^\n+\n+If the application uses thread local storage, the use case should be\n+studied carefully.\n+\n+In a legacy pthread application either or both the __thread  prefix, or the\n+pthread set/get specific APIs may have been used to define storage local\n+to a pthread.\n+\n+In some applications it may be a reasonable assumption that the data could\n+or in fact most likely should be placed in L-thread local storage.\n+\n+If the application (like many DPDK applications) has assumed a certain\n+relationship between a pthread and the CPU to which it is affined, there is\n+a risk that thread local storage may have been used to save some data items\n+that are correctly logically associated with the CPU, and others items which\n+relate to application context for the thread.  Only a good understanding of\n+the application will reveal such cases.\n+\n+If the application requires an that an L-thread is to be able to move between\n+schedulers then care should be taken to separate these kinds of data, into per\n+lcore, and per L-thread storage. In this way a migrating thread will bring with\n+it the local data it needs, and pick up the new logical core specific values\n+from pthread local storage at its new home.\n+\n+.. _pthread_shim:\n+\n+Pthread shim\n+~~~~~~~~~~~~\n+\n+A convenient way to get something working with legacy code can be to use a\n+shim that adapts pthread API calls to the corresponding L-thread ones.\n+This approach will not mitigate any of the porting considerations mentioned\n+in the previous sections, but it will reduce the amount of code churn that\n+would otherwise been involved. It is a reasonable approach to evaluate\n+L-threads, before investing effort in porting to the native L-thread APIs.\n+\n+Overview\n+^^^^^^^^\n+The L-thread subsystem includes an example pthread shim. This is a partial\n+implementation but does contain the API stubs needed to get basic applications\n+running.  There is a simple “hello world” application that demonstrates the\n+use of the pthread shim.\n+\n+A subtlety of working with a shim is that the application will still need\n+to make use of the genuine pthread library functions, at the very least in\n+order to create the EAL threads in which the L-thread schedulers will run.\n+This is the case with DPDK initialization, and exit.\n+\n+To deal with the initialization and shutdown scenarios, the shim is capable\n+of switching on or off its adaptor functionality, an application can control\n+this behavior by the calling the function pt_override_set(). The default state\n+is disabled.\n+\n+(Note: bearing in mind the preceding discussions about the impact of making\n+blocking system API calls in a cooperative environment, then switching the\n+shim in and out on the fly is something that should typically be avoided.)\n+\n+The pthread shim uses the dynamic linker loader and saves the loaded addresses\n+of the genuine pthread API functions in an internal table, when the shim\n+functionality is enabled it performs the adaptor function, when disabled it\n+invokes the genuine pthread function.\n+\n+The function pthread_exit() has additional special handling. The standard\n+system header file pthread.h declares pthread_exit()\n+with __attribute__((noreturn))  this is an optimization that is possible\n+because the pthread is terminating and this enables the compiler to omit the\n+normal handling of stack and protection of registers since the function is not\n+expected to return, and in fact the thread is being destroyed.\n+These optimizations are applied in both the callee and the caller of the\n+pthread_exit() function.\n+\n+In our cooperative scheduling environment this behavior is inadmissible.\n+The pthread is the L-thread scheduler thread, and, although an L-thread is\n+terminating, there must be a return to the scheduler in order that system can\n+continue to run. Further, returning from a function with attribute noreturn is\n+invalid and may result in undefined behavior.\n+\n+The solution is to redefine the pthread_exit function with a macro, causing it\n+to be mapped to a stub function in the shim that does not have the (noreturn)\n+attribute.  This macro is defined in the file pthread_shim.h. The stub function\n+is otherwise no different than any of the other stub functions in the shim,\n+and will switch between the real pthread_exit() function or the lthread_exit()\n+function as required. The only difference is that the mapping to the stub by\n+macro substitution.\n+\n+A consequence of this is that the file pthread_shim.h must be included in\n+legacy code wishing to make use of the shim. It also means that dynamic linkage\n+of a pre-compiled binary that did not include pthread_shim.h is not be supported.\n+\n+Given the requirements for porting legacy code outlined in\n+:ref:`porting_legacy_code_to_run_on_lthreads` most applications will require at\n+least some minimal adjustment and recompilation to run on L-threads so\n+pre-compiled binaries are unlikely to be met in practice.\n+\n+In summary the shim approach adds some overhead but can be a useful tool to help\n+establish the feasibility of a code reuse project. It is also a fairly\n+straightforward task to extend the shim if necessary.\n+\n+**Note:** Bearing in mind the preceding discussions about the impact of making\n+blocking calls then switching the shim in and out on the fly to invoke any\n+pthread API this might block is something that should typically be avoided.\n+\n+\n+Building and running the pthread shim\n+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+\n+The shim example application is located in the sample application\n+in the performance-thread folder\n+\n+To build and run the pthread shim example\n+\n+#.   Go to the example applications folder\n+\n+    .. code-block:: console\n+\n+\texport RTE_SDK=/path/to/rte_sdk cd ${RTE_SDK}/examples/performance-thread/pthread_shim\n+\n+\n+#.   Set the target (a default target is used if not specified). For example:\n+\n+    .. code-block:: console\n+\n+\texport RTE_TARGET=x86_64-native-linuxapp-gcc\n+\n+\tSee the DPDK Getting Started Guide for possible RTE_TARGET values.\n+\n+#.   Build the application:\n+\n+    .. code-block:: console\n+\n+\tmake\n+\n+#.   To run the pthread_shim example\n+\n+    .. code-block:: console\n+\n+\tlthread-pthread-shim –c < core mask ) –n <number of channels >\n+\n+.. _lthread_diagnostics:\n+\n+L-thread Diagnostics\n+~~~~~~~~~~~~~~~~~~~~\n+\n+When debugging you must take account of the fact that the L-threads are run in\n+a single pthread. The current scheduler is defined by\n+RTE_PER_LCORE(this_sched), and the current lthread is stored at\n+RTE_PER_LCORE(this_sched)->current_lthread.\n+Thus on a breakpoint in a GDB session the current lthread can be obtained by\n+displaying the pthread local variable  \"per_lcore_this_sched->current_lthread\".\n+\n+Another useful diagnostic feature is the possibility to trace significant\n+events in the life of an L-thread, this feature is enabled by changing the\n+value of LTHREAD_DIAG from 0 to 1 in the file lthread_diag_api.h.\n+\n+Tracing of events can be individually masked, and the mask may be programmed at\n+run time.\n+An unmasked event results in a callback that provides information\n+about the event. The default callback simply prints trace information.\n+The default mask is 0 (all events off) the mask can be modified by calling the\n+function lthread_diagniostic_set_mask().\n+\n+It is possible register a user callback to implement more sophisticated\n+diagnostic functions.\n+Object creation events (lthread, mutex, and condition variable) accept, and\n+store in the created object, a user supplied reference value from the callback\n+function.\n+\n+The reference value is passed back in all subsequent event callbacks pertaining\n+to the object, enabling a user to monitor, count, or even monitor specific\n+vents, on specific objects, for example to monitor for a specific thread\n+signalling a specific condition variable, or to monitor on all timer events,\n+the possibilities and combinations are endless.\n+\n+The callback can be set by calling the function lthread_diagnostic_enable()\n+supplying a callback and an event mask.\n+\n+Setting LTHREAD_DIAG also enables counting of statistics about cache and\n+queue usage, and these statistics can be displayed by calling the function\n+lthread_diag_stats_display(). This function also performs a consistency check\n+on the caches and queues. The function should only be called from the master\n+EAL thread after all slave threads have stopped and returned to the C main\n+program, otherwise the consistency check will fail.\n",
    "prefixes": [
        "dpdk-dev",
        "v1",
        "1/5"
    ]
}