From patchwork Tue Mar 10 17:49:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phil Yang X-Patchwork-Id: 66515 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 414E9A0566; Tue, 10 Mar 2020 18:50:03 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5BF811C021; Tue, 10 Mar 2020 18:49:59 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 578151C01E for ; Tue, 10 Mar 2020 18:49:58 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CDEAD1FB; Tue, 10 Mar 2020 10:49:57 -0700 (PDT) Received: from phil-VirtualBox.arm.com (A010647.Arm.com [10.170.243.28]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 413583F534; Tue, 10 Mar 2020 10:49:54 -0700 (PDT) From: Phil Yang To: thomas@monjalon.net, harry.van.haaren@intel.com, konstantin.ananyev@intel.com, stephen@networkplumber.org, maxime.coquelin@redhat.com, dev@dpdk.org Cc: david.marchand@redhat.com, jerinj@marvell.com, hemant.agrawal@nxp.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, ruifeng.wang@arm.com, joyce.kong@arm.com, nd@arm.com Date: Wed, 11 Mar 2020 01:49:02 +0800 Message-Id: <1583862551-2049-2-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1583862551-2049-1-git-send-email-phil.yang@arm.com> References: <1583862551-2049-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH 01/10] doc: add generic atomic deprecation section X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add deprecating the generic rte_atomic_xx APIs to c11 atomic built-ins guide and examples. Suggested-by: Honnappa Nagarahalli Signed-off-by: Phil Yang Reviewed-by: Gavin Hu --- doc/guides/prog_guide/writing_efficient_code.rst | 60 +++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/writing_efficient_code.rst b/doc/guides/prog_guide/writing_efficient_code.rst index 849f63e..b278bc6 100644 --- a/doc/guides/prog_guide/writing_efficient_code.rst +++ b/doc/guides/prog_guide/writing_efficient_code.rst @@ -167,7 +167,13 @@ but with the added cost of lower throughput. Locks and Atomic Operations --------------------------- -Atomic operations imply a lock prefix before the instruction, +This section describes some key considerations when using locks and atomic +operations in the DPDK environment. + +Locks +~~~~~ + +On x86, atomic operations imply a lock prefix before the instruction, causing the processor's LOCK# signal to be asserted during execution of the following instruction. This has a big impact on performance in a multicore environment. @@ -176,6 +182,58 @@ It can often be replaced by other solutions like per-lcore variables. Also, some locking techniques are more efficient than others. For instance, the Read-Copy-Update (RCU) algorithm can frequently replace simple rwlocks. +Atomic Operations: Use C11 Atomic Built-ins +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +DPDK `generic rte_atomic `_ operations are +implemented by `__sync built-ins `_. +These __sync built-ins result in full barriers on aarch64, which are unnecessary +in many use cases. They can be replaced by `__atomic built-ins `_ that +conform to the C11 memory model and provide finer memory order control. + +So replacing the rte_atomic operations with __atomic built-ins might improve +performance for aarch64 machines. `More details `_. + +Some typical optimization cases are listed below: + +Atomicity +^^^^^^^^^ + +Some use cases require atomicity alone, the ordering of the memory operations +does not matter. For example the packets statistics in the `vhost `_ example application. + +It just updates the number of transmitted packets, no subsequent logic depends +on these counters. So the RELAXED memory ordering is sufficient: + +.. code-block:: c + + static __rte_always_inline void + virtio_xmit(struct vhost_dev *dst_vdev, struct vhost_dev *src_vdev, + struct rte_mbuf *m) + { + ... + ... + if (enable_stats) { + __atomic_add_fetch(&dst_vdev->stats.rx_total_atomic, 1, __ATOMIC_RELAXED); + __atomic_add_fetch(&dst_vdev->stats.rx_atomic, ret, __ATOMIC_RELAXED); + ... + } + } + +One-way Barrier +^^^^^^^^^^^^^^^ + +Some use cases allow for memory reordering in one way while requiring memory +ordering in the other direction. + +For example, the memory operations before the `lock `_ can move to the +critical section, but the memory operations in the critical section cannot move +above the lock. In this case, the full memory barrier in the CAS operation can +be replaced to ACQUIRE. On the other hand, the memory operations after the +`unlock `_ can move to the critical section, but the memory operations in the +critical section cannot move below the unlock. So the full barrier in the STORE +operation can be replaced with RELEASE. + Coding Considerations ---------------------