[v2] ring: fix c11 memory ordering issue

From: He, Jia <jia.he@hxt-semitech.com>

  This patch includes two bug fixes(#1 and 2) and two optimisations(#3
and #4).
1) In update_tail, read ht->tail using __atomic_load.Although the
   compiler currently seems to be doing the right thing even without
   _atomic_load, we don't want to give the compiler freedom to optimise
   what should be an atomic load, it should not be arbitarily moved
   around.
2) Synchronize the load-acquire of the tail and the store-release
   within update_tail, the store release ensures all the ring operations,
   engqueu or dequeue are seen by the observers as soon as they see
   the updated tail. The load-acquire is required for correctly compu-
   tate the free_entries or avail_entries, respectively for enqueue and
   dequeue operations, the data dependency is not reliable for ordering
   as the compiler might break it by saving to temporary values to boost
   performance.
3) In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
   the do {} while loop as upon failure the old_head will be updated,
   another load is costy and not necessary.
4) When calling __atomic_compare_exchange_n, relaxed ordering for both
   success and failure, as multiple threads can work independently on
   the same end of the ring (either enqueue or dequeue) without
   synchronization, not as operating on tail, which has to be finished
   in sequence.

The patch was benchmarked with test/ring_perf_autotest and it decreases
the enqueue/dequeue latency by 5% ~ 24.6% with two lcores, the real gains
are dependent on the number of lcores, depth of the ring, SPSC or MPMC.
For 1 lcore, it also improves a little, about 3 ~ 4%.
It is a big improvement, in case of MPMC, with rings size of 32, it saves
latency up to (6.90-5.20)/6.90 = 24.6%.

Test result data:

SP/SC bulk enq/dequeue (size: 8): 13.19
MP/MC bulk enq/dequeue (size: 8): 25.79
SP/SC bulk enq/dequeue (size: 32): 3.85
MP/MC bulk enq/dequeue (size: 32): 6.90

SP/SC bulk enq/dequeue (size: 8): 12.05
MP/MC bulk enq/dequeue (size: 8): 23.06
SP/SC bulk enq/dequeue (size: 32): 3.62
MP/MC bulk enq/dequeue (size: 32): 5.20

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
---
 lib/librte_ring/rte_ring_c11_mem.h | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

Message ID	20180807031943.5331-1-gavin.hu@arm.com (mailing list archive)
State	Superseded, archived
Delegated to:	Thomas Monjalon
Headers	From: Gavin Hu <gavin.hu@arm.com> To: dev@dpdk.org Cc: gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com, steve.capper@arm.com, Ola.Liljedahl@arm.com, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, jia.he@hxt-semitech.com, stable@dpdk.org Date: Tue, 7 Aug 2018 11:19:43 +0800 Message-Id: <20180807031943.5331-1-gavin.hu@arm.com> In-Reply-To: <20180806011805.7857-1-gavin.hu@arm.com> References: <20180806011805.7857-1-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v2] ring: fix c11 memory ordering issue Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	[v2] ring: fix c11 memory ordering issue \| [v2] ring: fix c11 memory ordering issue

Context	Check	Description
ci/checkpatch	success	coding style OK
ci/Intel-compilation	success	Compilation OK

[v2] ring: fix c11 memory ordering issue

Checks

Commit Message

Comments

Patch