mbox series

[v8,0/3] generic spinlock optimization and test case enhancements

Message ID 1552031797-146710-1-git-send-email-gavin.hu@arm.com (mailing list archive)
Headers
Series generic spinlock optimization and test case enhancements |

Message

Gavin Hu March 8, 2019, 7:56 a.m. UTC
  V8: Remove internal ChangeId

V7: Update the 1/3 patch headline and commit message

V6: Rebase and drop the first patch as a similar fix was already merged.

V5: Remove ChangeId(sorry for that)

V4:
1. Drop one patch for the test case to get time precisely as the overhead
   of getting time is amortized already in another patch.
2. Drop the ticket lock patch from this series as there are no dependency
   between them, the ticket lock patch was submitted separately: 
   http://patchwork.dpdk.org/patch/49770/
3. Define volatile variable in patch #3 to be more realistic for spinlock
   protection(avoid optimization be compiler).
4. Fix typos.

V3:
1. Implemented the ticket lock to improve the fairness and predictability.
   The locks are obtained in the order of requested.

V2:
1. FORCE_INTRINCIS is still an option for ppc/x86, although not is use
   by default, so don't remove it from generic file.
2. Fix the clang compiler error on x86 when the above FORCE_INTRINSICS
   is enabled.

V1:
1. Remove the 1us delay outside of the locked region to really benchmark
   the spinlock acquire/release performance, not the delay API.
2. Use the precise version of getting timestamps for more precise
   benchmarking results.
3. Amortize the overhead of getting the timestamp by 10000 loops.
4. Move the arm specific implementation to arm folder to remove the
   hardcoded implementation.
5. Use atomic primitives, which translate to one-way barriers, instead of
   two-way sync primitives, to optimize for performance.

Gavin Hu (3):
  test/spinlock: remove 1us delay for correct benchmarking
  test/spinlock: amortize the cost of getting time
  spinlock: reimplement with atomic one-way barrier builtins

 app/test/test_spinlock.c                           | 31 +++++++++++-----------
 .../common/include/generic/rte_spinlock.h          | 18 +++++++++----
 2 files changed, 29 insertions(+), 20 deletions(-)
  

Comments

Nipun Gupta March 11, 2019, 12:21 p.m. UTC | #1
> -----Original Message-----
> From: Gavin Hu [mailto:gavin.hu@arm.com]
> Sent: Friday, March 8, 2019 1:27 PM
> To: dev@dpdk.org
> Cc: nd@arm.com; thomas@monjalon.net; jerinj@marvell.com; Hemant
> Agrawal <hemant.agrawal@nxp.com>; Nipun Gupta
> <nipun.gupta@nxp.com>; Honnappa.Nagarahalli@arm.com;
> gavin.hu@arm.com; i.maximets@samsung.com;
> chaozhu@linux.vnet.ibm.com
> Subject: [PATCH v8 0/3] generic spinlock optimization and test case
> enhancements
> 
...

> 
> Gavin Hu (3):
>   test/spinlock: remove 1us delay for correct benchmarking
>   test/spinlock: amortize the cost of getting time
>   spinlock: reimplement with atomic one-way barrier builtins
> 
>  app/test/test_spinlock.c                           | 31 +++++++++++-----------
>  .../common/include/generic/rte_spinlock.h          | 18 +++++++++----
>  2 files changed, 29 insertions(+), 20 deletions(-)
> 
> --

Seems good.

Series-Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
  
Ananyev, Konstantin March 15, 2019, 12:21 p.m. UTC | #2
> 
> V8: Remove internal ChangeId
> 
> V7: Update the 1/3 patch headline and commit message
> 
> V6: Rebase and drop the first patch as a similar fix was already merged.
> 
> V5: Remove ChangeId(sorry for that)
> 
> V4:
> 1. Drop one patch for the test case to get time precisely as the overhead
>    of getting time is amortized already in another patch.
> 2. Drop the ticket lock patch from this series as there are no dependency
>    between them, the ticket lock patch was submitted separately:
>    http://patchwork.dpdk.org/patch/49770/
> 3. Define volatile variable in patch #3 to be more realistic for spinlock
>    protection(avoid optimization be compiler).
> 4. Fix typos.
> 
> V3:
> 1. Implemented the ticket lock to improve the fairness and predictability.
>    The locks are obtained in the order of requested.
> 
> V2:
> 1. FORCE_INTRINCIS is still an option for ppc/x86, although not is use
>    by default, so don't remove it from generic file.
> 2. Fix the clang compiler error on x86 when the above FORCE_INTRINSICS
>    is enabled.
> 
> V1:
> 1. Remove the 1us delay outside of the locked region to really benchmark
>    the spinlock acquire/release performance, not the delay API.
> 2. Use the precise version of getting timestamps for more precise
>    benchmarking results.
> 3. Amortize the overhead of getting the timestamp by 10000 loops.
> 4. Move the arm specific implementation to arm folder to remove the
>    hardcoded implementation.
> 5. Use atomic primitives, which translate to one-way barriers, instead of
>    two-way sync primitives, to optimize for performance.
> 
> Gavin Hu (3):
>   test/spinlock: remove 1us delay for correct benchmarking
>   test/spinlock: amortize the cost of getting time
>   spinlock: reimplement with atomic one-way barrier builtins
> 
>  app/test/test_spinlock.c                           | 31 +++++++++++-----------
>  .../common/include/generic/rte_spinlock.h          | 18 +++++++++----
>  2 files changed, 29 insertions(+), 20 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.7.4
  
Thomas Monjalon March 28, 2019, 7:47 a.m. UTC | #3
> Gavin Hu (3):
>   test/spinlock: remove 1us delay for correct benchmarking
>   test/spinlock: amortize the cost of getting time
>   spinlock: reimplement with atomic one-way barrier builtins

Applied, thanks