[dpdk-dev,1/5] bonding: replace spinlock with read/write lock

Message ID 1462461300-9962-2-git-send-email-bernard.iremonger@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Bruce Richardson
Headers

Commit Message

Iremonger, Bernard May 5, 2016, 3:14 p.m. UTC
  Fixes: a45b288ef21a ("bond: support link status polling")
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 drivers/net/bonding/rte_eth_bond_api.c     | 10 +++---
 drivers/net/bonding/rte_eth_bond_pmd.c     | 57 +++++++++++++++---------------
 drivers/net/bonding/rte_eth_bond_private.h |  6 ++--
 3 files changed, 36 insertions(+), 37 deletions(-)
  

Comments

Stephen Hemminger May 5, 2016, 5:12 p.m. UTC | #1
On Thu,  5 May 2016 16:14:56 +0100
Bernard Iremonger <bernard.iremonger@intel.com> wrote:

> Fixes: a45b288ef21a ("bond: support link status polling")
> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>

You know an uncontested reader/writer lock is significantly slower
than a spinlock.
  
Doherty, Declan May 6, 2016, 10:32 a.m. UTC | #2
On 05/05/16 18:12, Stephen Hemminger wrote:
> On Thu,  5 May 2016 16:14:56 +0100
> Bernard Iremonger <bernard.iremonger@intel.com> wrote:
>
>> Fixes: a45b288ef21a ("bond: support link status polling")
>> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
>
> You know an uncontested reader/writer lock is significantly slower
> than a spinlock.
>

As we can have multiple readers of the active slave list / primary 
slave, basically any tx/rx burst call needs to protect against a device 
being removed/closed during it's operation now that we support 
hotplugging, in the worst case this could mean we have 2(rx+tx) * queues 
possibly using the active slave list simultaneously, in that case I 
would have thought that a spinlock would have a much more significant 
affect on performance?
  
Stephen Hemminger May 6, 2016, 3:55 p.m. UTC | #3
On Fri, 6 May 2016 11:32:19 +0100
Declan Doherty <declan.doherty@intel.com> wrote:

> On 05/05/16 18:12, Stephen Hemminger wrote:
> > On Thu,  5 May 2016 16:14:56 +0100
> > Bernard Iremonger <bernard.iremonger@intel.com> wrote:
> >
> >> Fixes: a45b288ef21a ("bond: support link status polling")
> >> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> >
> > You know an uncontested reader/writer lock is significantly slower
> > than a spinlock.
> >
> 
> As we can have multiple readers of the active slave list / primary 
> slave, basically any tx/rx burst call needs to protect against a device 
> being removed/closed during it's operation now that we support 
> hotplugging, in the worst case this could mean we have 2(rx+tx) * queues 
> possibly using the active slave list simultaneously, in that case I 
> would have thought that a spinlock would have a much more significant 
> affect on performance?

Right, but the window where the shared variable is accessed is very small,
and it is actually faster to use spinlock for that.
  
Ananyev, Konstantin May 13, 2016, 5:10 p.m. UTC | #4
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Friday, May 06, 2016 4:56 PM
> To: Doherty, Declan
> Cc: Iremonger, Bernard; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/5] bonding: replace spinlock with read/write lock
> 
> On Fri, 6 May 2016 11:32:19 +0100
> Declan Doherty <declan.doherty@intel.com> wrote:
> 
> > On 05/05/16 18:12, Stephen Hemminger wrote:
> > > On Thu,  5 May 2016 16:14:56 +0100
> > > Bernard Iremonger <bernard.iremonger@intel.com> wrote:
> > >
> > >> Fixes: a45b288ef21a ("bond: support link status polling")
> > >> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> > >
> > > You know an uncontested reader/writer lock is significantly slower
> > > than a spinlock.
> > >
> >
> > As we can have multiple readers of the active slave list / primary
> > slave, basically any tx/rx burst call needs to protect against a device
> > being removed/closed during it's operation now that we support
> > hotplugging, in the worst case this could mean we have 2(rx+tx) * queues
> > possibly using the active slave list simultaneously, in that case I
> > would have thought that a spinlock would have a much more significant
> > affect on performance?
> 
> Right, but the window where the shared variable is accessed is very small,
> and it is actually faster to use spinlock for that.

I don't think that window we hold the lock is that small, let say if we have
a burst of 32 packets * (let say) 50 cycles/pkt = ~1500 cycles - each IO thread would stall.
For me that's long enough to justify rwlock usage here, especially that 
DPDK rwlock price is not much bigger (as I remember) then spinlock -
it is basically 1 CAS operation.

Konstantin
  
Ananyev, Konstantin May 13, 2016, 5:18 p.m. UTC | #5
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Friday, May 13, 2016 6:11 PM
> To: Stephen Hemminger; Doherty, Declan
> Cc: Iremonger, Bernard; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/5] bonding: replace spinlock with read/write lock
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Friday, May 06, 2016 4:56 PM
> > To: Doherty, Declan
> > Cc: Iremonger, Bernard; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/5] bonding: replace spinlock with read/write lock
> >
> > On Fri, 6 May 2016 11:32:19 +0100
> > Declan Doherty <declan.doherty@intel.com> wrote:
> >
> > > On 05/05/16 18:12, Stephen Hemminger wrote:
> > > > On Thu,  5 May 2016 16:14:56 +0100
> > > > Bernard Iremonger <bernard.iremonger@intel.com> wrote:
> > > >
> > > >> Fixes: a45b288ef21a ("bond: support link status polling")
> > > >> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> > > >
> > > > You know an uncontested reader/writer lock is significantly slower
> > > > than a spinlock.
> > > >
> > >
> > > As we can have multiple readers of the active slave list / primary
> > > slave, basically any tx/rx burst call needs to protect against a device
> > > being removed/closed during it's operation now that we support
> > > hotplugging, in the worst case this could mean we have 2(rx+tx) * queues
> > > possibly using the active slave list simultaneously, in that case I
> > > would have thought that a spinlock would have a much more significant
> > > affect on performance?
> >
> > Right, but the window where the shared variable is accessed is very small,
> > and it is actually faster to use spinlock for that.
> 
> I don't think that window we hold the lock is that small, let say if we have
> a burst of 32 packets * (let say) 50 cycles/pkt = ~1500 cycles - each IO thread would stall.
> For me that's long enough to justify rwlock usage here, especially that
> DPDK rwlock price is not much bigger (as I remember) then spinlock -
> it is basically 1 CAS operation.

As another alternative we can have a spinlock per queue, then different IO threads
doing RX/XTX over different queues will be uncontended at all.
Though control thread would need to grab locks for all configured queues :)

Konstantin

> 
> Konstantin
>
  
Iremonger, Bernard May 26, 2016, 4:24 p.m. UTC | #6
Hi Konstantin, 

<snip>
> > > > On 05/05/16 18:12, Stephen Hemminger wrote:
> > > > > On Thu,  5 May 2016 16:14:56 +0100 Bernard Iremonger
> > > > > <bernard.iremonger@intel.com> wrote:
> > > > >
> > > > >> Fixes: a45b288ef21a ("bond: support link status polling")
> > > > >> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> > > > >
> > > > > You know an uncontested reader/writer lock is significantly
> > > > > slower than a spinlock.
> > > > >
> > > >
> > > > As we can have multiple readers of the active slave list / primary
> > > > slave, basically any tx/rx burst call needs to protect against a
> > > > device being removed/closed during it's operation now that we
> > > > support hotplugging, in the worst case this could mean we have
> > > > 2(rx+tx) * queues possibly using the active slave list
> > > > simultaneously, in that case I would have thought that a spinlock
> > > > would have a much more significant affect on performance?
> > >
> > > Right, but the window where the shared variable is accessed is very
> > > small, and it is actually faster to use spinlock for that.
> >
> > I don't think that window we hold the lock is that small, let say if
> > we have a burst of 32 packets * (let say) 50 cycles/pkt = ~1500 cycles - each
> IO thread would stall.
> > For me that's long enough to justify rwlock usage here, especially
> > that DPDK rwlock price is not much bigger (as I remember) then
> > spinlock - it is basically 1 CAS operation.
> 
> As another alternative we can have a spinlock per queue, then different IO
> threads doing RX/XTX over different queues will be uncontended at all.
> Though control thread would need to grab locks for all configured queues :)
> 
> Konstantin
> 

I am preparing a v2 patchset which uses a spinlock per queue.

Regards,

Bernard.
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c
index d3bda35..c77626d 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -227,7 +227,7 @@  rte_eth_bond_create(const char *name, uint8_t mode, uint8_t socket_id)
 	eth_dev->data->drv_name = pmd_bond_driver_name;
 	eth_dev->data->numa_node =  socket_id;
 
-	rte_spinlock_init(&internals->lock);
+	rte_rwlock_init(&internals->rwlock);
 
 	internals->port_id = eth_dev->data->port_id;
 	internals->mode = BONDING_MODE_INVALID;
@@ -451,11 +451,11 @@  rte_eth_bond_slave_add(uint8_t bonded_port_id, uint8_t slave_port_id)
 	bonded_eth_dev = &rte_eth_devices[bonded_port_id];
 	internals = bonded_eth_dev->data->dev_private;
 
-	rte_spinlock_lock(&internals->lock);
+	rte_rwlock_write_lock(&internals->rwlock);
 
 	retval = __eth_bond_slave_add_lock_free(bonded_port_id, slave_port_id);
 
-	rte_spinlock_unlock(&internals->lock);
+	rte_rwlock_write_unlock(&internals->rwlock);
 
 	return retval;
 }
@@ -553,11 +553,11 @@  rte_eth_bond_slave_remove(uint8_t bonded_port_id, uint8_t slave_port_id)
 	bonded_eth_dev = &rte_eth_devices[bonded_port_id];
 	internals = bonded_eth_dev->data->dev_private;
 
-	rte_spinlock_lock(&internals->lock);
+	rte_rwlock_write_lock(&internals->rwlock);
 
 	retval = __eth_bond_slave_remove_lock_free(bonded_port_id, slave_port_id);
 
-	rte_spinlock_unlock(&internals->lock);
+	rte_rwlock_write_unlock(&internals->rwlock);
 
 	return retval;
 }
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 129f04b..ed6245b 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1,7 +1,7 @@ 
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -1750,37 +1750,36 @@  bond_ethdev_slave_link_status_change_monitor(void *cb_arg)
 		!internals->link_status_polling_enabled)
 		return;
 
-	/* If device is currently being configured then don't check slaves link
-	 * status, wait until next period */
-	if (rte_spinlock_trylock(&internals->lock)) {
-		if (internals->slave_count > 0)
-			polling_slave_found = 0;
+	rte_rwlock_read_lock(&internals->rwlock);
+	if (internals->slave_count > 0)
+		polling_slave_found = 0;
 
-		for (i = 0; i < internals->slave_count; i++) {
-			if (!internals->slaves[i].link_status_poll_enabled)
-				continue;
-
-			slave_ethdev = &rte_eth_devices[internals->slaves[i].port_id];
-			polling_slave_found = 1;
-
-			/* Update slave link status */
-			(*slave_ethdev->dev_ops->link_update)(slave_ethdev,
-					internals->slaves[i].link_status_wait_to_complete);
-
-			/* if link status has changed since last checked then call lsc
-			 * event callback */
-			if (slave_ethdev->data->dev_link.link_status !=
-					internals->slaves[i].last_link_status) {
-				internals->slaves[i].last_link_status =
-						slave_ethdev->data->dev_link.link_status;
-
-				bond_ethdev_lsc_event_callback(internals->slaves[i].port_id,
-						RTE_ETH_EVENT_INTR_LSC,
-						&bonded_ethdev->data->port_id);
-			}
+	for (i = 0; i < internals->slave_count; i++) {
+		if (!internals->slaves[i].link_status_poll_enabled)
+			continue;
+
+		slave_ethdev = &rte_eth_devices[internals->slaves[i].port_id];
+		polling_slave_found = 1;
+
+		/* Update slave link status */
+		(*slave_ethdev->dev_ops->link_update)(slave_ethdev,
+			internals->slaves[i].link_status_wait_to_complete);
+
+		/* if link status has changed since last checked then call lsc
+		 * event callback
+		 */
+		if (slave_ethdev->data->dev_link.link_status !=
+			internals->slaves[i].last_link_status) {
+			internals->slaves[i].last_link_status =
+				slave_ethdev->data->dev_link.link_status;
+
+			bond_ethdev_lsc_event_callback(
+					internals->slaves[i].port_id,
+					RTE_ETH_EVENT_INTR_LSC,
+					&bonded_ethdev->data->port_id);
 		}
-		rte_spinlock_unlock(&internals->lock);
 	}
+	rte_rwlock_read_unlock(&internals->rwlock);
 
 	if (polling_slave_found)
 		/* Set alarm to continue monitoring link status of slave ethdev's */
diff --git a/drivers/net/bonding/rte_eth_bond_private.h b/drivers/net/bonding/rte_eth_bond_private.h
index 8312397..72ea679 100644
--- a/drivers/net/bonding/rte_eth_bond_private.h
+++ b/drivers/net/bonding/rte_eth_bond_private.h
@@ -1,7 +1,7 @@ 
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,7 +35,7 @@ 
 #define _RTE_ETH_BOND_PRIVATE_H_
 
 #include <rte_ethdev.h>
-#include <rte_spinlock.h>
+#include <rte_rwlock.h>
 
 #include "rte_eth_bond.h"
 #include "rte_eth_bond_8023ad_private.h"
@@ -115,7 +115,7 @@  struct bond_dev_private {
 	uint8_t port_id;					/**< Port Id of Bonded Port */
 	uint8_t mode;						/**< Link Bonding Mode */
 
-	rte_spinlock_t lock;
+	rte_rwlock_t rwlock;
 
 	uint8_t primary_port;				/**< Primary Slave Port */
 	uint8_t current_primary_port;		/**< Primary Slave Port */