[dpdk-dev] vhost: fix virtio_net cache sharing of broadcast_rarp
Checks
Commit Message
The virtio_net structure is used in both enqueue and dequeue datapaths.
broadcast_rarp is checked with cmpset in the dequeue datapath regardless
of whether descriptors are available or not.
It is observed in some cases where dequeue and enqueue are performed by
different cores and no packets are available on the dequeue datapath
(i.e. uni-directional traffic), the frequent checking of broadcast_rarp
in dequeue causes performance degradation for the enqueue datapath.
In OVS the issue can cause a uni-directional performance drop of up to 15%.
Fix that by moving broadcast_rarp to a different cache line in
virtio_net struct.
Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
lib/librte_vhost/vhost.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote:
> The virtio_net structure is used in both enqueue and dequeue datapaths.
> broadcast_rarp is checked with cmpset in the dequeue datapath regardless
> of whether descriptors are available or not.
>
> It is observed in some cases where dequeue and enqueue are performed by
> different cores and no packets are available on the dequeue datapath
> (i.e. uni-directional traffic), the frequent checking of broadcast_rarp
> in dequeue causes performance degradation for the enqueue datapath.
>
> In OVS the issue can cause a uni-directional performance drop of up to 15%.
>
> Fix that by moving broadcast_rarp to a different cache line in
> virtio_net struct.
Thanks, but I'm a bit confused. The drop looks like being caused by
cache false sharing, but I don't see anything would lead to a false
sharing. I mean, there is no write in the same cache line where the
broadcast_rarp belongs. Or, the "volatile" type is the culprit here?
Talking about that, I had actually considered to turn "broadcast_rarp"
to a simple "int" or "uint16_t" type, to make it more light weight.
The reason I used atomic type is to exactly send one broadcast RARP
packet once SEND_RARP request is recieved. Otherwise, we may send more
than one RARP packet when MQ is invovled. But I think we don't have
to be that accurate: it's tolerable when more RARP are sent. I saw 4
SEND_RARP requests (aka 4 RARP packets) in the last time I tried
vhost-user live migration after all. I don't quite remember why
it was 4 though.
That said, I think it also would resolve the performance issue if you
change "rte_atomic16_t" to "uint16_t", without moving the place?
--yliu
@@ -156,6 +156,4 @@ struct virtio_net {
uint32_t flags;
uint16_t vhost_hlen;
- /* to tell if we need broadcast rarp packet */
- rte_atomic16_t broadcast_rarp;
uint32_t virt_qp_nb;
int dequeue_zero_copy;
@@ -167,4 +165,6 @@ struct virtio_net {
uint64_t log_addr;
struct ether_addr mac;
+ /* to tell if we need broadcast rarp packet */
+ rte_atomic16_t broadcast_rarp;
uint32_t nr_guest_pages;