From patchwork Fri Dec  8 10:11:40 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Maxime Coquelin <maxime.coquelin@redhat.com>
X-Patchwork-Id: 32028
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@dpdk.org
Delivered-To: patchwork@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 49FD3199AF;
	Fri,  8 Dec 2017 11:11:47 +0100 (CET)
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
	by dpdk.org (Postfix) with ESMTP id D4A231C00;
	Fri,  8 Dec 2017 11:11:45 +0100 (CET)
Received: from smtp.corp.redhat.com
	(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 0CEEBC04575A;
	Fri,  8 Dec 2017 10:11:45 +0000 (UTC)
Received: from [10.36.112.38] (ovpn-112-38.ams2.redhat.com [10.36.112.38])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 9695F183B2;
	Fri,  8 Dec 2017 10:11:42 +0000 (UTC)
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>,
	Victor Kaplansky <vkaplans@redhat.com>, "dev@dpdk.org" <dev@dpdk.org>,
	"yliu@fridaylinux.org" <yliu@fridaylinux.org>,
	"Bie, Tiwei" <tiwei.bie@intel.com>
Cc: "stable@dpdk.org" <stable@dpdk.org>,
	"jfreiman@redhat.com" <jfreiman@redhat.com>
References: <20171206135329.652-1-vkaplans@redhat.com>
	<ED26CBA2FAD1BF48A8719AEF02201E36513789F8@SHSMSX103.ccr.corp.intel.com>
	<00a522dc-2287-b3ed-7ca4-e666f6217c91@redhat.com>
	<ED26CBA2FAD1BF48A8719AEF02201E3651379F80@SHSMSX103.ccr.corp.intel.com>
	<7352a09d-f90d-5b3a-51f9-bfc82faf744e@redhat.com>
	<ED26CBA2FAD1BF48A8719AEF02201E365137ACAC@SHSMSX103.ccr.corp.intel.com>
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-ID: <ea080aa6-8f0b-4788-bc71-3fdf5b2d636e@redhat.com>
Date: Fri, 8 Dec 2017 11:11:40 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: 
 <ED26CBA2FAD1BF48A8719AEF02201E365137ACAC@SHSMSX103.ccr.corp.intel.com>
Content-Language: en-US
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.5.110.32]); Fri, 08 Dec 2017 10:11:45 +0000 (UTC)
Subject: Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async
	ring changes
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
	<mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
	<mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Jianfeng,

On 12/08/2017 09:51 AM, Tan, Jianfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Friday, December 8, 2017 4:36 PM
>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie,
>> Tiwei
>> Cc: stable@dpdk.org; jfreiman@redhat.com
>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>
>>
>>
>> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Thursday, December 7, 2017 6:02 PM
>>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org;
>> Bie,
>>>> Tiwei
>>>> Cc: stable@dpdk.org; jfreiman@redhat.com
>>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>>>> changes
>>>>
>>>>
>>>>
>>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com]
>>>>>> Sent: Wednesday, December 6, 2017 9:56 PM
>>>>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng;
>>>>>> vkaplans@redhat.com
>>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin
>>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>>>>>
>>>>>> When performing live migration or memory hot-plugging,
>>>>>> the changes to the device and vrings made by message handler
>>>>>> done independently from vring usage by PMD threads.
>>>>>>
>>>>>> This causes for example segfauls during live-migration
>>>>>
>>>>> segfauls ->segfaults?
>>>>>
>>>>>> with MQ enable, but in general virtually any request
>>>>>> sent by qemu changing the state of device can cause
>>>>>> problems.
>>>>>>
>>>>>> These patches fixes all above issues by adding a spinlock
>>>>>> to every vring and requiring message handler to start operation
>>>>>> only after ensuring that all PMD threads related to the divece
>>>>>
>>>>> Another typo: divece.
>>>>>
>>>>>> are out of critical section accessing the vring data.
>>>>>>
>>>>>> Each vring has its own lock in order to not create contention
>>>>>> between PMD threads of different vrings and to prevent
>>>>>> performance degradation by scaling queue pair number.
>>>>>
>>>>> Also wonder how much overhead it brings.
>>>>>
>>>>> Instead of locking each vring, can we just, waiting a while (10us for
>> example)
>>>> after call destroy_device() callback so that every PMD thread has enough
>>>> time to skip out the criterial area?
>>>>
>>>> No, because we are not destroying the device when it is needed.
>>>> Actually, once destroy_device() is called, it is likely that the
>>>> application has taken care the ring aren't being processed anymore
>>>> before returning from the callback (This is at least the case with Vhost
>>>> PMD).
>>>
>>> OK, I did not put it right way as there are multiple cases above: migration
>> and memory hot plug. Let me try again:
>>>
>>> Whenever a vhost thread handles a message affecting PMD threads, (like
>> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag -
>> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of
>> those criterial area. After message handling, reset the flag -
>> VIRTIO_DEV_RUNNING.
>>
>> I think you mean clearing vq's enabled flag, because PMD threads never
>> check the VIRTIO_DEV_RUNNING flag.
> 
> Ah, yes.
> 
>>
>>> I suppose it can work, basing on an assumption that PMD threads work in
>> polling mode and can skip criterial area quickly and inevitably.
>>
>> That sounds very fragile, because if the CPU aren't perfectly isolated,
>> your PMD thread can be preempted for interrupt handling for example.
>>
>> Or what if for some reason the PMD thread CPU stalls for a short while?
>>
>> The later is unlikely, but if it happens, it will be hard to debug.
>>
>> Let's see first the performance impact of using the spinlock. It might
>> not be that important because 99.9999% of the times, it will not even
>> spin.
> 
> Fair enough.

I did some benchmarks on my Broadwell test bench (see patch below), and
it seems that depending on the benchmark, perfs are on par, or better
with the spinlock! I guess it explains because with the spinlock there
is a better batching and less concurrent accesses on the rings, but I'm
not sure.

Please find my results below (CPU E5-2667 v4 @ 3.20GHz):

       Bench         v17.11     v17.11 + spinlock
  ---------------- ----------- -------------------
   PVP Bidir run1   19.29Mpps   19.26Mpps
   PVP Bidir run2   19.26Mpps   19.28Mpps
   TxOnly           18.47Mpps   18.83Mpps
   RxOnly           13.83Mpps   13.83Mpps
   IO Loopback      7.94Mpps    7.97Mpps


Patch:
---------------------------------------------------------

 From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 2001
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Date: Fri, 8 Dec 2017 04:21:25 -0500
Subject: [PATCH] vhost: spinlock benchmarking

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
  lib/librte_vhost/vhost.c      |  2 ++
  lib/librte_vhost/vhost.h      |  2 ++
  lib/librte_vhost/virtio_net.c | 12 ++++++++++++
  3 files changed, 16 insertions(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 4f8b73a..d9752cf 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -219,6 +219,8 @@ struct virtio_net *
  	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
  	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;

+	rte_spinlock_init(&vq->access_lock);
+
  	vhost_user_iotlb_init(dev, vring_idx);
  	/* Backends are set to -1 indicating an inactive device. */
  	vq->backend = -1;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 1cc81c1..56242a8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -131,6 +131,8 @@ struct vhost_virtqueue {
  	struct batch_copy_elem	*batch_copy_elems;
  	uint16_t		batch_copy_nb_elems;

+	rte_spinlock_t access_lock;
+
  	rte_rwlock_t	iotlb_lock;
  	rte_rwlock_t	iotlb_pending_lock;
  	struct rte_mempool *iotlb_pool;
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6fee16e..50cc3de 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -329,6 +329,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_lock(vq);

@@ -419,6 +421,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	return count;
  }

@@ -654,6 +658,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_lock(vq);

@@ -715,6 +721,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	return pkt_idx;
  }

@@ -1183,6 +1191,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	vq->batch_copy_nb_elems = 0;

  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
@@ -1356,6 +1366,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	if (unlikely(rarp_mbuf != NULL)) {
  		/*
  		 * Inject it to the head of "pkts" array, so that switch's mac