[dpdk-dev,v2,2/3] vhost: protect dirty logging against logging base change

Message ID 20171124180826.18439-3-maxime.coquelin@redhat.com
State Rejected, archived
Headers show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch warning coding style issues

Commit Message

Maxime Coquelin Nov. 24, 2017, 6:08 p.m.
When performing live-migration with multiple queue pairs,
VHOST_USER_SET_LOG_BASE request is sent multiple times.

If packets are being processed by the PMD threads, it is
possible that they are setting bits in the dirty log map while
its region is being unmapped by the vhost-user protocol thread.
It results in the following crash:
Thread 3 "lcore-slave-2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f71ca495700 (LWP 32451)]
0x00000000004bfc8a in vhost_set_bit (addr=0x7f71cbe18432 <error: Cannot access memory at address 0x7f71cbe18432>, nr=1) at /home/max/projects/src/mainline/dpdk/lib/librte_vhost/vhost.h:267
267        __sync_fetch_and_or_8(addr, (1U << nr));

We can see the vhost-user protocol thread just did the unmap of the
dirty log region when it happens.

This patch prevents this by introducing a RW lock to protect
the log base.

Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      |  2 ++
 lib/librte_vhost/vhost.h      | 14 +++++++++++---
 lib/librte_vhost/vhost_user.c |  4 ++++
 3 files changed, 17 insertions(+), 3 deletions(-)

Comments

Victor Kaplansky Nov. 27, 2017, 8:16 a.m. | #1
Hi,

While I agree that taking full fledged lock by rte_rwlock_read_lock() solves the race condition,
I'm afraid that it would be too expensive in case when logging is off, since it introduces
acquiring and releasing lock into the main flow of ring updates.

It is OK for now, as it fixes the bug, but we need to perform more careful performance measurements,
and see whether the performance degradation is not too prohibitive.

As alternative, we may consider using more light weighted busy looping.

Also, lets fix by this series the __sync_fetch_and_or_8 -> __sync_fetch_and_or,
as it may improve the performance slightly.
Maxime Coquelin Nov. 27, 2017, 8:27 a.m. | #2
Hi Victor,

On 11/27/2017 09:16 AM, Victor Kaplansky wrote:
> Hi,
> 
> While I agree that taking full fledged lock by rte_rwlock_read_lock() solves the race condition,
> I'm afraid that it would be too expensive in case when logging is off, since it introduces
> acquiring and releasing lock into the main flow of ring updates.

Actually my v2 fixes the performance penalty when logging is off. The 
lock is now taken after the logging feature check.

But still, I agree logging on case will suffer from a performance
penalty.

> It is OK for now, as it fixes the bug, but we need to perform more careful performance measurements,
> and see whether the performance degradation is not too prohibitive.
> 
> As alternative, we may consider using more light weighted busy looping.

I think it will end up almost being the same, as both threads will need
to busy loop. PMD thread to be sure the protocol thread isn't being
unmapping the region before doing the logging, and protocol thread to be
sure the PMD thread is not doing logging before handling the set log
base.

Maybe you have something else in mind?

> Also, lets fix by this series the __sync_fetch_and_or_8 -> __sync_fetch_and_or,
> as it may improve the performance slightly.

Sure, this can be done, but it would need to be benchmarked first.

Regards,
Maxime
Victor Kaplansky Nov. 27, 2017, 8:42 a.m. | #3
----- Original Message -----
> From: "Maxime Coquelin" <maxime.coquelin@redhat.com>
> To: "Victor Kaplansky" <vkaplans@redhat.com>
> Cc: dev@dpdk.org, yliu@fridaylinux.org, "tiwei bie" <tiwei.bie@intel.com>, "jianfeng tan" <jianfeng.tan@intel.com>,
> stable@dpdk.org, jfreiman@redhat.com
> Sent: Monday, November 27, 2017 10:27:22 AM
> Subject: Re: [PATCH v2 2/3] vhost: protect dirty logging against logging base change
> 
> Hi Victor,
> 
> On 11/27/2017 09:16 AM, Victor Kaplansky wrote:
> > Hi,
> > 
> > While I agree that taking full fledged lock by rte_rwlock_read_lock()
> > solves the race condition,
> > I'm afraid that it would be too expensive in case when logging is off,
> > since it introduces
> > acquiring and releasing lock into the main flow of ring updates.
> 
> Actually my v2 fixes the performance penalty when logging is off. The
> lock is now taken after the logging feature check.
> 
> But still, I agree logging on case will suffer from a performance
> penalty.

Yes, checking of logging feature is better than nothing, but VHOST_F_LOG_ALL
marks only whether logging is supported by the device and not if
logging is in the action. Thus, any guest will hit the performance
degradation even not during migration.


> 
> > It is OK for now, as it fixes the bug, but we need to perform more careful
> > performance measurements,
> > and see whether the performance degradation is not too prohibitive.
> > 
> > As alternative, we may consider using more light weighted busy looping.
> 
> I think it will end up almost being the same, as both threads will need
> to busy loop. PMD thread to be sure the protocol thread isn't being
> unmapping the region before doing the logging, and protocol thread to be
> sure the PMD thread is not doing logging before handling the set log
> base.
> 

I'm not fully aware how rte_rwlock_read_lock() is implemented, but
theoretically busy looping should be much cheaper in cases when
taking lock by one side is very rare.

> Maybe you have something else in mind?
> 
> > Also, lets fix by this series the __sync_fetch_and_or_8 ->
> > __sync_fetch_and_or,
> > as it may improve the performance slightly.
> 
> Sure, this can be done, but it would need to be benchmarked first.

Agree.
> 
> Regards,
> Maxime
>
Maxime Coquelin Nov. 27, 2017, 9 a.m. | #4
On 11/27/2017 09:42 AM, Victor Kaplansky wrote:
> 
> 
> ----- Original Message -----
>> From: "Maxime Coquelin" <maxime.coquelin@redhat.com>
>> To: "Victor Kaplansky" <vkaplans@redhat.com>
>> Cc: dev@dpdk.org, yliu@fridaylinux.org, "tiwei bie" <tiwei.bie@intel.com>, "jianfeng tan" <jianfeng.tan@intel.com>,
>> stable@dpdk.org, jfreiman@redhat.com
>> Sent: Monday, November 27, 2017 10:27:22 AM
>> Subject: Re: [PATCH v2 2/3] vhost: protect dirty logging against logging base change
>>
>> Hi Victor,
>>
>> On 11/27/2017 09:16 AM, Victor Kaplansky wrote:
>>> Hi,
>>>
>>> While I agree that taking full fledged lock by rte_rwlock_read_lock()
>>> solves the race condition,
>>> I'm afraid that it would be too expensive in case when logging is off,
>>> since it introduces
>>> acquiring and releasing lock into the main flow of ring updates.
>>
>> Actually my v2 fixes the performance penalty when logging is off. The
>> lock is now taken after the logging feature check.
>>
>> But still, I agree logging on case will suffer from a performance
>> penalty.
> 
> Yes, checking of logging feature is better than nothing, but VHOST_F_LOG_ALL
> marks only whether logging is supported by the device and not if
> logging is in the action. Thus, any guest will hit the performance
> degradation even not during migration.

My understanding is that VHOST_USER_SET_FEATURES is called again with 
VHOST_F_LOG_ALL on on migration start and with VHOST_F_LOG_ALL off on
migration stop.

> 
> 
>>
>>> It is OK for now, as it fixes the bug, but we need to perform more careful
>>> performance measurements,
>>> and see whether the performance degradation is not too prohibitive.
>>>
>>> As alternative, we may consider using more light weighted busy looping.
>>
>> I think it will end up almost being the same, as both threads will need
>> to busy loop. PMD thread to be sure the protocol thread isn't being
>> unmapping the region before doing the logging, and protocol thread to be
>> sure the PMD thread is not doing logging before handling the set log
>> base.
>>
> 
> I'm not fully aware how rte_rwlock_read_lock() is implemented, but
> theoretically busy looping should be much cheaper in cases when
> taking lock by one side is very rare.

we could improve it by only taking the lock once per burst instead of
per page logging, as we don't care the protocol thread waits a bit more
when it wants to remap the area.

>> Maybe you have something else in mind?
>>
>>> Also, lets fix by this series the __sync_fetch_and_or_8 ->
>>> __sync_fetch_and_or,
>>> as it may improve the performance slightly.
>>
>> Sure, this can be done, but it would need to be benchmarked first.
> 
> Agree.
>>
>> Regards,
>> Maxime
>>
Maxime Coquelin Nov. 28, 2017, 10:06 a.m. | #5
On 11/24/2017 07:08 PM, Maxime Coquelin wrote:
> When performing live-migration with multiple queue pairs,
> VHOST_USER_SET_LOG_BASE request is sent multiple times.
> 
> If packets are being processed by the PMD threads, it is
> possible that they are setting bits in the dirty log map while
> its region is being unmapped by the vhost-user protocol thread.
> It results in the following crash:
> Thread 3 "lcore-slave-2" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f71ca495700 (LWP 32451)]
> 0x00000000004bfc8a in vhost_set_bit (addr=0x7f71cbe18432 <error: Cannot access memory at address 0x7f71cbe18432>, nr=1) at /home/max/projects/src/mainline/dpdk/lib/librte_vhost/vhost.h:267
> 267        __sync_fetch_and_or_8(addr, (1U << nr));
> 
> We can see the vhost-user protocol thread just did the unmap of the
> dirty log region when it happens.
> 
> This patch prevents this by introducing a RW lock to protect
> the log base.
> 
> Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>   lib/librte_vhost/vhost.c      |  2 ++
>   lib/librte_vhost/vhost.h      | 14 +++++++++++---
>   lib/librte_vhost/vhost_user.c |  4 ++++
>   3 files changed, 17 insertions(+), 3 deletions(-)
> 

By clarifying the vhost-user spec, we may be able to avoid this lock and
just ignore the subsequent SET_LOG_BASE requests once
VHOST_F_LOG_ALL feature bit is set.

So let's just discard this series for now.

Maxime
Jianfeng Tan Feb. 14, 2018, 2:03 a.m. | #6
Hi Maxime,


On 11/28/2017 6:06 PM, Maxime Coquelin wrote:
>
>
> On 11/24/2017 07:08 PM, Maxime Coquelin wrote:
>> When performing live-migration with multiple queue pairs,
>> VHOST_USER_SET_LOG_BASE request is sent multiple times.
>>
>> If packets are being processed by the PMD threads, it is
>> possible that they are setting bits in the dirty log map while
>> its region is being unmapped by the vhost-user protocol thread.
>> It results in the following crash:
>> Thread 3 "lcore-slave-2" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7f71ca495700 (LWP 32451)]
>> 0x00000000004bfc8a in vhost_set_bit (addr=0x7f71cbe18432 <error: 
>> Cannot access memory at address 0x7f71cbe18432>, nr=1) at 
>> /home/max/projects/src/mainline/dpdk/lib/librte_vhost/vhost.h:267
>> 267        __sync_fetch_and_or_8(addr, (1U << nr));
>>
>> We can see the vhost-user protocol thread just did the unmap of the
>> dirty log region when it happens.
>>
>> This patch prevents this by introducing a RW lock to protect
>> the log base.
>>
>> Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/vhost.c      |  2 ++
>>   lib/librte_vhost/vhost.h      | 14 +++++++++++---
>>   lib/librte_vhost/vhost_user.c |  4 ++++
>>   3 files changed, 17 insertions(+), 3 deletions(-)
>>
>
> By clarifying the vhost-user spec, we may be able to avoid this lock and
> just ignore the subsequent SET_LOG_BASE requests once
> VHOST_F_LOG_ALL feature bit is set.
>
> So let's just discard this series for now.

I would assume this issue has been addressed by the per-queue lock patch 
from Victor, correct?

Besides, we really don't need multiple unmap/map for each vq. Would you 
think this shall be fixed in QEMU?

Thanks,
Jianfeng
Maxime Coquelin Feb. 14, 2018, 7:52 a.m. | #7
Hi Jianfeng,

On 02/14/2018 03:03 AM, Tan, Jianfeng wrote:
> Hi Maxime,
> 
> 
> On 11/28/2017 6:06 PM, Maxime Coquelin wrote:
>>
>>
>> On 11/24/2017 07:08 PM, Maxime Coquelin wrote:
>>> When performing live-migration with multiple queue pairs,
>>> VHOST_USER_SET_LOG_BASE request is sent multiple times.
>>>
>>> If packets are being processed by the PMD threads, it is
>>> possible that they are setting bits in the dirty log map while
>>> its region is being unmapped by the vhost-user protocol thread.
>>> It results in the following crash:
>>> Thread 3 "lcore-slave-2" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7f71ca495700 (LWP 32451)]
>>> 0x00000000004bfc8a in vhost_set_bit (addr=0x7f71cbe18432 <error: 
>>> Cannot access memory at address 0x7f71cbe18432>, nr=1) at 
>>> /home/max/projects/src/mainline/dpdk/lib/librte_vhost/vhost.h:267
>>> 267        __sync_fetch_and_or_8(addr, (1U << nr));
>>>
>>> We can see the vhost-user protocol thread just did the unmap of the
>>> dirty log region when it happens.
>>>
>>> This patch prevents this by introducing a RW lock to protect
>>> the log base.
>>>
>>> Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>   lib/librte_vhost/vhost.c      |  2 ++
>>>   lib/librte_vhost/vhost.h      | 14 +++++++++++---
>>>   lib/librte_vhost/vhost_user.c |  4 ++++
>>>   3 files changed, 17 insertions(+), 3 deletions(-)
>>>
>>
>> By clarifying the vhost-user spec, we may be able to avoid this lock and
>> just ignore the subsequent SET_LOG_BASE requests once
>> VHOST_F_LOG_ALL feature bit is set.
>>
>> So let's just discard this series for now.
> 
> I would assume this issue has been addressed by the per-queue lock patch 
> from Victor, correct?

Correct.

> Besides, we really don't need multiple unmap/map for each vq. Would you 
> think this shall be fixed in QEMU?

Yes, I tihnk you are right it should be fixed in QEMU, so that it is
sent only for the first queue pair.

But I didn't had time to work on it TBH.


Cheers,
Maxime
> Thanks,
> Jianfeng
Jianfeng Tan Feb. 22, 2018, 2:54 a.m. | #8
> -----Original Message-----

> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]

> Sent: Wednesday, February 14, 2018 3:53 PM

> To: Tan, Jianfeng; dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei;

> vkaplans@redhat.com

> Cc: stable@dpdk.org; jfreiman@redhat.com

> Subject: Re: [PATCH v2 2/3] vhost: protect dirty logging against logging base

> change

> 

> Hi Jianfeng,

> 

> On 02/14/2018 03:03 AM, Tan, Jianfeng wrote:

> > Hi Maxime,

> >

> >

> > On 11/28/2017 6:06 PM, Maxime Coquelin wrote:

> >>

> >>

> >> On 11/24/2017 07:08 PM, Maxime Coquelin wrote:

> >>> When performing live-migration with multiple queue pairs,

> >>> VHOST_USER_SET_LOG_BASE request is sent multiple times.

> >>>

> >>> If packets are being processed by the PMD threads, it is

> >>> possible that they are setting bits in the dirty log map while

> >>> its region is being unmapped by the vhost-user protocol thread.

> >>> It results in the following crash:

> >>> Thread 3 "lcore-slave-2" received signal SIGSEGV, Segmentation fault.

> >>> [Switching to Thread 0x7f71ca495700 (LWP 32451)]

> >>> 0x00000000004bfc8a in vhost_set_bit (addr=0x7f71cbe18432 <error:

> >>> Cannot access memory at address 0x7f71cbe18432>, nr=1) at

> >>> /home/max/projects/src/mainline/dpdk/lib/librte_vhost/vhost.h:267

> >>> 267        __sync_fetch_and_or_8(addr, (1U << nr));

> >>>

> >>> We can see the vhost-user protocol thread just did the unmap of the

> >>> dirty log region when it happens.

> >>>

> >>> This patch prevents this by introducing a RW lock to protect

> >>> the log base.

> >>>

> >>> Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")

> >>> Cc: stable@dpdk.org

> >>>

> >>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

> >>> ---

> >>>   lib/librte_vhost/vhost.c      |  2 ++

> >>>   lib/librte_vhost/vhost.h      | 14 +++++++++++---

> >>>   lib/librte_vhost/vhost_user.c |  4 ++++

> >>>   3 files changed, 17 insertions(+), 3 deletions(-)

> >>>

> >>

> >> By clarifying the vhost-user spec, we may be able to avoid this lock and

> >> just ignore the subsequent SET_LOG_BASE requests once

> >> VHOST_F_LOG_ALL feature bit is set.

> >>

> >> So let's just discard this series for now.

> >

> > I would assume this issue has been addressed by the per-queue lock patch

> > from Victor, correct?

> 

> Correct.

> 

> > Besides, we really don't need multiple unmap/map for each vq. Would you

> > think this shall be fixed in QEMU?

> 

> Yes, I tihnk you are right it should be fixed in QEMU, so that it is

> sent only for the first queue pair.

> 

> But I didn't had time to work on it TBH.


Thank you for the confirmation. And it's not an urgent issue to fix anyway.

Patch

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 4f8b73a09..5a7699da0 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -311,6 +311,8 @@  vhost_new_device(void)
 		return -1;
 	}
 
+	rte_rwlock_init(&dev->log_lock);
+
 	vhost_devices[i] = dev;
 	dev->vid = i;
 	dev->slave_req_fd = -1;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 1cc81c17c..2f36a034e 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -243,6 +243,7 @@  struct virtio_net {
 	uint64_t		log_size;
 	uint64_t		log_base;
 	uint64_t		log_addr;
+	rte_rwlock_t	log_lock;
 	struct ether_addr	mac;
 	uint16_t		mtu;
 
@@ -278,12 +279,16 @@  vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len)
 {
 	uint64_t page;
 
+
 	if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
-		   !dev->log_base || !len))
+		   !len))
 		return;
 
-	if (unlikely(dev->log_size <= ((addr + len - 1) / VHOST_LOG_PAGE / 8)))
-		return;
+	rte_rwlock_read_lock(&dev->log_lock);
+
+	if (unlikely((!dev->log_base) ||
+				(dev->log_size <= ((addr + len - 1) / VHOST_LOG_PAGE / 8))))
+		goto unlock;
 
 	/* To make sure guest memory updates are committed before logging */
 	rte_smp_wmb();
@@ -293,6 +298,9 @@  vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len)
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 		page += 1;
 	}
+
+unlock:
+	rte_rwlock_read_unlock(&dev->log_lock);
 }
 
 static __rte_always_inline void
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index f06d9bb65..4b03dbbca 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -929,6 +929,8 @@  vhost_user_set_log_base(struct virtio_net *dev, struct VhostUserMsg *msg)
 		goto out;
 	}
 
+	rte_rwlock_write_lock(&dev->log_lock);
+
 	/*
 	 * Free previously mapped log memory on occasionally
 	 * multiple VHOST_USER_SET_LOG_BASE.
@@ -940,6 +942,8 @@  vhost_user_set_log_base(struct virtio_net *dev, struct VhostUserMsg *msg)
 	dev->log_base = dev->log_addr + off;
 	dev->log_size = size;
 
+	rte_rwlock_write_unlock(&dev->log_lock);
+
 out:
 	close(fd);