[v1] vhost: add sanity check for resubmiting reqs in split ring
Checks
Commit Message
When getting reqs from the avail ring, the id may exceed inflight
queue size. Then the dpdk will crash forever.
Signed-off-by: Li Feng <fengli@smartx.com>
---
lib/vhost/vhost_user.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
Comments
ping...
BTW, how could I retrigger the failure CI? The failure is not about this patch.
Thanks,
Feng Li
Li Feng <fengli@smartx.com> 于2021年8月27日周五 下午1:16写道:
>
> When getting reqs from the avail ring, the id may exceed inflight
> queue size. Then the dpdk will crash forever.
>
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
> lib/vhost/vhost_user.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 29a4c9af60..f09d0f6a48 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
> last_io = inflight_split->last_inflight_io;
>
> if (inflight_split->used_idx != used->idx) {
> - inflight_split->desc[last_io].inflight = 0;
> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> + if (unlikely(last_io >= inflight_split->desc_num)) {
> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds inflight "
> + "queue size (%"PRIu16").\n", last_io,
> + inflight_split->desc_num);
> + } else {
> + inflight_split->desc[last_io].inflight = 0;
> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> + }
> inflight_split->used_idx = used->idx;
> }
>
> --
> 2.31.1
>
Li Feng <fengli@smartx.com> 于2021年8月27日周五 下午1:16写道:
>
> When getting reqs from the avail ring, the id may exceed inflight
> queue size. Then the dpdk will crash forever.
>
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
> lib/vhost/vhost_user.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 29a4c9af60..f09d0f6a48 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
> last_io = inflight_split->last_inflight_io;
>
> if (inflight_split->used_idx != used->idx) {
> - inflight_split->desc[last_io].inflight = 0;
> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> + if (unlikely(last_io >= inflight_split->desc_num)) {
> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds inflight "
> + "queue size (%"PRIu16").\n", last_io,
> + inflight_split->desc_num);
> + } else {
> + inflight_split->desc[last_io].inflight = 0;
> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> + }
> inflight_split->used_idx = used->idx;
> }
>
> --
> 2.31.1
>
Hi Li,
Adding Jin Yu who introduced this function.
On 8/27/21 07:12, Li Feng wrote:
> When getting reqs from the avail ring, the id may exceed inflight
> queue size. Then the dpdk will crash forever.
You need to add Fixes tag and Cc stable@dpdk.org so that it can be
backported.
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
> lib/vhost/vhost_user.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 29a4c9af60..f09d0f6a48 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
> last_io = inflight_split->last_inflight_io;
>
> if (inflight_split->used_idx != used->idx) {
> - inflight_split->desc[last_io].inflight = 0;
> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> + if (unlikely(last_io >= inflight_split->desc_num)) {
> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds inflight "
> + "queue size (%"PRIu16").\n", last_io,
> + inflight_split->desc_num);
If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
instead of just logging an error?
> + } else {
> + inflight_split->desc[last_io].inflight = 0;
> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> + }
> inflight_split->used_idx = used->idx;
> }
>
>
Regards,
Maxime
On 10/14/21 10:17, Maxime Coquelin wrote:
> Hi Li,
>
> Adding Jin Yu who introduced this function.
Looks like Jin Yu has left Intel, Chenbo, could you find someone from
the Intel SPDK team to look at it?
> On 8/27/21 07:12, Li Feng wrote:
>> When getting reqs from the avail ring, the id may exceed inflight
>> queue size. Then the dpdk will crash forever.
>
> You need to add Fixes tag and Cc stable@dpdk.org so that it can be
> backported.
>
>> Signed-off-by: Li Feng <fengli@smartx.com>
>> ---
>> lib/vhost/vhost_user.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
>> index 29a4c9af60..f09d0f6a48 100644
>> --- a/lib/vhost/vhost_user.c
>> +++ b/lib/vhost/vhost_user.c
>> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct
>> virtio_net *dev,
>> last_io = inflight_split->last_inflight_io;
>> if (inflight_split->used_idx != used->idx) {
>> - inflight_split->desc[last_io].inflight = 0;
>> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
>> + if (unlikely(last_io >= inflight_split->desc_num)) {
>> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"'
>> exceeds inflight "
>> + "queue size (%"PRIu16").\n", last_io,
>> + inflight_split->desc_num);
>
> If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
> instead of just logging an error?
>
>> + } else {
>> + inflight_split->desc[last_io].inflight = 0;
>> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
>> + }
>> inflight_split->used_idx = used->idx;
>> }
>>
>
> Regards,
> Maxime
Hi Changpeng,
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Thursday, October 14, 2021 4:26 PM
> To: Li Feng <fengli@smartx.com>; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1] vhost: add sanity check for resubmiting reqs in split
> ring
>
>
>
> On 10/14/21 10:17, Maxime Coquelin wrote:
> > Hi Li,
> >
> > Adding Jin Yu who introduced this function.
>
> Looks like Jin Yu has left Intel, Chenbo, could you find someone from
> the Intel SPDK team to look at it?
Could you or your team member help check this?
Thanks,
Chenbo
>
> > On 8/27/21 07:12, Li Feng wrote:
> >> When getting reqs from the avail ring, the id may exceed inflight
> >> queue size. Then the dpdk will crash forever.
> >
> > You need to add Fixes tag and Cc stable@dpdk.org so that it can be
> > backported.
> >
> >> Signed-off-by: Li Feng <fengli@smartx.com>
> >> ---
> >> lib/vhost/vhost_user.c | 10 ++++++++--
> >> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> >> index 29a4c9af60..f09d0f6a48 100644
> >> --- a/lib/vhost/vhost_user.c
> >> +++ b/lib/vhost/vhost_user.c
> >> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct
> >> virtio_net *dev,
> >> last_io = inflight_split->last_inflight_io;
> >> if (inflight_split->used_idx != used->idx) {
> >> - inflight_split->desc[last_io].inflight = 0;
> >> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> >> + if (unlikely(last_io >= inflight_split->desc_num)) {
> >> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"'
> >> exceeds inflight "
> >> + "queue size (%"PRIu16").\n", last_io,
> >> + inflight_split->desc_num);
> >
> > If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
> > instead of just logging an error?
> >
> >> + } else {
> >> + inflight_split->desc[last_io].inflight = 0;
> >> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> >> + }
> >> inflight_split->used_idx = used->idx;
> >> }
> >>
> >
> > Regards,
> > Maxime
Thank you for your response.
On Thu, Oct 14, 2021 at 4:17 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Hi Li,
>
> Adding Jin Yu who introduced this function.
>
> On 8/27/21 07:12, Li Feng wrote:
> > When getting reqs from the avail ring, the id may exceed inflight
> > queue size. Then the dpdk will crash forever.
>
> You need to add Fixes tag and Cc stable@dpdk.org so that it can be
> backported.
OK, I will send the v2 version.
>
> > Signed-off-by: Li Feng <fengli@smartx.com>
> > ---
> > lib/vhost/vhost_user.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > index 29a4c9af60..f09d0f6a48 100644
> > --- a/lib/vhost/vhost_user.c
> > +++ b/lib/vhost/vhost_user.c
> > @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
> > last_io = inflight_split->last_inflight_io;
> >
> > if (inflight_split->used_idx != used->idx) {
> > - inflight_split->desc[last_io].inflight = 0;
> > - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> > + if (unlikely(last_io >= inflight_split->desc_num)) {
> > + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds inflight "
> > + "queue size (%"PRIu16").\n", last_io,
> > + inflight_split->desc_num);
>
> If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
> instead of just logging an error?
I think ignoring the error is ok. No one could handle this error correctly.
At this time the guest virtio driver of this virtqueue may be in an
incorrect state.
>
> > + } else {
> > + inflight_split->desc[last_io].inflight = 0;
> > + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> > + }
> > inflight_split->used_idx = used->idx;
> > }
> >
> >
>
> Regards,
> Maxime
>
On 10/14/21 13:25, Li Feng wrote:
> Thank you for your response.
>
> On Thu, Oct 14, 2021 at 4:17 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> Hi Li,
>>
>> Adding Jin Yu who introduced this function.
>>
>> On 8/27/21 07:12, Li Feng wrote:
>>> When getting reqs from the avail ring, the id may exceed inflight
>>> queue size. Then the dpdk will crash forever.
>>
>> You need to add Fixes tag and Cc stable@dpdk.org so that it can be
>> backported.
> OK, I will send the v2 version.
>
>>
>>> Signed-off-by: Li Feng <fengli@smartx.com>
>>> ---
>>> lib/vhost/vhost_user.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
>>> index 29a4c9af60..f09d0f6a48 100644
>>> --- a/lib/vhost/vhost_user.c
>>> +++ b/lib/vhost/vhost_user.c
>>> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
>>> last_io = inflight_split->last_inflight_io;
>>>
>>> if (inflight_split->used_idx != used->idx) {
>>> - inflight_split->desc[last_io].inflight = 0;
>>> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
>>> + if (unlikely(last_io >= inflight_split->desc_num)) {
>>> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds inflight "
>>> + "queue size (%"PRIu16").\n", last_io,
>>> + inflight_split->desc_num);
>>
>> If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
>> instead of just logging an error?
> I think ignoring the error is ok. No one could handle this error correctly.
> At this time the guest virtio driver of this virtqueue may be in an
> incorrect state.
Not sure to understand how it can happen.
But I see that last_io is actually vq->inflight_split->last_inflight_io,
which is set only by rte_vhost_set_last_inflight_io_split() API.
Shouldn't there be a sanity check there to ensure that last_inflight_io
is smaller than desc_num value set by the frontend?
Returning an error is the right thing to do anyway.
>>
>>> + } else {
>>> + inflight_split->desc[last_io].inflight = 0;
>>> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
>>> + }
>>> inflight_split->used_idx = used->idx;
>>> }
>>>
>>>
>>
>> Regards,
>> Maxime
>>
>
On Thu, Oct 14, 2021 at 7:38 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
>
>
> On 10/14/21 13:25, Li Feng wrote:
> > Thank you for your response.
> >
> > On Thu, Oct 14, 2021 at 4:17 PM Maxime Coquelin
> > <maxime.coquelin@redhat.com> wrote:
> >>
> >> Hi Li,
> >>
> >> Adding Jin Yu who introduced this function.
> >>
> >> On 8/27/21 07:12, Li Feng wrote:
> >>> When getting reqs from the avail ring, the id may exceed inflight
> >>> queue size. Then the dpdk will crash forever.
> >>
> >> You need to add Fixes tag and Cc stable@dpdk.org so that it can be
> >> backported.
> > OK, I will send the v2 version.
> >
> >>
> >>> Signed-off-by: Li Feng <fengli@smartx.com>
> >>> ---
> >>> lib/vhost/vhost_user.c | 10 ++++++++--
> >>> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> >>> index 29a4c9af60..f09d0f6a48 100644
> >>> --- a/lib/vhost/vhost_user.c
> >>> +++ b/lib/vhost/vhost_user.c
> >>> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
> >>> last_io = inflight_split->last_inflight_io;
> >>>
> >>> if (inflight_split->used_idx != used->idx) {
> >>> - inflight_split->desc[last_io].inflight = 0;
> >>> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> >>> + if (unlikely(last_io >= inflight_split->desc_num)) {
> >>> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds inflight "
> >>> + "queue size (%"PRIu16").\n", last_io,
> >>> + inflight_split->desc_num);
> >>
> >> If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
> >> instead of just logging an error?
> > I think ignoring the error is ok. No one could handle this error correctly.
> > At this time the guest virtio driver of this virtqueue may be in an
> > incorrect state.
>
> Not sure to understand how it can happen.
> But I see that last_io is actually vq->inflight_split->last_inflight_io,
> which is set only by rte_vhost_set_last_inflight_io_split() API.
The polluted value is from the frontend driver.
My environment occurs this issue, and a VM is hang, so I guess this
bad value comes from it.
>
> Shouldn't there be a sanity check there to ensure that last_inflight_io
> is smaller than desc_num value set by the frontend?
Yes, putting a check in rte_vhost_set_last_inflight_io_split is also ok.
I will send the v2 version that includes this.
Thanks.
>
> Returning an error is the right thing to do anyway.
OK.
>
> >>
> >>> + } else {
> >>> + inflight_split->desc[last_io].inflight = 0;
> >>> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> >>> + }
> >>> inflight_split->used_idx = used->idx;
> >>> }
> >>>
> >>>
> >>
> >> Regards,
> >> Maxime
> >>
> >
>
I agree with Maxime, just add an error log doesn't help anything, there might be something wrong in other places,
I don't have the context for this issue, if this can be reproduced in SPDK, I suggest to submit an issue to SPDK first.
> -----Original Message-----
> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Thursday, October 14, 2021 4:28 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Li Feng
> <fengli@smartx.com>; Liu, Changpeng <changpeng.liu@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v1] vhost: add sanity check for resubmiting reqs in split ring
>
> Hi Changpeng,
>
> > -----Original Message-----
> > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Sent: Thursday, October 14, 2021 4:26 PM
> > To: Li Feng <fengli@smartx.com>; Xia, Chenbo <chenbo.xia@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH v1] vhost: add sanity check for resubmiting reqs in split
> > ring
> >
> >
> >
> > On 10/14/21 10:17, Maxime Coquelin wrote:
> > > Hi Li,
> > >
> > > Adding Jin Yu who introduced this function.
> >
> > Looks like Jin Yu has left Intel, Chenbo, could you find someone from
> > the Intel SPDK team to look at it?
>
> Could you or your team member help check this?
>
> Thanks,
> Chenbo
>
> >
> > > On 8/27/21 07:12, Li Feng wrote:
> > >> When getting reqs from the avail ring, the id may exceed inflight
> > >> queue size. Then the dpdk will crash forever.
> > >
> > > You need to add Fixes tag and Cc stable@dpdk.org so that it can be
> > > backported.
> > >
> > >> Signed-off-by: Li Feng <fengli@smartx.com>
> > >> ---
> > >> lib/vhost/vhost_user.c | 10 ++++++++--
> > >> 1 file changed, 8 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > >> index 29a4c9af60..f09d0f6a48 100644
> > >> --- a/lib/vhost/vhost_user.c
> > >> +++ b/lib/vhost/vhost_user.c
> > >> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct
> > >> virtio_net *dev,
> > >> last_io = inflight_split->last_inflight_io;
> > >> if (inflight_split->used_idx != used->idx) {
> > >> - inflight_split->desc[last_io].inflight = 0;
> > >> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> > >> + if (unlikely(last_io >= inflight_split->desc_num)) {
> > >> + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"'
> > >> exceeds inflight "
> > >> + "queue size (%"PRIu16").\n", last_io,
> > >> + inflight_split->desc_num);
> > >
> > > If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
> > > instead of just logging an error?
> > >
> > >> + } else {
> > >> + inflight_split->desc[last_io].inflight = 0;
> > >> + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> > >> + }
> > >> inflight_split->used_idx = used->idx;
> > >> }
> > >>
> > >
> > > Regards,
> > > Maxime
@@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
last_io = inflight_split->last_inflight_io;
if (inflight_split->used_idx != used->idx) {
- inflight_split->desc[last_io].inflight = 0;
- rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+ if (unlikely(last_io >= inflight_split->desc_num)) {
+ VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds inflight "
+ "queue size (%"PRIu16").\n", last_io,
+ inflight_split->desc_num);
+ } else {
+ inflight_split->desc[last_io].inflight = 0;
+ rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+ }
inflight_split->used_idx = used->idx;
}