[dpdk-dev] ethdev: fix wrong memset

Message ID 1484899493-11051-1-git-send-email-yuanhan.liu@linux.intel.com (mailing list archive)
State Accepted, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel compilation success Compilation OK

Commit Message

Yuanhan Liu Jan. 20, 2017, 8:04 a.m. UTC
  Fix an silly error by auto-complete while managing the merge conflicts.
It's the eth_dev_data (but not eth_dev) entry should be memset.

Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")

Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 lib/librte_ether/rte_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Thomas Monjalon Jan. 20, 2017, 10:20 a.m. UTC | #1
2017-01-20 16:04, Yuanhan Liu:
> Fix an silly error by auto-complete while managing the merge conflicts.
> It's the eth_dev_data (but not eth_dev) entry should be memset.
> 
> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")

You should describe the impact on this bug.
It will be helpful for those testing the RC1.

> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));

The title should be contain the scope of the bug.
I suggest "fix data reset when allocating port".
  
Yuanhan Liu Jan. 20, 2017, 10:34 a.m. UTC | #2
On Fri, Jan 20, 2017 at 11:20:06AM +0100, Thomas Monjalon wrote:
> 2017-01-20 16:04, Yuanhan Liu:
> > Fix an silly error by auto-complete while managing the merge conflicts.
> > It's the eth_dev_data (but not eth_dev) entry should be memset.
> > 
> > Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> 
> You should describe the impact on this bug.

Honestly, I don't know. I didn't met any issue while testing vhost.
Maybe Ferruh knows what might be wrong, since it's him spotted this
bug?

> It will be helpful for those testing the RC1.
> 
> > -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> > +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> 
> The title should be contain the scope of the bug.
> I suggest "fix data reset when allocating port".

Yeah, that's better. Thanks.

	--yliu
  
Ferruh Yigit Jan. 20, 2017, 11:09 a.m. UTC | #3
On 1/20/2017 10:34 AM, Yuanhan Liu wrote:
> On Fri, Jan 20, 2017 at 11:20:06AM +0100, Thomas Monjalon wrote:
>> 2017-01-20 16:04, Yuanhan Liu:
>>> Fix an silly error by auto-complete while managing the merge conflicts.
>>> It's the eth_dev_data (but not eth_dev) entry should be memset.
>>>
>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
>>
>> You should describe the impact on this bug.
> 
> Honestly, I don't know. I didn't met any issue while testing vhost.
> Maybe Ferruh knows what might be wrong, since it's him spotted this
> bug?

I find this while reading the code, not observed any bug.

> 
>> It will be helpful for those testing the RC1.
>>
>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>>
>> The title should be contain the scope of the bug.
>> I suggest "fix data reset when allocating port".
> 
> Yeah, that's better. Thanks.
> 
> 	--yliu
>
  
Ferruh Yigit Jan. 20, 2017, 11:21 a.m. UTC | #4
On 1/20/2017 8:04 AM, Yuanhan Liu wrote:
> Fix an silly error by auto-complete while managing the merge conflicts.
> It's the eth_dev_data (but not eth_dev) entry should be memset.
> 
> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> 
> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 4790faf..61f44e2 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -225,7 +225,7 @@ struct rte_eth_dev *
>  		return NULL;
>  	}
>  
> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));

Not directly related to the this issue, but, after fix, this may have
issues with secondary process.

There were patches sent to fix this.


>  	eth_dev = eth_dev_get(port_id);
>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
>  	eth_dev->data->port_id = port_id;
>
  
Ferruh Yigit Jan. 20, 2017, 3:27 p.m. UTC | #5
On 1/20/2017 11:21 AM, Ferruh Yigit wrote:
> On 1/20/2017 8:04 AM, Yuanhan Liu wrote:
>> Fix an silly error by auto-complete while managing the merge conflicts.
>> It's the eth_dev_data (but not eth_dev) entry should be memset.
>>
>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
>>
>> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> ---
>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 4790faf..61f44e2 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
>>  		return NULL;
>>  	}
>>  
>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> 
> Not directly related to the this issue, but, after fix, this may have
> issues with secondary process.
> 
> There were patches sent to fix this.

I mean this one:
http://dpdk.org/ml/archives/dev/2017-January/054422.html

> 
> 
>>  	eth_dev = eth_dev_get(port_id);
>>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
>>  	eth_dev->data->port_id = port_id;
>>
>
  
Thomas Monjalon Jan. 20, 2017, 6:05 p.m. UTC | #6
2017-01-20 18:34, Yuanhan Liu:
> On Fri, Jan 20, 2017 at 11:20:06AM +0100, Thomas Monjalon wrote:
> > 2017-01-20 16:04, Yuanhan Liu:
> > > Fix an silly error by auto-complete while managing the merge conflicts.
> > > It's the eth_dev_data (but not eth_dev) entry should be memset.
> > > 
> > > Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> > 
> > You should describe the impact on this bug.
> 
> Honestly, I don't know. I didn't met any issue while testing vhost.
> Maybe Ferruh knows what might be wrong, since it's him spotted this
> bug?
> 
> > It will be helpful for those testing the RC1.
> > 
> > > -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> > > +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> > 
> > The title should be contain the scope of the bug.
> > I suggest "fix data reset when allocating port".
> 
> Yeah, that's better. Thanks.

Applied, thanks
  
Yuanhan Liu Jan. 22, 2017, 2:45 a.m. UTC | #7
On Fri, Jan 20, 2017 at 03:27:43PM +0000, Ferruh Yigit wrote:
> On 1/20/2017 11:21 AM, Ferruh Yigit wrote:
> > On 1/20/2017 8:04 AM, Yuanhan Liu wrote:
> >> Fix an silly error by auto-complete while managing the merge conflicts.
> >> It's the eth_dev_data (but not eth_dev) entry should be memset.
> >>
> >> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> >>
> >> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >> ---
> >>  lib/librte_ether/rte_ethdev.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> >> index 4790faf..61f44e2 100644
> >> --- a/lib/librte_ether/rte_ethdev.c
> >> +++ b/lib/librte_ether/rte_ethdev.c
> >> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> >>  		return NULL;
> >>  	}
> >>  
> >> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> >> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> > 
> > Not directly related to the this issue, but, after fix, this may have
> > issues with secondary process.
> > 
> > There were patches sent to fix this.
> 
> I mean this one:
> http://dpdk.org/ml/archives/dev/2017-January/054422.html

d948f596fee2 ("ethdev: fix port data mismatched in multiple process
model") should have fixed it.

	--yliu
  
Ferruh Yigit Jan. 23, 2017, 9:41 a.m. UTC | #8
On 1/22/2017 2:45 AM, Yuanhan Liu wrote:
> On Fri, Jan 20, 2017 at 03:27:43PM +0000, Ferruh Yigit wrote:
>> On 1/20/2017 11:21 AM, Ferruh Yigit wrote:
>>> On 1/20/2017 8:04 AM, Yuanhan Liu wrote:
>>>> Fix an silly error by auto-complete while managing the merge conflicts.
>>>> It's the eth_dev_data (but not eth_dev) entry should be memset.
>>>>
>>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
>>>>
>>>> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>> ---
>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>>>> index 4790faf..61f44e2 100644
>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
>>>>  		return NULL;
>>>>  	}
>>>>  
>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>>>
>>> Not directly related to the this issue, but, after fix, this may have
>>> issues with secondary process.
>>>
>>> There were patches sent to fix this.
>>
>> I mean this one:
>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> 
> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> model") should have fixed it.

Think about case, where secondary process uses a virtual PMD, which does
a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
device data?

> 
> 	--yliu
>
  
Yuanhan Liu Jan. 23, 2017, 10:34 a.m. UTC | #9
On Mon, Jan 23, 2017 at 09:41:35AM +0000, Ferruh Yigit wrote:
> On 1/22/2017 2:45 AM, Yuanhan Liu wrote:
> > On Fri, Jan 20, 2017 at 03:27:43PM +0000, Ferruh Yigit wrote:
> >> On 1/20/2017 11:21 AM, Ferruh Yigit wrote:
> >>> On 1/20/2017 8:04 AM, Yuanhan Liu wrote:
> >>>> Fix an silly error by auto-complete while managing the merge conflicts.
> >>>> It's the eth_dev_data (but not eth_dev) entry should be memset.
> >>>>
> >>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> >>>>
> >>>> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>> ---
> >>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> >>>> index 4790faf..61f44e2 100644
> >>>> --- a/lib/librte_ether/rte_ethdev.c
> >>>> +++ b/lib/librte_ether/rte_ethdev.c
> >>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> >>>>  		return NULL;
> >>>>  	}
> >>>>  
> >>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> >>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> >>>
> >>> Not directly related to the this issue, but, after fix, this may have
> >>> issues with secondary process.
> >>>
> >>> There were patches sent to fix this.
> >>
> >> I mean this one:
> >> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > 
> > d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> > model") should have fixed it.
> 
> Think about case, where secondary process uses a virtual PMD, which does
> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> device data?

Yes, it may. However, I doubt that's the typical usage. Besides that,
most of virtual PMDs don't support Multipleprocess: git grep shows pcap
is the only one that does claim Multipleprocess is supported.

	--yliu
  
Ferruh Yigit Jan. 23, 2017, 11:05 a.m. UTC | #10
On 1/23/2017 10:34 AM, Yuanhan Liu wrote:
> On Mon, Jan 23, 2017 at 09:41:35AM +0000, Ferruh Yigit wrote:
>> On 1/22/2017 2:45 AM, Yuanhan Liu wrote:
>>> On Fri, Jan 20, 2017 at 03:27:43PM +0000, Ferruh Yigit wrote:
>>>> On 1/20/2017 11:21 AM, Ferruh Yigit wrote:
>>>>> On 1/20/2017 8:04 AM, Yuanhan Liu wrote:
>>>>>> Fix an silly error by auto-complete while managing the merge conflicts.
>>>>>> It's the eth_dev_data (but not eth_dev) entry should be memset.
>>>>>>
>>>>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
>>>>>>
>>>>>> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>>>> ---
>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>>>>>> index 4790faf..61f44e2 100644
>>>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
>>>>>>  		return NULL;
>>>>>>  	}
>>>>>>  
>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>>>>>
>>>>> Not directly related to the this issue, but, after fix, this may have
>>>>> issues with secondary process.
>>>>>
>>>>> There were patches sent to fix this.
>>>>
>>>> I mean this one:
>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
>>>
>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
>>> model") should have fixed it.
>>
>> Think about case, where secondary process uses a virtual PMD, which does
>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
>> device data?
> 
> Yes, it may. However, I doubt that's the typical usage. 

But this is a use case, and broken now, and fix is known. Should be
fixed I think.

> Besides that,
> most of virtual PMDs don't support Multipleprocess: git grep shows pcap
> is the only one that does claim Multipleprocess is supported.

I guess you searched for NIC feature documentation for this. But as far
as I know, all virtual drivers can be used in both primary and secondary
process.

> 
> 	--yliu
>
  
Yuanhan Liu Jan. 23, 2017, 11:24 a.m. UTC | #11
On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
> >>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> >>>>>> index 4790faf..61f44e2 100644
> >>>>>> --- a/lib/librte_ether/rte_ethdev.c
> >>>>>> +++ b/lib/librte_ether/rte_ethdev.c
> >>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> >>>>>>  		return NULL;
> >>>>>>  	}
> >>>>>>  
> >>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> >>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> >>>>>
> >>>>> Not directly related to the this issue, but, after fix, this may have
> >>>>> issues with secondary process.
> >>>>>
> >>>>> There were patches sent to fix this.
> >>>>
> >>>> I mean this one:
> >>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> >>>
> >>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> >>> model") should have fixed it.
> >>
> >> Think about case, where secondary process uses a virtual PMD, which does
> >> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> >> device data?
> > 
> > Yes, it may. However, I doubt that's the typical usage. 
> 
> But this is a use case, and broken now,

I thought it was broken since the beginning?

> and fix is known.

And there is already a fix?

> Should be
> fixed I think.

Sure.

> 
> > Besides that,
> > most of virtual PMDs don't support Multipleprocess: git grep shows pcap
> > is the only one that does claim Multipleprocess is supported.
> 
> I guess you searched for NIC feature documentation for this.

Yes.

> But as far
> as I know, all virtual drivers can be used in both primary and secondary
> process.

Maybe. But it becomes very error-prone to me then when vdev are involved
in both primary and secondary process. I don't think current code is (or
designed to be) strong enough to support that.

I don't know it's allowed to use hotplug or not in the multiple process
model. If yes, I think there would be many ways to break it.

Honestly, the multiple process doesn't look like a good/clean design to
me, especially when some piece of code claim to support it while some
other doesn't.

So my point was, yes, there is a bug, we should fix it. But it seems
that there could be so many bugs if we hugely expand the test coverage
of the multiple process feature.

	--yliu
  
Ferruh Yigit Jan. 23, 2017, 11:32 a.m. UTC | #12
On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
> On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
>>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>>>>>>>> index 4790faf..61f44e2 100644
>>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
>>>>>>>>  		return NULL;
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
>>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>>>>>>>
>>>>>>> Not directly related to the this issue, but, after fix, this may have
>>>>>>> issues with secondary process.
>>>>>>>
>>>>>>> There were patches sent to fix this.
>>>>>>
>>>>>> I mean this one:
>>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
>>>>>
>>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
>>>>> model") should have fixed it.
>>>>
>>>> Think about case, where secondary process uses a virtual PMD, which does
>>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
>>>> device data?
>>>
>>> Yes, it may. However, I doubt that's the typical usage. 
>>
>> But this is a use case, and broken now,
> 
> I thought it was broken since the beginning?

No, memset(&rte_eth_dev_data[port_id], ...) breaks it.

> 
>> and fix is known.
> 
> And there is already a fix?

http://dpdk.org/ml/archives/dev/2017-January/054422.html

> 
>> Should be
>> fixed I think.
> 
> Sure.
> 
>>
>>> Besides that,
>>> most of virtual PMDs don't support Multipleprocess: git grep shows pcap
>>> is the only one that does claim Multipleprocess is supported.
>>
>> I guess you searched for NIC feature documentation for this.
> 
> Yes.
> 
>> But as far
>> as I know, all virtual drivers can be used in both primary and secondary
>> process.
> 
> Maybe. But it becomes very error-prone to me then when vdev are involved
> in both primary and secondary process. I don't think current code is (or
> designed to be) strong enough to support that.
> 
> I don't know it's allowed to use hotplug or not in the multiple process
> model. If yes, I think there would be many ways to break it.
> 
> Honestly, the multiple process doesn't look like a good/clean design to
> me, especially when some piece of code claim to support it while some
> other doesn't.
> 
> So my point was, yes, there is a bug, we should fix it. But it seems
> that there could be so many bugs if we hugely expand the test coverage
> of the multiple process feature.

Agreed.

> 
> 	--yliu
>
  
Yuanhan Liu Jan. 23, 2017, 11:40 a.m. UTC | #13
On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:
> On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
> > On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
> >>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> >>>>>>>> index 4790faf..61f44e2 100644
> >>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
> >>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
> >>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> >>>>>>>>  		return NULL;
> >>>>>>>>  	}
> >>>>>>>>  
> >>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> >>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> >>>>>>>
> >>>>>>> Not directly related to the this issue, but, after fix, this may have
> >>>>>>> issues with secondary process.
> >>>>>>>
> >>>>>>> There were patches sent to fix this.
> >>>>>>
> >>>>>> I mean this one:
> >>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> >>>>>
> >>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> >>>>> model") should have fixed it.
> >>>>
> >>>> Think about case, where secondary process uses a virtual PMD, which does
> >>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> >>>> device data?
> >>>
> >>> Yes, it may. However, I doubt that's the typical usage. 
> >>
> >> But this is a use case, and broken now,
> > 
> > I thought it was broken since the beginning?
> 
> No, memset(&rte_eth_dev_data[port_id], ...) breaks it.

Oh, you were talking about that particular case Remy's patch meant to
fix.

> >> and fix is known.
> > 
> > And there is already a fix?
> 
> http://dpdk.org/ml/archives/dev/2017-January/054422.html

Yes, it should fix that issue. One question: do Remy or you regularly
run some multiple process test cases (and with vdev both in primary
and secondary process)?

	--yliu
  
Yuanhan Liu Jan. 23, 2017, 11:56 a.m. UTC | #14
On Mon, Jan 23, 2017 at 07:40:50PM +0800, Yuanhan Liu wrote:
> On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:
> > On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
> > > On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
> > >>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> > >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > >>>>>>>> index 4790faf..61f44e2 100644
> > >>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
> > >>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
> > >>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> > >>>>>>>>  		return NULL;
> > >>>>>>>>  	}
> > >>>>>>>>  
> > >>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> > >>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> > >>>>>>>
> > >>>>>>> Not directly related to the this issue, but, after fix, this may have
> > >>>>>>> issues with secondary process.
> > >>>>>>>
> > >>>>>>> There were patches sent to fix this.
> > >>>>>>
> > >>>>>> I mean this one:
> > >>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > >>>>>
> > >>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> > >>>>> model") should have fixed it.
> > >>>>
> > >>>> Think about case, where secondary process uses a virtual PMD, which does
> > >>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> > >>>> device data?
> > >>>
> > >>> Yes, it may. However, I doubt that's the typical usage. 
> > >>
> > >> But this is a use case, and broken now,
> > > 
> > > I thought it was broken since the beginning?
> > 
> > No, memset(&rte_eth_dev_data[port_id], ...) breaks it.
> 
> Oh, you were talking about that particular case Remy's patch meant to
> fix.
> 
> > >> and fix is known.
> > > 
> > > And there is already a fix?
> > 
> > http://dpdk.org/ml/archives/dev/2017-January/054422.html
> 
> Yes, it should fix that issue.

Well, few more thoughts: it may fix the crash issue Remy saw, but it
looks like more a workaround to me. Basically, if primary and secondary
shares a same port id, they should point to same device. Otherwise,
primary process may use eth_dev->data for a device A, while the
secondary process may use it for another device, as you said, it
could be a vdev.

In such case, there is no way we could continue safely. That said,
the given patch avoids the total reset of eth_dev->data, while it
continues reset the eth_dev->data->name, which is wrong.

So it's not a proper fix.

Again, I think it's more about the usage. If primary starts with
a nic device A, while the secondary starts with a nic device B,
there is no way they could work well (unless they use different
port id).

	--yliu

> One question: do Remy or you regularly
> run some multiple process test cases (and with vdev both in primary
> and secondary process)?
  
Ananyev, Konstantin Jan. 23, 2017, 12:44 p.m. UTC | #15
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Monday, January 23, 2017 11:56 AM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas.monjalon@6wind.com>; Horton, Remy <remy.horton@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] ethdev: fix wrong memset
> 
> On Mon, Jan 23, 2017 at 07:40:50PM +0800, Yuanhan Liu wrote:
> > On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:
> > > On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
> > > > On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
> > > >>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> > > >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>>>>>>>
> > > >>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > > >>>>>>>> index 4790faf..61f44e2 100644
> > > >>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
> > > >>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
> > > >>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> > > >>>>>>>>  		return NULL;
> > > >>>>>>>>  	}
> > > >>>>>>>>
> > > >>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> > > >>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> > > >>>>>>>
> > > >>>>>>> Not directly related to the this issue, but, after fix, this may have
> > > >>>>>>> issues with secondary process.
> > > >>>>>>>
> > > >>>>>>> There were patches sent to fix this.
> > > >>>>>>
> > > >>>>>> I mean this one:
> > > >>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > > >>>>>
> > > >>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> > > >>>>> model") should have fixed it.
> > > >>>>
> > > >>>> Think about case, where secondary process uses a virtual PMD, which does
> > > >>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> > > >>>> device data?
> > > >>>
> > > >>> Yes, it may. However, I doubt that's the typical usage.
> > > >>
> > > >> But this is a use case, and broken now,
> > > >
> > > > I thought it was broken since the beginning?
> > >
> > > No, memset(&rte_eth_dev_data[port_id], ...) breaks it.
> >
> > Oh, you were talking about that particular case Remy's patch meant to
> > fix.
> >
> > > >> and fix is known.
> > > >
> > > > And there is already a fix?
> > >
> > > http://dpdk.org/ml/archives/dev/2017-January/054422.html
> >
> > Yes, it should fix that issue.
> 
> Well, few more thoughts: it may fix the crash issue Remy saw, but it
> looks like more a workaround to me. Basically, if primary and secondary
> shares a same port id, they should point to same device. Otherwise,
> primary process may use eth_dev->data for a device A, while the
> secondary process may use it for another device, as you said, it
> could be a vdev.
> 
> In such case, there is no way we could continue safely. That said,
> the given patch avoids the total reset of eth_dev->data, while it
> continues reset the eth_dev->data->name, which is wrong.
> 
> So it's not a proper fix.
> 
> Again, I think it's more about the usage. If primary starts with
> a nic device A, while the secondary starts with a nic device B,
> there is no way they could work well (unless they use different
> port id).

Why not?
I think this is possible.
They just need to be initialized properly,
so each rte_eth_devices[port_id]->data, etc. point to the right place.
Konstantin


> 
> 	--yliu
> 
> > One question: do Remy or you regularly
> > run some multiple process test cases (and with vdev both in primary
> > and secondary process)?
  
Yuanhan Liu Jan. 23, 2017, 12:52 p.m. UTC | #16
On Mon, Jan 23, 2017 at 12:44:11PM +0000, Ananyev, Konstantin wrote:
> > On Mon, Jan 23, 2017 at 07:40:50PM +0800, Yuanhan Liu wrote:
> > > On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:
> > > > On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
> > > > > On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
> > > > >>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> > > > >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >>>>>>>>
> > > > >>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > > > >>>>>>>> index 4790faf..61f44e2 100644
> > > > >>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
> > > > >>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
> > > > >>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> > > > >>>>>>>>  		return NULL;
> > > > >>>>>>>>  	}
> > > > >>>>>>>>
> > > > >>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> > > > >>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> > > > >>>>>>>
> > > > >>>>>>> Not directly related to the this issue, but, after fix, this may have
> > > > >>>>>>> issues with secondary process.
> > > > >>>>>>>
> > > > >>>>>>> There were patches sent to fix this.
> > > > >>>>>>
> > > > >>>>>> I mean this one:
> > > > >>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > > > >>>>>
> > > > >>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> > > > >>>>> model") should have fixed it.
> > > > >>>>
> > > > >>>> Think about case, where secondary process uses a virtual PMD, which does
> > > > >>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> > > > >>>> device data?
> > > > >>>
> > > > >>> Yes, it may. However, I doubt that's the typical usage.
> > > > >>
> > > > >> But this is a use case, and broken now,
> > > > >
> > > > > I thought it was broken since the beginning?
> > > >
> > > > No, memset(&rte_eth_dev_data[port_id], ...) breaks it.
> > >
> > > Oh, you were talking about that particular case Remy's patch meant to
> > > fix.
> > >
> > > > >> and fix is known.
> > > > >
> > > > > And there is already a fix?
> > > >
> > > > http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > >
> > > Yes, it should fix that issue.
> > 
> > Well, few more thoughts: it may fix the crash issue Remy saw, but it
> > looks like more a workaround to me. Basically, if primary and secondary
> > shares a same port id, they should point to same device. Otherwise,
> > primary process may use eth_dev->data for a device A, while the
> > secondary process may use it for another device, as you said, it
> > could be a vdev.
> > 
> > In such case, there is no way we could continue safely. That said,
> > the given patch avoids the total reset of eth_dev->data, while it
> > continues reset the eth_dev->data->name, which is wrong.
> > 
> > So it's not a proper fix.
> > 
> > Again, I think it's more about the usage. If primary starts with
> > a nic device A, while the secondary starts with a nic device B,
> > there is no way they could work well (unless they use different
> > port id).
> 
> Why not?
> I think this is possible.

Yes, it's possible: find another port id if that one is already taken
by primary process (or even by secondary process: think that primary
process might attatch a port later).

> They just need to be initialized properly,
> so each rte_eth_devices[port_id]->data, etc. point to the right place.

My understanding is, as far as they use different port_id, it might
be fine. Just not sure it's enough or not.

	--yliu
  
Ananyev, Konstantin Jan. 23, 2017, 1:06 p.m. UTC | #17
> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> Sent: Monday, January 23, 2017 12:53 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Thomas Monjalon <thomas.monjalon@6wind.com>; Horton, Remy
> <remy.horton@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] ethdev: fix wrong memset
> 
> On Mon, Jan 23, 2017 at 12:44:11PM +0000, Ananyev, Konstantin wrote:
> > > On Mon, Jan 23, 2017 at 07:40:50PM +0800, Yuanhan Liu wrote:
> > > > On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:
> > > > > On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
> > > > > > On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
> > > > > >>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> > > > > >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >>>>>>>>
> > > > > >>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > > > > >>>>>>>> index 4790faf..61f44e2 100644
> > > > > >>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
> > > > > >>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
> > > > > >>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> > > > > >>>>>>>>  		return NULL;
> > > > > >>>>>>>>  	}
> > > > > >>>>>>>>
> > > > > >>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> > > > > >>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> > > > > >>>>>>>
> > > > > >>>>>>> Not directly related to the this issue, but, after fix, this may have
> > > > > >>>>>>> issues with secondary process.
> > > > > >>>>>>>
> > > > > >>>>>>> There were patches sent to fix this.
> > > > > >>>>>>
> > > > > >>>>>> I mean this one:
> > > > > >>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > > > > >>>>>
> > > > > >>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> > > > > >>>>> model") should have fixed it.
> > > > > >>>>
> > > > > >>>> Think about case, where secondary process uses a virtual PMD, which does
> > > > > >>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> > > > > >>>> device data?
> > > > > >>>
> > > > > >>> Yes, it may. However, I doubt that's the typical usage.
> > > > > >>
> > > > > >> But this is a use case, and broken now,
> > > > > >
> > > > > > I thought it was broken since the beginning?
> > > > >
> > > > > No, memset(&rte_eth_dev_data[port_id], ...) breaks it.
> > > >
> > > > Oh, you were talking about that particular case Remy's patch meant to
> > > > fix.
> > > >
> > > > > >> and fix is known.
> > > > > >
> > > > > > And there is already a fix?
> > > > >
> > > > > http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > > >
> > > > Yes, it should fix that issue.
> > >
> > > Well, few more thoughts: it may fix the crash issue Remy saw, but it
> > > looks like more a workaround to me. Basically, if primary and secondary
> > > shares a same port id, they should point to same device. Otherwise,
> > > primary process may use eth_dev->data for a device A, while the
> > > secondary process may use it for another device, as you said, it
> > > could be a vdev.
> > >
> > > In such case, there is no way we could continue safely. That said,
> > > the given patch avoids the total reset of eth_dev->data, while it
> > > continues reset the eth_dev->data->name, which is wrong.
> > >
> > > So it's not a proper fix.
> > >
> > > Again, I think it's more about the usage. If primary starts with
> > > a nic device A, while the secondary starts with a nic device B,
> > > there is no way they could work well (unless they use different
> > > port id).
> >
> > Why not?
> > I think this is possible.
> 
> Yes, it's possible: find another port id if that one is already taken
> by primary process (or even by secondary process: think that primary
> process might attatch a port later).
> 
> > They just need to be initialized properly,
> > so each rte_eth_devices[port_id]->data, etc. point to the right place.
> 
> My understanding is, as far as they use different port_id, it might
> be fine. Just not sure it's enough or not.

As I understand, the main problem is that  rte_eth_devices[] is local,
while rte_eth_dev_data points to the shared memory array.
And rte_eth_dev_allocate() assumes that if rte_eth_devices[x] is free,
then rte_eth_dev_data[port_id] is also free.
Which is wrong in case when primary/secondary processes have different devices attached.
Another problem is that inside rte_ethdev.c we manipulate rte_eth_dev_data[]
contents without grabbing any lock.
I think it was an attempt to fix that issue in 16.07 timeframe or so,
but I don't remember what happened with that patch.
Konstantin
  
Ferruh Yigit Jan. 23, 2017, 1:09 p.m. UTC | #18
On 1/23/2017 1:06 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
>> Sent: Monday, January 23, 2017 12:53 PM
>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Thomas Monjalon <thomas.monjalon@6wind.com>; Horton, Remy
>> <remy.horton@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH] ethdev: fix wrong memset
>>
>> On Mon, Jan 23, 2017 at 12:44:11PM +0000, Ananyev, Konstantin wrote:
>>>> On Mon, Jan 23, 2017 at 07:40:50PM +0800, Yuanhan Liu wrote:
>>>>> On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:
>>>>>> On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
>>>>>>> On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
>>>>>>>>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>>>>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>>>>>>>>>>>>>> index 4790faf..61f44e2 100644
>>>>>>>>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>>>>>>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>>>>>>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
>>>>>>>>>>>>>>  		return NULL;
>>>>>>>>>>>>>>  	}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
>>>>>>>>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not directly related to the this issue, but, after fix, this may have
>>>>>>>>>>>>> issues with secondary process.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There were patches sent to fix this.
>>>>>>>>>>>>
>>>>>>>>>>>> I mean this one:
>>>>>>>>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
>>>>>>>>>>>
>>>>>>>>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
>>>>>>>>>>> model") should have fixed it.
>>>>>>>>>>
>>>>>>>>>> Think about case, where secondary process uses a virtual PMD, which does
>>>>>>>>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
>>>>>>>>>> device data?
>>>>>>>>>
>>>>>>>>> Yes, it may. However, I doubt that's the typical usage.
>>>>>>>>
>>>>>>>> But this is a use case, and broken now,
>>>>>>>
>>>>>>> I thought it was broken since the beginning?
>>>>>>
>>>>>> No, memset(&rte_eth_dev_data[port_id], ...) breaks it.
>>>>>
>>>>> Oh, you were talking about that particular case Remy's patch meant to
>>>>> fix.
>>>>>
>>>>>>>> and fix is known.
>>>>>>>
>>>>>>> And there is already a fix?
>>>>>>
>>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
>>>>>
>>>>> Yes, it should fix that issue.
>>>>
>>>> Well, few more thoughts: it may fix the crash issue Remy saw, but it
>>>> looks like more a workaround to me. Basically, if primary and secondary
>>>> shares a same port id, they should point to same device. Otherwise,
>>>> primary process may use eth_dev->data for a device A, while the
>>>> secondary process may use it for another device, as you said, it
>>>> could be a vdev.
>>>>
>>>> In such case, there is no way we could continue safely. That said,
>>>> the given patch avoids the total reset of eth_dev->data, while it
>>>> continues reset the eth_dev->data->name, which is wrong.
>>>>
>>>> So it's not a proper fix.
>>>>
>>>> Again, I think it's more about the usage. If primary starts with
>>>> a nic device A, while the secondary starts with a nic device B,
>>>> there is no way they could work well (unless they use different
>>>> port id).
>>>
>>> Why not?
>>> I think this is possible.
>>
>> Yes, it's possible: find another port id if that one is already taken
>> by primary process (or even by secondary process: think that primary
>> process might attatch a port later).
>>
>>> They just need to be initialized properly,
>>> so each rte_eth_devices[port_id]->data, etc. point to the right place.
>>
>> My understanding is, as far as they use different port_id, it might
>> be fine. Just not sure it's enough or not.
> 
> As I understand, the main problem is that  rte_eth_devices[] is local,
> while rte_eth_dev_data points to the shared memory array.
> And rte_eth_dev_allocate() assumes that if rte_eth_devices[x] is free,
> then rte_eth_dev_data[port_id] is also free.
> Which is wrong in case when primary/secondary processes have different devices attached.
> Another problem is that inside rte_ethdev.c we manipulate rte_eth_dev_data[]
> contents without grabbing any lock.

> I think it was an attempt to fix that issue in 16.07 timeframe or so,
> but I don't remember what happened with that patch.

Same here, I remember this already discussed and even some patches sent,
by Reshma if I remember correctly, but I don't remember latest status.

> Konstantin 
> 
> 
> 
>
  
Remy Horton Jan. 24, 2017, 8:29 a.m. UTC | #19
On 23/01/2017 11:56, Yuanhan Liu wrote:
[..]
>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
>>
>> Yes, it should fix that issue.
>
> Well, few more thoughts: it may fix the crash issue Remy saw, but it
> looks like more a workaround to me. Basically, if primary and secondary
> shares a same port id, they should point to same device. Otherwise,
> primary process may use eth_dev->data for a device A, while the
> secondary process may use it for another device, as you said, it
> could be a vdev.
>
> In such case, there is no way we could continue safely. That said,
> the given patch avoids the total reset of eth_dev->data, while it
> continues reset the eth_dev->data->name, which is wrong.

I did wonder whether 7f95f78a8aea ought to be rolled back rather than 
the memset being made process-conditional. You going to be fixing the 
issue in your own patch?


>> One question: do Remy or you regularly
>> run some multiple process test cases (and with vdev both in primary
>> and secondary process)?

Not aware of there being any multiproc-related unit tests.

..Remy
  
Thomas Monjalon Jan. 25, 2017, 11:16 a.m. UTC | #20
2017-01-23 13:06, Ananyev, Konstantin:
> 
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> > Sent: Monday, January 23, 2017 12:53 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Thomas Monjalon <thomas.monjalon@6wind.com>; Horton, Remy
> > <remy.horton@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH] ethdev: fix wrong memset
> > 
> > On Mon, Jan 23, 2017 at 12:44:11PM +0000, Ananyev, Konstantin wrote:
> > > > On Mon, Jan 23, 2017 at 07:40:50PM +0800, Yuanhan Liu wrote:
> > > > > On Mon, Jan 23, 2017 at 11:32:23AM +0000, Ferruh Yigit wrote:
> > > > > > On 1/23/2017 11:24 AM, Yuanhan Liu wrote:
> > > > > > > On Mon, Jan 23, 2017 at 11:05:25AM +0000, Ferruh Yigit wrote:
> > > > > > >>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
> > > > > > >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > > > > > >>>>>>>> index 4790faf..61f44e2 100644
> > > > > > >>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
> > > > > > >>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
> > > > > > >>>>>>>> @@ -225,7 +225,7 @@ struct rte_eth_dev *
> > > > > > >>>>>>>>  		return NULL;
> > > > > > >>>>>>>>  	}
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> -	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
> > > > > > >>>>>>>> +	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> > > > > > >>>>>>>
> > > > > > >>>>>>> Not directly related to the this issue, but, after fix, this may have
> > > > > > >>>>>>> issues with secondary process.
> > > > > > >>>>>>>
> > > > > > >>>>>>> There were patches sent to fix this.
> > > > > > >>>>>>
> > > > > > >>>>>> I mean this one:
> > > > > > >>>>>> http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > > > > > >>>>>
> > > > > > >>>>> d948f596fee2 ("ethdev: fix port data mismatched in multiple process
> > > > > > >>>>> model") should have fixed it.
> > > > > > >>>>
> > > > > > >>>> Think about case, where secondary process uses a virtual PMD, which does
> > > > > > >>>> a rte_eth_dev_allocate() call, shouldn't this corrupt primary process
> > > > > > >>>> device data?
> > > > > > >>>
> > > > > > >>> Yes, it may. However, I doubt that's the typical usage.
> > > > > > >>
> > > > > > >> But this is a use case, and broken now,
> > > > > > >
> > > > > > > I thought it was broken since the beginning?
> > > > > >
> > > > > > No, memset(&rte_eth_dev_data[port_id], ...) breaks it.
> > > > >
> > > > > Oh, you were talking about that particular case Remy's patch meant to
> > > > > fix.
> > > > >
> > > > > > >> and fix is known.
> > > > > > >
> > > > > > > And there is already a fix?
> > > > > >
> > > > > > http://dpdk.org/ml/archives/dev/2017-January/054422.html
> > > > >
> > > > > Yes, it should fix that issue.
> > > >
> > > > Well, few more thoughts: it may fix the crash issue Remy saw, but it
> > > > looks like more a workaround to me. Basically, if primary and secondary
> > > > shares a same port id, they should point to same device. Otherwise,
> > > > primary process may use eth_dev->data for a device A, while the
> > > > secondary process may use it for another device, as you said, it
> > > > could be a vdev.
> > > >
> > > > In such case, there is no way we could continue safely. That said,
> > > > the given patch avoids the total reset of eth_dev->data, while it
> > > > continues reset the eth_dev->data->name, which is wrong.
> > > >
> > > > So it's not a proper fix.
> > > >
> > > > Again, I think it's more about the usage. If primary starts with
> > > > a nic device A, while the secondary starts with a nic device B,
> > > > there is no way they could work well (unless they use different
> > > > port id).
> > >
> > > Why not?
> > > I think this is possible.
> > 
> > Yes, it's possible: find another port id if that one is already taken
> > by primary process (or even by secondary process: think that primary
> > process might attatch a port later).
> > 
> > > They just need to be initialized properly,
> > > so each rte_eth_devices[port_id]->data, etc. point to the right place.
> > 
> > My understanding is, as far as they use different port_id, it might
> > be fine. Just not sure it's enough or not.
> 
> As I understand, the main problem is that  rte_eth_devices[] is local,
> while rte_eth_dev_data points to the shared memory array.
> And rte_eth_dev_allocate() assumes that if rte_eth_devices[x] is free,
> then rte_eth_dev_data[port_id] is also free.
> Which is wrong in case when primary/secondary processes have different devices attached.
> Another problem is that inside rte_ethdev.c we manipulate rte_eth_dev_data[]
> contents without grabbing any lock.

Yes there are a lot of problems with the multiproc mode because it has
been implemented as a hack.
We are fixing some cases without figuring the whole picture.

I'll apply the patch from Remy which fixes a case where process creates
vdev (local data) and, hopefully, primary does no hotplug of PCI dev.

I'll restart this discussion with a bigger picture of what multiproc is,
and what are the issues.
  
Yuanhan Liu Jan. 28, 2017, 1:14 p.m. UTC | #21
On Wed, Jan 25, 2017 at 12:16:58PM +0100, Thomas Monjalon wrote:
> > As I understand, the main problem is that  rte_eth_devices[] is local,
> > while rte_eth_dev_data points to the shared memory array.
> > And rte_eth_dev_allocate() assumes that if rte_eth_devices[x] is free,
> > then rte_eth_dev_data[port_id] is also free.
> > Which is wrong in case when primary/secondary processes have different devices attached.
> > Another problem is that inside rte_ethdev.c we manipulate rte_eth_dev_data[]
> > contents without grabbing any lock.
> 
> Yes there are a lot of problems with the multiproc mode because it has
> been implemented as a hack.

Oh, right, "hacky" is the word I intended to say in my last email.

> We are fixing some cases without figuring the whole picture.

Agreed.

> 
> I'll apply the patch from Remy which fixes a case where process creates

I would ask you not to apply that. IMO, his patch may have "fixed" a crash
he saw, but it doesn't looks like to me it really fixes anything: the driver 
may reference rte_eth_data belongs to another driver -- things can't be
right afterwards.

> vdev (local data) and, hopefully, primary does no hotplug of PCI dev.
> 
> I'll restart this discussion with a bigger picture of what multiproc is,
> and what are the issues.

Great! And I'm keen to know how the multiproc is actually used in real
life (and why they don't use multiple thread). Most importantly, does
people really care about it? (I fixed few virtio multiproc issues that
I know there are some people/companies using it, and I want to know if
there are more).

OTOH, an initial idea comes to my mind now is, make the port_id a global
id among primary and secondary process:

- same device will get same port id (my patch acertains that)
- different device will get different port id

	--yliu
  
Remy Horton Jan. 30, 2017, 11:07 a.m. UTC | #22
On 28/01/2017 13:14, Yuanhan Liu wrote:
[..]
>> I'll apply the patch from Remy which fixes a case where process creates
>
> I would ask you not to apply that. IMO, his patch may have "fixed" a crash
> he saw, but it doesn't looks like to me it really fixes anything: the driver
> may reference rte_eth_data belongs to another driver -- things can't be
> right afterwards.

I've self-NAK'd it as the code path it was aimed at no longer occurrs.


>> I'll restart this discussion with a bigger picture of what multiproc is,
>> and what are the issues.
>
> Great! And I'm keen to know how the multiproc is actually used in real
> life (and why they don't use multiple thread). Most importantly, does
> people really care about it? (I fixed few virtio multiproc issues that
> I know there are some people/companies using it, and I want to know if
> there are more).

The use-cases I'm coming across are to allow the attaching/detaching of 
external agents mainly for information-reporting purposes, without 
having to leave hooks everywhere. In this case main advantage is the 
relative ease that processes can be hot-plugged in a way threads cannot. 
I suppose something more heavyweight might be a primary process as a 
host physical interconnect, with secondary processes being virtual 
machines - in this case these might be large semi-independent code-bases 
one does not want in the same process image.

To me the major down-side with DPDK's multiprocess model is how 
address-space layout randomisation can trip things up. I know parts of 
the code seems ASLR-safe, but I very much doubt all of it is.

..Remy
  

Patch

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 4790faf..61f44e2 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -225,7 +225,7 @@  struct rte_eth_dev *
 		return NULL;
 	}
 
-	memset(&rte_eth_devices[port_id], 0, sizeof(*eth_dev->data));
+	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
 	eth_dev = eth_dev_get(port_id);
 	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
 	eth_dev->data->port_id = port_id;