ethdev: fix DMA zone reserve not honoring size

Message ID 20190331162437.13048-1-pbhagavatula@marvell.com (mailing list archive)
State Rejected, archived
Delegated to: Ferruh Yigit
Headers
Series ethdev: fix DMA zone reserve not honoring size |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/mellanox-Performance-Testing fail Performance Testing issues
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Pavan Nikhilesh Bhagavatula March 31, 2019, 4:25 p.m. UTC
  From: Pavan Nikhilesh <pbhagavatula@marvell.com>

The `rte_eth_dma_zone_reserve()` is generally used to create HW rings.
In some scenarios when a driver needs to reconfigure the ring size
since the named memzone already exists it returns the previous memzone
without checking if a different sized ring is requested.

Introduce a check to see if the ring size requested is different from the
previously created memzone length.

Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
Cc: stable@dpdk.org

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/librte_ethdev/rte_ethdev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Andrew Rybchenko April 1, 2019, 7:30 a.m. UTC | #1
On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> The `rte_eth_dma_zone_reserve()` is generally used to create HW rings.
> In some scenarios when a driver needs to reconfigure the ring size
> since the named memzone already exists it returns the previous memzone
> without checking if a different sized ring is requested.
>
> Introduce a check to see if the ring size requested is different from the
> previously created memzone length.
>
> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
> Cc: stable@dpdk.org
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>   lib/librte_ethdev/rte_ethdev.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 12b66b68c..4ae12e43b 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>   	}
>   
>   	mz = rte_memzone_lookup(z_name);
> -	if (mz)
> +	if (mz && (mz->len == size))
>   		return mz;
>   
> +	if (mz)
> +		rte_memzone_free(mz);

NACK
I really don't like that API which should reserve does free if requested
size does not match previously allocated.
I understand the motivation, but I don't think the solution is correct.

> +
>   	return rte_memzone_reserve_aligned(z_name, size, socket_id,
>   			RTE_MEMZONE_IOVA_CONTIG, align);
>   }
  
Burakov, Anatoly April 1, 2019, 9:28 a.m. UTC | #2
On 01-Apr-19 8:30 AM, Andrew Rybchenko wrote:
> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>
>> The `rte_eth_dma_zone_reserve()` is generally used to create HW rings.
>> In some scenarios when a driver needs to reconfigure the ring size
>> since the named memzone already exists it returns the previous memzone
>> without checking if a different sized ring is requested.
>>
>> Introduce a check to see if the ring size requested is different from the
>> previously created memzone length.
>>
>> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> ---
>>   lib/librte_ethdev/rte_ethdev.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.c 
>> b/lib/librte_ethdev/rte_ethdev.c
>> index 12b66b68c..4ae12e43b 100644
>> --- a/lib/librte_ethdev/rte_ethdev.c
>> +++ b/lib/librte_ethdev/rte_ethdev.c
>> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct 
>> rte_eth_dev *dev, const char *ring_name,
>>       }
>>       mz = rte_memzone_lookup(z_name);
>> -    if (mz)
>> +    if (mz && (mz->len == size))
>>           return mz;
>> +    if (mz)
>> +        rte_memzone_free(mz);
> 
> NACK
> I really don't like that API which should reserve does free if requested
> size does not match previously allocated.
> I understand the motivation, but I don't think the solution is correct.

Why does size change in the first place?

> 
>> +
>>       return rte_memzone_reserve_aligned(z_name, size, socket_id,
>>               RTE_MEMZONE_IOVA_CONTIG, align);
>>   }
> 
>
  
Burakov, Anatoly April 1, 2019, 9:40 a.m. UTC | #3
On 01-Apr-19 10:28 AM, Burakov, Anatoly wrote:
> On 01-Apr-19 8:30 AM, Andrew Rybchenko wrote:
>> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>
>>> The `rte_eth_dma_zone_reserve()` is generally used to create HW rings.
>>> In some scenarios when a driver needs to reconfigure the ring size
>>> since the named memzone already exists it returns the previous memzone
>>> without checking if a different sized ring is requested.
>>>
>>> Introduce a check to see if the ring size requested is different from 
>>> the
>>> previously created memzone length.
>>>
>>> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>> ---
>>>   lib/librte_ethdev/rte_ethdev.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c 
>>> b/lib/librte_ethdev/rte_ethdev.c
>>> index 12b66b68c..4ae12e43b 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct 
>>> rte_eth_dev *dev, const char *ring_name,
>>>       }
>>>       mz = rte_memzone_lookup(z_name);
>>> -    if (mz)
>>> +    if (mz && (mz->len == size))
>>>           return mz;
>>> +    if (mz)
>>> +        rte_memzone_free(mz);
>>
>> NACK
>> I really don't like that API which should reserve does free if requested
>> size does not match previously allocated.
>> I understand the motivation, but I don't think the solution is correct.
> 
> Why does size change in the first place?

Never mind, i forgot that NICs can be reconfigured :)

Currently, there is no way to resize memzones, so freeing and 
reallocating is the only option. Since memzones are backed by regular 
malloc elements, we could add a memzone_resize API. That would help, 
because all of the references to the memzone itself will still be valid, 
even if memory ends up being reallocated.

> 
>>
>>> +
>>>       return rte_memzone_reserve_aligned(z_name, size, socket_id,
>>>               RTE_MEMZONE_IOVA_CONTIG, align);
>>>   }
>>
>>
> 
>
  
Pavan Nikhilesh Bhagavatula April 1, 2019, 12:12 p.m. UTC | #4
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Monday, April 1, 2019 3:11 PM
> To: Andrew Rybchenko <arybchenko@solarflare.com>; Pavan Nikhilesh
> Bhagavatula <pbhagavatula@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; thomas@monjalon.net; ferruh.yigit@intel.com;
> stephen@networkplumber.org
> Cc: dev@dpdk.org; stable@dpdk.org
> Subject: [EXT] Re: [dpdk-dev] [PATCH] ethdev: fix DMA zone reserve not
> honoring size
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 01-Apr-19 10:28 AM, Burakov, Anatoly wrote:
> > On 01-Apr-19 8:30 AM, Andrew Rybchenko wrote:
> >> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
> >>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >>>
> >>> The `rte_eth_dma_zone_reserve()` is generally used to create HW rings.
> >>> In some scenarios when a driver needs to reconfigure the ring size
> >>> since the named memzone already exists it returns the previous
> >>> memzone without checking if a different sized ring is requested.
> >>>
> >>> Introduce a check to see if the ring size requested is different
> >>> from the previously created memzone length.
> >>>
> >>> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >>> ---
> >>>   lib/librte_ethdev/rte_ethdev.c | 5 ++++-
> >>>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.c
> >>> b/lib/librte_ethdev/rte_ethdev.c index 12b66b68c..4ae12e43b 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.c
> >>> +++ b/lib/librte_ethdev/rte_ethdev.c
> >>> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct
> >>> rte_eth_dev *dev, const char *ring_name,
> >>>       }
> >>>       mz = rte_memzone_lookup(z_name);
> >>> -    if (mz)
> >>> +    if (mz && (mz->len == size))
> >>>           return mz;
> >>> +    if (mz)
> >>> +        rte_memzone_free(mz);
> >>
> >> NACK
> >> I really don't like that API which should reserve does free if
> >> requested size does not match previously allocated.
> >> I understand the motivation, but I don't think the solution is correct.
> >
> > Why does size change in the first place?
> 
> Never mind, i forgot that NICs can be reconfigured :)
> 

😊

> Currently, there is no way to resize memzones, so freeing and reallocating is
> the only option. Since memzones are backed by regular malloc elements, we
> could add a memzone_resize API. That would help, because all of the
> references to the memzone itself will still be valid, even if memory ends up
> being reallocated.
> 

Agreed, but currently the following drivers use dma_zone_reserve API

drivers/net/iavf
drivers/net/e1000
drivers/net/bnx2x
drivers/net/nfp
drivers/net/atlantic
drivers/net/vmxnet3
drivers/net/thunderx
drivers/net/liquidio
drivers/net/sfc
drivers/net/ixgbe
drivers/net/axgbe
drivers/net/fm10k
drivers/net/i40e
drivers/net/ice

Most of them have a notion that dma_zone_reserve allocates the correct size.
(Although Most of the Intel NIC reserve max possible ring size).

Can we have the free and reserve for this release and move it to resize in the next?
AFAIK most of the existing drivers don’t have a different path for reconfigure and since it is same as configure they 
propagate the dma zone address properly.

> >
> >>
> >>> +
> >>>       return rte_memzone_reserve_aligned(z_name, size, socket_id,
> >>>               RTE_MEMZONE_IOVA_CONTIG, align);
> >>>   }
> >>
> >>
> >
> >
> 
> 
> --
> Thanks,
> Anatoly
  
Jerin Jacob Kollanukkaran April 2, 2019, 12:47 a.m. UTC | #5
On Mon, 2019-04-01 at 10:30 +0300, Andrew Rybchenko wrote:
> External Email
> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > 
> > The `rte_eth_dma_zone_reserve()` is generally used to create HW
> > rings.
> > In some scenarios when a driver needs to reconfigure the ring size
> > since the named memzone already exists it returns the previous
> > memzone
> > without checking if a different sized ring is requested.
> > 
> > Introduce a check to see if the ring size requested is different
> > from the
> > previously created memzone length.
> > 
> > Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c
> > index 12b66b68c..4ae12e43b 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct
> > rte_eth_dev *dev, const char *ring_name,
> >  	}
> >  
> >  	mz = rte_memzone_lookup(z_name);
> > -	if (mz)
> > +	if (mz && (mz->len == size))
> >  		return mz;
> >  
> > +	if (mz)
> > +		rte_memzone_free(mz);
>  
> NACK
> I really don't like that API which should reserve does free if
> requested
> size does not match previously allocated.

Why? Is due to API name? If so,
Can we have rte_eth_dma_zone_reservere_with_resize() then ?
or any another name, You would like to have?

> I understand the motivation, but I don't think the solution is
> correct.

What you think it has correct solution then?
Obviously, We can not allocate max ring size in init time. 
If the NIC has support for 64K HW ring, We will be wasting too much as
it is per queue.


> 
> > +
> >  	return rte_memzone_reserve_aligned(z_name, size, socket_id,
> >  			RTE_MEMZONE_IOVA_CONTIG, align);
> >  }
>
  
Andrew Rybchenko April 2, 2019, 7:36 a.m. UTC | #6
On 4/2/19 3:47 AM, Jerin Jacob Kollanukkaran wrote:
> On Mon, 2019-04-01 at 10:30 +0300, Andrew Rybchenko wrote:
>> External Email
>> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>
>>> The `rte_eth_dma_zone_reserve()` is generally used to create HW
>>> rings.
>>> In some scenarios when a driver needs to reconfigure the ring size
>>> since the named memzone already exists it returns the previous
>>> memzone
>>> without checking if a different sized ring is requested.
>>>
>>> Introduce a check to see if the ring size requested is different
>>> from the
>>> previously created memzone length.
>>>
>>> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>> ---
>>>   lib/librte_ethdev/rte_ethdev.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c
>>> b/lib/librte_ethdev/rte_ethdev.c
>>> index 12b66b68c..4ae12e43b 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct
>>> rte_eth_dev *dev, const char *ring_name,
>>>   	}
>>>   
>>>   	mz = rte_memzone_lookup(z_name);
>>> -	if (mz)
>>> +	if (mz && (mz->len == size))
>>>   		return mz;
>>>   
>>> +	if (mz)
>>> +		rte_memzone_free(mz);
>>   
>> NACK
>> I really don't like that API which should reserve does free if
>> requested
>> size does not match previously allocated.
> Why? Is due to API name?

1. The problem really exists. The problem is bad and it very good that you
     caught it and came up with a patch. Many thanks.
2. Silently free and reallocate memory is bad. Memory could be 
used/mapped etc.
3. As an absolute minimum if we accept the behaviour it must be documented
     in the function description.

>   If so,
> Can we have rte_eth_dma_zone_reservere_with_resize() then ?
> or any another name, You would like to have?

4. I'd prefer an error if different size (or bigger) memzone is requested,
     but I understand that it can break existing drivers.

Thomas, Ferruh, what do you think?

>> I understand the motivation, but I don't think the solution is
>> correct.
> What you think it has correct solution then?

See above plus handling in drivers or dedicated function with
better name as you suggest above.

> Obviously, We can not allocate max ring size in init time.
> If the NIC has support for 64K HW ring, We will be wasting too much as
> it is per queue.

Yes, I agree that it is an overkill.

net/sfc tries to carefully free/reserve on NIC/queues reconfigure.

Many thanks,
Andrew.
  
Jerin Jacob Kollanukkaran April 2, 2019, 8:25 a.m. UTC | #7
On Tue, 2019-04-02 at 10:36 +0300, Andrew Rybchenko wrote:
> On 4/2/19 3:47 AM, Jerin Jacob Kollanukkaran wrote:
> > On Mon, 2019-04-01 at 10:30 +0300, Andrew Rybchenko wrote:
> > > External Email
> > > On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
> > > > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > > > 
> > > > The `rte_eth_dma_zone_reserve()` is generally used to create HW
> > > > rings.
> > > > In some scenarios when a driver needs to reconfigure the ring
> > > > size
> > > > since the named memzone already exists it returns the previous
> > > > memzone
> > > > without checking if a different sized ring is requested.
> > > > 
> > > > Introduce a check to see if the ring size requested is
> > > > different
> > > > from the
> > > > previously created memzone length.
> > > > 
> > > > Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
> > > > Cc: stable@dpdk.org
> > > > 
> > > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > > > ---
> > > >  lib/librte_ethdev/rte_ethdev.c | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > > b/lib/librte_ethdev/rte_ethdev.c
> > > > index 12b66b68c..4ae12e43b 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > > @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct
> > > > rte_eth_dev *dev, const char *ring_name,
> > > >  	}
> > > >  
> > > >  	mz = rte_memzone_lookup(z_name);
> > > > -	if (mz)
> > > > +	if (mz && (mz->len == size))
> > > >  		return mz;
> > > >  
> > > > +	if (mz)
> > > > +		rte_memzone_free(mz);
> > > 
> > >  
> > > NACK
> > > I really don't like that API which should reserve does free if
> > > requested
> > > size does not match previously allocated.
> > 
> > Why? Is due to API name?
>  
> 1. The problem really exists. The problem is bad and it very good
> that you
>     caught it and came up with a patch. Many thanks.
> 2. Silently free and reallocate memory is bad. Memory could be
> used/mapped etc.

If I understand it correctly, Its been used while configuring 
the device and it is per queue, If so, Is there any case where 
memory in use in parallel in real world case with DPDK?

> 3. As an absolute minimum if we accept the behaviour it must be
> documented 
>     in the function description. 
> 
> >  If so,
> > Can we have rte_eth_dma_zone_reservere_with_resize() then ?
> > or any another name, You would like to have?
>  
> 4. I'd prefer an error if different size (or bigger) memzone is
> requested,
>     but I understand that it can break existing drivers.
> 
> Thomas, Ferruh, what do you think?
> 
> > > I understand the motivation, but I don't think the solution is
> > > correct.
> > 
> > What you think it has correct solution then?
>  
> See above plus handling in drivers or dedicated function with
> better name as you suggest above.

Handling in driver means return error?

Regarding API, Yes, We can add new API. What we will do that exiting
driver. Is up to driver maintainers to use the new API. I am fine with
either approach, Just asking the opinion.

> 
> > Obviously, We can not allocate max ring size in init time. 
> > If the NIC has support for 64K HW ring, We will be wasting too much
> > as
> > it is per queue.
>  
> Yes, I agree that it is an overkill.
> 
> net/sfc tries to carefully free/reserve on NIC/queues reconfigure.
> 
> Many thanks,
> Andrew.
  
Andrew Rybchenko April 2, 2019, 8:44 a.m. UTC | #8
On 4/2/19 11:25 AM, Jerin Jacob Kollanukkaran wrote:
> On Tue, 2019-04-02 at 10:36 +0300, Andrew Rybchenko wrote:
>> On 4/2/19 3:47 AM, Jerin Jacob Kollanukkaran wrote:
>>> On Mon, 2019-04-01 at 10:30 +0300, Andrew Rybchenko wrote:
>>>> External Email
>>>> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
>>>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>>>
>>>>> The `rte_eth_dma_zone_reserve()` is generally used to create HW
>>>>> rings.
>>>>> In some scenarios when a driver needs to reconfigure the ring
>>>>> size
>>>>> since the named memzone already exists it returns the previous
>>>>> memzone
>>>>> without checking if a different sized ring is requested.
>>>>>
>>>>> Introduce a check to see if the ring size requested is
>>>>> different
>>>>> from the
>>>>> previously created memzone length.
>>>>>
>>>>> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>>> ---
>>>>>   lib/librte_ethdev/rte_ethdev.c | 5 ++++-
>>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/librte_ethdev/rte_ethdev.c
>>>>> b/lib/librte_ethdev/rte_ethdev.c
>>>>> index 12b66b68c..4ae12e43b 100644
>>>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>>>> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct
>>>>> rte_eth_dev *dev, const char *ring_name,
>>>>>   	}
>>>>>   
>>>>>   	mz = rte_memzone_lookup(z_name);
>>>>> -	if (mz)
>>>>> +	if (mz && (mz->len == size))
>>>>>   		return mz;
>>>>>   
>>>>> +	if (mz)
>>>>> +		rte_memzone_free(mz);
>>>>   
>>>> NACK
>>>> I really don't like that API which should reserve does free if
>>>> requested
>>>> size does not match previously allocated.
>>> Why? Is due to API name?
>>   
>> 1. The problem really exists. The problem is bad and it very good
>> that you
>>      caught it and came up with a patch. Many thanks.
>> 2. Silently free and reallocate memory is bad. Memory could be
>> used/mapped etc.
> If I understand it correctly, Its been used while configuring
> the device and it is per queue, If so, Is there any case where
> memory in use in parallel in real world case with DPDK?

"in real world case with DPDK" is very fragile justification.
I simply don't want to dig in this way since it is very easy to make
a mistake or simply false assumption.

>> 3. As an absolute minimum if we accept the behaviour it must be
>> documented
>>      in the function description.
>>
>>>   If so,
>>> Can we have rte_eth_dma_zone_reservere_with_resize() then ?
>>> or any another name, You would like to have?
>>   
>> 4. I'd prefer an error if different size (or bigger) memzone is
>> requested,
>>      but I understand that it can break existing drivers.
>>
>> Thomas, Ferruh, what do you think?
>>
>>>> I understand the motivation, but I don't think the solution is
>>>> correct.
>>> What you think it has correct solution then?
>>   
>> See above plus handling in drivers or dedicated function with
>> better name as you suggest above.
> Handling in driver means return error?

Yes.

> Regarding API, Yes, We can add new API. What we will do that exiting
> driver. Is up to driver maintainers to use the new API. I am fine with
> either approach, Just asking the opinion.

You have mine, but I'd like to know what other ethdev maintainers
think about it.

>>> Obviously, We can not allocate max ring size in init time.
>>> If the NIC has support for 64K HW ring, We will be wasting too much
>>> as
>>> it is per queue.
>>   
>> Yes, I agree that it is an overkill.
>>
>> net/sfc tries to carefully free/reserve on NIC/queues reconfigure.
>>
>> Many thanks,
>> Andrew.
  
Thomas Monjalon April 4, 2019, 10:23 p.m. UTC | #9
Hi,

02/04/2019 10:44, Andrew Rybchenko:
> On 4/2/19 11:25 AM, Jerin Jacob Kollanukkaran wrote:
> > On Tue, 2019-04-02 at 10:36 +0300, Andrew Rybchenko wrote:
> >> On 4/2/19 3:47 AM, Jerin Jacob Kollanukkaran wrote:
> >>> On Mon, 2019-04-01 at 10:30 +0300, Andrew Rybchenko wrote:
> >>>> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
> >>>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >>>>>
> >>>>> The `rte_eth_dma_zone_reserve()` is generally used to create HW
> >>>>> rings.
> >>>>> In some scenarios when a driver needs to reconfigure the ring
> >>>>> size
> >>>>> since the named memzone already exists it returns the previous
> >>>>> memzone
> >>>>> without checking if a different sized ring is requested.
> >>>>>
> >>>>> Introduce a check to see if the ring size requested is
> >>>>> different from the previously created memzone length.
> >>>>>
> >>>>> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
> >>>>> Cc: stable@dpdk.org
> >>>>>
> >>>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
[...]
> >>>>> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct
> >>>>>   	mz = rte_memzone_lookup(z_name);
> >>>>> -	if (mz)
> >>>>> +	if (mz && (mz->len == size))
> >>>>>   		return mz;
> >>>>>   
> >>>>> +	if (mz)
> >>>>> +		rte_memzone_free(mz);
> >>>>   
> >>>> NACK
> >>>> I really don't like that API which should reserve does free if
> >>>> requested
> >>>> size does not match previously allocated.
> >>> Why? Is due to API name?
> >>   
> >> 1. The problem really exists. The problem is bad and it very good
> >> that you
> >>      caught it and came up with a patch. Many thanks.

I don't agree that the problem exists.
You are just trying to use a function for a goal which is
documented as not supported.

> >> 2. Silently free and reallocate memory is bad. Memory could be
> >> used/mapped etc.
> > If I understand it correctly, Its been used while configuring
> > the device and it is per queue, If so, Is there any case where
> > memory in use in parallel in real world case with DPDK?
> 
> "in real world case with DPDK" is very fragile justification.
> I simply don't want to dig in this way since it is very easy to make
> a mistake or simply false assumption.

I agree.
A function, with "reserve" in the name, should not do any "free".

> >> 3. As an absolute minimum if we accept the behaviour it must be
> >> documented
> >>      in the function description.
> >>
> >>>   If so,
> >>> Can we have rte_eth_dma_zone_reservere_with_resize() then ?
> >>> or any another name, You would like to have?
> >>   
> >> 4. I'd prefer an error if different size (or bigger) memzone is
> >> requested,
> >>      but I understand that it can break existing drivers.

Yes some drivers may rely on the current behaviour.
But if you carefully check every drivers, you can change
this behaviour and return an error.

> >> Thomas, Ferruh, what do you think?
> >>
> >>>> I understand the motivation, but I don't think the solution is
> >>>> correct.
> >>> What you think it has correct solution then?
> >>   
> >> See above plus handling in drivers or dedicated function with
> >> better name as you suggest above.
> > Handling in driver means return error?
> 
> Yes.
> 
> > Regarding API, Yes, We can add new API. What we will do that exiting
> > driver. Is up to driver maintainers to use the new API. I am fine with
> > either approach, Just asking the opinion.
> 
> You have mine, but I'd like to know what other ethdev maintainers
> think about it.

In such case, I refer to the existing documentation.
For rte_eth_dma_zone_reserve, it says:
"
  If the memzone is already created, then this function returns a ptr
  to the old one.
"

> >>> Obviously, We can not allocate max ring size in init time.
> >>> If the NIC has support for 64K HW ring, We will be wasting too much
> >>> as it is per queue.
> >>   
> >> Yes, I agree that it is an overkill.
> >>
> >> net/sfc tries to carefully free/reserve on NIC/queues reconfigure.

Yes, using rte_memzone_free looks saner.
Is there an API missing?
A function to check the size of the memzone? Is rte_memzone.len enough?
  
Andrew Rybchenko April 5, 2019, 8:03 a.m. UTC | #10
On 4/5/19 1:23 AM, Thomas Monjalon wrote:
> Hi,
>
> 02/04/2019 10:44, Andrew Rybchenko:
>> On 4/2/19 11:25 AM, Jerin Jacob Kollanukkaran wrote:
>>> On Tue, 2019-04-02 at 10:36 +0300, Andrew Rybchenko wrote:
>>>> On 4/2/19 3:47 AM, Jerin Jacob Kollanukkaran wrote:
>>>>> On Mon, 2019-04-01 at 10:30 +0300, Andrew Rybchenko wrote:
>>>>>> On 3/31/19 7:25 PM, Pavan Nikhilesh Bhagavatula wrote:
>>>>>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>>>>>
>>>>>>> The `rte_eth_dma_zone_reserve()` is generally used to create HW
>>>>>>> rings.
>>>>>>> In some scenarios when a driver needs to reconfigure the ring
>>>>>>> size
>>>>>>> since the named memzone already exists it returns the previous
>>>>>>> memzone
>>>>>>> without checking if a different sized ring is requested.
>>>>>>>
>>>>>>> Introduce a check to see if the ring size requested is
>>>>>>> different from the previously created memzone length.
>>>>>>>
>>>>>>> Fixes: 719dbebceb81 ("xen: allow determining DOM0 at runtime")
>>>>>>> Cc: stable@dpdk.org
>>>>>>>
>>>>>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> [...]
>>>>>>> @@ -3604,9 +3604,12 @@ rte_eth_dma_zone_reserve(const struct
>>>>>>>    	mz = rte_memzone_lookup(z_name);
>>>>>>> -	if (mz)
>>>>>>> +	if (mz && (mz->len == size))
>>>>>>>    		return mz;
>>>>>>>    
>>>>>>> +	if (mz)
>>>>>>> +		rte_memzone_free(mz);
>>>>>>    
>>>>>> NACK
>>>>>> I really don't like that API which should reserve does free if
>>>>>> requested
>>>>>> size does not match previously allocated.
>>>>> Why? Is due to API name?
>>>>    
>>>> 1. The problem really exists. The problem is bad and it very good
>>>> that you
>>>>       caught it and came up with a patch. Many thanks.
> I don't agree that the problem exists.
> You are just trying to use a function for a goal which is
> documented as not supported.

The documentation says nothing about size, alignment and different socket.
It is good that the behaviour is documented, but I can't say that it is 
friendly.
Friendly behaviour would guarantee size, alignment and socket_id properties
preserved. Otherwise, it is too error-prone.

>>>> 2. Silently free and reallocate memory is bad. Memory could be
>>>> used/mapped etc.
>>> If I understand it correctly, Its been used while configuring
>>> the device and it is per queue, If so, Is there any case where
>>> memory in use in parallel in real world case with DPDK?
>> "in real world case with DPDK" is very fragile justification.
>> I simply don't want to dig in this way since it is very easy to make
>> a mistake or simply false assumption.
> I agree.
> A function, with "reserve" in the name, should not do any "free".
>
>>>> 3. As an absolute minimum if we accept the behaviour it must be
>>>> documented
>>>>       in the function description.
>>>>
>>>>>    If so,
>>>>> Can we have rte_eth_dma_zone_reservere_with_resize() then ?
>>>>> or any another name, You would like to have?
>>>>    
>>>> 4. I'd prefer an error if different size (or bigger) memzone is
>>>> requested,
>>>>       but I understand that it can break existing drivers.
> Yes some drivers may rely on the current behaviour.
> But if you carefully check every drivers, you can change
> this behaviour and return an error.
>
>>>> Thomas, Ferruh, what do you think?
>>>>
>>>>>> I understand the motivation, but I don't think the solution is
>>>>>> correct.
>>>>> What you think it has correct solution then?
>>>>    
>>>> See above plus handling in drivers or dedicated function with
>>>> better name as you suggest above.
>>> Handling in driver means return error?
>> Yes.
>>
>>> Regarding API, Yes, We can add new API. What we will do that exiting
>>> driver. Is up to driver maintainers to use the new API. I am fine with
>>> either approach, Just asking the opinion.
>> You have mine, but I'd like to know what other ethdev maintainers
>> think about it.
> In such case, I refer to the existing documentation.
> For rte_eth_dma_zone_reserve, it says:
> "
>    If the memzone is already created, then this function returns a ptr
>    to the old one.
> "

Now I'm more confident that an error should be returned if memzone
already exists but its properties do not match requested.

>>>>> Obviously, We can not allocate max ring size in init time.
>>>>> If the NIC has support for 64K HW ring, We will be wasting too much
>>>>> as it is per queue.
>>>>    
>>>> Yes, I agree that it is an overkill.
>>>>
>>>> net/sfc tries to carefully free/reserve on NIC/queues reconfigure.
> Yes, using rte_memzone_free looks saner.
> Is there an API missing?
> A function to check the size of the memzone? Is rte_memzone.len enough?
>
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 12b66b68c..4ae12e43b 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -3604,9 +3604,12 @@  rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
 	}
 
 	mz = rte_memzone_lookup(z_name);
-	if (mz)
+	if (mz && (mz->len == size))
 		return mz;
 
+	if (mz)
+		rte_memzone_free(mz);
+
 	return rte_memzone_reserve_aligned(z_name, size, socket_id,
 			RTE_MEMZONE_IOVA_CONTIG, align);
 }