[v11,2/4] eal: add legacy kni option

Message ID 20191021080324.10659-3-vattunuru@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series kni: add IOVA=VA mode support |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Vamsi Krishna Attunuru Oct. 21, 2019, 8:03 a.m. UTC
  From: Vamsi Attunuru <vattunuru@marvell.com>

This adds a "--legacy-kni" command-line option. It will
be used to run existing KNI applications with DPDK 19.11
and later.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/guides/rel_notes/release_19_11.rst     | 4 ++++
 lib/librte_eal/common/eal_common_options.c | 5 +++++
 lib/librte_eal/common/eal_internal_cfg.h   | 2 ++
 lib/librte_eal/common/eal_options.h        | 2 ++
 4 files changed, 13 insertions(+)
  

Comments

Ferruh Yigit Oct. 21, 2019, 11:55 a.m. UTC | #1
On 10/21/2019 9:03 AM, vattunuru@marvell.com wrote:
> From: Vamsi Attunuru <vattunuru@marvell.com>
> 
> This adds a "--legacy-kni" command-line option. It will
> be used to run existing KNI applications with DPDK 19.11
> and later.
> 
> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>

<...>

> diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
> index 9855429..1010ed3 100644
> --- a/lib/librte_eal/common/eal_options.h
> +++ b/lib/librte_eal/common/eal_options.h
> @@ -69,6 +69,8 @@ enum {
>  	OPT_IOVA_MODE_NUM,
>  #define OPT_MATCH_ALLOCATIONS  "match-allocations"
>  	OPT_MATCH_ALLOCATIONS_NUM,
> +#define OPT_LEGACY_KNI      "legacy-kni"
> +	OPT_LEGACY_KNI_NUM,
>  	OPT_LONG_MAX_NUM
>  };

Two concerns,

1- "legacy-kni" doesn't have enough context

2- I prefer to keep existing behavior default, at least for a while, something
like next LTS etc, meanwhile this patch can be around for a good time and can be
good to switch.

Based on above to, what do you think to rename the option to 'kni-iova-va', if
not set by default it will be "IOVA=PA", when set it will enable "IOVA=VA" mode?
  
Vamsi Krishna Attunuru Oct. 21, 2019, 1:13 p.m. UTC | #2
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Monday, October 21, 2019 5:25 PM
> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> olivier.matz@6wind.com; anatoly.burakov@intel.com;
> arybchenko@solarflare.com; stephen@networkplumber.org
> Subject: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni option
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 10/21/2019 9:03 AM, vattunuru@marvell.com wrote:
> > From: Vamsi Attunuru <vattunuru@marvell.com>
> >
> > This adds a "--legacy-kni" command-line option. It will be used to run
> > existing KNI applications with DPDK 19.11 and later.
> >
> > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> > Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> <...>
> 
> > diff --git a/lib/librte_eal/common/eal_options.h
> > b/lib/librte_eal/common/eal_options.h
> > index 9855429..1010ed3 100644
> > --- a/lib/librte_eal/common/eal_options.h
> > +++ b/lib/librte_eal/common/eal_options.h
> > @@ -69,6 +69,8 @@ enum {
> >  	OPT_IOVA_MODE_NUM,
> >  #define OPT_MATCH_ALLOCATIONS  "match-allocations"
> >  	OPT_MATCH_ALLOCATIONS_NUM,
> > +#define OPT_LEGACY_KNI      "legacy-kni"
> > +	OPT_LEGACY_KNI_NUM,
> >  	OPT_LONG_MAX_NUM
> >  };
> 
> Two concerns,
> 
> 1- "legacy-kni" doesn't have enough context
> 
> 2- I prefer to keep existing behavior default, at least for a while, something
> like next LTS etc, meanwhile this patch can be around for a good time and
> can be good to switch.
> 
> Based on above to, what do you think to rename the option to 'kni-iova-va',
> if not set by default it will be "IOVA=PA", when set it will enable "IOVA=VA"
> mode?

Hi Ferruh,

I think the new eal flag(legacy-kni) is quite intuitive. Since release notes will be having the required details about it's purpose and how it enables users to use existing applications on latest dpdk.

Current EAL does set iova as va if bus iommu returns DC, meaning iova=va is the kind of default mode(in most of use cases) and for running kni, we have to explicitly set the flag to run kni in iova=va mode all the time. I think having a flag for legacy usage(PA mode) is more appropriate than having kni-iova-va kind of flag.

Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va mode, we would like to have KNI running by default without any flags passed.
  
Ferruh Yigit Oct. 21, 2019, 1:32 p.m. UTC | #3
On 10/21/2019 2:13 PM, Vamsi Krishna Attunuru wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Monday, October 21, 2019 5:25 PM
>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
>> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
>> Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
>> olivier.matz@6wind.com; anatoly.burakov@intel.com;
>> arybchenko@solarflare.com; stephen@networkplumber.org
>> Subject: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni option
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 10/21/2019 9:03 AM, vattunuru@marvell.com wrote:
>>> From: Vamsi Attunuru <vattunuru@marvell.com>
>>>
>>> This adds a "--legacy-kni" command-line option. It will be used to run
>>> existing KNI applications with DPDK 19.11 and later.
>>>
>>> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
>>> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>
>> <...>
>>
>>> diff --git a/lib/librte_eal/common/eal_options.h
>>> b/lib/librte_eal/common/eal_options.h
>>> index 9855429..1010ed3 100644
>>> --- a/lib/librte_eal/common/eal_options.h
>>> +++ b/lib/librte_eal/common/eal_options.h
>>> @@ -69,6 +69,8 @@ enum {
>>>  	OPT_IOVA_MODE_NUM,
>>>  #define OPT_MATCH_ALLOCATIONS  "match-allocations"
>>>  	OPT_MATCH_ALLOCATIONS_NUM,
>>> +#define OPT_LEGACY_KNI      "legacy-kni"
>>> +	OPT_LEGACY_KNI_NUM,
>>>  	OPT_LONG_MAX_NUM
>>>  };
>>
>> Two concerns,
>>
>> 1- "legacy-kni" doesn't have enough context
>>
>> 2- I prefer to keep existing behavior default, at least for a while, something
>> like next LTS etc, meanwhile this patch can be around for a good time and
>> can be good to switch.
>>
>> Based on above to, what do you think to rename the option to 'kni-iova-va',
>> if not set by default it will be "IOVA=PA", when set it will enable "IOVA=VA"
>> mode?
> 
> Hi Ferruh,
> 
> I think the new eal flag(legacy-kni) is quite intuitive. Since release notes will be having the required details about it's purpose and how it enables users to use existing applications on latest dpdk.

what exactly 'legacy' means, what has been changed, is the old one completely
replaced/re-written ????, but whoever not following what is happening won't
underase tand what is old in the KNI, digging through release notes and commits
will give this information but it will be hard to get it from binary and get
harder by a few releases passed.

> 
> Current EAL does set iova as va if bus iommu returns DC, meaning iova=va is the kind of default mode(in most of use cases) and for running kni, we have to explicitly set the flag to run kni in iova=va mode all the time. I think having a flag for legacy usage(PA mode) is more appropriate than having kni-iova-va kind of flag.

It is about keeping the existing behavior same, right now if the kni module is
inserted it will force the PA mode. With your update it will be possible to run
iova=va with kni module inserted when flag is set. I suggest giving some time to
this new behavior before making it default.

> 
> Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va mode, we would like to have KNI running by default without any flags passed.   
> 

I see, but other way around will affect all existing KNI users, they will either
need to add this flag or update their application.

This is new feature, who want to use it adding a specific flag makes more sense
to me than all old users have to add the flag.
  
Vamsi Krishna Attunuru Oct. 21, 2019, 2:38 p.m. UTC | #4
Hi Olivier, Andrew,

> >>> +#define OPT_LEGACY_KNI      "legacy-kni"
> >>> +	OPT_LEGACY_KNI_NUM,
> >>>  	OPT_LONG_MAX_NUM
> >>>  };
> >>
> >> Two concerns,
> >>
> >> 1- "legacy-kni" doesn't have enough context
> >>
> >> 2- I prefer to keep existing behavior default, at least for a while,
> >> something like next LTS etc, meanwhile this patch can be around for a
> >> good time and can be good to switch.
> >>
> >> Based on above to, what do you think to rename the option to
> >> 'kni-iova-va', if not set by default it will be "IOVA=PA", when set it will
> enable "IOVA=VA"
> >> mode?
> >
> > Hi Ferruh,
> >
> > I think the new eal flag(legacy-kni) is quite intuitive. Since release notes will
> be having the required details about it's purpose and how it enables users to
> use existing applications on latest dpdk.
> 
> what exactly 'legacy' means, what has been changed, is the old one
> completely replaced/re-written ????, but whoever not following what is
> happening won't underase tand what is old in the KNI, digging through
> release notes and commits will give this information but it will be hard to get
> it from binary and get harder by a few releases passed.
> 
> >
> > Current EAL does set iova as va if bus iommu returns DC, meaning iova=va
> is the kind of default mode(in most of use cases) and for running kni, we
> have to explicitly set the flag to run kni in iova=va mode all the time. I think
> having a flag for legacy usage(PA mode) is more appropriate than having kni-
> iova-va kind of flag.
> 
> It is about keeping the existing behavior same, right now if the kni module is
> inserted it will force the PA mode. With your update it will be possible to run
> iova=va with kni module inserted when flag is set. I suggest giving some time
> to this new behavior before making it default.
> 
> >
> > Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va
> mode, we would like to have KNI running by default without any flags
> passed.
> >
> 
> I see, but other way around will affect all existing KNI users, they will either
> need to add this flag or update their application.
> 
> This is new feature, who want to use it adding a specific flag makes more
> sense to me than all old users have to add the flag.


Ferruh suggested to have a flag for enabling these new feature and also not interested in having  newer mempool alloc APIs for KNI(see V10 review comments). Before reworking on the flag changes, I would like check with you whether the same flag can be used in mempool lib for checking and fulfilling the mempool  page boundary requirement (mempool patch v11 1/4), by doing so, it can avoid newer exported APIs both in mempool and KNI lib. Anyways, these mempool requirement can be addressed with Olivier's below patches.

http://patchwork.dpdk.org/project/dpdk/list/?series=5624

When those patches are merged,  flag check can be removed.

Regards
A Vamsi 






> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Monday, October 21, 2019 7:02 PM
> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> olivier.matz@6wind.com; anatoly.burakov@intel.com;
> arybchenko@solarflare.com; stephen@networkplumber.org
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni option
> 
> On 10/21/2019 2:13 PM, Vamsi Krishna Attunuru wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Monday, October 21, 2019 5:25 PM
> >> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> >> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran
> >> <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> >> <kirankumark@marvell.com>; olivier.matz@6wind.com;
> >> anatoly.burakov@intel.com; arybchenko@solarflare.com;
> >> stephen@networkplumber.org
> >> Subject: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni
> >> option
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> - On 10/21/2019 9:03 AM, vattunuru@marvell.com wrote:
> >>> From: Vamsi Attunuru <vattunuru@marvell.com>
> >>>
> >>> This adds a "--legacy-kni" command-line option. It will be used to
> >>> run existing KNI applications with DPDK 19.11 and later.
> >>>
> >>> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> >>> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >>
> >> <...>
> >>
> >>> diff --git a/lib/librte_eal/common/eal_options.h
> >>> b/lib/librte_eal/common/eal_options.h
> >>> index 9855429..1010ed3 100644
> >>> --- a/lib/librte_eal/common/eal_options.h
> >>> +++ b/lib/librte_eal/common/eal_options.h
> >>> @@ -69,6 +69,8 @@ enum {
> >>>  	OPT_IOVA_MODE_NUM,
> >>>  #define OPT_MATCH_ALLOCATIONS  "match-allocations"
> >>>  	OPT_MATCH_ALLOCATIONS_NUM,
> >>> +#define OPT_LEGACY_KNI      "legacy-kni"
> >>> +	OPT_LEGACY_KNI_NUM,
> >>>  	OPT_LONG_MAX_NUM
> >>>  };
> >>
> >> Two concerns,
> >>
> >> 1- "legacy-kni" doesn't have enough context
> >>
> >> 2- I prefer to keep existing behavior default, at least for a while,
> >> something like next LTS etc, meanwhile this patch can be around for a
> >> good time and can be good to switch.
> >>
> >> Based on above to, what do you think to rename the option to
> >> 'kni-iova-va', if not set by default it will be "IOVA=PA", when set it will
> enable "IOVA=VA"
> >> mode?
> >
> > Hi Ferruh,
> >
> > I think the new eal flag(legacy-kni) is quite intuitive. Since release notes will
> be having the required details about it's purpose and how it enables users to
> use existing applications on latest dpdk.
> 
> what exactly 'legacy' means, what has been changed, is the old one
> completely replaced/re-written ????, but whoever not following what is
> happening won't underase tand what is old in the KNI, digging through
> release notes and commits will give this information but it will be hard to get
> it from binary and get harder by a few releases passed.
> 
> >
> > Current EAL does set iova as va if bus iommu returns DC, meaning iova=va
> is the kind of default mode(in most of use cases) and for running kni, we
> have to explicitly set the flag to run kni in iova=va mode all the time. I think
> having a flag for legacy usage(PA mode) is more appropriate than having kni-
> iova-va kind of flag.
> 
> It is about keeping the existing behavior same, right now if the kni module is
> inserted it will force the PA mode. With your update it will be possible to run
> iova=va with kni module inserted when flag is set. I suggest giving some time
> to this new behavior before making it default.
> 
> >
> > Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va
> mode, we would like to have KNI running by default without any flags
> passed.
> >
> 
> I see, but other way around will affect all existing KNI users, they will either
> need to add this flag or update their application.
> 
> This is new feature, who want to use it adding a specific flag makes more
> sense to me than all old users have to add the flag.
  
Vamsi Krishna Attunuru Oct. 22, 2019, 9:29 a.m. UTC | #5
Hi Olivier, Andrew,

Please share your thoughts/comments on below email.

> -----Original Message-----
> From: Vamsi Krishna Attunuru
> Sent: Monday, October 21, 2019 8:08 PM
> To: olivier.matz@6wind.com; Andrew Rybchenko
> <arybchenko@solarflare.com>
> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> olivier.matz@6wind.com; anatoly.burakov@intel.com;
> arybchenko@solarflare.com; stephen@networkplumber.org; Ferruh Yigit
> <ferruh.yigit@intel.com>; dev@dpdk.org
> Subject: RE: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni option
> 
> Hi Olivier, Andrew,
> 
> > >>> +#define OPT_LEGACY_KNI      "legacy-kni"
> > >>> +	OPT_LEGACY_KNI_NUM,
> > >>>  	OPT_LONG_MAX_NUM
> > >>>  };
> > >>
> > >> Two concerns,
> > >>
> > >> 1- "legacy-kni" doesn't have enough context
> > >>
> > >> 2- I prefer to keep existing behavior default, at least for a
> > >> while, something like next LTS etc, meanwhile this patch can be
> > >> around for a good time and can be good to switch.
> > >>
> > >> Based on above to, what do you think to rename the option to
> > >> 'kni-iova-va', if not set by default it will be "IOVA=PA", when set
> > >> it will
> > enable "IOVA=VA"
> > >> mode?
> > >
> > > Hi Ferruh,
> > >
> > > I think the new eal flag(legacy-kni) is quite intuitive. Since
> > > release notes will
> > be having the required details about it's purpose and how it enables
> > users to use existing applications on latest dpdk.
> >
> > what exactly 'legacy' means, what has been changed, is the old one
> > completely replaced/re-written ????, but whoever not following what is
> > happening won't underase tand what is old in the KNI, digging through
> > release notes and commits will give this information but it will be
> > hard to get it from binary and get harder by a few releases passed.
> >
> > >
> > > Current EAL does set iova as va if bus iommu returns DC, meaning
> > > iova=va
> > is the kind of default mode(in most of use cases) and for running kni,
> > we have to explicitly set the flag to run kni in iova=va mode all the
> > time. I think having a flag for legacy usage(PA mode) is more
> > appropriate than having kni- iova-va kind of flag.
> >
> > It is about keeping the existing behavior same, right now if the kni
> > module is inserted it will force the PA mode. With your update it will
> > be possible to run iova=va with kni module inserted when flag is set.
> > I suggest giving some time to this new behavior before making it default.
> >
> > >
> > > Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va
> > mode, we would like to have KNI running by default without any flags
> > passed.
> > >
> >
> > I see, but other way around will affect all existing KNI users, they
> > will either need to add this flag or update their application.
> >
> > This is new feature, who want to use it adding a specific flag makes
> > more sense to me than all old users have to add the flag.
> 
> 
> Ferruh suggested to have a flag for enabling these new feature and also not
> interested in having  newer mempool alloc APIs for KNI(see V10 review
> comments). Before reworking on the flag changes, I would like check with you
> whether the same flag can be used in mempool lib for checking and fulfilling the
> mempool  page boundary requirement (mempool patch v11 1/4), by doing so, it
> can avoid newer exported APIs both in mempool and KNI lib. Anyways, these
> mempool requirement can be addressed with Olivier's below patches.
> 
> http://patchwork.dpdk.org/project/dpdk/list/?series=5624
> 
> When those patches are merged,  flag check can be removed.
> 
> Regards
> A Vamsi
> 
> 
> 
> 
> 
> 
> > -----Original Message-----
> > From: Ferruh Yigit <ferruh.yigit@intel.com>
> > Sent: Monday, October 21, 2019 7:02 PM
> > To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> > Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran
> > <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> > <kirankumark@marvell.com>; olivier.matz@6wind.com;
> > anatoly.burakov@intel.com; arybchenko@solarflare.com;
> > stephen@networkplumber.org
> > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni
> > option
> >
> > On 10/21/2019 2:13 PM, Vamsi Krishna Attunuru wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> > >> Sent: Monday, October 21, 2019 5:25 PM
> > >> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> > >> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran
> > >> <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
> > >> <kirankumark@marvell.com>; olivier.matz@6wind.com;
> > >> anatoly.burakov@intel.com; arybchenko@solarflare.com;
> > >> stephen@networkplumber.org
> > >> Subject: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni
> > >> option
> > >>
> > >> External Email
> > >>
> > >> -------------------------------------------------------------------
> > >> --
> > >> - On 10/21/2019 9:03 AM, vattunuru@marvell.com wrote:
> > >>> From: Vamsi Attunuru <vattunuru@marvell.com>
> > >>>
> > >>> This adds a "--legacy-kni" command-line option. It will be used to
> > >>> run existing KNI applications with DPDK 19.11 and later.
> > >>>
> > >>> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> > >>> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > >>
> > >> <...>
> > >>
> > >>> diff --git a/lib/librte_eal/common/eal_options.h
> > >>> b/lib/librte_eal/common/eal_options.h
> > >>> index 9855429..1010ed3 100644
> > >>> --- a/lib/librte_eal/common/eal_options.h
> > >>> +++ b/lib/librte_eal/common/eal_options.h
> > >>> @@ -69,6 +69,8 @@ enum {
> > >>>  	OPT_IOVA_MODE_NUM,
> > >>>  #define OPT_MATCH_ALLOCATIONS  "match-allocations"
> > >>>  	OPT_MATCH_ALLOCATIONS_NUM,
> > >>> +#define OPT_LEGACY_KNI      "legacy-kni"
> > >>> +	OPT_LEGACY_KNI_NUM,
> > >>>  	OPT_LONG_MAX_NUM
> > >>>  };
> > >>
> > >> Two concerns,
> > >>
> > >> 1- "legacy-kni" doesn't have enough context
> > >>
> > >> 2- I prefer to keep existing behavior default, at least for a
> > >> while, something like next LTS etc, meanwhile this patch can be
> > >> around for a good time and can be good to switch.
> > >>
> > >> Based on above to, what do you think to rename the option to
> > >> 'kni-iova-va', if not set by default it will be "IOVA=PA", when set
> > >> it will
> > enable "IOVA=VA"
> > >> mode?
> > >
> > > Hi Ferruh,
> > >
> > > I think the new eal flag(legacy-kni) is quite intuitive. Since
> > > release notes will
> > be having the required details about it's purpose and how it enables
> > users to use existing applications on latest dpdk.
> >
> > what exactly 'legacy' means, what has been changed, is the old one
> > completely replaced/re-written ????, but whoever not following what is
> > happening won't underase tand what is old in the KNI, digging through
> > release notes and commits will give this information but it will be
> > hard to get it from binary and get harder by a few releases passed.
> >
> > >
> > > Current EAL does set iova as va if bus iommu returns DC, meaning
> > > iova=va
> > is the kind of default mode(in most of use cases) and for running kni,
> > we have to explicitly set the flag to run kni in iova=va mode all the
> > time. I think having a flag for legacy usage(PA mode) is more
> > appropriate than having kni- iova-va kind of flag.
> >
> > It is about keeping the existing behavior same, right now if the kni
> > module is inserted it will force the PA mode. With your update it will
> > be possible to run iova=va with kni module inserted when flag is set.
> > I suggest giving some time to this new behavior before making it default.
> >
> > >
> > > Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va
> > mode, we would like to have KNI running by default without any flags
> > passed.
> > >
> >
> > I see, but other way around will affect all existing KNI users, they
> > will either need to add this flag or update their application.
> >
> > This is new feature, who want to use it adding a specific flag makes
> > more sense to me than all old users have to add the flag.
  
Andrew Rybchenko Oct. 22, 2019, 12:28 p.m. UTC | #6
Hi Vasmi,

On 10/21/19 5:38 PM, Vamsi Krishna Attunuru wrote:
> Hi Olivier, Andrew,
>
>>>>> +#define OPT_LEGACY_KNI      "legacy-kni"
>>>>> +	OPT_LEGACY_KNI_NUM,
>>>>>   	OPT_LONG_MAX_NUM
>>>>>   };
>>>> Two concerns,
>>>>
>>>> 1- "legacy-kni" doesn't have enough context
>>>>
>>>> 2- I prefer to keep existing behavior default, at least for a while,
>>>> something like next LTS etc, meanwhile this patch can be around for a
>>>> good time and can be good to switch.
>>>>
>>>> Based on above to, what do you think to rename the option to
>>>> 'kni-iova-va', if not set by default it will be "IOVA=PA", when set it will
>> enable "IOVA=VA"
>>>> mode?
>>> Hi Ferruh,
>>>
>>> I think the new eal flag(legacy-kni) is quite intuitive. Since release notes will
>> be having the required details about it's purpose and how it enables users to
>> use existing applications on latest dpdk.
>>
>> what exactly 'legacy' means, what has been changed, is the old one
>> completely replaced/re-written ????, but whoever not following what is
>> happening won't underase tand what is old in the KNI, digging through
>> release notes and commits will give this information but it will be hard to get
>> it from binary and get harder by a few releases passed.
>>
>>> Current EAL does set iova as va if bus iommu returns DC, meaning iova=va
>> is the kind of default mode(in most of use cases) and for running kni, we
>> have to explicitly set the flag to run kni in iova=va mode all the time. I think
>> having a flag for legacy usage(PA mode) is more appropriate than having kni-
>> iova-va kind of flag.
>>
>> It is about keeping the existing behavior same, right now if the kni module is
>> inserted it will force the PA mode. With your update it will be possible to run
>> iova=va with kni module inserted when flag is set. I suggest giving some time
>> to this new behavior before making it default.
>>
>>> Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va
>> mode, we would like to have KNI running by default without any flags
>> passed.
>> I see, but other way around will affect all existing KNI users, they will either
>> need to add this flag or update their application.
>>
>> This is new feature, who want to use it adding a specific flag makes more
>> sense to me than all old users have to add the flag.
>
> Ferruh suggested to have a flag for enabling these new feature and also not interested in having  newer mempool alloc APIs for KNI(see V10 review comments). Before reworking on the flag changes, I would like check with you whether the same flag can be used in mempool lib for checking and fulfilling the mempool  page boundary requirement (mempool patch v11 1/4), by doing so, it can avoid newer exported APIs both in mempool and KNI lib. Anyways, these mempool requirement can be addressed with Olivier's below patches.
>
> http://patchwork.dpdk.org/project/dpdk/list/?series=5624
>
> When those patches are merged,  flag check can be removed.

It is really hard to follow, sorry.
Is it really a problem to have KNI switcher to enable new
behaviour and document it that dedicated function
should be used to allocate mbufs?

Andrew.

> Regards
> A Vamsi
>
>
>
>
>
>
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Monday, October 21, 2019 7:02 PM
>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
>> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
>> Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
>> olivier.matz@6wind.com; anatoly.burakov@intel.com;
>> arybchenko@solarflare.com; stephen@networkplumber.org
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni option
>>
>> On 10/21/2019 2:13 PM, Vamsi Krishna Attunuru wrote:
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Sent: Monday, October 21, 2019 5:25 PM
>>>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
>>>> Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran
>>>> <jerinj@marvell.com>; Kiran Kumar Kokkilagadda
>>>> <kirankumark@marvell.com>; olivier.matz@6wind.com;
>>>> anatoly.burakov@intel.com; arybchenko@solarflare.com;
>>>> stephen@networkplumber.org
>>>> Subject: [EXT] Re: [dpdk-dev] [PATCH v11 2/4] eal: add legacy kni
>>>> option
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> - On 10/21/2019 9:03 AM, vattunuru@marvell.com wrote:
>>>>> From: Vamsi Attunuru <vattunuru@marvell.com>
>>>>>
>>>>> This adds a "--legacy-kni" command-line option. It will be used to
>>>>> run existing KNI applications with DPDK 19.11 and later.
>>>>>
>>>>> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
>>>>> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> <...>
>>>>
>>>>> diff --git a/lib/librte_eal/common/eal_options.h
>>>>> b/lib/librte_eal/common/eal_options.h
>>>>> index 9855429..1010ed3 100644
>>>>> --- a/lib/librte_eal/common/eal_options.h
>>>>> +++ b/lib/librte_eal/common/eal_options.h
>>>>> @@ -69,6 +69,8 @@ enum {
>>>>>   	OPT_IOVA_MODE_NUM,
>>>>>   #define OPT_MATCH_ALLOCATIONS  "match-allocations"
>>>>>   	OPT_MATCH_ALLOCATIONS_NUM,
>>>>> +#define OPT_LEGACY_KNI      "legacy-kni"
>>>>> +	OPT_LEGACY_KNI_NUM,
>>>>>   	OPT_LONG_MAX_NUM
>>>>>   };
>>>> Two concerns,
>>>>
>>>> 1- "legacy-kni" doesn't have enough context
>>>>
>>>> 2- I prefer to keep existing behavior default, at least for a while,
>>>> something like next LTS etc, meanwhile this patch can be around for a
>>>> good time and can be good to switch.
>>>>
>>>> Based on above to, what do you think to rename the option to
>>>> 'kni-iova-va', if not set by default it will be "IOVA=PA", when set it will
>> enable "IOVA=VA"
>>>> mode?
>>> Hi Ferruh,
>>>
>>> I think the new eal flag(legacy-kni) is quite intuitive. Since release notes will
>> be having the required details about it's purpose and how it enables users to
>> use existing applications on latest dpdk.
>>
>> what exactly 'legacy' means, what has been changed, is the old one
>> completely replaced/re-written ????, but whoever not following what is
>> happening won't underase tand what is old in the KNI, digging through
>> release notes and commits will give this information but it will be hard to get
>> it from binary and get harder by a few releases passed.
>>
>>> Current EAL does set iova as va if bus iommu returns DC, meaning iova=va
>> is the kind of default mode(in most of use cases) and for running kni, we
>> have to explicitly set the flag to run kni in iova=va mode all the time. I think
>> having a flag for legacy usage(PA mode) is more appropriate than having kni-
>> iova-va kind of flag.
>>
>> It is about keeping the existing behavior same, right now if the kni module is
>> inserted it will force the PA mode. With your update it will be possible to run
>> iova=va with kni module inserted when flag is set. I suggest giving some time
>> to this new behavior before making it default.
>>
>>> Otx2 net pmd that runs on Octeontx2 platforms only supports iova=va
>> mode, we would like to have KNI running by default without any flags
>> passed.
>> I see, but other way around will affect all existing KNI users, they will either
>> need to add this flag or update their application.
>>
>> This is new feature, who want to use it adding a specific flag makes more
>> sense to me than all old users have to add the flag.
  
Vamsi Krishna Attunuru Oct. 22, 2019, 1:31 p.m. UTC | #7
Hi Ferruh,

Can you please explain the problems in using kni dedicated mbuf alloc routines while enabling kni iova=va mode. Please see the below discussion with Andrew. He wanted to know the problems in having newer APIs.

I wanted to clarify ourselves about the need of mem pool patch(V11 1/4) and where to fit the fix(newer APIs or fix in mempool lib) before I start reworking the patches.

Regards
A Vamsi

> snipped
> >>
> >> This is new feature, who want to use it adding a specific flag makes
> >> more sense to me than all old users have to add the flag.
> >
> > Ferruh suggested to have a flag for enabling these new feature and also not
> interested in having  newer mempool alloc APIs for KNI(see V10 review
> comments). Before reworking on the flag changes, I would like check with you
> whether the same flag can be used in mempool lib for checking and fulfilling the
> mempool  page boundary requirement (mempool patch v11 1/4), by doing so, it
> can avoid newer exported APIs both in mempool and KNI lib. Anyways, these
> mempool requirement can be addressed with Olivier's below patches.
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.dpdk.org
> > _project_dpdk_list_-3Fseries-
> 3D5624&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
> > r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=tCO-
> 8E27vdyIVMm_35Qv6K
> > -OwGCIobWI0H9DGGBm-gc&s=9aLHS9Og5L0uhuLGFAuQ9NAT3-
> hlFmtc9RrrSbLQC00&e=
> >
> > When those patches are merged,  flag check can be removed.
> 
> It is really hard to follow, sorry.
> Is it really a problem to have KNI switcher to enable new behaviour and
> document it that dedicated function should be used to allocate mbufs?
> 
> Andrew.
> 
> > Regards
> > A Vamsi
  
Jerin Jacob Oct. 23, 2019, 10:12 a.m. UTC | #8
On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
<vattunuru@marvell.com> wrote:
>
> Hi Ferruh,
>
> Can you please explain the problems in using kni dedicated mbuf alloc routines while enabling kni iova=va mode. Please see the below discussion with Andrew. He wanted to know the problems in having newer APIs.


While waiting  for the Ferruh reply, I would like to summarise the
current status

# In order to make KNI work with IOVA as VA, We need to make sure
mempool pool _object_ should not span across two huge pages

# This problem can be fixed by, either of:

a) Introduce a flag in mempool to define this constraint, so that,
when only needed, this constraint enforced and this is in line
with existing semantics of addressing such problems in mempool

b) Instead of creating a flag, Make this behavior by default in
mempool for IOVA as VA case

Upside:
b1) There is no need for specific mempool_create for KNI.

Downside:
b2) Not align with existing mempool API semantics
b3) There will be a trivial amount of memory waste as we can not
allocate from the edge. Considering the normal huge
page memory size is 1G or 512MB this not a real issue.

c) Make IOVA as PA when KNI kernel module is loaded

Upside:
c1) Doing option (a) would call for new KNI specific mempool create
API i.e existing KNI applications need a one-line change in
application to make it work with release 19.11 or later.

Downslide:
c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work with KNI
c3) Need root privilege to run KNI as IOVA as PA need root privilege

For the next year, we expect applications to work 19.11 without any
code change. My personal opinion to make go with option (a)
and update the release notes to document the change any it simple
one-line change.

The selection of (a) vs (b) is between KNI and Mempool maintainers.
Could we please reach a consensus? Or can we discuss this TB meeting?

We are going back and forth on this feature on for the last 3
releases. Now that, we solved all the technical problems, please help
us
to decide (a) vs (b) to make forward progress.







>
> I wanted to clarify ourselves about the need of mem pool patch(V11 1/4) and where to fit the fix(newer APIs or fix in mempool lib) before I start reworking the patches.
>
> Regards
> A Vamsi
>
> > snipped
> > >>
> > >> This is new feature, who want to use it adding a specific flag makes
> > >> more sense to me than all old users have to add the flag.
> > >
> > > Ferruh suggested to have a flag for enabling these new feature and also not
> > interested in having  newer mempool alloc APIs for KNI(see V10 review
> > comments). Before reworking on the flag changes, I would like check with you
> > whether the same flag can be used in mempool lib for checking and fulfilling the
> > mempool  page boundary requirement (mempool patch v11 1/4), by doing so, it
> > can avoid newer exported APIs both in mempool and KNI lib. Anyways, these
> > mempool requirement can be addressed with Olivier's below patches.
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.dpdk.org
> > > _project_dpdk_list_-3Fseries-
> > 3D5624&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
> > > r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=tCO-
> > 8E27vdyIVMm_35Qv6K
> > > -OwGCIobWI0H9DGGBm-gc&s=9aLHS9Og5L0uhuLGFAuQ9NAT3-
> > hlFmtc9RrrSbLQC00&e=
> > >
> > > When those patches are merged,  flag check can be removed.
> >
> > It is really hard to follow, sorry.
> > Is it really a problem to have KNI switcher to enable new behaviour and
> > document it that dedicated function should be used to allocate mbufs?
> >
> > Andrew.
> >
> > > Regards
> > > A Vamsi
  
Olivier Matz Oct. 23, 2019, 2:47 p.m. UTC | #9
Hi,

On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> <vattunuru@marvell.com> wrote:
> >
> > Hi Ferruh,
> >
> > Can you please explain the problems in using kni dedicated mbuf alloc routines while enabling kni iova=va mode. Please see the below discussion with Andrew. He wanted to know the problems in having newer APIs.
> 
> 
> While waiting  for the Ferruh reply, I would like to summarise the
> current status
> 
> # In order to make KNI work with IOVA as VA, We need to make sure
> mempool pool _object_ should not span across two huge pages
> 
> # This problem can be fixed by, either of:
> 
> a) Introduce a flag in mempool to define this constraint, so that,
> when only needed, this constraint enforced and this is in line
> with existing semantics of addressing such problems in mempool
> 
> b) Instead of creating a flag, Make this behavior by default in
> mempool for IOVA as VA case
> 
> Upside:
> b1) There is no need for specific mempool_create for KNI.
> 
> Downside:
> b2) Not align with existing mempool API semantics
> b3) There will be a trivial amount of memory waste as we can not
> allocate from the edge. Considering the normal huge
> page memory size is 1G or 512MB this not a real issue.
> 
> c) Make IOVA as PA when KNI kernel module is loaded
> 
> Upside:
> c1) Doing option (a) would call for new KNI specific mempool create
> API i.e existing KNI applications need a one-line change in
> application to make it work with release 19.11 or later.
> 
> Downslide:
> c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work with KNI
> c3) Need root privilege to run KNI as IOVA as PA need root privilege
> 
> For the next year, we expect applications to work 19.11 without any
> code change. My personal opinion to make go with option (a)
> and update the release notes to document the change any it simple
> one-line change.
> 
> The selection of (a) vs (b) is between KNI and Mempool maintainers.
> Could we please reach a consensus? Or can we discuss this TB meeting?
> 
> We are going back and forth on this feature on for the last 3
> releases. Now that, we solved all the technical problems, please help
> us
> to decide (a) vs (b) to make forward progress.

Thank you for the summary.
What is not clear to me is if (a) or (b) may break an existing
application, and if yes, in which case.
  
Jerin Jacob Oct. 23, 2019, 3:02 p.m. UTC | #10
On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> Hi,
>
> On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > <vattunuru@marvell.com> wrote:
> > >
> > > Hi Ferruh,
> > >
> > > Can you please explain the problems in using kni dedicated mbuf alloc routines while enabling kni iova=va mode. Please see the below discussion with Andrew. He wanted to know the problems in having newer APIs.
> >
> >
> > While waiting  for the Ferruh reply, I would like to summarise the
> > current status
> >
> > # In order to make KNI work with IOVA as VA, We need to make sure
> > mempool pool _object_ should not span across two huge pages
> >
> > # This problem can be fixed by, either of:
> >
> > a) Introduce a flag in mempool to define this constraint, so that,
> > when only needed, this constraint enforced and this is in line
> > with existing semantics of addressing such problems in mempool
> >
> > b) Instead of creating a flag, Make this behavior by default in
> > mempool for IOVA as VA case
> >
> > Upside:
> > b1) There is no need for specific mempool_create for KNI.
> >
> > Downside:
> > b2) Not align with existing mempool API semantics
> > b3) There will be a trivial amount of memory waste as we can not
> > allocate from the edge. Considering the normal huge
> > page memory size is 1G or 512MB this not a real issue.
> >
> > c) Make IOVA as PA when KNI kernel module is loaded
> >
> > Upside:
> > c1) Doing option (a) would call for new KNI specific mempool create
> > API i.e existing KNI applications need a one-line change in
> > application to make it work with release 19.11 or later.
> >
> > Downslide:
> > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work with KNI
> > c3) Need root privilege to run KNI as IOVA as PA need root privilege
> >
> > For the next year, we expect applications to work 19.11 without any
> > code change. My personal opinion to make go with option (a)
> > and update the release notes to document the change any it simple
> > one-line change.
> >
> > The selection of (a) vs (b) is between KNI and Mempool maintainers.
> > Could we please reach a consensus? Or can we discuss this TB meeting?
> >
> > We are going back and forth on this feature on for the last 3
> > releases. Now that, we solved all the technical problems, please help
> > us
> > to decide (a) vs (b) to make forward progress.
>
> Thank you for the summary.
> What is not clear to me is if (a) or (b) may break an existing
> application, and if yes, in which case.

Thanks for the reply.

To be clear we are talking about out of tree KNI tree application.
Which they don't want to
change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create() and
build for v19.11

So in case (b) there is no issue as It will be using rte_pktmbuf_pool_create ().
But in case of (a) it will create an issue if out of tree KNI
application is using rte_pktmbuf_pool_create() which is not using the
NEW flag.
  
Olivier Matz Oct. 24, 2019, 5:35 p.m. UTC | #11
Hi,

On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > Hi,
> >
> > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > <vattunuru@marvell.com> wrote:
> > > >
> > > > Hi Ferruh,
> > > >
> > > > Can you please explain the problems in using kni dedicated mbuf alloc routines while enabling kni iova=va mode. Please see the below discussion with Andrew. He wanted to know the problems in having newer APIs.
> > >
> > >
> > > While waiting  for the Ferruh reply, I would like to summarise the
> > > current status
> > >
> > > # In order to make KNI work with IOVA as VA, We need to make sure
> > > mempool pool _object_ should not span across two huge pages
> > >
> > > # This problem can be fixed by, either of:
> > >
> > > a) Introduce a flag in mempool to define this constraint, so that,
> > > when only needed, this constraint enforced and this is in line
> > > with existing semantics of addressing such problems in mempool
> > >
> > > b) Instead of creating a flag, Make this behavior by default in
> > > mempool for IOVA as VA case
> > >
> > > Upside:
> > > b1) There is no need for specific mempool_create for KNI.
> > >
> > > Downside:
> > > b2) Not align with existing mempool API semantics
> > > b3) There will be a trivial amount of memory waste as we can not
> > > allocate from the edge. Considering the normal huge
> > > page memory size is 1G or 512MB this not a real issue.
> > >
> > > c) Make IOVA as PA when KNI kernel module is loaded
> > >
> > > Upside:
> > > c1) Doing option (a) would call for new KNI specific mempool create
> > > API i.e existing KNI applications need a one-line change in
> > > application to make it work with release 19.11 or later.
> > >
> > > Downslide:
> > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work with KNI
> > > c3) Need root privilege to run KNI as IOVA as PA need root privilege
> > >
> > > For the next year, we expect applications to work 19.11 without any
> > > code change. My personal opinion to make go with option (a)
> > > and update the release notes to document the change any it simple
> > > one-line change.
> > >
> > > The selection of (a) vs (b) is between KNI and Mempool maintainers.
> > > Could we please reach a consensus? Or can we discuss this TB meeting?
> > >
> > > We are going back and forth on this feature on for the last 3
> > > releases. Now that, we solved all the technical problems, please help
> > > us
> > > to decide (a) vs (b) to make forward progress.
> >
> > Thank you for the summary.
> > What is not clear to me is if (a) or (b) may break an existing
> > application, and if yes, in which case.
> 
> Thanks for the reply.
> 
> To be clear we are talking about out of tree KNI tree application.
> Which they don't want to
> change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create() and
> build for v19.11
> 
> So in case (b) there is no issue as It will be using rte_pktmbuf_pool_create ().
> But in case of (a) it will create an issue if out of tree KNI
> application is using rte_pktmbuf_pool_create() which is not using the
> NEW flag.

Following yesterday's discussion at techboard, I looked at the mempool
code and at my previous RFC patch. It took some time to remind me what
was my worries.

Currently, in rte_mempool_populate_default(), when the mempool is
populated, we first try to allocate one iova-contiguous block of (n *
elt_size). On success, we use this memory to fully populate the mempool
without taking care of crossing page boundaries.

If we change the behavior to prevent objects from crossing pages, the
assumption that allocating (n * elt_size) is always enough becomes
wrong.  By luck, there is no real impact, because if the mempool is not
fully populated after this first iteration, it will allocate a new
chunk.

To be rigorous, we need to better calculate the amount of memory to
allocate, according to page size.

Looking at the code, I found another problem in the same area: let's say
we populate a mempool that requires 1.1GB (and we use 1G huge pages):

1/ mempool code will first tries to allocate an iova-contiguous zone
   of 1.1G -> fail
2/ it then tries to allocate a page-aligned non iova-contiguous
   zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
3/ on error, we try to allocate the biggest zone, it can still return
   a zone between 1.1G and 2G, which can also waste memory.

I will rework my mempool patchset to properly address these issues,
hopefully tomorrow.


Also, I thought about another idea to solve your issue, not sure it is
better but it would not imply to change the mempool behavior. If I
understood the problem, when a mbuf is accross 2 pages, the copy of the
data can fail in kni because the mbuf is not virtually contiguous in the
kernel. So why not in this case splitting the memcpy() into several,
each of them being on a single page (and calling phys2virt() for each
page)? The same would have to be done when accessing the fields of the
mbuf structure if it crosses a page boundary. Would that work? This
could be a B plan.

Olivier
  
Jerin Jacob Oct. 24, 2019, 7:30 p.m. UTC | #12
On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> Hi,
>
> On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > <vattunuru@marvell.com> wrote:
> > > > >
> > > > > Hi Ferruh,
> > > > >
> > > > > Can you please explain the problems in using kni dedicated mbuf alloc routines while enabling kni iova=va mode. Please see the below discussion with Andrew. He wanted to know the problems in having newer APIs.
> > > >
> > > >
> > > > While waiting  for the Ferruh reply, I would like to summarise the
> > > > current status
> > > >
> > > > # In order to make KNI work with IOVA as VA, We need to make sure
> > > > mempool pool _object_ should not span across two huge pages
> > > >
> > > > # This problem can be fixed by, either of:
> > > >
> > > > a) Introduce a flag in mempool to define this constraint, so that,
> > > > when only needed, this constraint enforced and this is in line
> > > > with existing semantics of addressing such problems in mempool
> > > >
> > > > b) Instead of creating a flag, Make this behavior by default in
> > > > mempool for IOVA as VA case
> > > >
> > > > Upside:
> > > > b1) There is no need for specific mempool_create for KNI.
> > > >
> > > > Downside:
> > > > b2) Not align with existing mempool API semantics
> > > > b3) There will be a trivial amount of memory waste as we can not
> > > > allocate from the edge. Considering the normal huge
> > > > page memory size is 1G or 512MB this not a real issue.
> > > >
> > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > >
> > > > Upside:
> > > > c1) Doing option (a) would call for new KNI specific mempool create
> > > > API i.e existing KNI applications need a one-line change in
> > > > application to make it work with release 19.11 or later.
> > > >
> > > > Downslide:
> > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work with KNI
> > > > c3) Need root privilege to run KNI as IOVA as PA need root privilege
> > > >
> > > > For the next year, we expect applications to work 19.11 without any
> > > > code change. My personal opinion to make go with option (a)
> > > > and update the release notes to document the change any it simple
> > > > one-line change.
> > > >
> > > > The selection of (a) vs (b) is between KNI and Mempool maintainers.
> > > > Could we please reach a consensus? Or can we discuss this TB meeting?
> > > >
> > > > We are going back and forth on this feature on for the last 3
> > > > releases. Now that, we solved all the technical problems, please help
> > > > us
> > > > to decide (a) vs (b) to make forward progress.
> > >
> > > Thank you for the summary.
> > > What is not clear to me is if (a) or (b) may break an existing
> > > application, and if yes, in which case.
> >
> > Thanks for the reply.
> >
> > To be clear we are talking about out of tree KNI tree application.
> > Which they don't want to
> > change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create() and
> > build for v19.11
> >
> > So in case (b) there is no issue as It will be using rte_pktmbuf_pool_create ().
> > But in case of (a) it will create an issue if out of tree KNI
> > application is using rte_pktmbuf_pool_create() which is not using the
> > NEW flag.
>
> Following yesterday's discussion at techboard, I looked at the mempool
> code and at my previous RFC patch. It took some time to remind me what
> was my worries.

Thanks for the review Olivier.

Just to make sure the correct one is reviewed.

1) v7 had similar issue mentioned
http://patches.dpdk.org/patch/56585/

2) v11 addressed the review comments and you have given the Acked-by
for mempool change
http://patches.dpdk.org/patch/61559/

My thought process in the TB meeting was, since
rte_mempool_populate_from_pg_sz_chunks() reviwed
replace rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
rte_mempool_populate_from_pg_sz_chunks()
in IOVA == VA case to avoid a new KNI mempool_create API.

>
> Currently, in rte_mempool_populate_default(), when the mempool is
> populated, we first try to allocate one iova-contiguous block of (n *
> elt_size). On success, we use this memory to fully populate the mempool
> without taking care of crossing page boundaries.
>
> If we change the behavior to prevent objects from crossing pages, the
> assumption that allocating (n * elt_size) is always enough becomes
> wrong.  By luck, there is no real impact, because if the mempool is not
> fully populated after this first iteration, it will allocate a new
> chunk.
>
> To be rigorous, we need to better calculate the amount of memory to
> allocate, according to page size.
>
> Looking at the code, I found another problem in the same area: let's say
> we populate a mempool that requires 1.1GB (and we use 1G huge pages):
>
> 1/ mempool code will first tries to allocate an iova-contiguous zone
>    of 1.1G -> fail
> 2/ it then tries to allocate a page-aligned non iova-contiguous
>    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> 3/ on error, we try to allocate the biggest zone, it can still return
>    a zone between 1.1G and 2G, which can also waste memory.
>
> I will rework my mempool patchset to properly address these issues,
> hopefully tomorrow.

OK.


>
> Also, I thought about another idea to solve your issue, not sure it is
> better but it would not imply to change the mempool behavior. If I
> understood the problem, when a mbuf is accross 2 pages, the copy of the
> data can fail in kni because the mbuf is not virtually contiguous in the

For KNI use case, we would need _physically_ contiguous to make sure
that using, get_user_pages_remote() we get  physically contiguous memory map,
so that both KNI kernel thread and KNI kernel context and DPDK userspace can
use the same memory in different contexts.



> kernel. So why not in this case splitting the memcpy() into several,
> each of them being on a single page (and calling phys2virt() for each
> page)? The same would have to be done when accessing the fields of the
> mbuf structure if it crosses a page boundary. Would that work? This

If the above is the requirement, Does this logic need to be in slow
path or fast path?

> could be a B plan.

OK.

>
> Olivier
  
Vamsi Krishna Attunuru Oct. 25, 2019, 9:20 a.m. UTC | #13
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, October 25, 2019 1:01 AM
> To: Olivier Matz <olivier.matz@6wind.com>
> Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Kiran
> Kumar Kokkilagadda <kirankumark@marvell.com>; anatoly.burakov@intel.com;
> stephen@networkplumber.org; dev@dpdk.org
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option
> 
> On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz <olivier.matz@6wind.com>
> wrote:
> >
> > Hi,
> >
> > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.matz@6wind.com>
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > > <vattunuru@marvell.com> wrote:
> > > > > >
> > > > > > Hi Ferruh,
> > > > > >
> > > > > > Can you please explain the problems in using kni dedicated mbuf alloc
> routines while enabling kni iova=va mode. Please see the below discussion with
> Andrew. He wanted to know the problems in having newer APIs.
> > > > >
> > > > >
> > > > > While waiting  for the Ferruh reply, I would like to summarise
> > > > > the current status
> > > > >
> > > > > # In order to make KNI work with IOVA as VA, We need to make
> > > > > sure mempool pool _object_ should not span across two huge pages
> > > > >
> > > > > # This problem can be fixed by, either of:
> > > > >
> > > > > a) Introduce a flag in mempool to define this constraint, so
> > > > > that, when only needed, this constraint enforced and this is in
> > > > > line with existing semantics of addressing such problems in
> > > > > mempool
> > > > >
> > > > > b) Instead of creating a flag, Make this behavior by default in
> > > > > mempool for IOVA as VA case
> > > > >
> > > > > Upside:
> > > > > b1) There is no need for specific mempool_create for KNI.
> > > > >
> > > > > Downside:
> > > > > b2) Not align with existing mempool API semantics
> > > > > b3) There will be a trivial amount of memory waste as we can not
> > > > > allocate from the edge. Considering the normal huge page memory
> > > > > size is 1G or 512MB this not a real issue.
> > > > >
> > > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > > >
> > > > > Upside:
> > > > > c1) Doing option (a) would call for new KNI specific mempool
> > > > > create API i.e existing KNI applications need a one-line change
> > > > > in application to make it work with release 19.11 or later.
> > > > >
> > > > > Downslide:
> > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work
> > > > > with KNI
> > > > > c3) Need root privilege to run KNI as IOVA as PA need root
> > > > > privilege
> > > > >
> > > > > For the next year, we expect applications to work 19.11 without
> > > > > any code change. My personal opinion to make go with option (a)
> > > > > and update the release notes to document the change any it
> > > > > simple one-line change.
> > > > >
> > > > > The selection of (a) vs (b) is between KNI and Mempool maintainers.
> > > > > Could we please reach a consensus? Or can we discuss this TB meeting?
> > > > >
> > > > > We are going back and forth on this feature on for the last 3
> > > > > releases. Now that, we solved all the technical problems, please
> > > > > help us to decide (a) vs (b) to make forward progress.
> > > >
> > > > Thank you for the summary.
> > > > What is not clear to me is if (a) or (b) may break an existing
> > > > application, and if yes, in which case.
> > >
> > > Thanks for the reply.
> > >
> > > To be clear we are talking about out of tree KNI tree application.
> > > Which they don't want to
> > > change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create()
> > > and build for v19.11
> > >
> > > So in case (b) there is no issue as It will be using rte_pktmbuf_pool_create ().
> > > But in case of (a) it will create an issue if out of tree KNI
> > > application is using rte_pktmbuf_pool_create() which is not using
> > > the NEW flag.
> >
> > Following yesterday's discussion at techboard, I looked at the mempool
> > code and at my previous RFC patch. It took some time to remind me what
> > was my worries.
> 
> Thanks for the review Olivier.
> 
> Just to make sure the correct one is reviewed.
> 
> 1) v7 had similar issue mentioned
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf
> Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe
> 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_-
> uM9o22A5j392TdBZY-bKK4&e=
> 
> 2) v11 addressed the review comments and you have given the Acked-by for
> mempool change https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf
> Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe
> 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU62
> xzAiiBMnf0I&e=
> 
> My thought process in the TB meeting was, since
> rte_mempool_populate_from_pg_sz_chunks() reviwed replace
> rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
> rte_mempool_populate_from_pg_sz_chunks()
> in IOVA == VA case to avoid a new KNI mempool_create API.
> 
> >
> > Currently, in rte_mempool_populate_default(), when the mempool is
> > populated, we first try to allocate one iova-contiguous block of (n *
> > elt_size). On success, we use this memory to fully populate the
> > mempool without taking care of crossing page boundaries.
> >
> > If we change the behavior to prevent objects from crossing pages, the
> > assumption that allocating (n * elt_size) is always enough becomes
> > wrong.  By luck, there is no real impact, because if the mempool is
> > not fully populated after this first iteration, it will allocate a new
> > chunk.
> >
> > To be rigorous, we need to better calculate the amount of memory to
> > allocate, according to page size.

Hi Olivier,

Thanks for the review, I think the below mentioned problems exist with 
current mempool_populate_default() api and will there be high chances of hitting 
those problems when we precalculate the memory size(after preventing objs from 
pg boundary and fit complete mempool memory in single mem chunk) and if
mempool size goes beyond page size as below example. ?,  

Regards,
Vamsi

> >
> > Looking at the code, I found another problem in the same area: let's
> > say we populate a mempool that requires 1.1GB (and we use 1G huge pages):
> >	
> > 1/ mempool code will first tries to allocate an iova-contiguous zone
> >    of 1.1G -> fail
> > 2/ it then tries to allocate a page-aligned non iova-contiguous
> >    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> > 3/ on error, we try to allocate the biggest zone, it can still return
> >    a zone between 1.1G and 2G, which can also waste memory.
> >
> > I will rework my mempool patchset to properly address these issues,
> > hopefully tomorrow.
> 
> OK.
> 
> 
> >
> > Also, I thought about another idea to solve your issue, not sure it is
> > better but it would not imply to change the mempool behavior. If I
> > understood the problem, when a mbuf is accross 2 pages, the copy of
> > the data can fail in kni because the mbuf is not virtually contiguous
> > in the
> 
> For KNI use case, we would need _physically_ contiguous to make sure that
> using, get_user_pages_remote() we get  physically contiguous memory map, so
> that both KNI kernel thread and KNI kernel context and DPDK userspace can use
> the same memory in different contexts.
> 
> 
> 
> > kernel. So why not in this case splitting the memcpy() into several,
> > each of them being on a single page (and calling phys2virt() for each
> > page)? The same would have to be done when accessing the fields of the
> > mbuf structure if it crosses a page boundary. Would that work? This
> 
> If the above is the requirement, Does this logic need to be in slow path or fast
> path?
> 
> > could be a B plan.
> 
> OK.
> 
> >
> > Olivier
  
Olivier Matz Oct. 26, 2019, 12:25 p.m. UTC | #14
Hi Jerin, Hi Vamsi,

On Fri, Oct 25, 2019 at 09:20:20AM +0000, Vamsi Krishna Attunuru wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, October 25, 2019 1:01 AM
> > To: Olivier Matz <olivier.matz@6wind.com>
> > Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Kiran
> > Kumar Kokkilagadda <kirankumark@marvell.com>; anatoly.burakov@intel.com;
> > stephen@networkplumber.org; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option
> > 
> > On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz <olivier.matz@6wind.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.matz@6wind.com>
> > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > > > <vattunuru@marvell.com> wrote:
> > > > > > >
> > > > > > > Hi Ferruh,
> > > > > > >
> > > > > > > Can you please explain the problems in using kni dedicated mbuf alloc
> > routines while enabling kni iova=va mode. Please see the below discussion with
> > Andrew. He wanted to know the problems in having newer APIs.
> > > > > >
> > > > > >
> > > > > > While waiting  for the Ferruh reply, I would like to summarise
> > > > > > the current status
> > > > > >
> > > > > > # In order to make KNI work with IOVA as VA, We need to make
> > > > > > sure mempool pool _object_ should not span across two huge pages
> > > > > >
> > > > > > # This problem can be fixed by, either of:
> > > > > >
> > > > > > a) Introduce a flag in mempool to define this constraint, so
> > > > > > that, when only needed, this constraint enforced and this is in
> > > > > > line with existing semantics of addressing such problems in
> > > > > > mempool
> > > > > >
> > > > > > b) Instead of creating a flag, Make this behavior by default in
> > > > > > mempool for IOVA as VA case
> > > > > >
> > > > > > Upside:
> > > > > > b1) There is no need for specific mempool_create for KNI.
> > > > > >
> > > > > > Downside:
> > > > > > b2) Not align with existing mempool API semantics
> > > > > > b3) There will be a trivial amount of memory waste as we can not
> > > > > > allocate from the edge. Considering the normal huge page memory
> > > > > > size is 1G or 512MB this not a real issue.
> > > > > >
> > > > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > > > >
> > > > > > Upside:
> > > > > > c1) Doing option (a) would call for new KNI specific mempool
> > > > > > create API i.e existing KNI applications need a one-line change
> > > > > > in application to make it work with release 19.11 or later.
> > > > > >
> > > > > > Downslide:
> > > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work
> > > > > > with KNI
> > > > > > c3) Need root privilege to run KNI as IOVA as PA need root
> > > > > > privilege
> > > > > >
> > > > > > For the next year, we expect applications to work 19.11 without
> > > > > > any code change. My personal opinion to make go with option (a)
> > > > > > and update the release notes to document the change any it
> > > > > > simple one-line change.
> > > > > >
> > > > > > The selection of (a) vs (b) is between KNI and Mempool maintainers.
> > > > > > Could we please reach a consensus? Or can we discuss this TB meeting?
> > > > > >
> > > > > > We are going back and forth on this feature on for the last 3
> > > > > > releases. Now that, we solved all the technical problems, please
> > > > > > help us to decide (a) vs (b) to make forward progress.
> > > > >
> > > > > Thank you for the summary.
> > > > > What is not clear to me is if (a) or (b) may break an existing
> > > > > application, and if yes, in which case.
> > > >
> > > > Thanks for the reply.
> > > >
> > > > To be clear we are talking about out of tree KNI tree application.
> > > > Which they don't want to
> > > > change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create()
> > > > and build for v19.11
> > > >
> > > > So in case (b) there is no issue as It will be using rte_pktmbuf_pool_create ().
> > > > But in case of (a) it will create an issue if out of tree KNI
> > > > application is using rte_pktmbuf_pool_create() which is not using
> > > > the NEW flag.
> > >
> > > Following yesterday's discussion at techboard, I looked at the mempool
> > > code and at my previous RFC patch. It took some time to remind me what
> > > was my worries.
> > 
> > Thanks for the review Olivier.
> > 
> > Just to make sure the correct one is reviewed.
> > 
> > 1) v7 had similar issue mentioned
> > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf
> > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe
> > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_-
> > uM9o22A5j392TdBZY-bKK4&e=

The v7 has the problem I described below: the iova-contiguous allocation
may fail because the calculated size is too small, and remaining objects
will be added in another chunk. This can happen if a fully
iova-contiguous mempool is requested (it happens for octeontx).

Also, the way page size is retrieved in
rte_mempool_op_populate_default() assume that memzones are used, which
is not correct.

> > 2) v11 addressed the review comments and you have given the Acked-by for
> > mempool change https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf
> > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe
> > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU62
> > xzAiiBMnf0I&e=

The v11 looked fine to me, because it does not impact any default
behavior, and the plan was to remove this new function when my mempool
patchset was in.


> > 
> > My thought process in the TB meeting was, since
> > rte_mempool_populate_from_pg_sz_chunks() reviwed replace
> > rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
> > rte_mempool_populate_from_pg_sz_chunks()
> > in IOVA == VA case to avoid a new KNI mempool_create API.
> > 
> > >
> > > Currently, in rte_mempool_populate_default(), when the mempool is
> > > populated, we first try to allocate one iova-contiguous block of (n *
> > > elt_size). On success, we use this memory to fully populate the
> > > mempool without taking care of crossing page boundaries.
> > >
> > > If we change the behavior to prevent objects from crossing pages, the
> > > assumption that allocating (n * elt_size) is always enough becomes
> > > wrong.  By luck, there is no real impact, because if the mempool is
> > > not fully populated after this first iteration, it will allocate a new
> > > chunk.
> > >
> > > To be rigorous, we need to better calculate the amount of memory to
> > > allocate, according to page size.
> 
> Hi Olivier,
> 
> Thanks for the review, I think the below mentioned problems exist with 
> current mempool_populate_default() api and will there be high chances of hitting 
> those problems when we precalculate the memory size(after preventing objs from 
> pg boundary and fit complete mempool memory in single mem chunk) and if
> mempool size goes beyond page size as below example. ?,  

Yes, the problem described below (alloc a mempool of 1.1GB resulting in
2GB reserved) exists in the current version. It will be fixed in the new
version of my "mempool: avoid objects allocations across pages"
patchset.

FYI, a reworked patchset is alsmost ready, I'll send it monday.

> 
> Regards,
> Vamsi
> 
> > >
> > > Looking at the code, I found another problem in the same area: let's
> > > say we populate a mempool that requires 1.1GB (and we use 1G huge pages):
> > >	
> > > 1/ mempool code will first tries to allocate an iova-contiguous zone
> > >    of 1.1G -> fail
> > > 2/ it then tries to allocate a page-aligned non iova-contiguous
> > >    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> > > 3/ on error, we try to allocate the biggest zone, it can still return
> > >    a zone between 1.1G and 2G, which can also waste memory.
> > >
> > > I will rework my mempool patchset to properly address these issues,
> > > hopefully tomorrow.
> > 
> > OK.
> > 
> > 
> > >
> > > Also, I thought about another idea to solve your issue, not sure it is
> > > better but it would not imply to change the mempool behavior. If I
> > > understood the problem, when a mbuf is accross 2 pages, the copy of
> > > the data can fail in kni because the mbuf is not virtually contiguous
> > > in the
> > 
> > For KNI use case, we would need _physically_ contiguous to make sure that
> > using, get_user_pages_remote() we get  physically contiguous memory map, so
> > that both KNI kernel thread and KNI kernel context and DPDK userspace can use
> > the same memory in different contexts.
> > 
> > 
> > 
> > > kernel. So why not in this case splitting the memcpy() into several,
> > > each of them being on a single page (and calling phys2virt() for each
> > > page)? The same would have to be done when accessing the fields of the
> > > mbuf structure if it crosses a page boundary. Would that work? This
> > 
> > If the above is the requirement, Does this logic need to be in slow path or fast
> > path?

In fast path. But I don't think the performance impact would be
significative.

Vamsi, do you confirm this approach could also solve the issue without
changing the mempool?

> > 
> > > could be a B plan.
> > 
> > OK.
> > 
> > >
> > > Olivier
  
Vamsi Krishna Attunuru Oct. 26, 2019, 2:09 p.m. UTC | #15
Hi Olivier,

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Saturday, October 26, 2019 5:55 PM
> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> Cc: Jerin Jacob <jerinjacobk@gmail.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> anatoly.burakov@intel.com; stephen@networkplumber.org; dev@dpdk.org
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option
> 
> Hi Jerin, Hi Vamsi,
> 
> On Fri, Oct 25, 2019 at 09:20:20AM +0000, Vamsi Krishna Attunuru wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Friday, October 25, 2019 1:01 AM
> > > To: Olivier Matz <olivier.matz@6wind.com>
> > > Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; Andrew
> Rybchenko
> > > <arybchenko@solarflare.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > > Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> > > anatoly.burakov@intel.com; stephen@networkplumber.org;
> dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy
> > > kni option
> > >
> > > On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz
> > > <olivier.matz@6wind.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > > > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz
> > > > > <olivier.matz@6wind.com>
> > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > > > > <vattunuru@marvell.com> wrote:
> > > > > > > >
> > > > > > > > Hi Ferruh,
> > > > > > > >
> > > > > > > > Can you please explain the problems in using kni dedicated
> > > > > > > > mbuf alloc
> > > routines while enabling kni iova=va mode. Please see the below
> > > discussion with Andrew. He wanted to know the problems in having
> newer APIs.
> > > > > > >
> > > > > > >
> > > > > > > While waiting  for the Ferruh reply, I would like to
> > > > > > > summarise the current status
> > > > > > >
> > > > > > > # In order to make KNI work with IOVA as VA, We need to make
> > > > > > > sure mempool pool _object_ should not span across two huge
> > > > > > > pages
> > > > > > >
> > > > > > > # This problem can be fixed by, either of:
> > > > > > >
> > > > > > > a) Introduce a flag in mempool to define this constraint, so
> > > > > > > that, when only needed, this constraint enforced and this is
> > > > > > > in line with existing semantics of addressing such problems
> > > > > > > in mempool
> > > > > > >
> > > > > > > b) Instead of creating a flag, Make this behavior by default
> > > > > > > in mempool for IOVA as VA case
> > > > > > >
> > > > > > > Upside:
> > > > > > > b1) There is no need for specific mempool_create for KNI.
> > > > > > >
> > > > > > > Downside:
> > > > > > > b2) Not align with existing mempool API semantics
> > > > > > > b3) There will be a trivial amount of memory waste as we can
> > > > > > > not allocate from the edge. Considering the normal huge page
> > > > > > > memory size is 1G or 512MB this not a real issue.
> > > > > > >
> > > > > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > > > > >
> > > > > > > Upside:
> > > > > > > c1) Doing option (a) would call for new KNI specific mempool
> > > > > > > create API i.e existing KNI applications need a one-line
> > > > > > > change in application to make it work with release 19.11 or later.
> > > > > > >
> > > > > > > Downslide:
> > > > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not
> > > > > > > work with KNI
> > > > > > > c3) Need root privilege to run KNI as IOVA as PA need root
> > > > > > > privilege
> > > > > > >
> > > > > > > For the next year, we expect applications to work 19.11
> > > > > > > without any code change. My personal opinion to make go with
> > > > > > > option (a) and update the release notes to document the
> > > > > > > change any it simple one-line change.
> > > > > > >
> > > > > > > The selection of (a) vs (b) is between KNI and Mempool
> maintainers.
> > > > > > > Could we please reach a consensus? Or can we discuss this TB
> meeting?
> > > > > > >
> > > > > > > We are going back and forth on this feature on for the last
> > > > > > > 3 releases. Now that, we solved all the technical problems,
> > > > > > > please help us to decide (a) vs (b) to make forward progress.
> > > > > >
> > > > > > Thank you for the summary.
> > > > > > What is not clear to me is if (a) or (b) may break an existing
> > > > > > application, and if yes, in which case.
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > To be clear we are talking about out of tree KNI tree application.
> > > > > Which they don't want to
> > > > > change rte_pktmbuf_pool_create() to
> > > > > rte_kni_pktmbuf_pool_create() and build for v19.11
> > > > >
> > > > > So in case (b) there is no issue as It will be using
> rte_pktmbuf_pool_create ().
> > > > > But in case of (a) it will create an issue if out of tree KNI
> > > > > application is using rte_pktmbuf_pool_create() which is not
> > > > > using the NEW flag.
> > > >
> > > > Following yesterday's discussion at techboard, I looked at the
> > > > mempool code and at my previous RFC patch. It took some time to
> > > > remind me what was my worries.
> > >
> > > Thanks for the review Olivier.
> > >
> > > Just to make sure the correct one is reviewed.
> > >
> > > 1) v7 had similar issue mentioned
> > > https://urldefense.proofpoint.com/v2/url?u=http-
> > >
> 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7
> xtf
> > >
> Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM
> VHe
> > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_-
> > > uM9o22A5j392TdBZY-bKK4&e=
> 
> The v7 has the problem I described below: the iova-contiguous allocation
> may fail because the calculated size is too small, and remaining objects will
> be added in another chunk. This can happen if a fully iova-contiguous
> mempool is requested (it happens for octeontx).
> 
> Also, the way page size is retrieved in
> rte_mempool_op_populate_default() assume that memzones are used,
> which is not correct.
> 
> > > 2) v11 addressed the review comments and you have given the Acked-by
> > > for mempool change https://urldefense.proofpoint.com/v2/url?u=http-
> > >
> 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7
> xtf
> > >
> Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM
> VHe
> > >
> 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU
> 62
> > > xzAiiBMnf0I&e=
> 
> The v11 looked fine to me, because it does not impact any default behavior,
> and the plan was to remove this new function when my mempool patchset
> was in.
> 
> 
> > >
> > > My thought process in the TB meeting was, since
> > > rte_mempool_populate_from_pg_sz_chunks() reviwed replace
> > > rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
> > > rte_mempool_populate_from_pg_sz_chunks()
> > > in IOVA == VA case to avoid a new KNI mempool_create API.
> > >
> > > >
> > > > Currently, in rte_mempool_populate_default(), when the mempool is
> > > > populated, we first try to allocate one iova-contiguous block of
> > > > (n * elt_size). On success, we use this memory to fully populate
> > > > the mempool without taking care of crossing page boundaries.
> > > >
> > > > If we change the behavior to prevent objects from crossing pages,
> > > > the assumption that allocating (n * elt_size) is always enough
> > > > becomes wrong.  By luck, there is no real impact, because if the
> > > > mempool is not fully populated after this first iteration, it will
> > > > allocate a new chunk.
> > > >
> > > > To be rigorous, we need to better calculate the amount of memory
> > > > to allocate, according to page size.
> >
> > Hi Olivier,
> >
> > Thanks for the review, I think the below mentioned problems exist with
> > current mempool_populate_default() api and will there be high chances
> > of hitting those problems when we precalculate the memory size(after
> > preventing objs from pg boundary and fit complete mempool memory in
> > single mem chunk) and if mempool size goes beyond page size as below
> > example. ?,
> 
> Yes, the problem described below (alloc a mempool of 1.1GB resulting in 2GB
> reserved) exists in the current version. It will be fixed in the new version of
> my "mempool: avoid objects allocations across pages"
> patchset.
> 
> FYI, a reworked patchset is alsmost ready, I'll send it monday.

Thanks a lot Olivier.

> 
> >
> > Regards,
> > Vamsi
> >
> > > >
> > > > Looking at the code, I found another problem in the same area:
> > > > let's say we populate a mempool that requires 1.1GB (and we use 1G
> huge pages):
> > > >
> > > > 1/ mempool code will first tries to allocate an iova-contiguous zone
> > > >    of 1.1G -> fail
> > > > 2/ it then tries to allocate a page-aligned non iova-contiguous
> > > >    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> > > > 3/ on error, we try to allocate the biggest zone, it can still return
> > > >    a zone between 1.1G and 2G, which can also waste memory.
> > > >
> > > > I will rework my mempool patchset to properly address these
> > > > issues, hopefully tomorrow.
> > >
> > > OK.
> > >
> > >
> > > >
> > > > Also, I thought about another idea to solve your issue, not sure
> > > > it is better but it would not imply to change the mempool
> > > > behavior. If I understood the problem, when a mbuf is accross 2
> > > > pages, the copy of the data can fail in kni because the mbuf is
> > > > not virtually contiguous in the
> > >
> > > For KNI use case, we would need _physically_ contiguous to make sure
> > > that using, get_user_pages_remote() we get  physically contiguous
> > > memory map, so that both KNI kernel thread and KNI kernel context
> > > and DPDK userspace can use the same memory in different contexts.
> > >
> > >
> > >
> > > > kernel. So why not in this case splitting the memcpy() into
> > > > several, each of them being on a single page (and calling
> > > > phys2virt() for each page)? The same would have to be done when
> > > > accessing the fields of the mbuf structure if it crosses a page
> > > > boundary. Would that work? This
> > >
> > > If the above is the requirement, Does this logic need to be in slow
> > > path or fast path?
> 
> In fast path. But I don't think the performance impact would be significative.
> 
> Vamsi, do you confirm this approach could also solve the issue without
> changing the mempool?

So far there is no performance impact observed when these feature(kni iova as va) is enabled with this patch set. Not sure of impact if when memcpy split and extra address translations are introduced, this approach might solve but I feel it's not a clearer way of solving the issue.
> 
> > >
> > > > could be a B plan.
> > >
> > > OK.
> > >
> > > >
> > > > Olivier
  
Olivier Matz Oct. 28, 2019, 2:05 p.m. UTC | #16
On Sat, Oct 26, 2019 at 02:09:51PM +0000, Vamsi Krishna Attunuru wrote:
> Hi Olivier,
> 
> > -----Original Message-----
> > From: Olivier Matz <olivier.matz@6wind.com>
> > Sent: Saturday, October 26, 2019 5:55 PM
> > To: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> > Cc: Jerin Jacob <jerinjacobk@gmail.com>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> > anatoly.burakov@intel.com; stephen@networkplumber.org; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option
> > 
> > Hi Jerin, Hi Vamsi,
> > 
> > On Fri, Oct 25, 2019 at 09:20:20AM +0000, Vamsi Krishna Attunuru wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Friday, October 25, 2019 1:01 AM
> > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; Andrew
> > Rybchenko
> > > > <arybchenko@solarflare.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > > thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > > > Kiran Kumar Kokkilagadda <kirankumark@marvell.com>;
> > > > anatoly.burakov@intel.com; stephen@networkplumber.org;
> > dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy
> > > > kni option
> > > >
> > > > On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz
> > > > <olivier.matz@6wind.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > > > > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz
> > > > > > <olivier.matz@6wind.com>
> > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > > > > > <vattunuru@marvell.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Ferruh,
> > > > > > > > >
> > > > > > > > > Can you please explain the problems in using kni dedicated
> > > > > > > > > mbuf alloc
> > > > routines while enabling kni iova=va mode. Please see the below
> > > > discussion with Andrew. He wanted to know the problems in having
> > newer APIs.
> > > > > > > >
> > > > > > > >
> > > > > > > > While waiting  for the Ferruh reply, I would like to
> > > > > > > > summarise the current status
> > > > > > > >
> > > > > > > > # In order to make KNI work with IOVA as VA, We need to make
> > > > > > > > sure mempool pool _object_ should not span across two huge
> > > > > > > > pages
> > > > > > > >
> > > > > > > > # This problem can be fixed by, either of:
> > > > > > > >
> > > > > > > > a) Introduce a flag in mempool to define this constraint, so
> > > > > > > > that, when only needed, this constraint enforced and this is
> > > > > > > > in line with existing semantics of addressing such problems
> > > > > > > > in mempool
> > > > > > > >
> > > > > > > > b) Instead of creating a flag, Make this behavior by default
> > > > > > > > in mempool for IOVA as VA case
> > > > > > > >
> > > > > > > > Upside:
> > > > > > > > b1) There is no need for specific mempool_create for KNI.
> > > > > > > >
> > > > > > > > Downside:
> > > > > > > > b2) Not align with existing mempool API semantics
> > > > > > > > b3) There will be a trivial amount of memory waste as we can
> > > > > > > > not allocate from the edge. Considering the normal huge page
> > > > > > > > memory size is 1G or 512MB this not a real issue.
> > > > > > > >
> > > > > > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > > > > > >
> > > > > > > > Upside:
> > > > > > > > c1) Doing option (a) would call for new KNI specific mempool
> > > > > > > > create API i.e existing KNI applications need a one-line
> > > > > > > > change in application to make it work with release 19.11 or later.
> > > > > > > >
> > > > > > > > Downslide:
> > > > > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not
> > > > > > > > work with KNI
> > > > > > > > c3) Need root privilege to run KNI as IOVA as PA need root
> > > > > > > > privilege
> > > > > > > >
> > > > > > > > For the next year, we expect applications to work 19.11
> > > > > > > > without any code change. My personal opinion to make go with
> > > > > > > > option (a) and update the release notes to document the
> > > > > > > > change any it simple one-line change.
> > > > > > > >
> > > > > > > > The selection of (a) vs (b) is between KNI and Mempool
> > maintainers.
> > > > > > > > Could we please reach a consensus? Or can we discuss this TB
> > meeting?
> > > > > > > >
> > > > > > > > We are going back and forth on this feature on for the last
> > > > > > > > 3 releases. Now that, we solved all the technical problems,
> > > > > > > > please help us to decide (a) vs (b) to make forward progress.
> > > > > > >
> > > > > > > Thank you for the summary.
> > > > > > > What is not clear to me is if (a) or (b) may break an existing
> > > > > > > application, and if yes, in which case.
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > To be clear we are talking about out of tree KNI tree application.
> > > > > > Which they don't want to
> > > > > > change rte_pktmbuf_pool_create() to
> > > > > > rte_kni_pktmbuf_pool_create() and build for v19.11
> > > > > >
> > > > > > So in case (b) there is no issue as It will be using
> > rte_pktmbuf_pool_create ().
> > > > > > But in case of (a) it will create an issue if out of tree KNI
> > > > > > application is using rte_pktmbuf_pool_create() which is not
> > > > > > using the NEW flag.
> > > > >
> > > > > Following yesterday's discussion at techboard, I looked at the
> > > > > mempool code and at my previous RFC patch. It took some time to
> > > > > remind me what was my worries.
> > > >
> > > > Thanks for the review Olivier.
> > > >
> > > > Just to make sure the correct one is reviewed.
> > > >
> > > > 1) v7 had similar issue mentioned
> > > > https://urldefense.proofpoint.com/v2/url?u=http-
> > > >
> > 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7
> > xtf
> > > >
> > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM
> > VHe
> > > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_-
> > > > uM9o22A5j392TdBZY-bKK4&e=
> > 
> > The v7 has the problem I described below: the iova-contiguous allocation
> > may fail because the calculated size is too small, and remaining objects will
> > be added in another chunk. This can happen if a fully iova-contiguous
> > mempool is requested (it happens for octeontx).
> > 
> > Also, the way page size is retrieved in
> > rte_mempool_op_populate_default() assume that memzones are used,
> > which is not correct.
> > 
> > > > 2) v11 addressed the review comments and you have given the Acked-by
> > > > for mempool change https://urldefense.proofpoint.com/v2/url?u=http-
> > > >
> > 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7
> > xtf
> > > >
> > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM
> > VHe
> > > >
> > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU
> > 62
> > > > xzAiiBMnf0I&e=
> > 
> > The v11 looked fine to me, because it does not impact any default behavior,
> > and the plan was to remove this new function when my mempool patchset
> > was in.
> > 
> > 
> > > >
> > > > My thought process in the TB meeting was, since
> > > > rte_mempool_populate_from_pg_sz_chunks() reviwed replace
> > > > rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
> > > > rte_mempool_populate_from_pg_sz_chunks()
> > > > in IOVA == VA case to avoid a new KNI mempool_create API.
> > > >
> > > > >
> > > > > Currently, in rte_mempool_populate_default(), when the mempool is
> > > > > populated, we first try to allocate one iova-contiguous block of
> > > > > (n * elt_size). On success, we use this memory to fully populate
> > > > > the mempool without taking care of crossing page boundaries.
> > > > >
> > > > > If we change the behavior to prevent objects from crossing pages,
> > > > > the assumption that allocating (n * elt_size) is always enough
> > > > > becomes wrong.  By luck, there is no real impact, because if the
> > > > > mempool is not fully populated after this first iteration, it will
> > > > > allocate a new chunk.
> > > > >
> > > > > To be rigorous, we need to better calculate the amount of memory
> > > > > to allocate, according to page size.
> > >
> > > Hi Olivier,
> > >
> > > Thanks for the review, I think the below mentioned problems exist with
> > > current mempool_populate_default() api and will there be high chances
> > > of hitting those problems when we precalculate the memory size(after
> > > preventing objs from pg boundary and fit complete mempool memory in
> > > single mem chunk) and if mempool size goes beyond page size as below
> > > example. ?,
> > 
> > Yes, the problem described below (alloc a mempool of 1.1GB resulting in 2GB
> > reserved) exists in the current version. It will be fixed in the new version of
> > my "mempool: avoid objects allocations across pages"
> > patchset.
> > 
> > FYI, a reworked patchset is alsmost ready, I'll send it monday.
> 
> Thanks a lot Olivier.
> 
> > 
> > >
> > > Regards,
> > > Vamsi
> > >
> > > > >
> > > > > Looking at the code, I found another problem in the same area:
> > > > > let's say we populate a mempool that requires 1.1GB (and we use 1G
> > huge pages):
> > > > >
> > > > > 1/ mempool code will first tries to allocate an iova-contiguous zone
> > > > >    of 1.1G -> fail
> > > > > 2/ it then tries to allocate a page-aligned non iova-contiguous
> > > > >    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> > > > > 3/ on error, we try to allocate the biggest zone, it can still return
> > > > >    a zone between 1.1G and 2G, which can also waste memory.
> > > > >
> > > > > I will rework my mempool patchset to properly address these
> > > > > issues, hopefully tomorrow.
> > > >
> > > > OK.
> > > >
> > > >
> > > > >
> > > > > Also, I thought about another idea to solve your issue, not sure
> > > > > it is better but it would not imply to change the mempool
> > > > > behavior. If I understood the problem, when a mbuf is accross 2
> > > > > pages, the copy of the data can fail in kni because the mbuf is
> > > > > not virtually contiguous in the
> > > >
> > > > For KNI use case, we would need _physically_ contiguous to make sure
> > > > that using, get_user_pages_remote() we get  physically contiguous
> > > > memory map, so that both KNI kernel thread and KNI kernel context
> > > > and DPDK userspace can use the same memory in different contexts.
> > > >
> > > >
> > > >
> > > > > kernel. So why not in this case splitting the memcpy() into
> > > > > several, each of them being on a single page (and calling
> > > > > phys2virt() for each page)? The same would have to be done when
> > > > > accessing the fields of the mbuf structure if it crosses a page
> > > > > boundary. Would that work? This
> > > >
> > > > If the above is the requirement, Does this logic need to be in slow
> > > > path or fast path?
> > 
> > In fast path. But I don't think the performance impact would be significative.
> > 
> > Vamsi, do you confirm this approach could also solve the issue without
> > changing the mempool?
> 
> So far there is no performance impact observed when these feature(kni iova as va) is enabled with this patch set. Not sure of impact if when memcpy split and extra address translations are introduced, this approach might solve but I feel it's not a clearer way of solving the issue.

Agree it's not the clearer way, but maybe less risky.

As you can see, the reworked patchset is not that short:
http://inbox.dpdk.org/dev/20190719133845.32432-1-olivier.matz@6wind.com/T/#m6249311baae8469f5727f07c32e4e6e6844eae6a


> > 
> > > >
> > > > > could be a B plan.
> > > >
> > > > OK.
> > > >
> > > > >
> > > > > Olivier
  

Patch

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 85953b9..ab2c381 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -115,6 +115,10 @@  New Features
   Added eBPF JIT support for arm64 architecture to improve the eBPF program
   performance.
 
+* **Added EAL option to operate KNI in legacy mode.**
+
+  Added EAL option ``--legacy-kni`` to make existing KNI applications work
+  with DPDK 19.11 and later.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 05cae5f..8f5174e 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -81,6 +81,7 @@  eal_long_options[] = {
 	{OPT_LEGACY_MEM,        0, NULL, OPT_LEGACY_MEM_NUM       },
 	{OPT_SINGLE_FILE_SEGMENTS, 0, NULL, OPT_SINGLE_FILE_SEGMENTS_NUM},
 	{OPT_MATCH_ALLOCATIONS, 0, NULL, OPT_MATCH_ALLOCATIONS_NUM},
+	{OPT_LEGACY_KNI,        0, NULL, OPT_LEGACY_KNI_NUM       },
 	{0,                     0, NULL, 0                        }
 };
 
@@ -1408,6 +1409,9 @@  eal_parse_common_option(int opt, const char *optarg,
 			return -1;
 		}
 		break;
+	case OPT_LEGACY_KNI_NUM:
+		conf->legacy_kni = 1;
+		break;
 
 	/* don't know what to do, leave this to caller */
 	default:
@@ -1636,6 +1640,7 @@  eal_common_usage(void)
 	       "                      (ex: --vdev=net_pcap0,iface=eth2).\n"
 	       "  --"OPT_IOVA_MODE"   Set IOVA mode. 'pa' for IOVA_PA\n"
 	       "                      'va' for IOVA_VA\n"
+	       "  --"OPT_LEGACY_KNI"  Run KNI in IOVA_PA mode (legacy mode)\n"
 	       "  -d LIB.so|DIR       Add a driver or driver directory\n"
 	       "                      (can be used multiple times)\n"
 	       "  --"OPT_VMWARE_TSC_MAP"    Use VMware TSC map instead of native RDTSC\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index a42f349..eee71ec 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -82,6 +82,8 @@  struct internal_config {
 	rte_cpuset_t ctrl_cpuset;         /**< cpuset for ctrl threads */
 	volatile unsigned int init_complete;
 	/**< indicates whether EAL has completed initialization */
+	volatile unsigned legacy_kni;
+	/**< true to enable legacy kni behavior */
 };
 extern struct internal_config internal_config; /**< Global EAL configuration. */
 
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 9855429..1010ed3 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -69,6 +69,8 @@  enum {
 	OPT_IOVA_MODE_NUM,
 #define OPT_MATCH_ALLOCATIONS  "match-allocations"
 	OPT_MATCH_ALLOCATIONS_NUM,
+#define OPT_LEGACY_KNI      "legacy-kni"
+	OPT_LEGACY_KNI_NUM,
 	OPT_LONG_MAX_NUM
 };