[v1] ethdev: add direction info when creating the transfer table

Message ID 20220907024020.2474860-1-rongweil@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Andrew Rybchenko
Headers
Series [v1] ethdev: add direction info when creating the transfer table |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-x86_64-compile-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Rongwei Liu Sept. 7, 2022, 2:40 a.m. UTC
  The transfer domain rule is able to match traffic wire/vf
origin and it means two directions' underlayer resource.

In customer deployments, they usually match only one direction
traffic in single flow table: either from wire or from vf.

Introduce one new member transfer_mode into rte_flow_attr to
indicate the flow table direction property: from wire, from vf
or bi-direction(default).

It helps to save underlayer memory also on insertion rate.

By default, the transfer domain is bi-direction, and no behavior changes.

1. Match wire origin only
   flow template_table 0 create group 0 priority 0 transfer wire_orig...
2. Match vf origin only
   flow template_table 0 create group 0 priority 0 transfer vf_orig...

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
 lib/ethdev/rte_flow.h                       |  9 ++++++-
 3 files changed, 36 insertions(+), 2 deletions(-)
  

Comments

Ori Kam Sept. 11, 2022, 8:22 a.m. UTC | #1
Hi Rongwei,

> -----Original Message-----
> From: Rongwei Liu <rongweil@nvidia.com>
> Sent: Wednesday, 7 September 2022 5:40
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH v1] ethdev: add direction info when creating the transfer
> table
> 
> The transfer domain rule is able to match traffic wire/vf
> origin and it means two directions' underlayer resource.
> 
> In customer deployments, they usually match only one direction
> traffic in single flow table: either from wire or from vf.
> 
> Introduce one new member transfer_mode into rte_flow_attr to
> indicate the flow table direction property: from wire, from vf
> or bi-direction(default).
> 
> It helps to save underlayer memory also on insertion rate.
> 
> By default, the transfer domain is bi-direction, and no behavior changes.
> 
> 1. Match wire origin only
>    flow template_table 0 create group 0 priority 0 transfer wire_orig...
> 2. Match vf origin only
>    flow template_table 0 create group 0 priority 0 transfer vf_orig...
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>  app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
>  lib/ethdev/rte_flow.h                       |  9 ++++++-
>  3 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> index 7f50028eb7..b25b595e82 100644
> --- a/app/test-pmd/cmdline_flow.c
> +++ b/app/test-pmd/cmdline_flow.c
> @@ -177,6 +177,8 @@ enum index {
>  	TABLE_INGRESS,
>  	TABLE_EGRESS,
>  	TABLE_TRANSFER,
> +	TABLE_TRANSFER_WIRE_ORIG,
> +	TABLE_TRANSFER_VF_ORIG,
>  	TABLE_RULES_NUMBER,
>  	TABLE_PATTERN_TEMPLATE,
>  	TABLE_ACTIONS_TEMPLATE,
> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
>  	TABLE_INGRESS,
>  	TABLE_EGRESS,
>  	TABLE_TRANSFER,
> +	TABLE_TRANSFER_WIRE_ORIG,
> +	TABLE_TRANSFER_VF_ORIG,
>  	TABLE_RULES_NUMBER,
>  	TABLE_PATTERN_TEMPLATE,
>  	TABLE_ACTIONS_TEMPLATE,
> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
>  		.next = NEXT(next_table_attr),
>  		.call = parse_table,
>  	},
> +	[TABLE_TRANSFER_WIRE_ORIG] = {
> +		.name = "wire_orig",
> +		.help = "affect rule direction to transfer",
> +		.next = NEXT(next_table_attr),
> +		.call = parse_table,
> +	},
> +	[TABLE_TRANSFER_VF_ORIG] = {
> +		.name = "vf_orig",
> +		.help = "affect rule direction to transfer",
> +		.next = NEXT(next_table_attr),
> +		.call = parse_table,
> +	},
>  	[TABLE_RULES_NUMBER] = {
>  		.name = "rules_number",
>  		.help = "number of rules in table",
> @@ -8894,6 +8910,16 @@ parse_table(struct context *ctx, const struct
> token *token,
>  	case TABLE_TRANSFER:
>  		out->args.table.attr.flow_attr.transfer = 1;
>  		return len;
> +	case TABLE_TRANSFER_WIRE_ORIG:
> +		if (!out->args.table.attr.flow_attr.transfer)
> +			return -1;
> +		out->args.table.attr.flow_attr.transfer_mode = 1;
> +		return len;
> +	case TABLE_TRANSFER_VF_ORIG:
> +		if (!out->args.table.attr.flow_attr.transfer)
> +			return -1;
> +		out->args.table.attr.flow_attr.transfer_mode = 2;
> +		return len;
>  	default:
>  		return -1;
>  	}
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 330e34427d..603b7988dd 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -3332,7 +3332,8 @@ It is bound to
> ``rte_flow_template_table_create()``::
> 
>     flow template_table {port_id} create
>         [table_id {id}] [group {group_id}]
> -       [priority {level}] [ingress] [egress] [transfer]
> +       [priority {level}] [ingress] [egress]
> +       [transfer [vf_orig] [wire_orig]]
>         rules_number {number}
>         pattern_template {pattern_template_id}
>         actions_template {actions_template_id}
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index a79f1e7ef0..512b08d817 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -130,7 +130,14 @@ struct rte_flow_attr {
>  	 * through a suitable port. @see rte_flow_pick_transfer_proxy().
>  	 */
>  	uint32_t transfer:1;
> -	uint32_t reserved:29; /**< Reserved, must be zero. */
> +	/**
> +	 * 0 means bidirection,
> +	 * 0x1 origin uplink,
> +	 * 0x2 origin vport,
> +	 * N/A both set.
> +	 */
> +	uint32_t transfer_mode:2;
> +	uint32_t reserved:27; /**< Reserved, must be zero. */
>  };
> 
>  /**
> --
> 2.27.0

Acked-by: Ori Kam <orika@nvidia.com>
Thanks,
Ori
  
Ivan Malov Sept. 12, 2022, 4:57 p.m. UTC | #2
Hi,

On Wed, 7 Sep 2022, Rongwei Liu wrote:

> The transfer domain rule is able to match traffic wire/vf
> origin and it means two directions' underlayer resource.

The point of fact is that matching traffic coming from
some entity like wire / VF has been long generalised
in the form of representors. So, a flow rule with
attribute "transfer" is able to match traffic
coming from either a REPRESENTED_PORT or from
a PORT_REPRESENTOR (please find these items).

>
> In customer deployments, they usually match only one direction
> traffic in single flow table: either from wire or from vf.

Which customer deployments? Could you please provide detailed examples?

>
> Introduce one new member transfer_mode into rte_flow_attr to
> indicate the flow table direction property: from wire, from vf
> or bi-direction(default).

AFAIK, 'rte_flow_attr' serves both traditional flow rule
insertion and asynchronous (table) approach. The patch
adds the attributes to generic 'rte_flow_attr' but,
for some reason, ignores non-table rules.

For example, the diff below adds the attributes to "table" commands
in testpmd but does not add them to regular (non-table)
commands like "flow create". Why?

>
> It helps to save underlayer memory also on insertion rate.

Which memory? Host memory? NIC memory? Term "underlayer" is vague.
I suggest that the commit message be revised to first explain how
such memory is spent currently, then explain why this is not
optimal and, finally, which way the patch is supposed to
improve that. I.e. be more specific.

>
> By default, the transfer domain is bi-direction, and no behavior changes.
>
> 1. Match wire origin only
>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
> 2. Match vf origin only
>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
>
> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
> ---
> app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
> lib/ethdev/rte_flow.h                       |  9 ++++++-
> 3 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> index 7f50028eb7..b25b595e82 100644
> --- a/app/test-pmd/cmdline_flow.c
> +++ b/app/test-pmd/cmdline_flow.c
> @@ -177,6 +177,8 @@ enum index {
> 	TABLE_INGRESS,
> 	TABLE_EGRESS,
> 	TABLE_TRANSFER,
> +	TABLE_TRANSFER_WIRE_ORIG,
> +	TABLE_TRANSFER_VF_ORIG,
> 	TABLE_RULES_NUMBER,
> 	TABLE_PATTERN_TEMPLATE,
> 	TABLE_ACTIONS_TEMPLATE,
> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
> 	TABLE_INGRESS,
> 	TABLE_EGRESS,
> 	TABLE_TRANSFER,
> +	TABLE_TRANSFER_WIRE_ORIG,
> +	TABLE_TRANSFER_VF_ORIG,
> 	TABLE_RULES_NUMBER,
> 	TABLE_PATTERN_TEMPLATE,
> 	TABLE_ACTIONS_TEMPLATE,
> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
> 		.next = NEXT(next_table_attr),
> 		.call = parse_table,
> 	},
> +	[TABLE_TRANSFER_WIRE_ORIG] = {
> +		.name = "wire_orig",
> +		.help = "affect rule direction to transfer",

This does not explain the "wire" aspect. It's too broad.

> +		.next = NEXT(next_table_attr),
> +		.call = parse_table,
> +	},
> +	[TABLE_TRANSFER_VF_ORIG] = {
> +		.name = "vf_orig",
> +		.help = "affect rule direction to transfer",

This explanation simply duplicates such of the "wire_orig".
It does not explain the "vf" part. Should be more specific.

> +		.next = NEXT(next_table_attr),
> +		.call = parse_table,
> +	},
> 	[TABLE_RULES_NUMBER] = {
> 		.name = "rules_number",
> 		.help = "number of rules in table",
> @@ -8894,6 +8910,16 @@ parse_table(struct context *ctx, const struct token 
> *token,
> 	case TABLE_TRANSFER:
> 		out->args.table.attr.flow_attr.transfer = 1;
> 		return len;
> +	case TABLE_TRANSFER_WIRE_ORIG:
> +		if (!out->args.table.attr.flow_attr.transfer)
> +			return -1;
> +		out->args.table.attr.flow_attr.transfer_mode = 1;
> +		return len;
> +	case TABLE_TRANSFER_VF_ORIG:
> +		if (!out->args.table.attr.flow_attr.transfer)
> +			return -1;
> +		out->args.table.attr.flow_attr.transfer_mode = 2;
> +		return len;
> 	default:
> 		return -1;
> 	}
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 330e34427d..603b7988dd 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -3332,7 +3332,8 @@ It is bound to ``rte_flow_template_table_create()``::
>
>   flow template_table {port_id} create
>       [table_id {id}] [group {group_id}]
> -       [priority {level}] [ingress] [egress] [transfer]
> +       [priority {level}] [ingress] [egress]
> +       [transfer [vf_orig] [wire_orig]]

Is it correct? Shouldn't it rather be
[transfer] [vf_orig] [wire_orig]
?

>       rules_number {number}
>       pattern_template {pattern_template_id}
>       actions_template {actions_template_id}
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index a79f1e7ef0..512b08d817 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -130,7 +130,14 @@ struct rte_flow_attr {
> 	 * through a suitable port. @see rte_flow_pick_transfer_proxy().
> 	 */
> 	uint32_t transfer:1;
> -	uint32_t reserved:29; /**< Reserved, must be zero. */
> +	/**
> +	 * 0 means bidirection,
> +	 * 0x1 origin uplink,

What does "uplink" mean? It's too vague. Hardly a good term.

> +	 * 0x2 origin vport,

What does "origin vport" mean? Hardly a good term as well.

> +	 * N/A both set.

What's this?

> +	 */
> +	uint32_t transfer_mode:2;
> +	uint32_t reserved:27; /**< Reserved, must be zero. */
> };
>
> /**
> -- 
> 2.27.0
>

Since the attributes are added to generic 'struct rte_flow_attr',
non-table (synchronous) flow rules are supposed to support them,
too. If that is indeed the case, then I'm afraid such proposal
does not agree with the existing items PORT_REPRESENTOR and
REPRESENTED_PORT. They do exactly the same thing, but they
are designed to be way more generic. Why not use them?

Ivan
  
Rongwei Liu Sept. 13, 2022, 1:46 p.m. UTC | #3
Hi 

BR
Rongwei

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Tuesday, September 13, 2022 00:57
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
> Darawsheh <rasland@nvidia.com>
> Subject: Re: [PATCH v1] ethdev: add direction info when creating the transfer
> table
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> 
> On Wed, 7 Sep 2022, Rongwei Liu wrote:
> 
> > The transfer domain rule is able to match traffic wire/vf origin and
> > it means two directions' underlayer resource.
> 
> The point of fact is that matching traffic coming from some entity like wire /
> VF has been long generalised in the form of representors. So, a flow rule with
> attribute "transfer" is able to match traffic coming from either a
> REPRESENTED_PORT or from a PORT_REPRESENTOR (please find these items).
> 
> >
> > In customer deployments, they usually match only one direction traffic
> > in single flow table: either from wire or from vf.
> 
> Which customer deployments? Could you please provide detailed examples?
> 
> > 

We saw a lot of customers' deployment like:
1. Match overlay traffic from wire and do decap, then send to specific vport.
2. Match specific 5-tuples and do encap, then send to wire.
The matching criteria has obvious direction preference.  

> > Introduce one new member transfer_mode into rte_flow_attr to indicate
> > the flow table direction property: from wire, from vf or
> > bi-direction(default).
> 
> AFAIK, 'rte_flow_attr' serves both traditional flow rule insertion and
> asynchronous (table) approach. The patch adds the attributes to generic
> 'rte_flow_attr' but, for some reason, ignores non-table rules.
> 
> > 
Sync API uses one rule to contain everything. It' hard for PMD to determine if this rule has direction preference or not.
Image a situation, just for an example:
1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
1 and 2 share the same matching conditions (eth / ipv4 / udp / vxlan /...), so sync API consider them share matching determination logic. 
It means "2" have 1M scale capability too. Obviously, it wastes a lot of resources.

In async API, there is pattern_template introduced. We can mark "1" to use pattern_tempate id 1 and "2" to use pattern_template 2.
They will be separated from each other, don't share anymore.

> For example, the diff below adds the attributes to "table" commands in
> testpmd but does not add them to regular (non-table) commands like "flow
> create". Why?
> 
> >

 "table" command limits pattern_template to single direction or bidirection per user specified attribute.
 "rule" command must tight with one "table_id", so the rule will inherit the "table" direction property, no need to specify again.

> > It helps to save underlayer memory also on insertion rate.
> 
> Which memory? Host memory? NIC memory? Term "underlayer" is vague.
> I suggest that the commit message be revised to first explain how such
> memory is spent currently, then explain why this is not optimal and, finally,
> which way the patch is supposed to improve that. I.e. be more specific.
> 
> > 

For large scalable rules, HW (depends on implementation) always needs memory to hold the rules' patterns and actions, either from NIC or from host.
The memory footprint highly depends on "user rules' complexity", also diff between NICs.
~50% memory saving is expected if one-direction is cut.

> > By default, the transfer domain is bi-direction, and no behavior changes.
> >
> > 1. Match wire origin only
> >  flow template_table 0 create group 0 priority 0 transfer wire_orig...
> > 2. Match vf origin only
> >  flow template_table 0 create group 0 priority 0 transfer vf_orig...
> >
> > Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
> > ---
> > app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
> > doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
> > lib/ethdev/rte_flow.h                       |  9 ++++++-
> > 3 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> > index 7f50028eb7..b25b595e82 100644
> > --- a/app/test-pmd/cmdline_flow.c
> > +++ b/app/test-pmd/cmdline_flow.c
> > @@ -177,6 +177,8 @@ enum index {
> >       TABLE_INGRESS,
> >       TABLE_EGRESS,
> >       TABLE_TRANSFER,
> > +     TABLE_TRANSFER_WIRE_ORIG,
> > +     TABLE_TRANSFER_VF_ORIG,
> >       TABLE_RULES_NUMBER,
> >       TABLE_PATTERN_TEMPLATE,
> >       TABLE_ACTIONS_TEMPLATE,
> > @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
> >       TABLE_INGRESS,
> >       TABLE_EGRESS,
> >       TABLE_TRANSFER,
> > +     TABLE_TRANSFER_WIRE_ORIG,
> > +     TABLE_TRANSFER_VF_ORIG,
> >       TABLE_RULES_NUMBER,
> >       TABLE_PATTERN_TEMPLATE,
> >       TABLE_ACTIONS_TEMPLATE,
> > @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
> >               .next = NEXT(next_table_attr),
> >               .call = parse_table,
> >       },
> > +     [TABLE_TRANSFER_WIRE_ORIG] = {
> > +             .name = "wire_orig",
> > +             .help = "affect rule direction to transfer",
> 
> This does not explain the "wire" aspect. It's too broad.
> 
> > +             .next = NEXT(next_table_attr),
> > +             .call = parse_table,
> > +     },
> > +     [TABLE_TRANSFER_VF_ORIG] = {
> > +             .name = "vf_orig",
> > +             .help = "affect rule direction to transfer",
> 
> This explanation simply duplicates such of the "wire_orig".
> It does not explain the "vf" part. Should be more specific.
> 
> > +             .next = NEXT(next_table_attr),
> > +             .call = parse_table,
> > +     },
> >       [TABLE_RULES_NUMBER] = {
> >               .name = "rules_number",
> >               .help = "number of rules in table", @@ -8894,6 +8910,16
> > @@ parse_table(struct context *ctx, const struct token *token,
> >       case TABLE_TRANSFER:
> >               out->args.table.attr.flow_attr.transfer = 1;
> >               return len;
> > +     case TABLE_TRANSFER_WIRE_ORIG:
> > +             if (!out->args.table.attr.flow_attr.transfer)
> > +                     return -1;
> > +             out->args.table.attr.flow_attr.transfer_mode = 1;
> > +             return len;
> > +     case TABLE_TRANSFER_VF_ORIG:
> > +             if (!out->args.table.attr.flow_attr.transfer)
> > +                     return -1;
> > +             out->args.table.attr.flow_attr.transfer_mode = 2;
> > +             return len;
> >       default:
> >               return -1;
> >       }
> > diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > index 330e34427d..603b7988dd 100644
> > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > @@ -3332,7 +3332,8 @@ It is bound to
> ``rte_flow_template_table_create()``::
> >
> >   flow template_table {port_id} create
> >       [table_id {id}] [group {group_id}]
> > -       [priority {level}] [ingress] [egress] [transfer]
> > +       [priority {level}] [ingress] [egress]
> > +       [transfer [vf_orig] [wire_orig]]
> 
> Is it correct? Shouldn't it rather be
> [transfer] [vf_orig] [wire_orig]
> ?
> 
> >       rules_number {number}
> >       pattern_template {pattern_template_id}
> >       actions_template {actions_template_id} diff --git
> > a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> > a79f1e7ef0..512b08d817 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -130,7 +130,14 @@ struct rte_flow_attr {
> >        * through a suitable port. @see rte_flow_pick_transfer_proxy().
> >        */
> >       uint32_t transfer:1;
> > -     uint32_t reserved:29; /**< Reserved, must be zero. */
> > +     /**
> > +      * 0 means bidirection,
> > +      * 0x1 origin uplink,
> 
> What does "uplink" mean? It's too vague. Hardly a good term.
> 
> > +      * 0x2 origin vport,
> 
> What does "origin vport" mean? Hardly a good term as well.
> 
> > +      * N/A both set.
> 
> What's this?
> 
> > +      */
> > +     uint32_t transfer_mode:2;
> > +     uint32_t reserved:27; /**< Reserved, must be zero. */
> > };
> >
> > /**
> > --
> > 2.27.0
> >
> 
> Since the attributes are added to generic 'struct rte_flow_attr', non-table
> (synchronous) flow rules are supposed to support them, too. If that is indeed
> the case, then I'm afraid such proposal does not agree with the existing items
> PORT_REPRESENTOR and REPRESENTED_PORT. They do exactly the same
> thing, but they are designed to be way more generic. Why not use them?
> 
> Ivan
  
Ivan Malov Sept. 13, 2022, 2:33 p.m. UTC | #4
Hi Rongwei,

PSB

On Tue, 13 Sep 2022, Rongwei Liu wrote:

> Hi
>
> BR
> Rongwei
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Tuesday, September 13, 2022 00:57
>> To: Rongwei Liu <rongweil@nvidia.com>
>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Subject: Re: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi,
>>
>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
>>
>>> The transfer domain rule is able to match traffic wire/vf origin and
>>> it means two directions' underlayer resource.
>>
>> The point of fact is that matching traffic coming from some entity like wire /
>> VF has been long generalised in the form of representors. So, a flow rule with
>> attribute "transfer" is able to match traffic coming from either a
>> REPRESENTED_PORT or from a PORT_REPRESENTOR (please find these items).
>>
>>>
>>> In customer deployments, they usually match only one direction traffic
>>> in single flow table: either from wire or from vf.
>>
>> Which customer deployments? Could you please provide detailed examples?
>>
>>>
>
> We saw a lot of customers' deployment like:
> 1. Match overlay traffic from wire and do decap, then send to specific vport.
> 2. Match specific 5-tuples and do encap, then send to wire.
> The matching criteria has obvious direction preference.

Thank you. My questions are as follows:

In (1), when you say "from wire", do you mean the need to match
packets arriving via whatever physical ports rather then
matching packets arriving from some specific phys. port?

If, however, matching traffic "from wire" in fact means matching
packets arriving from a *specific* physical port, then for sure
item REPRESENTED_PORT should perfectly do the job, and the
proposed attribute is unneeded.

(BTW, in DPDK, it is customary to use term "physical port", not "wire")

In (1), what are "vport"s? Please explain. Once again, I should remind
that, in DPDK, folks prefer terms "represented entity" / "representor"
over vendor-specific terms like "vport", etc.

As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
Could you please explain, why not just add a match item REPRESENTED_PORT
pointing to that VF via its representor? Doing so should perfectly
define the exact direction / traffic source. Isn't that sufficient?

Also please mind that, although I appreciate your explanations here,
on the mailing list, they should finally be added to the commit
message, so that readers do not have to look for them elsewhere.

>
>>> Introduce one new member transfer_mode into rte_flow_attr to indicate
>>> the flow table direction property: from wire, from vf or
>>> bi-direction(default).
>>
>> AFAIK, 'rte_flow_attr' serves both traditional flow rule insertion and
>> asynchronous (table) approach. The patch adds the attributes to generic
>> 'rte_flow_attr' but, for some reason, ignores non-table rules.
>>
>>>
> Sync API uses one rule to contain everything. It' hard for PMD to determine if this rule has direction preference or not.
> Image a situation, just for an example:
> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
> 1 and 2 share the same matching conditions (eth / ipv4 / udp / vxlan /...), so sync API consider them share matching determination logic.
> It means "2" have 1M scale capability too. Obviously, it wastes a lot of resources.

Strictly speaking, they do not share the same match pattern.
Your example clearly shows that, in (1), the pattern should
request packets coming from "vport 1" and, in (2), packets
coming from "vport 0".

My point is simple: the "vport" from which packets enter
the embedded switch is ALSO a match criterion. If you
accept this, you'll see: the matching conditions differ.

>
> In async API, there is pattern_template introduced. We can mark "1" to use pattern_tempate id 1 and "2" to use pattern_template 2.
> They will be separated from each other, don't share anymore.

Consider an example. "Wire" is a physical port represented by PF0 which,
in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is attached to
guest and is represented by a representor ethdev 1 in DPDK.

So, some rules (template 1) are needed to deliver packets from "wire"
to "VF" and also decapsulate them. And some rules (template 2) are
needed to deliver packets in the opposite direction, from "VF"
to "wire" and also encapsulate them.

My question is, what prevents you from adding match item 
REPRESENTED_PORT[ethdev_id=0] to the pattern template 1
and REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?

As I said previously, if you insert such item before eth / ipv4 / etc
to your match pattern, doing so defines an *exact* direction / source.

>
>> For example, the diff below adds the attributes to "table" commands in
>> testpmd but does not add them to regular (non-table) commands like "flow
>> create". Why?
>>
>>>
>
> "table" command limits pattern_template to single direction or bidirection per user specified attribute.

As I say above, the same effect can be achieved by adding item
REPRESENTED_PORT to the corresponding pattern template.

> "rule" command must tight with one "table_id", so the rule will inherit the "table" direction property, no need to specify again.

You migh've misunderstood. I do not talk about "rule" command coupled with
some "table". What I talk about is regular, NON-async flow insertion
commands.

Please take a look at section "/* Validate/create attributes. */" in
file "app/test-pmd/cmdline_flow.c". When one adds a new flow attribute,
they should reflect it the same way as VC_INGRESS, VC_TRANSFER, etc.

That's it.

But, as I say, I still believe that the new attributes aren't needed.

>
>>> It helps to save underlayer memory also on insertion rate.
>>
>> Which memory? Host memory? NIC memory? Term "underlayer" is vague.
>> I suggest that the commit message be revised to first explain how such
>> memory is spent currently, then explain why this is not optimal and, finally,
>> which way the patch is supposed to improve that. I.e. be more specific.
>>
>>>
>
> For large scalable rules, HW (depends on implementation) always needs memory to hold the rules' patterns and actions, either from NIC or from host.
> The memory footprint highly depends on "user rules' complexity", also diff between NICs.
> ~50% memory saving is expected if one-direction is cut.

Regardless of this talk, this explanation should probably be present in
the commit description.

>
>>> By default, the transfer domain is bi-direction, and no behavior changes.
>>>
>>> 1. Match wire origin only
>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
>>> 2. Match vf origin only
>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
>>>
>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
>>> ---
>>> app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
>>> 3 files changed, 36 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
>>> index 7f50028eb7..b25b595e82 100644
>>> --- a/app/test-pmd/cmdline_flow.c
>>> +++ b/app/test-pmd/cmdline_flow.c
>>> @@ -177,6 +177,8 @@ enum index {
>>>       TABLE_INGRESS,
>>>       TABLE_EGRESS,
>>>       TABLE_TRANSFER,
>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>> +     TABLE_TRANSFER_VF_ORIG,
>>>       TABLE_RULES_NUMBER,
>>>       TABLE_PATTERN_TEMPLATE,
>>>       TABLE_ACTIONS_TEMPLATE,
>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
>>>       TABLE_INGRESS,
>>>       TABLE_EGRESS,
>>>       TABLE_TRANSFER,
>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>> +     TABLE_TRANSFER_VF_ORIG,
>>>       TABLE_RULES_NUMBER,
>>>       TABLE_PATTERN_TEMPLATE,
>>>       TABLE_ACTIONS_TEMPLATE,
>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
>>>               .next = NEXT(next_table_attr),
>>>               .call = parse_table,
>>>       },
>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
>>> +             .name = "wire_orig",
>>> +             .help = "affect rule direction to transfer",
>>
>> This does not explain the "wire" aspect. It's too broad.
>>
>>> +             .next = NEXT(next_table_attr),
>>> +             .call = parse_table,
>>> +     },
>>> +     [TABLE_TRANSFER_VF_ORIG] = {
>>> +             .name = "vf_orig",
>>> +             .help = "affect rule direction to transfer",
>>
>> This explanation simply duplicates such of the "wire_orig".
>> It does not explain the "vf" part. Should be more specific.
>>
>>> +             .next = NEXT(next_table_attr),
>>> +             .call = parse_table,
>>> +     },
>>>       [TABLE_RULES_NUMBER] = {
>>>               .name = "rules_number",
>>>               .help = "number of rules in table", @@ -8894,6 +8910,16
>>> @@ parse_table(struct context *ctx, const struct token *token,
>>>       case TABLE_TRANSFER:
>>>               out->args.table.attr.flow_attr.transfer = 1;
>>>               return len;
>>> +     case TABLE_TRANSFER_WIRE_ORIG:
>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>> +                     return -1;
>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
>>> +             return len;
>>> +     case TABLE_TRANSFER_VF_ORIG:
>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>> +                     return -1;
>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
>>> +             return len;
>>>       default:
>>>               return -1;
>>>       }
>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> index 330e34427d..603b7988dd 100644
>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> @@ -3332,7 +3332,8 @@ It is bound to
>> ``rte_flow_template_table_create()``::
>>>
>>>   flow template_table {port_id} create
>>>       [table_id {id}] [group {group_id}]
>>> -       [priority {level}] [ingress] [egress] [transfer]
>>> +       [priority {level}] [ingress] [egress]
>>> +       [transfer [vf_orig] [wire_orig]]
>>
>> Is it correct? Shouldn't it rather be
>> [transfer] [vf_orig] [wire_orig]
>> ?
>>
>>>       rules_number {number}
>>>       pattern_template {pattern_template_id}
>>>       actions_template {actions_template_id} diff --git
>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>>> a79f1e7ef0..512b08d817 100644
>>> --- a/lib/ethdev/rte_flow.h
>>> +++ b/lib/ethdev/rte_flow.h
>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
>>>        */
>>>       uint32_t transfer:1;
>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
>>> +     /**
>>> +      * 0 means bidirection,
>>> +      * 0x1 origin uplink,
>>
>> What does "uplink" mean? It's too vague. Hardly a good term.
>>
>>> +      * 0x2 origin vport,
>>
>> What does "origin vport" mean? Hardly a good term as well.
>>
>>> +      * N/A both set.
>>
>> What's this?
>>
>>> +      */
>>> +     uint32_t transfer_mode:2;
>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
>>> };
>>>
>>> /**
>>> --
>>> 2.27.0
>>>
>>
>> Since the attributes are added to generic 'struct rte_flow_attr', non-table
>> (synchronous) flow rules are supposed to support them, too. If that is indeed
>> the case, then I'm afraid such proposal does not agree with the existing items
>> PORT_REPRESENTOR and REPRESENTED_PORT. They do exactly the same
>> thing, but they are designed to be way more generic. Why not use them?

The question stands.

>>
>> Ivan
>

Ivan
  
Rongwei Liu Sept. 14, 2022, 5:16 a.m. UTC | #5
HI

BR
Rongwei

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Tuesday, September 13, 2022 22:33
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
> Darawsheh <rasland@nvidia.com>
> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
> table
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Rongwei,
> 
> PSB
> 
> On Tue, 13 Sep 2022, Rongwei Liu wrote:
> 
> > Hi
> >
> > BR
> > Rongwei
> >
> >> -----Original Message-----
> >> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >> Sent: Tuesday, September 13, 2022 00:57
> >> To: Rongwei Liu <rongweil@nvidia.com>
> >> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> >> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> >> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> >> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
> >> Raslan Darawsheh <rasland@nvidia.com>
> >> Subject: Re: [PATCH v1] ethdev: add direction info when creating the
> >> transfer table
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi,
> >>
> >> On Wed, 7 Sep 2022, Rongwei Liu wrote:
> >>
> >>> The transfer domain rule is able to match traffic wire/vf origin and
> >>> it means two directions' underlayer resource.
> >>
> >> The point of fact is that matching traffic coming from some entity
> >> like wire / VF has been long generalised in the form of representors.
> >> So, a flow rule with attribute "transfer" is able to match traffic
> >> coming from either a REPRESENTED_PORT or from a PORT_REPRESENTOR
> (please find these items).
> >>
> >>>
> >>> In customer deployments, they usually match only one direction
> >>> traffic in single flow table: either from wire or from vf.
> >>
> >> Which customer deployments? Could you please provide detailed examples?
> >>
> >>>
> >
> > We saw a lot of customers' deployment like:
> > 1. Match overlay traffic from wire and do decap, then send to specific vport.
> > 2. Match specific 5-tuples and do encap, then send to wire.
> > The matching criteria has obvious direction preference.
> 
> Thank you. My questions are as follows:
> 
> In (1), when you say "from wire", do you mean the need to match packets
> arriving via whatever physical ports rather then matching packets arriving
> from some specific phys. port?
> 
> If, however, matching traffic "from wire" in fact means matching packets
> arriving from a *specific* physical port, then for sure item
> REPRESENTED_PORT should perfectly do the job, and the proposed attribute is
> unneeded.
> 
> (BTW, in DPDK, it is customary to use term "physical port", not "wire")
> 
> In (1), what are "vport"s? Please explain. Once again, I should remind that, in
> DPDK, folks prefer terms "represented entity" / "representor"
> over vendor-specific terms like "vport", etc.
> 
Vport is virtual port for short such as VF.
> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
> Could you please explain, why not just add a match item REPRESENTED_PORT
> pointing to that VF via its representor? Doing so should perfectly define the
> exact direction / traffic source. Isn't that sufficient?
> 
Per my view, there is matching field and matching value difference.
Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as same or different matching criteria?
I would like to call them same since it can be summarized like 1.1.1.0/30
REPRESENTED_PORT is just another matching item, no essential differences and it can't stand for direction info.
Port id depends on the attach sequence.
> Also please mind that, although I appreciate your explanations here, on the
> mailing list, they should finally be added to the commit message, so that
> readers do not have to look for them elsewhere.
> 
We have explained the high possibility of single-direction matching, right?
It' hard to list all the possibilities of traffic matching preferences.
The underlay is the one we have met for now.
> >
> >>> Introduce one new member transfer_mode into rte_flow_attr to
> >>> indicate the flow table direction property: from wire, from vf or
> >>> bi-direction(default).
> >>
> >> AFAIK, 'rte_flow_attr' serves both traditional flow rule insertion
> >> and asynchronous (table) approach. The patch adds the attributes to
> >> generic 'rte_flow_attr' but, for some reason, ignores non-table rules.
> >>
> >>>
> > Sync API uses one rule to contain everything. It' hard for PMD to determine
> if this rule has direction preference or not.
> > Image a situation, just for an example:
> > 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
> > 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
> > 1 and 2 share the same matching conditions (eth / ipv4 / udp / vxlan /...), so
> sync API consider them share matching determination logic.
> > It means "2" have 1M scale capability too. Obviously, it wastes a lot of
> resources.
> 
> Strictly speaking, they do not share the same match pattern.
> Your example clearly shows that, in (1), the pattern should request packets
> coming from "vport 1" and, in (2), packets coming from "vport 0".
> 
> My point is simple: the "vport" from which packets enter the embedded switch
> is ALSO a match criterion. If you accept this, you'll see: the matching
> conditions differ.
> 
See above.
In this case, I think the matching fields are both "port_id + ipv4_vxlan". They are same.
Only differs with values like vni 100 or 200 vice versa.
> >
> > In async API, there is pattern_template introduced. We can mark "1" to use
> pattern_tempate id 1 and "2" to use pattern_template 2.
> > They will be separated from each other, don't share anymore.
> 
> Consider an example. "Wire" is a physical port represented by PF0 which, in
> turn, is attached to DPDK via ethdev 0. "VF" (vport?) is attached to guest and is
> represented by a representor ethdev 1 in DPDK.
> 
> So, some rules (template 1) are needed to deliver packets from "wire"
> to "VF" and also decapsulate them. And some rules (template 2) are needed to
> deliver packets in the opposite direction, from "VF"
> to "wire" and also encapsulate them.
> 
> My question is, what prevents you from adding match item
> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
> 
> As I said previously, if you insert such item before eth / ipv4 / etc to your
> match pattern, doing so defines an *exact* direction / source.
> 
Could you check the async API guidance? I think pattern template focusing on the matching field (mask).
"REPRESENTED_PORT[ethdev_id=0] " and "REPRESENTED_PORT[ethdev_id=1] "are the same.
1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
2. action template: action1 / actions2. / 
3. table create with pattern_template plus action template..
REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create REPRESENTED_PORT port_id is 0 / actions ....
REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create REPRESENTED_PORT port_id is 1 / actions ....

> >
> >> For example, the diff below adds the attributes to "table" commands
> >> in testpmd but does not add them to regular (non-table) commands like
> >> "flow create". Why?
> >>
> >>>
> >
> > "table" command limits pattern_template to single direction or bidirection
> per user specified attribute.
> 
> As I say above, the same effect can be achieved by adding item
> REPRESENTED_PORT to the corresponding pattern template.
See above.
> 
> > "rule" command must tight with one "table_id", so the rule will inherit the
> "table" direction property, no need to specify again.
> 
> You migh've misunderstood. I do not talk about "rule" command coupled with
> some "table". What I talk about is regular, NON-async flow insertion
> commands.
> 
> Please take a look at section "/* Validate/create attributes. */" in file
> "app/test-pmd/cmdline_flow.c". When one adds a new flow attribute, they
> should reflect it the same way as VC_INGRESS, VC_TRANSFER, etc.
> 
> That's it.
We don't intend to pass this to sync API. The above code example is for sync API.
> 
> But, as I say, I still believe that the new attributes aren't needed.
I think we are not at the same page for now. Can we reach agreement on the same
matching criteria first?
> >
> >>> It helps to save underlayer memory also on insertion rate.
> >>
> >> Which memory? Host memory? NIC memory? Term "underlayer" is vague.
> >> I suggest that the commit message be revised to first explain how
> >> such memory is spent currently, then explain why this is not optimal
> >> and, finally, which way the patch is supposed to improve that. I.e. be more
> specific.
> >>
> >>>
> >
> > For large scalable rules, HW (depends on implementation) always needs
> memory to hold the rules' patterns and actions, either from NIC or from host.
> > The memory footprint highly depends on "user rules' complexity", also diff
> between NICs.
> > ~50% memory saving is expected if one-direction is cut.
> 
> Regardless of this talk, this explanation should probably be present in the
> commit description.
>
This number may differ with different NICs or implementation. We can't say it for sure.
> >
> >>> By default, the transfer domain is bi-direction, and no behavior changes.
> >>>
> >>> 1. Match wire origin only
> >>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
> >>> 2. Match vf origin only
> >>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
> >>>
> >>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
> >>> ---
> >>> app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
> >>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
> >>> lib/ethdev/rte_flow.h                       |  9 ++++++-
> >>> 3 files changed, 36 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/app/test-pmd/cmdline_flow.c
> >>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82 100644
> >>> --- a/app/test-pmd/cmdline_flow.c
> >>> +++ b/app/test-pmd/cmdline_flow.c
> >>> @@ -177,6 +177,8 @@ enum index {
> >>>       TABLE_INGRESS,
> >>>       TABLE_EGRESS,
> >>>       TABLE_TRANSFER,
> >>> +     TABLE_TRANSFER_WIRE_ORIG,
> >>> +     TABLE_TRANSFER_VF_ORIG,
> >>>       TABLE_RULES_NUMBER,
> >>>       TABLE_PATTERN_TEMPLATE,
> >>>       TABLE_ACTIONS_TEMPLATE,
> >>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
> >>>       TABLE_INGRESS,
> >>>       TABLE_EGRESS,
> >>>       TABLE_TRANSFER,
> >>> +     TABLE_TRANSFER_WIRE_ORIG,
> >>> +     TABLE_TRANSFER_VF_ORIG,
> >>>       TABLE_RULES_NUMBER,
> >>>       TABLE_PATTERN_TEMPLATE,
> >>>       TABLE_ACTIONS_TEMPLATE,
> >>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
> >>>               .next = NEXT(next_table_attr),
> >>>               .call = parse_table,
> >>>       },
> >>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
> >>> +             .name = "wire_orig",
> >>> +             .help = "affect rule direction to transfer",
> >>
> >> This does not explain the "wire" aspect. It's too broad.
> >>
> >>> +             .next = NEXT(next_table_attr),
> >>> +             .call = parse_table,
> >>> +     },
> >>> +     [TABLE_TRANSFER_VF_ORIG] = {
> >>> +             .name = "vf_orig",
> >>> +             .help = "affect rule direction to transfer",
> >>
> >> This explanation simply duplicates such of the "wire_orig".
> >> It does not explain the "vf" part. Should be more specific.
> >>
> >>> +             .next = NEXT(next_table_attr),
> >>> +             .call = parse_table,
> >>> +     },
> >>>       [TABLE_RULES_NUMBER] = {
> >>>               .name = "rules_number",
> >>>               .help = "number of rules in table", @@ -8894,6
> >>> +8910,16 @@ parse_table(struct context *ctx, const struct token *token,
> >>>       case TABLE_TRANSFER:
> >>>               out->args.table.attr.flow_attr.transfer = 1;
> >>>               return len;
> >>> +     case TABLE_TRANSFER_WIRE_ORIG:
> >>> +             if (!out->args.table.attr.flow_attr.transfer)
> >>> +                     return -1;
> >>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
> >>> +             return len;
> >>> +     case TABLE_TRANSFER_VF_ORIG:
> >>> +             if (!out->args.table.attr.flow_attr.transfer)
> >>> +                     return -1;
> >>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
> >>> +             return len;
> >>>       default:
> >>>               return -1;
> >>>       }
> >>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>> index 330e34427d..603b7988dd 100644
> >>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>> @@ -3332,7 +3332,8 @@ It is bound to
> >> ``rte_flow_template_table_create()``::
> >>>
> >>>   flow template_table {port_id} create
> >>>       [table_id {id}] [group {group_id}]
> >>> -       [priority {level}] [ingress] [egress] [transfer]
> >>> +       [priority {level}] [ingress] [egress]
> >>> +       [transfer [vf_orig] [wire_orig]]
> >>
> >> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
> >> [wire_orig] ?
> >>
> >>>       rules_number {number}
> >>>       pattern_template {pattern_template_id}
> >>>       actions_template {actions_template_id} diff --git
> >>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> >>> a79f1e7ef0..512b08d817 100644
> >>> --- a/lib/ethdev/rte_flow.h
> >>> +++ b/lib/ethdev/rte_flow.h
> >>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
> >>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
> >>>        */
> >>>       uint32_t transfer:1;
> >>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
> >>> +     /**
> >>> +      * 0 means bidirection,
> >>> +      * 0x1 origin uplink,
> >>
> >> What does "uplink" mean? It's too vague. Hardly a good term.
> >>
> >>> +      * 0x2 origin vport,
> >>
> >> What does "origin vport" mean? Hardly a good term as well.
> >>
> >>> +      * N/A both set.
> >>
> >> What's this?
> >>
> >>> +      */
> >>> +     uint32_t transfer_mode:2;
> >>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
> >>> };
> >>>
> >>> /**
> >>> --
> >>> 2.27.0
> >>>
> >>
> >> Since the attributes are added to generic 'struct rte_flow_attr',
> >> non-table
> >> (synchronous) flow rules are supposed to support them, too. If that
> >> is indeed the case, then I'm afraid such proposal does not agree with
> >> the existing items PORT_REPRESENTOR and REPRESENTED_PORT. They do
> >> exactly the same thing, but they are designed to be way more generic. Why
> not use them?
> 
> The question stands.
> 
> >>
> >> Ivan
> >
> 
> Ivan
  
Ivan Malov Sept. 14, 2022, 7:32 a.m. UTC | #6
Hi,

On Wed, 14 Sep 2022, Rongwei Liu wrote:

> HI
>
> BR
> Rongwei
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Tuesday, September 13, 2022 22:33
>> To: Rongwei Liu <rongweil@nvidia.com>
>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi Rongwei,
>>
>> PSB
>>
>> On Tue, 13 Sep 2022, Rongwei Liu wrote:
>>
>>> Hi
>>>
>>> BR
>>> Rongwei
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>> Sent: Tuesday, September 13, 2022 00:57
>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>> Subject: Re: [PATCH v1] ethdev: add direction info when creating the
>>>> transfer table
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
>>>>
>>>>> The transfer domain rule is able to match traffic wire/vf origin and
>>>>> it means two directions' underlayer resource.
>>>>
>>>> The point of fact is that matching traffic coming from some entity
>>>> like wire / VF has been long generalised in the form of representors.
>>>> So, a flow rule with attribute "transfer" is able to match traffic
>>>> coming from either a REPRESENTED_PORT or from a PORT_REPRESENTOR
>> (please find these items).
>>>>
>>>>>
>>>>> In customer deployments, they usually match only one direction
>>>>> traffic in single flow table: either from wire or from vf.
>>>>
>>>> Which customer deployments? Could you please provide detailed examples?
>>>>
>>>>>
>>>
>>> We saw a lot of customers' deployment like:
>>> 1. Match overlay traffic from wire and do decap, then send to specific vport.
>>> 2. Match specific 5-tuples and do encap, then send to wire.
>>> The matching criteria has obvious direction preference.
>>
>> Thank you. My questions are as follows:
>>
>> In (1), when you say "from wire", do you mean the need to match packets
>> arriving via whatever physical ports rather then matching packets arriving
>> from some specific phys. port?

^^

Could you please find my question above? Based on your understanding
of templates in async flow approach, an answer to this question may
help us find the common ground.

--

>>
>> If, however, matching traffic "from wire" in fact means matching packets
>> arriving from a *specific* physical port, then for sure item
>> REPRESENTED_PORT should perfectly do the job, and the proposed attribute is
>> unneeded.
>>
>> (BTW, in DPDK, it is customary to use term "physical port", not "wire")
>>
>> In (1), what are "vport"s? Please explain. Once again, I should remind that, in
>> DPDK, folks prefer terms "represented entity" / "representor"
>> over vendor-specific terms like "vport", etc.
>>
> Vport is virtual port for short such as VF.

Thanks. As I say, term "vport" might be confusing to some readers,
so it'd be better to provide this explanation (about VF)
in the commit description next time.

>> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
>> Could you please explain, why not just add a match item REPRESENTED_PORT
>> pointing to that VF via its representor? Doing so should perfectly define the
>> exact direction / traffic source. Isn't that sufficient?
>>
> Per my view, there is matching field and matching value difference.
> Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as same or different matching criteria?
> I would like to call them same since it can be summarized like 1.1.1.0/30
> REPRESENTED_PORT is just another matching item, no essential differences and it can't stand for direction info.

It looks like we're starting to run into disagreement here.
There's no "direction" at all. There's an embedded switch
inside the NIC, and there're (logical) switch ports that
packets enter the switch from.

When the user submits a "transfer" rule and does not provide
neither REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern,
the embedded switch is supposed to match packets coming from
ANY ports, be it VFs or physical (wire) ports.

But when the user provides, in example, item REPRESENTED_PORT
to point to the physical (wire) port, the embedded switch
knows exactly which port the packets should enter it from.
In this case, it is supposed to match only packets coming
from that physical port. And this should be sufficient.
This in fact replaces the need to know a "direction".
It's just an exact specification of packet's origin.

> Port id depends on the attach sequence.

Unfortunately, this is hardly a good argument because flow rules
are supposed to be inserted based on the run-time packet
learning. Attach sequence is a don't care here.

>> Also please mind that, although I appreciate your explanations here, on the
>> mailing list, they should finally be added to the commit message, so that
>> readers do not have to look for them elsewhere.
>>
> We have explained the high possibility of single-direction matching, right?

Not quite. As I said, it is not correct to assume any "direction", like in
geographical sense ("north", "south", etc.). Application has ethdevs, and
they are representors of some "virtual ports" (in your terminology)
belonging to the switch, for example, VFs, SFs or physical ports.

The user adds an appropriate item to the pattern (REPRESENTED_PORT),
and doing so specifies the packet path which it enters the switch.

> It' hard to list all the possibilities of traffic matching preferences.

And let's say more: one need never do this. That's exactly the reason
why DPDK has abandoned the concept of "direction" in *transfer* rules
and switched to the use of precise criteria (REPRESENTED_PORT, etc.).

> The underlay is the one we have met for now.
>>>
>>>>> Introduce one new member transfer_mode into rte_flow_attr to
>>>>> indicate the flow table direction property: from wire, from vf or
>>>>> bi-direction(default).
>>>>
>>>> AFAIK, 'rte_flow_attr' serves both traditional flow rule insertion
>>>> and asynchronous (table) approach. The patch adds the attributes to
>>>> generic 'rte_flow_attr' but, for some reason, ignores non-table rules.
>>>>
>>>>>
>>> Sync API uses one rule to contain everything. It' hard for PMD to determine
>> if this rule has direction preference or not.
>>> Image a situation, just for an example:
>>> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
>>> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
>>> 1 and 2 share the same matching conditions (eth / ipv4 / udp / vxlan /...), so
>> sync API consider them share matching determination logic.
>>> It means "2" have 1M scale capability too. Obviously, it wastes a lot of
>> resources.
>>
>> Strictly speaking, they do not share the same match pattern.
>> Your example clearly shows that, in (1), the pattern should request packets
>> coming from "vport 1" and, in (2), packets coming from "vport 0".
>>
>> My point is simple: the "vport" from which packets enter the embedded switch
>> is ALSO a match criterion. If you accept this, you'll see: the matching
>> conditions differ.
>>
> See above.
> In this case, I think the matching fields are both "port_id + ipv4_vxlan". They are same.
> Only differs with values like vni 100 or 200 vice versa.

Not quite. Look closer: you use *different* port IDs for (1) and (2).
The value of "ethdev_id" field in item REPRESENTED_PORT differs.

>>>
>>> In async API, there is pattern_template introduced. We can mark "1" to use
>> pattern_tempate id 1 and "2" to use pattern_template 2.
>>> They will be separated from each other, don't share anymore.
>>
>> Consider an example. "Wire" is a physical port represented by PF0 which, in
>> turn, is attached to DPDK via ethdev 0. "VF" (vport?) is attached to guest and is
>> represented by a representor ethdev 1 in DPDK.
>>
>> So, some rules (template 1) are needed to deliver packets from "wire"
>> to "VF" and also decapsulate them. And some rules (template 2) are needed to
>> deliver packets in the opposite direction, from "VF"
>> to "wire" and also encapsulate them.
>>
>> My question is, what prevents you from adding match item
>> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
>> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
>>
>> As I said previously, if you insert such item before eth / ipv4 / etc to your
>> match pattern, doing so defines an *exact* direction / source.
>>
> Could you check the async API guidance? I think pattern template focusing on the matching field (mask).
> "REPRESENTED_PORT[ethdev_id=0] " and "REPRESENTED_PORT[ethdev_id=1] "are the same.
> 1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
> 2. action template: action1 / actions2. /
> 3. table create with pattern_template plus action template..
> REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create REPRESENTED_PORT port_id is 0 / actions ....
> REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create REPRESENTED_PORT port_id is 1 / actions ....

OK, so, based on this explanation, it appears that
you might be looking to refer to:
a) a *set* of any physical (wire) ports
b) a *set* of any guest ports (VFs)

You chose to achieve this using an attribute, but:

1) as I explained above, the use of term "direction" is wrong;
    please hear me out: I'm not saying that your use case and
    your optimisation is wrong: I'm saying that naming for it
    is wrong: it has nothing to do with "direction";

2) while naming a *set* of wire ports as "wire_orig" might be OK,
    sticking with term "vf_orig" for a *set* of guest ports is
    clearly not, simply because the user may pass another PF
    to a guest instead of passing a VF; in other words,
    a better term is needed here;

3) since it is possible to plug multiple NICs to a DPDK application,
    even from different vendors, the user may end up having multiple
    physical ports belonging to different physical NICs attached to
    the application; if this is the case, then referring to a *set*
    of wire ports using the new attribute is ambiguous in the
    sense that it's unclear whether this applies only to
    wire ports of some specific physical NIC or to the
    physical ports of *all* NICs managed by the app;

4) adding an attribute instead of yet another pattern item type
    is not quite good because PMDs need to be updated separately
    to detect this attribute and throw an error if it's not
    supported, whilst with a new item type, the PMDs do not
    need to be updated = if a PMD sees an unsupported item
    while traversing the item with switch () { case }, it
    will anyway throw an error;

5) as in (4), a new attribute is not good from documentation
    standpoint; plase search for "represented_port = Y" in
    documentation = this way, all supported items are
    easily defined for various NIC vendors, but the
    same isn't true for attributes = there is no
    way to indicate supported attributes in doc.

If points (1 - 5) make sense to you, then, if I may be so bold,
I'd like to suggest that the idea of adding a new attribute be
abandoned. Instead, I'd like to suggest adding new items:

(the names are just sketch, for sure, it should be discussed)

ANY_PHY_PORTS { switch_domain_id }
  = match packets entering the embedded switch from *whatever*
    physical ports belonging to the given switch domain

ANY_GUEST_PORTS { switch_domain_id }
  = match packets entering the embedded switch from *whatever*
    guest ports (VFs, PFs, etc.) belonging to the given
    switch domain

The field "switch_domain_id" is required to tell one physical
board / vendor from another (as I explained in point (3)).
The application can query this parameter from ethdev's
switch info: please see "struct rte_eth_switch_info".

What's your opinion?

>
>>>
>>>> For example, the diff below adds the attributes to "table" commands
>>>> in testpmd but does not add them to regular (non-table) commands like
>>>> "flow create". Why?
>>>>
>>>>>
>>>
>>> "table" command limits pattern_template to single direction or bidirection
>> per user specified attribute.
>>
>> As I say above, the same effect can be achieved by adding item
>> REPRESENTED_PORT to the corresponding pattern template.
> See above.
>>
>>> "rule" command must tight with one "table_id", so the rule will inherit the
>> "table" direction property, no need to specify again.
>>
>> You migh've misunderstood. I do not talk about "rule" command coupled with
>> some "table". What I talk about is regular, NON-async flow insertion
>> commands.
>>
>> Please take a look at section "/* Validate/create attributes. */" in file
>> "app/test-pmd/cmdline_flow.c". When one adds a new flow attribute, they
>> should reflect it the same way as VC_INGRESS, VC_TRANSFER, etc.
>>
>> That's it.
> We don't intend to pass this to sync API. The above code example is for sync API.

So I understand. But there's one slight problem: in your patch, you add
the new attributes to the structure which is *shared* between sync and
async use case scenarios. If one adds an attribute to this structure,
they have to provide accessors for it in all sync-related commands
in testpmd, but your patch does not do that.

In other words, it is wrong to assume that "struct rte_flow_attr" only
applies to async approach. It had been introduced long before the
async flow design was added to DPDK. That's it.

>>
>> But, as I say, I still believe that the new attributes aren't needed.
> I think we are not at the same page for now. Can we reach agreement on the same
> matching criteria first?
>>>
>>>>> It helps to save underlayer memory also on insertion rate.
>>>>
>>>> Which memory? Host memory? NIC memory? Term "underlayer" is vague.
>>>> I suggest that the commit message be revised to first explain how
>>>> such memory is spent currently, then explain why this is not optimal
>>>> and, finally, which way the patch is supposed to improve that. I.e. be more
>> specific.
>>>>
>>>>>
>>>
>>> For large scalable rules, HW (depends on implementation) always needs
>> memory to hold the rules' patterns and actions, either from NIC or from host.
>>> The memory footprint highly depends on "user rules' complexity", also diff
>> between NICs.
>>> ~50% memory saving is expected if one-direction is cut.
>>
>> Regardless of this talk, this explanation should probably be present in the
>> commit description.
>>
> This number may differ with different NICs or implementation. We can't say it for sure.

Not an exact number, of course, but a brief explanation of:
a) what is wrong / not optimal in the current design;
b) how it is observed in customer deployments;
c) why the proposed patch is a good solution.

>>>
>>>>> By default, the transfer domain is bi-direction, and no behavior changes.
>>>>>
>>>>> 1. Match wire origin only
>>>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
>>>>> 2. Match vf origin only
>>>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
>>>>>
>>>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
>>>>> ---
>>>>> app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
>>>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
>>>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
>>>>> 3 files changed, 36 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/app/test-pmd/cmdline_flow.c
>>>>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82 100644
>>>>> --- a/app/test-pmd/cmdline_flow.c
>>>>> +++ b/app/test-pmd/cmdline_flow.c
>>>>> @@ -177,6 +177,8 @@ enum index {
>>>>>       TABLE_INGRESS,
>>>>>       TABLE_EGRESS,
>>>>>       TABLE_TRANSFER,
>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>       TABLE_RULES_NUMBER,
>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
>>>>>       TABLE_INGRESS,
>>>>>       TABLE_EGRESS,
>>>>>       TABLE_TRANSFER,
>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>       TABLE_RULES_NUMBER,
>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
>>>>>               .next = NEXT(next_table_attr),
>>>>>               .call = parse_table,
>>>>>       },
>>>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
>>>>> +             .name = "wire_orig",
>>>>> +             .help = "affect rule direction to transfer",
>>>>
>>>> This does not explain the "wire" aspect. It's too broad.
>>>>
>>>>> +             .next = NEXT(next_table_attr),
>>>>> +             .call = parse_table,
>>>>> +     },
>>>>> +     [TABLE_TRANSFER_VF_ORIG] = {
>>>>> +             .name = "vf_orig",
>>>>> +             .help = "affect rule direction to transfer",
>>>>
>>>> This explanation simply duplicates such of the "wire_orig".
>>>> It does not explain the "vf" part. Should be more specific.
>>>>
>>>>> +             .next = NEXT(next_table_attr),
>>>>> +             .call = parse_table,
>>>>> +     },
>>>>>       [TABLE_RULES_NUMBER] = {
>>>>>               .name = "rules_number",
>>>>>               .help = "number of rules in table", @@ -8894,6
>>>>> +8910,16 @@ parse_table(struct context *ctx, const struct token *token,
>>>>>       case TABLE_TRANSFER:
>>>>>               out->args.table.attr.flow_attr.transfer = 1;
>>>>>               return len;
>>>>> +     case TABLE_TRANSFER_WIRE_ORIG:
>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>> +                     return -1;
>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
>>>>> +             return len;
>>>>> +     case TABLE_TRANSFER_VF_ORIG:
>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>> +                     return -1;
>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
>>>>> +             return len;
>>>>>       default:
>>>>>               return -1;
>>>>>       }
>>>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>> index 330e34427d..603b7988dd 100644
>>>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>> @@ -3332,7 +3332,8 @@ It is bound to
>>>> ``rte_flow_template_table_create()``::
>>>>>
>>>>>   flow template_table {port_id} create
>>>>>       [table_id {id}] [group {group_id}]
>>>>> -       [priority {level}] [ingress] [egress] [transfer]
>>>>> +       [priority {level}] [ingress] [egress]
>>>>> +       [transfer [vf_orig] [wire_orig]]
>>>>
>>>> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
>>>> [wire_orig] ?
>>>>
>>>>>       rules_number {number}
>>>>>       pattern_template {pattern_template_id}
>>>>>       actions_template {actions_template_id} diff --git
>>>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>>>>> a79f1e7ef0..512b08d817 100644
>>>>> --- a/lib/ethdev/rte_flow.h
>>>>> +++ b/lib/ethdev/rte_flow.h
>>>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
>>>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
>>>>>        */
>>>>>       uint32_t transfer:1;
>>>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
>>>>> +     /**
>>>>> +      * 0 means bidirection,
>>>>> +      * 0x1 origin uplink,
>>>>
>>>> What does "uplink" mean? It's too vague. Hardly a good term.

I believe this comment should be reworked, in case
the idea of having an extra attribute persists.

>>>>
>>>>> +      * 0x2 origin vport,
>>>>
>>>> What does "origin vport" mean? Hardly a good term as well.

I still believe this explanation is way too brief and needs
to be reworked to provide more details, to define the
use case for the attribute more specifically.

>>>>
>>>>> +      * N/A both set.
>>>>
>>>> What's this?

The question stands.

>>>>
>>>>> +      */
>>>>> +     uint32_t transfer_mode:2;
>>>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
>>>>> };
>>>>>
>>>>> /**
>>>>> --
>>>>> 2.27.0
>>>>>
>>>>
>>>> Since the attributes are added to generic 'struct rte_flow_attr',
>>>> non-table
>>>> (synchronous) flow rules are supposed to support them, too. If that
>>>> is indeed the case, then I'm afraid such proposal does not agree with
>>>> the existing items PORT_REPRESENTOR and REPRESENTED_PORT. They do
>>>> exactly the same thing, but they are designed to be way more generic. Why
>> not use them?
>>
>> The question stands.
>>
>>>>
>>>> Ivan
>>>
>>
>> Ivan
>
  
Rongwei Liu Sept. 14, 2022, 10:17 a.m. UTC | #7
HI

BR
Rongwei

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Wednesday, September 14, 2022 15:32
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
> Darawsheh <rasland@nvidia.com>
> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
> table
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> 
> On Wed, 14 Sep 2022, Rongwei Liu wrote:
> 
> > HI
> >
> > BR
> > Rongwei
> >
> >> -----Original Message-----
> >> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >> Sent: Tuesday, September 13, 2022 22:33
> >> To: Rongwei Liu <rongweil@nvidia.com>
> >> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> >> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> >> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> >> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
> >> Raslan Darawsheh <rasland@nvidia.com>
> >> Subject: RE: [PATCH v1] ethdev: add direction info when creating the
> >> transfer table
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi Rongwei,
> >>
> >> PSB
> >>
> >> On Tue, 13 Sep 2022, Rongwei Liu wrote:
> >>
> >>> Hi
> >>>
> >>> BR
> >>> Rongwei
> >>>
> >>>> -----Original Message-----
> >>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>>> Sent: Tuesday, September 13, 2022 00:57
> >>>> To: Rongwei Liu <rongweil@nvidia.com>
> >>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> >>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> >>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> >>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> >>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
> >>>> Raslan Darawsheh <rasland@nvidia.com>
> >>>> Subject: Re: [PATCH v1] ethdev: add direction info when creating
> >>>> the transfer table
> >>>>
> >>>> External email: Use caution opening links or attachments
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
> >>>>
> >>>>> The transfer domain rule is able to match traffic wire/vf origin
> >>>>> and it means two directions' underlayer resource.
> >>>>
> >>>> The point of fact is that matching traffic coming from some entity
> >>>> like wire / VF has been long generalised in the form of representors.
> >>>> So, a flow rule with attribute "transfer" is able to match traffic
> >>>> coming from either a REPRESENTED_PORT or from a
> PORT_REPRESENTOR
> >> (please find these items).
> >>>>
> >>>>>
> >>>>> In customer deployments, they usually match only one direction
> >>>>> traffic in single flow table: either from wire or from vf.
> >>>>
> >>>> Which customer deployments? Could you please provide detailed
> examples?
> >>>>
> >>>>>
> >>>
> >>> We saw a lot of customers' deployment like:
> >>> 1. Match overlay traffic from wire and do decap, then send to specific
> vport.
> >>> 2. Match specific 5-tuples and do encap, then send to wire.
> >>> The matching criteria has obvious direction preference.
> >>
> >> Thank you. My questions are as follows:
> >>
> >> In (1), when you say "from wire", do you mean the need to match
> >> packets arriving via whatever physical ports rather then matching
> >> packets arriving from some specific phys. port?
> 
> ^^
> 
> Could you please find my question above? Based on your understanding of
> templates in async flow approach, an answer to this question may help us find
> the common ground.
It means traffic arrived from physical ports (transfer_proxy role) or south band per you concept.
Traffic from vport (not transfer_proxy) or north band per your concept won't hit even if same packets.
> 
> --
> 
> >>
> >> If, however, matching traffic "from wire" in fact means matching
> >> packets arriving from a *specific* physical port, then for sure item
> >> REPRESENTED_PORT should perfectly do the job, and the proposed
> >> attribute is unneeded.
> >>
> >> (BTW, in DPDK, it is customary to use term "physical port", not
> >> "wire")
> >>
> >> In (1), what are "vport"s? Please explain. Once again, I should
> >> remind that, in DPDK, folks prefer terms "represented entity" /
> "representor"
> >> over vendor-specific terms like "vport", etc.
> >>
> > Vport is virtual port for short such as VF.
> 
> Thanks. As I say, term "vport" might be confusing to some readers, so it'd be
> better to provide this explanation (about VF) in the commit description next
> time.
Ack. Will add VF as an example.
> 
> >> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
> >> Could you please explain, why not just add a match item
> >> REPRESENTED_PORT pointing to that VF via its representor? Doing so
> >> should perfectly define the exact direction / traffic source. Isn't that
> sufficient?
> >>
> > Per my view, there is matching field and matching value difference.
> > Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as same or
> different matching criteria?
> > I would like to call them same since it can be summarized like
> > 1.1.1.0/30 REPRESENTED_PORT is just another matching item, no essential
> differences and it can't stand for direction info.
> 
> It looks like we're starting to run into disagreement here.
> There's no "direction" at all. There's an embedded switch inside the NIC, and
> there're (logical) switch ports that packets enter the switch from.
> 
> When the user submits a "transfer" rule and does not provide neither
> REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern, the embedded
> switch is supposed to match packets coming from ANY ports, be it VFs or
> physical (wire) ports.
> 
> But when the user provides, in example, item REPRESENTED_PORT to point to
> the physical (wire) port, the embedded switch knows exactly which port the
> packets should enter it from.
> In this case, it is supposed to match only packets coming from that physical
> port. And this should be sufficient.
> This in fact replaces the need to know a "direction".
> It's just an exact specification of packet's origin.
> 
There is traffic arriving or leaving the switch, so there is always direction, implicit or explicit. 
For transfer rules, there is a concept transfer_proxy. 
It takes the switch ownership; all switch rules should be configured via transfer_proxy.

Image a logic switch with one PF and two VFs.
PF is the transfer proxy and VF belongs to the PF logically. 
When receiving traffic from PF, we can say it comes into the logic switch. 
When packet sent from VF (VF belongs to PF), so we can say traffic leaves the switch.  

Item REPRESENTED_PORT indicates switch to match traffic sent from which port, comes into, or leave switch.
We can say it as one kind of packet metadata.
Like you said, DPDK always treat transfer to match any PORTs traffic. 
When REPRESENTED_PORT is specified, the rules are limited to some dedicated PORTs. 
Other PORTs are ignored because metadata mismatching.
Rules still have the capability to match ANY PORTS if metadata matched. 

This update will allow user to cut the other PORTs matching capabilities.
> > Port id depends on the attach sequence.
> 
> Unfortunately, this is hardly a good argument because flow rules are supposed
> to be inserted based on the run-time packet learning. Attach sequence is a
> don't care here.
> 
> >> Also please mind that, although I appreciate your explanations here,
> >> on the mailing list, they should finally be added to the commit
> >> message, so that readers do not have to look for them elsewhere.
> >>
> > We have explained the high possibility of single-direction matching, right?
> 
> Not quite. As I said, it is not correct to assume any "direction", like in
> geographical sense ("north", "south", etc.). Application has ethdevs, and they
> are representors of some "virtual ports" (in your terminology) belonging to the
> switch, for example, VFs, SFs or physical ports.
> 
> The user adds an appropriate item to the pattern (REPRESENTED_PORT), and
> doing so specifies the packet path which it enters the switch.
> 
> > It' hard to list all the possibilities of traffic matching preferences.
> 
> And let's say more: one need never do this. That's exactly the reason why
> DPDK has abandoned the concept of "direction" in *transfer* rules and
> switched to the use of precise criteria (REPRESENTED_PORT, etc.).
> 
As far as I know, DPDK changes "transfer ingress" to "transfer", so it' more clear that transfer can match both directions (both ingress and egress).
REPRESENTED_PORT is the evolution of "port_id", I think, it' only one kind of matching items.

For large scale deployment like 10M rules, if we can save resources significantly by introducing direction, why not?

Again, async API:
1. pattern template A
2. action template B
3. table C with pattern template A + action template B.
4. rule D, E, F...
The specified REPRESENTED_PORT is provided in rules (D, E, F...) not pattern template A or action template B or table C.
Resources may be allocated early at step 3 since table' rule_nums property.
> > The underlay is the one we have met for now.
> >>>
> >>>>> Introduce one new member transfer_mode into rte_flow_attr to
> >>>>> indicate the flow table direction property: from wire, from vf or
> >>>>> bi-direction(default).
> >>>>
> >>>> AFAIK, 'rte_flow_attr' serves both traditional flow rule insertion
> >>>> and asynchronous (table) approach. The patch adds the attributes to
> >>>> generic 'rte_flow_attr' but, for some reason, ignores non-table rules.
> >>>>
> >>>>>
> >>> Sync API uses one rule to contain everything. It' hard for PMD to
> >>> determine
> >> if this rule has direction preference or not.
> >>> Image a situation, just for an example:
> >>> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
> >>> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
> >>> 1 and 2 share the same matching conditions (eth / ipv4 / udp / vxlan
> >>> /...), so
> >> sync API consider them share matching determination logic.
> >>> It means "2" have 1M scale capability too. Obviously, it wastes a
> >>> lot of
> >> resources.
> >>
> >> Strictly speaking, they do not share the same match pattern.
> >> Your example clearly shows that, in (1), the pattern should request
> >> packets coming from "vport 1" and, in (2), packets coming from "vport 0".
> >>
> >> My point is simple: the "vport" from which packets enter the embedded
> >> switch is ALSO a match criterion. If you accept this, you'll see: the
> >> matching conditions differ.
> >>
> > See above.
> > In this case, I think the matching fields are both "port_id + ipv4_vxlan". They
> are same.
> > Only differs with values like vni 100 or 200 vice versa.
> 
> Not quite. Look closer: you use *different* port IDs for (1) and (2).
> The value of "ethdev_id" field in item REPRESENTED_PORT differs.
> 
> >>>
> >>> In async API, there is pattern_template introduced. We can mark "1"
> >>> to use
> >> pattern_tempate id 1 and "2" to use pattern_template 2.
> >>> They will be separated from each other, don't share anymore.
> >>
> >> Consider an example. "Wire" is a physical port represented by PF0
> >> which, in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is
> >> attached to guest and is represented by a representor ethdev 1 in DPDK.
> >>
> >> So, some rules (template 1) are needed to deliver packets from "wire"
> >> to "VF" and also decapsulate them. And some rules (template 2) are
> >> needed to deliver packets in the opposite direction, from "VF"
> >> to "wire" and also encapsulate them.
> >>
> >> My question is, what prevents you from adding match item
> >> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
> >> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
> >>
> >> As I said previously, if you insert such item before eth / ipv4 / etc
> >> to your match pattern, doing so defines an *exact* direction / source.
> >>
> > Could you check the async API guidance? I think pattern template focusing
> on the matching field (mask).
> > "REPRESENTED_PORT[ethdev_id=0] " and
> "REPRESENTED_PORT[ethdev_id=1] "are the same.
> > 1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
> > 2. action template: action1 / actions2. / 3. table create with
> > pattern_template plus action template..
> > REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create
> REPRESENTED_PORT port_id is 0 / actions ....
> > REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create
> REPRESENTED_PORT port_id is 1 / actions ....
> 
> OK, so, based on this explanation, it appears that you might be looking to refer
> to:
> a) a *set* of any physical (wire) ports
> b) a *set* of any guest ports (VFs)
> 
Great, looks we are more and more closer to the agreement.
> You chose to achieve this using an attribute, but:
> 
> 1) as I explained above, the use of term "direction" is wrong;
>     please hear me out: I'm not saying that your use case and
>     your optimisation is wrong: I'm saying that naming for it
>     is wrong: it has nothing to do with "direction";
> 
Do you have any better naming proposal?
> 2) while naming a *set* of wire ports as "wire_orig" might be OK,
>     sticking with term "vf_orig" for a *set* of guest ports is
>     clearly not, simply because the user may pass another PF
>     to a guest instead of passing a VF; in other words,
>     a better term is needed here;
> 
Like you said, vport may contain VF, SF etc. vport_orgin is on the logic switch perspective.
Any proposal is welcome.
> 3) since it is possible to plug multiple NICs to a DPDK application,
>     even from different vendors, the user may end up having multiple
>     physical ports belonging to different physical NICs attached to
>     the application; if this is the case, then referring to a *set*
>     of wire ports using the new attribute is ambiguous in the
>     sense that it's unclear whether this applies only to
>     wire ports of some specific physical NIC or to the
>     physical ports of *all* NICs managed by the app;
> 
Not matter how many NICs has been probed by the DPDK, there is always switch/PF/VF/SF.. concept.
Each switch must have an owner identified by transfer_proxy(). Vport (VF/SF) can't cross switch in normal case.
The traffic comes from one NIC can't be offloaded by other NICs unless forwarded by the application. 
If user use new attribute to cut one side resource, I think user is smart enough to management the rules in different NICs.
No default behavior changed with this update.

> 4) adding an attribute instead of yet another pattern item type
>     is not quite good because PMDs need to be updated separately
>     to detect this attribute and throw an error if it's not
>     supported, whilst with a new item type, the PMDs do not
>     need to be updated = if a PMD sees an unsupported item
>     while traversing the item with switch () { case }, it
>     will anyway throw an error;
>
PMD also need to check if it supports new matching item or not, right?
We can't assume NIC vendor' PMD implementation, right?
> 5) as in (4), a new attribute is not good from documentation
>     standpoint; plase search for "represented_port = Y" in
>     documentation = this way, all supported items are
>     easily defined for various NIC vendors, but the
>     same isn't true for attributes = there is no
>     way to indicate supported attributes in doc.
>
> If points (1 - 5) make sense to you, then, if I may be so bold, I'd like to suggest
> that the idea of adding a new attribute be abandoned. Instead, I'd like to
> suggest adding new items:
> 
> (the names are just sketch, for sure, it should be discussed)
> 
> ANY_PHY_PORTS { switch_domain_id }
>   = match packets entering the embedded switch from *whatever*
>     physical ports belonging to the given switch domain
> 
How many PHY_PORTS can one switch have, per your thought? Can I treat the PHY_PORTS as the { switch_domain_id } owner as transfer_proxy()?
> ANY_GUEST_PORTS { switch_domain_id }
>   = match packets entering the embedded switch from *whatever*
>     guest ports (VFs, PFs, etc.) belonging to the given
>     switch domain
> 
> The field "switch_domain_id" is required to tell one physical board / vendor
> from another (as I explained in point (3)).
> The application can query this parameter from ethdev's switch info: please see
> "struct rte_eth_switch_info".
> 
> What's your opinion?
> 
How can we handle ANY_PHY_PORTS/ ANY_GUEST_PORTS ' relationship with REPRESENTED_PORT if conflicts?
Need future tuning.
Like I said before,  offloaded rules can't cross different NIC vendor' "switch_domain_id".
If user probes multiple NICs in one application, application should take care of packet forwarding. 
Also application should be aware which ports belong to which NICs. 
> >
> >>>
> >>>> For example, the diff below adds the attributes to "table" commands
> >>>> in testpmd but does not add them to regular (non-table) commands
> >>>> like "flow create". Why?
> >>>>
> >>>>>
> >>>
> >>> "table" command limits pattern_template to single direction or
> >>> bidirection
> >> per user specified attribute.
> >>
> >> As I say above, the same effect can be achieved by adding item
> >> REPRESENTED_PORT to the corresponding pattern template.
> > See above.
> >>
> >>> "rule" command must tight with one "table_id", so the rule will
> >>> inherit the
> >> "table" direction property, no need to specify again.
> >>
> >> You migh've misunderstood. I do not talk about "rule" command coupled
> >> with some "table". What I talk about is regular, NON-async flow
> >> insertion commands.
> >>
> >> Please take a look at section "/* Validate/create attributes. */" in
> >> file "app/test-pmd/cmdline_flow.c". When one adds a new flow
> >> attribute, they should reflect it the same way as VC_INGRESS,
> VC_TRANSFER, etc.
> >>
> >> That's it.
> > We don't intend to pass this to sync API. The above code example is for sync
> API.
> 
> So I understand. But there's one slight problem: in your patch, you add the new
> attributes to the structure which is *shared* between sync and async use case
> scenarios. If one adds an attribute to this structure, they have to provide
> accessors for it in all sync-related commands in testpmd, but your patch does
> not do that.
> 
Like the title said, "creating transfer table" is the ASYNC operation. 
We have limited the scope of this patch. Sync API will be another story.
Maybe we can add one more sentence to emphasize async API again.

> In other words, it is wrong to assume that "struct rte_flow_attr" only applies to
> async approach. It had been introduced long before the async flow design was
> added to DPDK. That's it.
> 
> >>
> >> But, as I say, I still believe that the new attributes aren't needed.
> > I think we are not at the same page for now. Can we reach agreement on
> > the same matching criteria first?
> >>>
> >>>>> It helps to save underlayer memory also on insertion rate.
> >>>>
> >>>> Which memory? Host memory? NIC memory? Term "underlayer" is
> vague.
> >>>> I suggest that the commit message be revised to first explain how
> >>>> such memory is spent currently, then explain why this is not
> >>>> optimal and, finally, which way the patch is supposed to improve
> >>>> that. I.e. be more
> >> specific.
> >>>>
> >>>>>
> >>>
> >>> For large scalable rules, HW (depends on implementation) always
> >>> needs
> >> memory to hold the rules' patterns and actions, either from NIC or from
> host.
> >>> The memory footprint highly depends on "user rules' complexity",
> >>> also diff
> >> between NICs.
> >>> ~50% memory saving is expected if one-direction is cut.
> >>
> >> Regardless of this talk, this explanation should probably be present
> >> in the commit description.
> >>
> > This number may differ with different NICs or implementation. We can't say
> it for sure.
> 
> Not an exact number, of course, but a brief explanation of:
> a) what is wrong / not optimal in the current design;
Please check the commit log, transfer have the capability to match bi-direction traffic no matter what ports.
> b) how it is observed in customer deployments;
Customer have the requirements to save resources and their offloaded rules is direction aware.
> c) why the proposed patch is a good solution.
New attributes provide the way to remove one direction and save underlayer resource.
All of the above can be found in the commit log.

> 

> >>>
> >>>>> By default, the transfer domain is bi-direction, and no behavior changes.
> >>>>>
> >>>>> 1. Match wire origin only
> >>>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
> >>>>> 2. Match vf origin only
> >>>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
> >>>>>
> >>>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
> >>>>> ---
> >>>>> app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
> >>>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
> >>>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
> >>>>> 3 files changed, 36 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/app/test-pmd/cmdline_flow.c
> >>>>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82 100644
> >>>>> --- a/app/test-pmd/cmdline_flow.c
> >>>>> +++ b/app/test-pmd/cmdline_flow.c
> >>>>> @@ -177,6 +177,8 @@ enum index {
> >>>>>       TABLE_INGRESS,
> >>>>>       TABLE_EGRESS,
> >>>>>       TABLE_TRANSFER,
> >>>>> +     TABLE_TRANSFER_WIRE_ORIG,
> >>>>> +     TABLE_TRANSFER_VF_ORIG,
> >>>>>       TABLE_RULES_NUMBER,
> >>>>>       TABLE_PATTERN_TEMPLATE,
> >>>>>       TABLE_ACTIONS_TEMPLATE,
> >>>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
> >>>>>       TABLE_INGRESS,
> >>>>>       TABLE_EGRESS,
> >>>>>       TABLE_TRANSFER,
> >>>>> +     TABLE_TRANSFER_WIRE_ORIG,
> >>>>> +     TABLE_TRANSFER_VF_ORIG,
> >>>>>       TABLE_RULES_NUMBER,
> >>>>>       TABLE_PATTERN_TEMPLATE,
> >>>>>       TABLE_ACTIONS_TEMPLATE,
> >>>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
> >>>>>               .next = NEXT(next_table_attr),
> >>>>>               .call = parse_table,
> >>>>>       },
> >>>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
> >>>>> +             .name = "wire_orig",
> >>>>> +             .help = "affect rule direction to transfer",
> >>>>
> >>>> This does not explain the "wire" aspect. It's too broad.
> >>>>
> >>>>> +             .next = NEXT(next_table_attr),
> >>>>> +             .call = parse_table,
> >>>>> +     },
> >>>>> +     [TABLE_TRANSFER_VF_ORIG] = {
> >>>>> +             .name = "vf_orig",
> >>>>> +             .help = "affect rule direction to transfer",
> >>>>
> >>>> This explanation simply duplicates such of the "wire_orig".
> >>>> It does not explain the "vf" part. Should be more specific.
> >>>>
> >>>>> +             .next = NEXT(next_table_attr),
> >>>>> +             .call = parse_table,
> >>>>> +     },
> >>>>>       [TABLE_RULES_NUMBER] = {
> >>>>>               .name = "rules_number",
> >>>>>               .help = "number of rules in table", @@ -8894,6
> >>>>> +8910,16 @@ parse_table(struct context *ctx, const struct token
> >>>>> +*token,
> >>>>>       case TABLE_TRANSFER:
> >>>>>               out->args.table.attr.flow_attr.transfer = 1;
> >>>>>               return len;
> >>>>> +     case TABLE_TRANSFER_WIRE_ORIG:
> >>>>> +             if (!out->args.table.attr.flow_attr.transfer)
> >>>>> +                     return -1;
> >>>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
> >>>>> +             return len;
> >>>>> +     case TABLE_TRANSFER_VF_ORIG:
> >>>>> +             if (!out->args.table.attr.flow_attr.transfer)
> >>>>> +                     return -1;
> >>>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
> >>>>> +             return len;
> >>>>>       default:
> >>>>>               return -1;
> >>>>>       }
> >>>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>> index 330e34427d..603b7988dd 100644
> >>>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>> @@ -3332,7 +3332,8 @@ It is bound to
> >>>> ``rte_flow_template_table_create()``::
> >>>>>
> >>>>>   flow template_table {port_id} create
> >>>>>       [table_id {id}] [group {group_id}]
> >>>>> -       [priority {level}] [ingress] [egress] [transfer]
> >>>>> +       [priority {level}] [ingress] [egress]
> >>>>> +       [transfer [vf_orig] [wire_orig]]
> >>>>
> >>>> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
> >>>> [wire_orig] ?
> >>>>
> >>>>>       rules_number {number}
> >>>>>       pattern_template {pattern_template_id}
> >>>>>       actions_template {actions_template_id} diff --git
> >>>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> >>>>> a79f1e7ef0..512b08d817 100644
> >>>>> --- a/lib/ethdev/rte_flow.h
> >>>>> +++ b/lib/ethdev/rte_flow.h
> >>>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
> >>>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
> >>>>>        */
> >>>>>       uint32_t transfer:1;
> >>>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
> >>>>> +     /**
> >>>>> +      * 0 means bidirection,
> >>>>> +      * 0x1 origin uplink,
> >>>>
> >>>> What does "uplink" mean? It's too vague. Hardly a good term.
> 
> I believe this comment should be reworked, in case the idea of having an extra
> attribute persists.
> 
> >>>>
> >>>>> +      * 0x2 origin vport,
> >>>>
> >>>> What does "origin vport" mean? Hardly a good term as well.
> 
> I still believe this explanation is way too brief and needs to be reworked to
> provide more details, to define the use case for the attribute more specifically.
> 
> >>>>
> >>>>> +      * N/A both set.
> >>>>
> >>>> What's this?
> 
> The question stands.
> 
> >>>>
> >>>>> +      */
> >>>>> +     uint32_t transfer_mode:2;
> >>>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
> >>>>> };
> >>>>>
> >>>>> /**
> >>>>> --
> >>>>> 2.27.0
> >>>>>
> >>>>
> >>>> Since the attributes are added to generic 'struct rte_flow_attr',
> >>>> non-table
> >>>> (synchronous) flow rules are supposed to support them, too. If that
> >>>> is indeed the case, then I'm afraid such proposal does not agree
> >>>> with the existing items PORT_REPRESENTOR and REPRESENTED_PORT.
> They
> >>>> do exactly the same thing, but they are designed to be way more
> >>>> generic. Why
> >> not use them?
> >>
> >> The question stands.
> >>
> >>>>
> >>>> Ivan
> >>>
> >>
> >> Ivan
> >
  
Ivan Malov Sept. 14, 2022, 3:18 p.m. UTC | #8
Hi Rongwei,

On Wed, 14 Sep 2022, Rongwei Liu wrote:

> HI
>
> BR
> Rongwei
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Wednesday, September 14, 2022 15:32
>> To: Rongwei Liu <rongweil@nvidia.com>
>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi,
>>
>> On Wed, 14 Sep 2022, Rongwei Liu wrote:
>>
>>> HI
>>>
>>> BR
>>> Rongwei
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>> Sent: Tuesday, September 13, 2022 22:33
>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the
>>>> transfer table
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hi Rongwei,
>>>>
>>>> PSB
>>>>
>>>> On Tue, 13 Sep 2022, Rongwei Liu wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> BR
>>>>> Rongwei
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>> Sent: Tuesday, September 13, 2022 00:57
>>>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>>>> Subject: Re: [PATCH v1] ethdev: add direction info when creating
>>>>>> the transfer table
>>>>>>
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
>>>>>>
>>>>>>> The transfer domain rule is able to match traffic wire/vf origin
>>>>>>> and it means two directions' underlayer resource.
>>>>>>
>>>>>> The point of fact is that matching traffic coming from some entity
>>>>>> like wire / VF has been long generalised in the form of representors.
>>>>>> So, a flow rule with attribute "transfer" is able to match traffic
>>>>>> coming from either a REPRESENTED_PORT or from a
>> PORT_REPRESENTOR
>>>> (please find these items).
>>>>>>
>>>>>>>
>>>>>>> In customer deployments, they usually match only one direction
>>>>>>> traffic in single flow table: either from wire or from vf.
>>>>>>
>>>>>> Which customer deployments? Could you please provide detailed
>> examples?
>>>>>>
>>>>>>>
>>>>>
>>>>> We saw a lot of customers' deployment like:
>>>>> 1. Match overlay traffic from wire and do decap, then send to specific
>> vport.
>>>>> 2. Match specific 5-tuples and do encap, then send to wire.
>>>>> The matching criteria has obvious direction preference.
>>>>
>>>> Thank you. My questions are as follows:
>>>>
>>>> In (1), when you say "from wire", do you mean the need to match
>>>> packets arriving via whatever physical ports rather then matching
>>>> packets arriving from some specific phys. port?
>>
>> ^^
>>
>> Could you please find my question above? Based on your understanding of
>> templates in async flow approach, an answer to this question may help us find
>> the common ground.
> It means traffic arrived from physical ports (transfer_proxy role) or south band per you concept.

Transfer proxy has nothing to do with physical ports. And I should stress
out that "south band" and the likes are NOT my concepts. Instead, I think
that direction designations like "south" or "north" aren't applicable
when talking about the embedded switch and its flow (transfer) rules.

> Traffic from vport (not transfer_proxy) or north band per your concept won't hit even if same packets.

Please see above. Transfer proxy is a completely different concept.
And I never used "north band" concept.

>>
>> --
>>
>>>>
>>>> If, however, matching traffic "from wire" in fact means matching
>>>> packets arriving from a *specific* physical port, then for sure item
>>>> REPRESENTED_PORT should perfectly do the job, and the proposed
>>>> attribute is unneeded.
>>>>
>>>> (BTW, in DPDK, it is customary to use term "physical port", not
>>>> "wire")
>>>>
>>>> In (1), what are "vport"s? Please explain. Once again, I should
>>>> remind that, in DPDK, folks prefer terms "represented entity" /
>> "representor"
>>>> over vendor-specific terms like "vport", etc.
>>>>
>>> Vport is virtual port for short such as VF.
>>
>> Thanks. As I say, term "vport" might be confusing to some readers, so it'd be
>> better to provide this explanation (about VF) in the commit description next
>> time.
> Ack. Will add VF as an example.
>>
>>>> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
>>>> Could you please explain, why not just add a match item
>>>> REPRESENTED_PORT pointing to that VF via its representor? Doing so
>>>> should perfectly define the exact direction / traffic source. Isn't that
>> sufficient?
>>>>
>>> Per my view, there is matching field and matching value difference.
>>> Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as same or
>> different matching criteria?
>>> I would like to call them same since it can be summarized like
>>> 1.1.1.0/30 REPRESENTED_PORT is just another matching item, no essential
>> differences and it can't stand for direction info.
>>
>> It looks like we're starting to run into disagreement here.
>> There's no "direction" at all. There's an embedded switch inside the NIC, and
>> there're (logical) switch ports that packets enter the switch from.
>>
>> When the user submits a "transfer" rule and does not provide neither
>> REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern, the embedded
>> switch is supposed to match packets coming from ANY ports, be it VFs or
>> physical (wire) ports.
>>
>> But when the user provides, in example, item REPRESENTED_PORT to point to
>> the physical (wire) port, the embedded switch knows exactly which port the
>> packets should enter it from.
>> In this case, it is supposed to match only packets coming from that physical
>> port. And this should be sufficient.
>> This in fact replaces the need to know a "direction".
>> It's just an exact specification of packet's origin.
>>
> There is traffic arriving or leaving the switch, so there is always direction, implicit or explicit.

This does not contradict my thoughts above. "Direction" is *defined* by
two points (like in geometry): an initial point (the switch port through
which a packet enters the switch) and the terminal point (the match engine 
inside the switch). If one knows these two points, no extra hints are
required to specify some "direction". Because direction is already
represented by this "vector" of sorts. That's why presence of the
port match item in the pattern is absolutely sufficient.

However, based on your later explanations, the use of
precise port item is simply inconvenient in your
use case because you are trying to match traffic
from *multiple* ports that have something in
common (i.e. all VFs or all wire ports).

And, instead of adding a new item type which would serve
exactly your needs, you for some reason try to add an
attribute, which has multiple drawbacks which I
described in my previous letter.

> For transfer rules, there is a concept transfer_proxy.
> It takes the switch ownership; all switch rules should be configured via transfer_proxy.

Yes, such concept exists, but it's a don't care with
regard to the problem that we're discussing, sorry.
Furthermore, unlike "switch domain ID" (which is
the same for all ethdevs belonging to a given
physical NIC board), nobody guarantees that
it's only one transfer proxy port. Some NIC
vendors allows transfer rules to be added
via any ethdev port.

>
> Image a logic switch with one PF and two VFs.
> PF is the transfer proxy and VF belongs to the PF logically.
> When receiving traffic from PF, we can say it comes into the logic switch.

That's correct.

> When packet sent from VF (VF belongs to PF), so we can say traffic leaves the switch.

That's not correct. Traffic sent from VF (for example, a guest VM
is sending packets) also *enters* the switch. PFs and VFs are in
fact *separate* logical ports of the embedded switch.

>
> Item REPRESENTED_PORT indicates switch to match traffic sent from which port, comes into, or leave switch.

That is not correct either. Item REPRESENTED_PORT tells the switch to
match packets which come into the switch FROM the logical port
which is represented by the given DPDK ethdev.

For example, if ethdev="E" is the *main* PF which is bound to
physical port "P", then item REPRESENTED_PORT with ethdev ID
being set to "E" tells the switch that only packet coming
to NIC from *wire* via physical port "E" should match.

> We can say it as one kind of packet metadata.

Kind of yes, but might be vendor-specific. No need to delve into this.

> Like you said, DPDK always treat transfer to match any PORTs traffic.

Slight correction: it treats it this way until it sees an exact port item.
If the user provides REPRESENTED_PORT (or PORT_REPRESENTOR), it's no
longer *any* ports traffic, it's an exact port traffic. That's it.

> When REPRESENTED_PORT is specified, the rules are limited to some dedicated PORTs.

These rules match only packets arriving TO the
embedded switch FROM the said dedicated ports.

> Other PORTs are ignored because metadata mismatching.

Kind of yes, correct.

> Rules still have the capability to match ANY PORTS if metadata matched.

This statement is only correct for the cases when the user does NOT
use neither item REPRESENTED_PORT nor item PORT_REPRESENTOR.

>
> This update will allow user to cut the other PORTs matching capabilities.

As I explained, this is exactly what items PORT_REPRESENTOR
and REPRESENTED_PORT do. No need to have an extra attribute.

If the user adds item REPRESENTED_PORT with ethdev_id="E",
like in the above example, to match packets entering NIC
via the physical port "P", then this rule will NOT match
packets entering NIC from other points. For example,
packets transmitted by a virtual machine via a VF
will not match in this case.

>>> Port id depends on the attach sequence.
>>
>> Unfortunately, this is hardly a good argument because flow rules are supposed
>> to be inserted based on the run-time packet learning. Attach sequence is a
>> don't care here.
>>
>>>> Also please mind that, although I appreciate your explanations here,
>>>> on the mailing list, they should finally be added to the commit
>>>> message, so that readers do not have to look for them elsewhere.
>>>>
>>> We have explained the high possibility of single-direction matching, right?
>>
>> Not quite. As I said, it is not correct to assume any "direction", like in
>> geographical sense ("north", "south", etc.). Application has ethdevs, and they
>> are representors of some "virtual ports" (in your terminology) belonging to the
>> switch, for example, VFs, SFs or physical ports.
>>
>> The user adds an appropriate item to the pattern (REPRESENTED_PORT), and
>> doing so specifies the packet path which it enters the switch.
>>
>>> It' hard to list all the possibilities of traffic matching preferences.
>>
>> And let's say more: one need never do this. That's exactly the reason why
>> DPDK has abandoned the concept of "direction" in *transfer* rules and
>> switched to the use of precise criteria (REPRESENTED_PORT, etc.).
>>
> As far as I know, DPDK changes "transfer ingress" to "transfer", so it' more clear that transfer can match both directions (both ingress and egress).

Not quite. DPDK has abandoned the use of "ingress / egress" in "transfer" 
rules because "ingress" and "egress" are only applicable on the VNIC
level. For example, there is a PF attached to DPDK application:
packets that the application receives through this ethdev, are
ingress, and packets that it transmits (tx_burst) are egress.

I can explain in other words. Imagine yourself standing *inside* a room
which only has one door. When someone enters the room, it's "ingress",
when someone leaves, it's "egress". It's relative to your viewpoint.
In this example, such a room represents a VNIC / ethdev.

And now imagine yourself standing *outside* of another room / auditorium 
which has multiple doors / exits. You're standing near some particular
exit "A" (VNIC / ethdev), but people may enter this room via another
door "B" and then leave it via yet another door "C". In this case,
from your viewpoint, this traffic cannot be considered neither
ingress nor egress. Because these people do not approach you.

Like in this example, embedded switch is like a large auditorium
with many-many doors / exits. And there can be many-many
directions: packet can enter the switch via phys. port "P1"
and then leave it via another phys. port "P2". Or it can
enter the switch via phys. port and the leave it via
VF's logical port (to be delivered to a guest machine),
or a packet can travel from one VF to another one.

There's no PRE-DEFINED direction like "north to south" or "east to west".
And this explains why it's very undesirable to use term "direction".

> REPRESENTED_PORT is the evolution of "port_id", I think, it' only one kind of matching items.

Yes. But nobody prevents you from defining yet another match item
which will be able to refer to a *group* of ports which have
something in common (i.e. "all guest ports of this switch"
pointing to all logical ports currently attached to
virtual machines / guests, or "all wire ports of this swtich").

>
> For large scale deployment like 10M rules, if we can save resources significantly by introducing direction, why not?

I do not deny the fact that you have a use case where resources can
be saved significantly if you give the PMD some extra knowledge
when creating a flow table / pattern template. That's totally
OK. What I object is the very implementation and the use of
term "direction". If you add new item types (like above),
then, when you create an async table 1 pattern template,
you will have item ANY_WIRE_PORTS, and, for table 2
pattern template, you'll have item ANY_GUEST_PORTS.
As you see, the two pattern templates now differ
because the match criteria use different items.

>
> Again, async API:
> 1. pattern template A
> 2. action template B
> 3. table C with pattern template A + action template B.
> 4. rule D, E, F...
> The specified REPRESENTED_PORT is provided in rules (D, E, F...) not pattern template A or action template B or table C.
> Resources may be allocated early at step 3 since table' rule_nums property.

No, item REPRESENTED_PORT *can* be provided inside pattern template A,
but, as you pointed out earlier, the problem is that you can't
distinguish different pattern templates which have this item,
because pattern templates know nothing about *exact* port IDs
and only know item MASKS. Yes, I agree that in your case
such problem exists, but, as I say above, it can be
solved by adding new item types: one for referring to
all phys. ports of a given NIC and another one for
pointing to a group of current guest users (VFs).

>>> The underlay is the one we have met for now.
>>>>>
>>>>>>> Introduce one new member transfer_mode into rte_flow_attr to
>>>>>>> indicate the flow table direction property: from wire, from vf or
>>>>>>> bi-direction(default).
>>>>>>
>>>>>> AFAIK, 'rte_flow_attr' serves both traditional flow rule insertion
>>>>>> and asynchronous (table) approach. The patch adds the attributes to
>>>>>> generic 'rte_flow_attr' but, for some reason, ignores non-table rules.
>>>>>>
>>>>>>>
>>>>> Sync API uses one rule to contain everything. It' hard for PMD to
>>>>> determine
>>>> if this rule has direction preference or not.
>>>>> Image a situation, just for an example:
>>>>> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
>>>>> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
>>>>> 1 and 2 share the same matching conditions (eth / ipv4 / udp / vxlan
>>>>> /...), so
>>>> sync API consider them share matching determination logic.
>>>>> It means "2" have 1M scale capability too. Obviously, it wastes a
>>>>> lot of
>>>> resources.
>>>>
>>>> Strictly speaking, they do not share the same match pattern.
>>>> Your example clearly shows that, in (1), the pattern should request
>>>> packets coming from "vport 1" and, in (2), packets coming from "vport 0".
>>>>
>>>> My point is simple: the "vport" from which packets enter the embedded
>>>> switch is ALSO a match criterion. If you accept this, you'll see: the
>>>> matching conditions differ.
>>>>
>>> See above.
>>> In this case, I think the matching fields are both "port_id + ipv4_vxlan". They
>> are same.
>>> Only differs with values like vni 100 or 200 vice versa.
>>
>> Not quite. Look closer: you use *different* port IDs for (1) and (2).
>> The value of "ethdev_id" field in item REPRESENTED_PORT differs.
>>
>>>>>
>>>>> In async API, there is pattern_template introduced. We can mark "1"
>>>>> to use
>>>> pattern_tempate id 1 and "2" to use pattern_template 2.
>>>>> They will be separated from each other, don't share anymore.
>>>>
>>>> Consider an example. "Wire" is a physical port represented by PF0
>>>> which, in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is
>>>> attached to guest and is represented by a representor ethdev 1 in DPDK.
>>>>
>>>> So, some rules (template 1) are needed to deliver packets from "wire"
>>>> to "VF" and also decapsulate them. And some rules (template 2) are
>>>> needed to deliver packets in the opposite direction, from "VF"
>>>> to "wire" and also encapsulate them.
>>>>
>>>> My question is, what prevents you from adding match item
>>>> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
>>>> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
>>>>
>>>> As I said previously, if you insert such item before eth / ipv4 / etc
>>>> to your match pattern, doing so defines an *exact* direction / source.
>>>>
>>> Could you check the async API guidance? I think pattern template focusing
>> on the matching field (mask).
>>> "REPRESENTED_PORT[ethdev_id=0] " and
>> "REPRESENTED_PORT[ethdev_id=1] "are the same.
>>> 1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
>>> 2. action template: action1 / actions2. / 3. table create with
>>> pattern_template plus action template..
>>> REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create
>> REPRESENTED_PORT port_id is 0 / actions ....
>>> REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create
>> REPRESENTED_PORT port_id is 1 / actions ....
>>
>> OK, so, based on this explanation, it appears that you might be looking to refer
>> to:
>> a) a *set* of any physical (wire) ports
>> b) a *set* of any guest ports (VFs)
>>
> Great, looks we are more and more closer to the agreement.

Looks so.

>> You chose to achieve this using an attribute, but:
>>
>> 1) as I explained above, the use of term "direction" is wrong;
>>     please hear me out: I'm not saying that your use case and
>>     your optimisation is wrong: I'm saying that naming for it
>>     is wrong: it has nothing to do with "direction";
>>
> Do you have any better naming proposal?

As I said, what you are trying to achieve using a new
attribute would be way better to achieve using new
pattern items which can be easily told one from
another in PMD when pre-allocaing resources for
different async flow tables.

So, I don't have any proposal for *attribute* naming.
What I propose is to consider new items instead.

>> 2) while naming a *set* of wire ports as "wire_orig" might be OK,
>>     sticking with term "vf_orig" for a *set* of guest ports is
>>     clearly not, simply because the user may pass another PF
>>     to a guest instead of passing a VF; in other words,
>>     a better term is needed here;
>>
> Like you said, vport may contain VF, SF etc. vport_orgin is on the logic switch perspective.
> Any proposal is welcome.

The problem is, vport can be easily confused with a slightly more
generic "lport" (embedded switch's "logical port"), and, logical
ports, in turn, are not confined to just VFs or PFs. For example,
physical (wire) ports are ALSO logical ports of the switch.

>> 3) since it is possible to plug multiple NICs to a DPDK application,
>>     even from different vendors, the user may end up having multiple
>>     physical ports belonging to different physical NICs attached to
>>     the application; if this is the case, then referring to a *set*
>>     of wire ports using the new attribute is ambiguous in the
>>     sense that it's unclear whether this applies only to
>>     wire ports of some specific physical NIC or to the
>>     physical ports of *all* NICs managed by the app;
>>
> Not matter how many NICs has been probed by the DPDK, there is always switch/PF/VF/SF.. concept.

Correct.

> Each switch must have an owner identified by transfer_proxy(). Vport (VF/SF) can't cross switch in normal case.

No. That is not correct. This is tricky, but please hear me out: an
individual NIC board (that is, a given *switch*) is identified only
by its switch domain ID. As I explained above, "transfer proxy" is
just a technical hint for the applcation to indicate an ethdev
through which "transfer" rules must be managed. Not all vendors
support this concept (and they are not obliged to support it).

> The traffic comes from one NIC can't be offloaded by other NICs unless forwarded by the application.

Right, but forwarding in software (inside DPDK application) is
out of scope with regard to the problem that we're discussing.

> If user use new attribute to cut one side resource, I think user is smart enough to management the rules in different NICs.

As I explained above, I do not deny the existence of the problem that
your patch is trying to solve. Now it looks like we're on the same
page with regard to understanding the fact that what you're
trying to do is to introduce a match criterion that would
refer to a GROUP of similar ports. In my opinion, this
is not an *attribute*, it's a *match criterion*, and
it should be implemented as two new items.

Having two different item types would perfectly fit the need
to know the difference between such "directions" (as per
your terminology) early enough, when parsing templates.

> No default behavior changed with this update.
>
>> 4) adding an attribute instead of yet another pattern item type
>>     is not quite good because PMDs need to be updated separately
>>     to detect this attribute and throw an error if it's not
>>     supported, whilst with a new item type, the PMDs do not
>>     need to be updated = if a PMD sees an unsupported item
>>     while traversing the item with switch () { case }, it
>>     will anyway throw an error;
>>
> PMD also need to check if it supports new matching item or not, right?
> We can't assume NIC vendor' PMD implementation, right?

No-no-no. Imagine a PMD which does not support "transfer" rules.
In such PMD, in the flow parsing function one would have:

if (!!attr->transfer) {
     print_error("Transfer is not supported");
     return EINVAL;
}

If you add a new attribute, then PMDs which are NOT going
to support it need to be updated to add similar check.
Otherwise, they will simply ignore presence / absence
of the attribute in the rule, and validation result
will be unreliable.

Yes, if this attribute is 0x0, then indeed behaviour
does nto change. But what if it's 0x1 or 0x2?
PMDs that do not support these values must
somehow reject such rules on parsing.

However, this problem does not manifest itself when
parsing items. Typially, in a PMD, one would have:

switch (item->type) {
     case RTE_FLOW_ITEM_TYPE_VOID:
         break;

     case RTE_FLOW_ITEM_TYPE_ETH:
         /* blah-blah-blah */
         break;

     default:
         return ENOTSUP;
}

So, if you introduce two new item types to solve your problem,
then you won't have to update existing PMDs. If the vendor
wants to support the new items (say, MLX or SFC), they'll
update their code to accept the items. But other vendors
will not do anything. If the user tries to pass such an
item to a vendor which doesn't support the feature,
the "default" case will just throw an error.

This is what I mean when pointing out such difference
between adding an attribute VS adding new item types.

>> 5) as in (4), a new attribute is not good from documentation
>>     standpoint; plase search for "represented_port = Y" in
>>     documentation = this way, all supported items are
>>     easily defined for various NIC vendors, but the
>>     same isn't true for attributes = there is no
>>     way to indicate supported attributes in doc.
>>
>> If points (1 - 5) make sense to you, then, if I may be so bold, I'd like to suggest
>> that the idea of adding a new attribute be abandoned. Instead, I'd like to
>> suggest adding new items:
>>
>> (the names are just sketch, for sure, it should be discussed)
>>
>> ANY_PHY_PORTS { switch_domain_id }
>>   = match packets entering the embedded switch from *whatever*
>>     physical ports belonging to the given switch domain
>>
> How many PHY_PORTS can one switch have, per your thought? Can I treat the PHY_PORTS as the { switch_domain_id } owner as transfer_proxy()?

A single physical NIC board is supposed to have a single
embedded switch engine. Hence, if the NIC board has, in
example, two or four physical ports, these will be the
physical ports of the switch. That's it.

As for the transfer proxy, please see my explanations above.
It's not *always* reliable to tell whether two given ethdevs
belong to the same physical NIC board or not.

Switch domain ID is the right criterion (for applications).

>> ANY_GUEST_PORTS { switch_domain_id }
>>   = match packets entering the embedded switch from *whatever*
>>     guest ports (VFs, PFs, etc.) belonging to the given
>>     switch domain
>>
>> The field "switch_domain_id" is required to tell one physical board / vendor
>> from another (as I explained in point (3)).
>> The application can query this parameter from ethdev's switch info: please see
>> "struct rte_eth_switch_info".
>>
>> What's your opinion?
>>
> How can we handle ANY_PHY_PORTS/ ANY_GUEST_PORTS ' relationship with REPRESENTED_PORT if conflicts?
> Need future tuning.

And if you carry on with "vf_orig" / "wire_orig" approach, you
will inevitably have the very same problem: possible conflict
with items like REPRESENTED_PORT. So does it matter? Yes,
checks need to be done by PMDs when parsing patterns.

> Like I said before,  offloaded rules can't cross different NIC vendor' "switch_domain_id".
> If user probes multiple NICs in one application, application should take care of packet forwarding.
> Also application should be aware which ports belong to which NICs.

Yes, perhaps, domain ID is not needed in the new items.
But the application still must keep track of switch
domain IDs itself so it knows which rules to
manage via which ethdevs.

Any other opinions?

>>>
>>>>>
>>>>>> For example, the diff below adds the attributes to "table" commands
>>>>>> in testpmd but does not add them to regular (non-table) commands
>>>>>> like "flow create". Why?
>>>>>>
>>>>>>>
>>>>>
>>>>> "table" command limits pattern_template to single direction or
>>>>> bidirection
>>>> per user specified attribute.
>>>>
>>>> As I say above, the same effect can be achieved by adding item
>>>> REPRESENTED_PORT to the corresponding pattern template.
>>> See above.
>>>>
>>>>> "rule" command must tight with one "table_id", so the rule will
>>>>> inherit the
>>>> "table" direction property, no need to specify again.
>>>>
>>>> You migh've misunderstood. I do not talk about "rule" command coupled
>>>> with some "table". What I talk about is regular, NON-async flow
>>>> insertion commands.
>>>>
>>>> Please take a look at section "/* Validate/create attributes. */" in
>>>> file "app/test-pmd/cmdline_flow.c". When one adds a new flow
>>>> attribute, they should reflect it the same way as VC_INGRESS,
>> VC_TRANSFER, etc.
>>>>
>>>> That's it.
>>> We don't intend to pass this to sync API. The above code example is for sync
>> API.
>>
>> So I understand. But there's one slight problem: in your patch, you add the new
>> attributes to the structure which is *shared* between sync and async use case
>> scenarios. If one adds an attribute to this structure, they have to provide
>> accessors for it in all sync-related commands in testpmd, but your patch does
>> not do that.
>>
> Like the title said, "creating transfer table" is the ASYNC operation.
> We have limited the scope of this patch. Sync API will be another story.
> Maybe we can add one more sentence to emphasize async API again.

No-no-no. There might be slight misunderstanding. I understand that
you are limiting the scope of your patch by saying this and this.
That's OK. What I'm trying to point out is the fact that your
patch nevertheless touches the COMMON part of the flow API
which is shared between two approaches (sync and async).

Imagine a reader that does not know anything about the async approach.
He just opens the file in vim and goes directly to struct rte_flow_attr.
And, over there, he sees the new attribute "wire_orig". He then
immediately assumes that these attributes can be used in
testpmd. Now the reader opens testpmd and tries to
insert a flow rule using the sync approach:

flow create priority 0 transfer vf_orig pattern / ... / end actions drop

And doing so will be a failure, because your patch does not add the
new attribute keyword to sync flow rule syntax parser. That's it.

Once again, I should ephasize: the reader MAY know nothing about the async
approach. But if the attribute is present in "struct rte_flow_attr", it
immediately means that it is available everywhere. Both sync and async.

So, with this in mind, your attempt to limit the scope of the patch
to async-only rules looks a little bit artificial. It's not
correct from the *formal* standpoint.

>
>> In other words, it is wrong to assume that "struct rte_flow_attr" only applies to
>> async approach. It had been introduced long before the async flow design was
>> added to DPDK. That's it.
>>
>>>>
>>>> But, as I say, I still believe that the new attributes aren't needed.
>>> I think we are not at the same page for now. Can we reach agreement on
>>> the same matching criteria first?
>>>>>
>>>>>>> It helps to save underlayer memory also on insertion rate.
>>>>>>
>>>>>> Which memory? Host memory? NIC memory? Term "underlayer" is
>> vague.
>>>>>> I suggest that the commit message be revised to first explain how
>>>>>> such memory is spent currently, then explain why this is not
>>>>>> optimal and, finally, which way the patch is supposed to improve
>>>>>> that. I.e. be more
>>>> specific.
>>>>>>
>>>>>>>
>>>>>
>>>>> For large scalable rules, HW (depends on implementation) always
>>>>> needs
>>>> memory to hold the rules' patterns and actions, either from NIC or from
>> host.
>>>>> The memory footprint highly depends on "user rules' complexity",
>>>>> also diff
>>>> between NICs.
>>>>> ~50% memory saving is expected if one-direction is cut.
>>>>
>>>> Regardless of this talk, this explanation should probably be present
>>>> in the commit description.
>>>>
>>> This number may differ with different NICs or implementation. We can't say
>> it for sure.
>>
>> Not an exact number, of course, but a brief explanation of:
>> a) what is wrong / not optimal in the current design;
> Please check the commit log, transfer have the capability to match bi-direction traffic no matter what ports.
>> b) how it is observed in customer deployments;
> Customer have the requirements to save resources and their offloaded rules is direction aware.
>> c) why the proposed patch is a good solution.
> New attributes provide the way to remove one direction and save underlayer resource.
> All of the above can be found in the commit log.

I understand all of that, but my point is, the existing commit message is
way too brief. Yes, it mentions that SOME customers have SOME deployments,
but it does not shed light on which specifics these deployments have. For
example, back in the day, when items PORT_REPRESENTOR and REPRESENTED_PORT
were added, the cover letter for that patch series provided details of
deployment specifics (application: OvS, scenario: full offload rules).

So, it's always better to expand on such specifics so that the reader
has full picture in their head and doesn't need to look elsewhere.
Not all readers of the commit message will be happy to delve
into our discussions on the mailing list to get the gist.

>
>>
>
>>>>>
>>>>>>> By default, the transfer domain is bi-direction, and no behavior changes.
>>>>>>>
>>>>>>> 1. Match wire origin only
>>>>>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
>>>>>>> 2. Match vf origin only
>>>>>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
>>>>>>>
>>>>>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
>>>>>>> ---
>>>>>>> app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
>>>>>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
>>>>>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
>>>>>>> 3 files changed, 36 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/app/test-pmd/cmdline_flow.c
>>>>>>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82 100644
>>>>>>> --- a/app/test-pmd/cmdline_flow.c
>>>>>>> +++ b/app/test-pmd/cmdline_flow.c
>>>>>>> @@ -177,6 +177,8 @@ enum index {
>>>>>>>       TABLE_INGRESS,
>>>>>>>       TABLE_EGRESS,
>>>>>>>       TABLE_TRANSFER,
>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
>>>>>>>       TABLE_INGRESS,
>>>>>>>       TABLE_EGRESS,
>>>>>>>       TABLE_TRANSFER,
>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
>>>>>>>               .next = NEXT(next_table_attr),
>>>>>>>               .call = parse_table,
>>>>>>>       },
>>>>>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
>>>>>>> +             .name = "wire_orig",
>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>
>>>>>> This does not explain the "wire" aspect. It's too broad.
>>>>>>
>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>> +             .call = parse_table,
>>>>>>> +     },
>>>>>>> +     [TABLE_TRANSFER_VF_ORIG] = {
>>>>>>> +             .name = "vf_orig",
>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>
>>>>>> This explanation simply duplicates such of the "wire_orig".
>>>>>> It does not explain the "vf" part. Should be more specific.
>>>>>>
>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>> +             .call = parse_table,
>>>>>>> +     },
>>>>>>>       [TABLE_RULES_NUMBER] = {
>>>>>>>               .name = "rules_number",
>>>>>>>               .help = "number of rules in table", @@ -8894,6
>>>>>>> +8910,16 @@ parse_table(struct context *ctx, const struct token
>>>>>>> +*token,
>>>>>>>       case TABLE_TRANSFER:
>>>>>>>               out->args.table.attr.flow_attr.transfer = 1;
>>>>>>>               return len;
>>>>>>> +     case TABLE_TRANSFER_WIRE_ORIG:
>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>> +                     return -1;
>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
>>>>>>> +             return len;
>>>>>>> +     case TABLE_TRANSFER_VF_ORIG:
>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>> +                     return -1;
>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
>>>>>>> +             return len;
>>>>>>>       default:
>>>>>>>               return -1;
>>>>>>>       }
>>>>>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> index 330e34427d..603b7988dd 100644
>>>>>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> @@ -3332,7 +3332,8 @@ It is bound to
>>>>>> ``rte_flow_template_table_create()``::
>>>>>>>
>>>>>>>   flow template_table {port_id} create
>>>>>>>       [table_id {id}] [group {group_id}]
>>>>>>> -       [priority {level}] [ingress] [egress] [transfer]
>>>>>>> +       [priority {level}] [ingress] [egress]
>>>>>>> +       [transfer [vf_orig] [wire_orig]]
>>>>>>
>>>>>> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
>>>>>> [wire_orig] ?
>>>>>>
>>>>>>>       rules_number {number}
>>>>>>>       pattern_template {pattern_template_id}
>>>>>>>       actions_template {actions_template_id} diff --git
>>>>>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>>>>>>> a79f1e7ef0..512b08d817 100644
>>>>>>> --- a/lib/ethdev/rte_flow.h
>>>>>>> +++ b/lib/ethdev/rte_flow.h
>>>>>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
>>>>>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
>>>>>>>        */
>>>>>>>       uint32_t transfer:1;
>>>>>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
>>>>>>> +     /**
>>>>>>> +      * 0 means bidirection,
>>>>>>> +      * 0x1 origin uplink,
>>>>>>
>>>>>> What does "uplink" mean? It's too vague. Hardly a good term.
>>
>> I believe this comment should be reworked, in case the idea of having an extra
>> attribute persists.
>>
>>>>>>
>>>>>>> +      * 0x2 origin vport,
>>>>>>
>>>>>> What does "origin vport" mean? Hardly a good term as well.
>>
>> I still believe this explanation is way too brief and needs to be reworked to
>> provide more details, to define the use case for the attribute more specifically.
>>
>>>>>>
>>>>>>> +      * N/A both set.
>>>>>>
>>>>>> What's this?
>>
>> The question stands.
>>
>>>>>>
>>>>>>> +      */
>>>>>>> +     uint32_t transfer_mode:2;
>>>>>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
>>>>>>> };
>>>>>>>
>>>>>>> /**
>>>>>>> --
>>>>>>> 2.27.0
>>>>>>>
>>>>>>
>>>>>> Since the attributes are added to generic 'struct rte_flow_attr',
>>>>>> non-table
>>>>>> (synchronous) flow rules are supposed to support them, too. If that
>>>>>> is indeed the case, then I'm afraid such proposal does not agree
>>>>>> with the existing items PORT_REPRESENTOR and REPRESENTED_PORT.
>> They
>>>>>> do exactly the same thing, but they are designed to be way more
>>>>>> generic. Why
>>>> not use them?
>>>>
>>>> The question stands.
>>>>
>>>>>>
>>>>>> Ivan
>>>>>
>>>>
>>>> Ivan
>>>
>

Thank you.
  
Thomas Monjalon Sept. 14, 2022, 9:02 p.m. UTC | #9
14/09/2022 17:18, Ivan Malov:
> So, it's always better to expand on such specifics so that the reader
> has full picture in their head and doesn't need to look elsewhere.
> Not all readers of the commit message will be happy to delve
> into our discussions on the mailing list to get the gist.

Yes clearly, we'll need a summary of this long discussion :)
  
Rongwei Liu Sept. 15, 2022, 12:58 a.m. UTC | #10
HI Ivan:

BR
Rongwei

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Wednesday, September 14, 2022 23:18
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
> Darawsheh <rasland@nvidia.com>
> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
> table
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Rongwei,
> 
> On Wed, 14 Sep 2022, Rongwei Liu wrote:
> 
> > HI
> >
> > BR
> > Rongwei
> >
> >> -----Original Message-----
> >> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >> Sent: Wednesday, September 14, 2022 15:32
> >> To: Rongwei Liu <rongweil@nvidia.com>
> >> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> >> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> >> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> >> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
> >> Raslan Darawsheh <rasland@nvidia.com>
> >> Subject: RE: [PATCH v1] ethdev: add direction info when creating the
> >> transfer table
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi,
> >>
> >> On Wed, 14 Sep 2022, Rongwei Liu wrote:
> >>
> >>> HI
> >>>
> >>> BR
> >>> Rongwei
> >>>
> >>>> -----Original Message-----
> >>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>>> Sent: Tuesday, September 13, 2022 22:33
> >>>> To: Rongwei Liu <rongweil@nvidia.com>
> >>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> >>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> >>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
> >>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> >>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
> >>>> Raslan Darawsheh <rasland@nvidia.com>
> >>>> Subject: RE: [PATCH v1] ethdev: add direction info when creating
> >>>> the transfer table
> >>>>
> >>>> External email: Use caution opening links or attachments
> >>>>
> >>>>
> >>>> Hi Rongwei,
> >>>>
> >>>> PSB
> >>>>
> >>>> On Tue, 13 Sep 2022, Rongwei Liu wrote:
> >>>>
> >>>>> Hi
> >>>>>
> >>>>> BR
> >>>>> Rongwei
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >>>>>> Sent: Tuesday, September 13, 2022 00:57
> >>>>>> To: Rongwei Liu <rongweil@nvidia.com>
> >>>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> >>>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> >>>>>> NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> >>>>>> Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang
> >>>>>> <yuying.zhang@intel.com>; Andrew Rybchenko
> >>>>>> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> >>>>>> <rasland@nvidia.com>
> >>>>>> Subject: Re: [PATCH v1] ethdev: add direction info when creating
> >>>>>> the transfer table
> >>>>>>
> >>>>>> External email: Use caution opening links or attachments
> >>>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
> >>>>>>
> >>>>>>> The transfer domain rule is able to match traffic wire/vf origin
> >>>>>>> and it means two directions' underlayer resource.
> >>>>>>
> >>>>>> The point of fact is that matching traffic coming from some
> >>>>>> entity like wire / VF has been long generalised in the form of
> representors.
> >>>>>> So, a flow rule with attribute "transfer" is able to match
> >>>>>> traffic coming from either a REPRESENTED_PORT or from a
> >> PORT_REPRESENTOR
> >>>> (please find these items).
> >>>>>>
> >>>>>>>
> >>>>>>> In customer deployments, they usually match only one direction
> >>>>>>> traffic in single flow table: either from wire or from vf.
> >>>>>>
> >>>>>> Which customer deployments? Could you please provide detailed
> >> examples?
> >>>>>>
> >>>>>>>
> >>>>>
> >>>>> We saw a lot of customers' deployment like:
> >>>>> 1. Match overlay traffic from wire and do decap, then send to
> >>>>> specific
> >> vport.
> >>>>> 2. Match specific 5-tuples and do encap, then send to wire.
> >>>>> The matching criteria has obvious direction preference.
> >>>>
> >>>> Thank you. My questions are as follows:
> >>>>
> >>>> In (1), when you say "from wire", do you mean the need to match
> >>>> packets arriving via whatever physical ports rather then matching
> >>>> packets arriving from some specific phys. port?
> >>
> >> ^^
> >>
> >> Could you please find my question above? Based on your understanding
> >> of templates in async flow approach, an answer to this question may
> >> help us find the common ground.
> > It means traffic arrived from physical ports (transfer_proxy role) or south
> band per you concept.
> 
> Transfer proxy has nothing to do with physical ports. And I should stress out
> that "south band" and the likes are NOT my concepts. Instead, I think that
> direction designations like "south" or "north" aren't applicable when talking
> about the embedded switch and its flow (transfer) rules.
> 
> > Traffic from vport (not transfer_proxy) or north band per your concept won't
> hit even if same packets.
> 
> Please see above. Transfer proxy is a completely different concept.
> And I never used "north band" concept.
> 
> >>
> >> --
> >>
> >>>>
> >>>> If, however, matching traffic "from wire" in fact means matching
> >>>> packets arriving from a *specific* physical port, then for sure
> >>>> item REPRESENTED_PORT should perfectly do the job, and the proposed
> >>>> attribute is unneeded.
> >>>>
> >>>> (BTW, in DPDK, it is customary to use term "physical port", not
> >>>> "wire")
> >>>>
> >>>> In (1), what are "vport"s? Please explain. Once again, I should
> >>>> remind that, in DPDK, folks prefer terms "represented entity" /
> >> "representor"
> >>>> over vendor-specific terms like "vport", etc.
> >>>>
> >>> Vport is virtual port for short such as VF.
> >>
> >> Thanks. As I say, term "vport" might be confusing to some readers, so
> >> it'd be better to provide this explanation (about VF) in the commit
> >> description next time.
> > Ack. Will add VF as an example.
> >>
> >>>> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
> >>>> Could you please explain, why not just add a match item
> >>>> REPRESENTED_PORT pointing to that VF via its representor? Doing so
> >>>> should perfectly define the exact direction / traffic source. Isn't
> >>>> that
> >> sufficient?
> >>>>
> >>> Per my view, there is matching field and matching value difference.
> >>> Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as
> >>> same or
> >> different matching criteria?
> >>> I would like to call them same since it can be summarized like
> >>> 1.1.1.0/30 REPRESENTED_PORT is just another matching item, no
> >>> essential
> >> differences and it can't stand for direction info.
> >>
> >> It looks like we're starting to run into disagreement here.
> >> There's no "direction" at all. There's an embedded switch inside the
> >> NIC, and there're (logical) switch ports that packets enter the switch from.
> >>
> >> When the user submits a "transfer" rule and does not provide neither
> >> REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern, the
> embedded
> >> switch is supposed to match packets coming from ANY ports, be it VFs
> >> or physical (wire) ports.
> >>
> >> But when the user provides, in example, item REPRESENTED_PORT to
> >> point to the physical (wire) port, the embedded switch knows exactly
> >> which port the packets should enter it from.
> >> In this case, it is supposed to match only packets coming from that
> >> physical port. And this should be sufficient.
> >> This in fact replaces the need to know a "direction".
> >> It's just an exact specification of packet's origin.
> >>
> > There is traffic arriving or leaving the switch, so there is always direction,
> implicit or explicit.
> 
> This does not contradict my thoughts above. "Direction" is *defined* by two
> points (like in geometry): an initial point (the switch port through which a
> packet enters the switch) and the terminal point (the match engine inside the
> switch). If one knows these two points, no extra hints are required to specify
> some "direction". Because direction is already represented by this "vector" of
> sorts. That's why presence of the port match item in the pattern is absolutely
> sufficient.
Good to see this. Thank for the information.
This update leverages the concept exactly defined by you: "an initial point (the switch port through which a
packet enters the switch)"
If you think direction not good, we can change to other words like "initial port"/"origin port" etc.
> 
> However, based on your later explanations, the use of precise port item is
> simply inconvenient in your use case because you are trying to match traffic
> from *multiple* ports that have something in common (i.e. all VFs or all wire
> ports).
> 
> And, instead of adding a new item type which would serve exactly your needs,
> you for some reason try to add an attribute, which has multiple drawbacks
> which I described in my previous letter.
> 
> > For transfer rules, there is a concept transfer_proxy.
> > It takes the switch ownership; all switch rules should be configured via
> transfer_proxy.
> 
> Yes, such concept exists, but it's a don't care with regard to the problem that
> we're discussing, sorry.
> Furthermore, unlike "switch domain ID" (which is the same for all ethdevs
> belonging to a given physical NIC board), nobody guarantees that it's only one
> transfer proxy port. Some NIC vendors allows transfer rules to be added via
> any ethdev port.
> 
Does any flow rule leverage switchid already. Is it too obscure for end-user?
> >
> > Image a logic switch with one PF and two VFs.
> > PF is the transfer proxy and VF belongs to the PF logically.
> > When receiving traffic from PF, we can say it comes into the logic switch.
> 
> That's correct.
> 
> > When packet sent from VF (VF belongs to PF), so we can say traffic leaves
> the switch.
> 
> That's not correct. Traffic sent from VF (for example, a guest VM is sending
> packets) also *enters* the switch. PFs and VFs are in fact *separate* logical
> ports of the embedded switch.
> 
> >
> > Item REPRESENTED_PORT indicates switch to match traffic sent from which
> port, comes into, or leave switch.
> 
> That is not correct either. Item REPRESENTED_PORT tells the switch to match
> packets which come into the switch FROM the logical port which is
> represented by the given DPDK ethdev.
> 
> For example, if ethdev="E" is the *main* PF which is bound to physical port "P",
> then item REPRESENTED_PORT with ethdev ID being set to "E" tells the switch
> that only packet coming to NIC from *wire* via physical port "E" should match.
> 
> > We can say it as one kind of packet metadata.
> 
> Kind of yes, but might be vendor-specific. No need to delve into this.
> 
> > Like you said, DPDK always treat transfer to match any PORTs traffic.
> 
> Slight correction: it treats it this way until it sees an exact port item.
> If the user provides REPRESENTED_PORT (or PORT_REPRESENTOR), it's no
> longer *any* ports traffic, it's an exact port traffic. That's it.
> 
> > When REPRESENTED_PORT is specified, the rules are limited to some
> dedicated PORTs.
> 
> These rules match only packets arriving TO the embedded switch FROM the
> said dedicated ports.
> 
> > Other PORTs are ignored because metadata mismatching.
> 
> Kind of yes, correct.
> 
> > Rules still have the capability to match ANY PORTS if metadata matched.
> 
> This statement is only correct for the cases when the user does NOT use
> neither item REPRESENTED_PORT nor item PORT_REPRESENTOR.
> 
> >
> > This update will allow user to cut the other PORTs matching capabilities.
> 
> As I explained, this is exactly what items PORT_REPRESENTOR and
> REPRESENTED_PORT do. No need to have an extra attribute.
> 
> If the user adds item REPRESENTED_PORT with ethdev_id="E", like in the
> above example, to match packets entering NIC via the physical port "P", then
> this rule will NOT match packets entering NIC from other points. For example,
> packets transmitted by a virtual machine via a VF will not match in this case.
> 
> >>> Port id depends on the attach sequence.
> >>
> >> Unfortunately, this is hardly a good argument because flow rules are
> >> supposed to be inserted based on the run-time packet learning. Attach
> >> sequence is a don't care here.
> >>
> >>>> Also please mind that, although I appreciate your explanations
> >>>> here, on the mailing list, they should finally be added to the
> >>>> commit message, so that readers do not have to look for them elsewhere.
> >>>>
> >>> We have explained the high possibility of single-direction matching, right?
> >>
> >> Not quite. As I said, it is not correct to assume any "direction",
> >> like in geographical sense ("north", "south", etc.). Application has
> >> ethdevs, and they are representors of some "virtual ports" (in your
> >> terminology) belonging to the switch, for example, VFs, SFs or physical
> ports.
> >>
> >> The user adds an appropriate item to the pattern (REPRESENTED_PORT),
> >> and doing so specifies the packet path which it enters the switch.
> >>
> >>> It' hard to list all the possibilities of traffic matching preferences.
> >>
> >> And let's say more: one need never do this. That's exactly the reason
> >> why DPDK has abandoned the concept of "direction" in *transfer* rules
> >> and switched to the use of precise criteria (REPRESENTED_PORT, etc.).
> >>
> > As far as I know, DPDK changes "transfer ingress" to "transfer", so it' more
> clear that transfer can match both directions (both ingress and egress).
> 
> Not quite. DPDK has abandoned the use of "ingress / egress" in "transfer"
> rules because "ingress" and "egress" are only applicable on the VNIC level. For
> example, there is a PF attached to DPDK application:
> packets that the application receives through this ethdev, are ingress, and
> packets that it transmits (tx_burst) are egress.
> 
> I can explain in other words. Imagine yourself standing *inside* a room which
> only has one door. When someone enters the room, it's "ingress", when
> someone leaves, it's "egress". It's relative to your viewpoint.
> In this example, such a room represents a VNIC / ethdev.
> 
> And now imagine yourself standing *outside* of another room / auditorium
> which has multiple doors / exits. You're standing near some particular exit "A"
> (VNIC / ethdev), but people may enter this room via another door "B" and then
> leave it via yet another door "C". In this case, from your viewpoint, this traffic
> cannot be considered neither ingress nor egress. Because these people do not
> approach you.
> 
> Like in this example, embedded switch is like a large auditorium with many-
> many doors / exits. And there can be many-many
> directions: packet can enter the switch via phys. port "P1"
> and then leave it via another phys. port "P2". Or it can enter the switch via
> phys. port and the leave it via VF's logical port (to be delivered to a guest
> machine), or a packet can travel from one VF to another one.
> 
> There's no PRE-DEFINED direction like "north to south" or "east to west".
> And this explains why it's very undesirable to use term "direction".
> 
> > REPRESENTED_PORT is the evolution of "port_id", I think, it' only one kind of
> matching items.
> 
> Yes. But nobody prevents you from defining yet another match item which will
> be able to refer to a *group* of ports which have something in common (i.e.
> "all guest ports of this switch"
> pointing to all logical ports currently attached to virtual machines / guests, or
> "all wire ports of this swtich").
> 
> >
> > For large scale deployment like 10M rules, if we can save resources
> significantly by introducing direction, why not?
> 
> I do not deny the fact that you have a use case where resources can be saved
> significantly if you give the PMD some extra knowledge when creating a flow
> table / pattern template. That's totally OK. What I object is the very
> implementation and the use of term "direction". If you add new item types
> (like above), then, when you create an async table 1 pattern template, you will
> have item ANY_WIRE_PORTS, and, for table 2 pattern template, you'll have
> item ANY_GUEST_PORTS.
> As you see, the two pattern templates now differ because the match criteria
> use different items.
> 
> >
> > Again, async API:
> > 1. pattern template A
> > 2. action template B
> > 3. table C with pattern template A + action template B.
> > 4. rule D, E, F...
> > The specified REPRESENTED_PORT is provided in rules (D, E, F...) not pattern
> template A or action template B or table C.
> > Resources may be allocated early at step 3 since table' rule_nums property.
> 
> No, item REPRESENTED_PORT *can* be provided inside pattern template A,
> but, as you pointed out earlier, the problem is that you can't distinguish
> different pattern templates which have this item, because pattern templates
> know nothing about *exact* port IDs and only know item MASKS. Yes, I agree
> that in your case such problem exists, but, as I say above, it can be solved by
> adding new item types: one for referring to all phys. ports of a given NIC and
> another one for pointing to a group of current guest users (VFs).
> 
> >>> The underlay is the one we have met for now.
> >>>>>
> >>>>>>> Introduce one new member transfer_mode into rte_flow_attr to
> >>>>>>> indicate the flow table direction property: from wire, from vf
> >>>>>>> or bi-direction(default).
> >>>>>>
> >>>>>> AFAIK, 'rte_flow_attr' serves both traditional flow rule
> >>>>>> insertion and asynchronous (table) approach. The patch adds the
> >>>>>> attributes to generic 'rte_flow_attr' but, for some reason, ignores non-
> table rules.
> >>>>>>
> >>>>>>>
> >>>>> Sync API uses one rule to contain everything. It' hard for PMD to
> >>>>> determine
> >>>> if this rule has direction preference or not.
> >>>>> Image a situation, just for an example:
> >>>>> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
> >>>>> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
> >>>>> 1 and 2 share the same matching conditions (eth / ipv4 / udp /
> >>>>> vxlan /...), so
> >>>> sync API consider them share matching determination logic.
> >>>>> It means "2" have 1M scale capability too. Obviously, it wastes a
> >>>>> lot of
> >>>> resources.
> >>>>
> >>>> Strictly speaking, they do not share the same match pattern.
> >>>> Your example clearly shows that, in (1), the pattern should request
> >>>> packets coming from "vport 1" and, in (2), packets coming from "vport 0".
> >>>>
> >>>> My point is simple: the "vport" from which packets enter the
> >>>> embedded switch is ALSO a match criterion. If you accept this,
> >>>> you'll see: the matching conditions differ.
> >>>>
> >>> See above.
> >>> In this case, I think the matching fields are both "port_id +
> >>> ipv4_vxlan". They
> >> are same.
> >>> Only differs with values like vni 100 or 200 vice versa.
> >>
> >> Not quite. Look closer: you use *different* port IDs for (1) and (2).
> >> The value of "ethdev_id" field in item REPRESENTED_PORT differs.
> >>
> >>>>>
> >>>>> In async API, there is pattern_template introduced. We can mark "1"
> >>>>> to use
> >>>> pattern_tempate id 1 and "2" to use pattern_template 2.
> >>>>> They will be separated from each other, don't share anymore.
> >>>>
> >>>> Consider an example. "Wire" is a physical port represented by PF0
> >>>> which, in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is
> >>>> attached to guest and is represented by a representor ethdev 1 in DPDK.
> >>>>
> >>>> So, some rules (template 1) are needed to deliver packets from "wire"
> >>>> to "VF" and also decapsulate them. And some rules (template 2) are
> >>>> needed to deliver packets in the opposite direction, from "VF"
> >>>> to "wire" and also encapsulate them.
> >>>>
> >>>> My question is, what prevents you from adding match item
> >>>> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
> >>>> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
> >>>>
> >>>> As I said previously, if you insert such item before eth / ipv4 /
> >>>> etc to your match pattern, doing so defines an *exact* direction / source.
> >>>>
> >>> Could you check the async API guidance? I think pattern template
> >>> focusing
> >> on the matching field (mask).
> >>> "REPRESENTED_PORT[ethdev_id=0] " and
> >> "REPRESENTED_PORT[ethdev_id=1] "are the same.
> >>> 1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
> >>> 2. action template: action1 / actions2. / 3. table create with
> >>> pattern_template plus action template..
> >>> REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create
> >> REPRESENTED_PORT port_id is 0 / actions ....
> >>> REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create
> >> REPRESENTED_PORT port_id is 1 / actions ....
> >>
> >> OK, so, based on this explanation, it appears that you might be
> >> looking to refer
> >> to:
> >> a) a *set* of any physical (wire) ports
> >> b) a *set* of any guest ports (VFs)
> >>
> > Great, looks we are more and more closer to the agreement.
> 
> Looks so.
> 
> >> You chose to achieve this using an attribute, but:
> >>
> >> 1) as I explained above, the use of term "direction" is wrong;
> >>     please hear me out: I'm not saying that your use case and
> >>     your optimisation is wrong: I'm saying that naming for it
> >>     is wrong: it has nothing to do with "direction";
> >>
> > Do you have any better naming proposal?
> 
> As I said, what you are trying to achieve using a new attribute would be way
> better to achieve using new pattern items which can be easily told one from
> another in PMD when pre-allocaing resources for different async flow tables.
> 
> So, I don't have any proposal for *attribute* naming.
> What I propose is to consider new items instead.
> 
> >> 2) while naming a *set* of wire ports as "wire_orig" might be OK,
> >>     sticking with term "vf_orig" for a *set* of guest ports is
> >>     clearly not, simply because the user may pass another PF
> >>     to a guest instead of passing a VF; in other words,
> >>     a better term is needed here;
> >>
> > Like you said, vport may contain VF, SF etc. vport_orgin is on the logic switch
> perspective.
> > Any proposal is welcome.
> 
> The problem is, vport can be easily confused with a slightly more generic
> "lport" (embedded switch's "logical port"), and, logical ports, in turn, are not
> confined to just VFs or PFs. For example, physical (wire) ports are ALSO logical
> ports of the switch.
> 
> >> 3) since it is possible to plug multiple NICs to a DPDK application,
> >>     even from different vendors, the user may end up having multiple
> >>     physical ports belonging to different physical NICs attached to
> >>     the application; if this is the case, then referring to a *set*
> >>     of wire ports using the new attribute is ambiguous in the
> >>     sense that it's unclear whether this applies only to
> >>     wire ports of some specific physical NIC or to the
> >>     physical ports of *all* NICs managed by the app;
> >>
> > Not matter how many NICs has been probed by the DPDK, there is always
> switch/PF/VF/SF.. concept.
> 
> Correct.
> 
> > Each switch must have an owner identified by transfer_proxy(). Vport (VF/SF)
> can't cross switch in normal case.
> 
> No. That is not correct. This is tricky, but please hear me out: an individual NIC
> board (that is, a given *switch*) is identified only by its switch domain ID. As I
> explained above, "transfer proxy" is just a technical hint for the applcation to
> indicate an ethdev through which "transfer" rules must be managed. Not all
> vendors support this concept (and they are not obliged to support it).
> 
> > The traffic comes from one NIC can't be offloaded by other NICs unless
> forwarded by the application.
> 
> Right, but forwarding in software (inside DPDK application) is out of scope with
> regard to the problem that we're discussing.
> 
> > If user use new attribute to cut one side resource, I think user is smart
> enough to management the rules in different NICs.
> 
> As I explained above, I do not deny the existence of the problem that your
> patch is trying to solve. Now it looks like we're on the same page with regard
> to understanding the fact that what you're trying to do is to introduce a match
> criterion that would refer to a GROUP of similar ports. In my opinion, this is
> not an *attribute*, it's a *match criterion*, and it should be implemented as
> two new items.
> 
> Having two different item types would perfectly fit the need to know the
> difference between such "directions" (as per your terminology) early enough,
> when parsing templates.
> 
> > No default behavior changed with this update.
> >
> >> 4) adding an attribute instead of yet another pattern item type
> >>     is not quite good because PMDs need to be updated separately
> >>     to detect this attribute and throw an error if it's not
> >>     supported, whilst with a new item type, the PMDs do not
> >>     need to be updated = if a PMD sees an unsupported item
> >>     while traversing the item with switch () { case }, it
> >>     will anyway throw an error;
> >>
> > PMD also need to check if it supports new matching item or not, right?
> > We can't assume NIC vendor' PMD implementation, right?
> 
> No-no-no. Imagine a PMD which does not support "transfer" rules.
> In such PMD, in the flow parsing function one would have:
> 
> if (!!attr->transfer) {
>      print_error("Transfer is not supported");
>      return EINVAL;
> }
> 
> If you add a new attribute, then PMDs which are NOT going to support it need
> to be updated to add similar check.
> Otherwise, they will simply ignore presence / absence of the attribute in the
> rule, and validation result will be unreliable.
> 
> Yes, if this attribute is 0x0, then indeed behaviour does nto change. But what if
> it's 0x1 or 0x2?
> PMDs that do not support these values must somehow reject such rules on
> parsing.
> 
> However, this problem does not manifest itself when parsing items. Typially, in
> a PMD, one would have:
> 
> switch (item->type) {
>      case RTE_FLOW_ITEM_TYPE_VOID:
>          break;
> 
>      case RTE_FLOW_ITEM_TYPE_ETH:
>          /* blah-blah-blah */
>          break;
> 
>      default:
>          return ENOTSUP;
> }
Are you assuming all PMDs will be implemented in the upper style?
This new field targets async API which was added recently. No impact on sync API.
I don't predict any effort on the existing PMD behavior.
But agree with you: we should emphasize it' only for async mode.

> 
> So, if you introduce two new item types to solve your problem, then you won't
> have to update existing PMDs. If the vendor wants to support the new items
> (say, MLX or SFC), they'll update their code to accept the items. But other
> vendors will not do anything. If the user tries to pass such an item to a vendor
> which doesn't support the feature, the "default" case will just throw an error.
> 
> This is what I mean when pointing out such difference between adding an
> attribute VS adding new item types.
> 
> >> 5) as in (4), a new attribute is not good from documentation
> >>     standpoint; plase search for "represented_port = Y" in
> >>     documentation = this way, all supported items are
> >>     easily defined for various NIC vendors, but the
> >>     same isn't true for attributes = there is no
> >>     way to indicate supported attributes in doc.
> >>
> >> If points (1 - 5) make sense to you, then, if I may be so bold, I'd
> >> like to suggest that the idea of adding a new attribute be abandoned.
> >> Instead, I'd like to suggest adding new items:
> >>
> >> (the names are just sketch, for sure, it should be discussed)
> >>
> >> ANY_PHY_PORTS { switch_domain_id }
> >>   = match packets entering the embedded switch from *whatever*
> >>     physical ports belonging to the given switch domain
> >>
> > How many PHY_PORTS can one switch have, per your thought? Can I treat
> the PHY_PORTS as the { switch_domain_id } owner as transfer_proxy()?
> 
> A single physical NIC board is supposed to have a single embedded switch
> engine. Hence, if the NIC board has, in example, two or four physical ports,
> these will be the physical ports of the switch. That's it.
> 
> As for the transfer proxy, please see my explanations above.
> It's not *always* reliable to tell whether two given ethdevs belong to the same
> physical NIC board or not.
> 
> Switch domain ID is the right criterion (for applications).
> 
> >> ANY_GUEST_PORTS { switch_domain_id }
> >>   = match packets entering the embedded switch from *whatever*
> >>     guest ports (VFs, PFs, etc.) belonging to the given
> >>     switch domain
> >>
> >> The field "switch_domain_id" is required to tell one physical board /
> >> vendor from another (as I explained in point (3)).
> >> The application can query this parameter from ethdev's switch info:
> >> please see "struct rte_eth_switch_info".
> >>
> >> What's your opinion?
> >>
> > How can we handle ANY_PHY_PORTS/ ANY_GUEST_PORTS ' relationship
> with REPRESENTED_PORT if conflicts?
> > Need future tuning.
> 
> And if you carry on with "vf_orig" / "wire_orig" approach, you will inevitably
> have the very same problem: possible conflict with items like
> REPRESENTED_PORT. So does it matter? Yes, checks need to be done by PMDs
> when parsing patterns.
> 
> > Like I said before,  offloaded rules can't cross different NIC vendor'
> "switch_domain_id".
> > If user probes multiple NICs in one application, application should take care
> of packet forwarding.
> > Also application should be aware which ports belong to which NICs.
> 
> Yes, perhaps, domain ID is not needed in the new items.
> But the application still must keep track of switch domain IDs itself so it knows
> which rules to manage via which ethdevs.
> 
> Any other opinions?
ANY_PHY_PORTS/ ANY_GUEST_PORTS looks like a super set of ports. 
This will come another challenge: "why can't we use REPRESENTED_PORT  with mask" or "combine several REPRESENTED_PORT together"?
> 
> >>>
> >>>>>
> >>>>>> For example, the diff below adds the attributes to "table"
> >>>>>> commands in testpmd but does not add them to regular (non-table)
> >>>>>> commands like "flow create". Why?
> >>>>>>
> >>>>>>>
> >>>>>
> >>>>> "table" command limits pattern_template to single direction or
> >>>>> bidirection
> >>>> per user specified attribute.
> >>>>
> >>>> As I say above, the same effect can be achieved by adding item
> >>>> REPRESENTED_PORT to the corresponding pattern template.
> >>> See above.
> >>>>
> >>>>> "rule" command must tight with one "table_id", so the rule will
> >>>>> inherit the
> >>>> "table" direction property, no need to specify again.
> >>>>
> >>>> You migh've misunderstood. I do not talk about "rule" command
> >>>> coupled with some "table". What I talk about is regular, NON-async
> >>>> flow insertion commands.
> >>>>
> >>>> Please take a look at section "/* Validate/create attributes. */"
> >>>> in file "app/test-pmd/cmdline_flow.c". When one adds a new flow
> >>>> attribute, they should reflect it the same way as VC_INGRESS,
> >> VC_TRANSFER, etc.
> >>>>
> >>>> That's it.
> >>> We don't intend to pass this to sync API. The above code example is
> >>> for sync
> >> API.
> >>
> >> So I understand. But there's one slight problem: in your patch, you
> >> add the new attributes to the structure which is *shared* between
> >> sync and async use case scenarios. If one adds an attribute to this
> >> structure, they have to provide accessors for it in all sync-related
> >> commands in testpmd, but your patch does not do that.
> >>
> > Like the title said, "creating transfer table" is the ASYNC operation.
> > We have limited the scope of this patch. Sync API will be another story.
> > Maybe we can add one more sentence to emphasize async API again.
> 
> No-no-no. There might be slight misunderstanding. I understand that you are
> limiting the scope of your patch by saying this and this.
> That's OK. What I'm trying to point out is the fact that your patch nevertheless
> touches the COMMON part of the flow API which is shared between two
> approaches (sync and async).
Yeah, you are right, we should emphasize it for async API not sync in the code and comments.
> 
> Imagine a reader that does not know anything about the async approach.
> He just opens the file in vim and goes directly to struct rte_flow_attr.
> And, over there, he sees the new attribute "wire_orig". He then immediately
> assumes that these attributes can be used in testpmd. Now the reader opens
> testpmd and tries to insert a flow rule using the sync approach:
> 
> flow create priority 0 transfer vf_orig pattern / ... / end actions drop
> 

This is wrong statement.
If user has no idea with cmdline usage, he should rely on "tab indication' not something by guessing.

The command prefix "flow" bifurcated now to sync and async now, user may use any keyword combinations. 
He will get "argument error" if it's not good unless he knows what' he is doing.
Again:  we should emphasize it's only for async API only.

> And doing so will be a failure, because your patch does not add the new
> attribute keyword to sync flow rule syntax parser. That's it.
> 
> Once again, I should ephasize: the reader MAY know nothing about the async
> approach. But if the attribute is present in "struct rte_flow_attr", it
> immediately means that it is available everywhere. Both sync and async.
> 
> So, with this in mind, your attempt to limit the scope of the patch to async-only
> rules looks a little bit artificial. It's not correct from the *formal* standpoint.
> 
> >
> >> In other words, it is wrong to assume that "struct rte_flow_attr"
> >> only applies to async approach. It had been introduced long before
> >> the async flow design was added to DPDK. That's it.
> >>
> >>>>
> >>>> But, as I say, I still believe that the new attributes aren't needed.
> >>> I think we are not at the same page for now. Can we reach agreement
> >>> on the same matching criteria first?
> >>>>>
> >>>>>>> It helps to save underlayer memory also on insertion rate.
> >>>>>>
> >>>>>> Which memory? Host memory? NIC memory? Term "underlayer" is
> >> vague.
> >>>>>> I suggest that the commit message be revised to first explain how
> >>>>>> such memory is spent currently, then explain why this is not
> >>>>>> optimal and, finally, which way the patch is supposed to improve
> >>>>>> that. I.e. be more
> >>>> specific.
> >>>>>>
> >>>>>>>
> >>>>>
> >>>>> For large scalable rules, HW (depends on implementation) always
> >>>>> needs
> >>>> memory to hold the rules' patterns and actions, either from NIC or
> >>>> from
> >> host.
> >>>>> The memory footprint highly depends on "user rules' complexity",
> >>>>> also diff
> >>>> between NICs.
> >>>>> ~50% memory saving is expected if one-direction is cut.
> >>>>
> >>>> Regardless of this talk, this explanation should probably be
> >>>> present in the commit description.
> >>>>
> >>> This number may differ with different NICs or implementation. We
> >>> can't say
> >> it for sure.
> >>
> >> Not an exact number, of course, but a brief explanation of:
> >> a) what is wrong / not optimal in the current design;
> > Please check the commit log, transfer have the capability to match bi-
> direction traffic no matter what ports.
> >> b) how it is observed in customer deployments;
> > Customer have the requirements to save resources and their offloaded rules
> is direction aware.
> >> c) why the proposed patch is a good solution.
> > New attributes provide the way to remove one direction and save underlayer
> resource.
> > All of the above can be found in the commit log.
> 
> I understand all of that, but my point is, the existing commit message is way
> too brief. Yes, it mentions that SOME customers have SOME deployments, but
> it does not shed light on which specifics these deployments have. For example,
> back in the day, when items PORT_REPRESENTOR and REPRESENTED_PORT
> were added, the cover letter for that patch series provided details of
> deployment specifics (application: OvS, scenario: full offload rules).
> 
> So, it's always better to expand on such specifics so that the reader has full
> picture in their head and doesn't need to look elsewhere.
> Not all readers of the commit message will be happy to delve into our
> discussions on the mailing list to get the gist.
> 
It' approach diverse. Pattern item approach will attract another discussion thread, right?
We should get a conclusion and reflect in the commit changes&logs, and it's easy for others to absorb.
> >
> >>
> >
> >>>>>
> >>>>>>> By default, the transfer domain is bi-direction, and no behavior
> changes.
> >>>>>>>
> >>>>>>> 1. Match wire origin only
> >>>>>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
> >>>>>>> 2. Match vf origin only
> >>>>>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
> >>>>>>>
> >>>>>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
> >>>>>>> ---
> >>>>>>> app/test-pmd/cmdline_flow.c                 | 26
> +++++++++++++++++++++
> >>>>>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
> >>>>>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
> >>>>>>> 3 files changed, 36 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/app/test-pmd/cmdline_flow.c
> >>>>>>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82
> >>>>>>> 100644
> >>>>>>> --- a/app/test-pmd/cmdline_flow.c
> >>>>>>> +++ b/app/test-pmd/cmdline_flow.c
> >>>>>>> @@ -177,6 +177,8 @@ enum index {
> >>>>>>>       TABLE_INGRESS,
> >>>>>>>       TABLE_EGRESS,
> >>>>>>>       TABLE_TRANSFER,
> >>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
> >>>>>>> +     TABLE_TRANSFER_VF_ORIG,
> >>>>>>>       TABLE_RULES_NUMBER,
> >>>>>>>       TABLE_PATTERN_TEMPLATE,
> >>>>>>>       TABLE_ACTIONS_TEMPLATE,
> >>>>>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] =
> {
> >>>>>>>       TABLE_INGRESS,
> >>>>>>>       TABLE_EGRESS,
> >>>>>>>       TABLE_TRANSFER,
> >>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
> >>>>>>> +     TABLE_TRANSFER_VF_ORIG,
> >>>>>>>       TABLE_RULES_NUMBER,
> >>>>>>>       TABLE_PATTERN_TEMPLATE,
> >>>>>>>       TABLE_ACTIONS_TEMPLATE,
> >>>>>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
> >>>>>>>               .next = NEXT(next_table_attr),
> >>>>>>>               .call = parse_table,
> >>>>>>>       },
> >>>>>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
> >>>>>>> +             .name = "wire_orig",
> >>>>>>> +             .help = "affect rule direction to transfer",
> >>>>>>
> >>>>>> This does not explain the "wire" aspect. It's too broad.
> >>>>>>
> >>>>>>> +             .next = NEXT(next_table_attr),
> >>>>>>> +             .call = parse_table,
> >>>>>>> +     },
> >>>>>>> +     [TABLE_TRANSFER_VF_ORIG] = {
> >>>>>>> +             .name = "vf_orig",
> >>>>>>> +             .help = "affect rule direction to transfer",
> >>>>>>
> >>>>>> This explanation simply duplicates such of the "wire_orig".
> >>>>>> It does not explain the "vf" part. Should be more specific.
> >>>>>>
> >>>>>>> +             .next = NEXT(next_table_attr),
> >>>>>>> +             .call = parse_table,
> >>>>>>> +     },
> >>>>>>>       [TABLE_RULES_NUMBER] = {
> >>>>>>>               .name = "rules_number",
> >>>>>>>               .help = "number of rules in table", @@ -8894,6
> >>>>>>> +8910,16 @@ parse_table(struct context *ctx, const struct token
> >>>>>>> +*token,
> >>>>>>>       case TABLE_TRANSFER:
> >>>>>>>               out->args.table.attr.flow_attr.transfer = 1;
> >>>>>>>               return len;
> >>>>>>> +     case TABLE_TRANSFER_WIRE_ORIG:
> >>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
> >>>>>>> +                     return -1;
> >>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
> >>>>>>> +             return len;
> >>>>>>> +     case TABLE_TRANSFER_VF_ORIG:
> >>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
> >>>>>>> +                     return -1;
> >>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
> >>>>>>> +             return len;
> >>>>>>>       default:
> >>>>>>>               return -1;
> >>>>>>>       }
> >>>>>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>>>> index 330e34427d..603b7988dd 100644
> >>>>>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >>>>>>> @@ -3332,7 +3332,8 @@ It is bound to
> >>>>>> ``rte_flow_template_table_create()``::
> >>>>>>>
> >>>>>>>   flow template_table {port_id} create
> >>>>>>>       [table_id {id}] [group {group_id}]
> >>>>>>> -       [priority {level}] [ingress] [egress] [transfer]
> >>>>>>> +       [priority {level}] [ingress] [egress]
> >>>>>>> +       [transfer [vf_orig] [wire_orig]]
> >>>>>>
> >>>>>> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
> >>>>>> [wire_orig] ?
> >>>>>>
> >>>>>>>       rules_number {number}
> >>>>>>>       pattern_template {pattern_template_id}
> >>>>>>>       actions_template {actions_template_id} diff --git
> >>>>>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> >>>>>>> a79f1e7ef0..512b08d817 100644
> >>>>>>> --- a/lib/ethdev/rte_flow.h
> >>>>>>> +++ b/lib/ethdev/rte_flow.h
> >>>>>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
> >>>>>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
> >>>>>>>        */
> >>>>>>>       uint32_t transfer:1;
> >>>>>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
> >>>>>>> +     /**
> >>>>>>> +      * 0 means bidirection,
> >>>>>>> +      * 0x1 origin uplink,
> >>>>>>
> >>>>>> What does "uplink" mean? It's too vague. Hardly a good term.
> >>
> >> I believe this comment should be reworked, in case the idea of having
> >> an extra attribute persists.
> >>
> >>>>>>
> >>>>>>> +      * 0x2 origin vport,
> >>>>>>
> >>>>>> What does "origin vport" mean? Hardly a good term as well.
> >>
> >> I still believe this explanation is way too brief and needs to be
> >> reworked to provide more details, to define the use case for the attribute
> more specifically.
> >>
> >>>>>>
> >>>>>>> +      * N/A both set.
> >>>>>>
> >>>>>> What's this?
> >>
> >> The question stands.
> >>
> >>>>>>
> >>>>>>> +      */
> >>>>>>> +     uint32_t transfer_mode:2;
> >>>>>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
> >>>>>>> };
> >>>>>>>
> >>>>>>> /**
> >>>>>>> --
> >>>>>>> 2.27.0
> >>>>>>>
> >>>>>>
> >>>>>> Since the attributes are added to generic 'struct rte_flow_attr',
> >>>>>> non-table
> >>>>>> (synchronous) flow rules are supposed to support them, too. If
> >>>>>> that is indeed the case, then I'm afraid such proposal does not
> >>>>>> agree with the existing items PORT_REPRESENTOR and
> REPRESENTED_PORT.
> >> They
> >>>>>> do exactly the same thing, but they are designed to be way more
> >>>>>> generic. Why
> >>>> not use them?
> >>>>
> >>>> The question stands.
> >>>>
> >>>>>>
> >>>>>> Ivan
> >>>>>
> >>>>
> >>>> Ivan
> >>>
> >
> 
> Thank you.
  
Ivan Malov Sept. 15, 2022, 7:47 a.m. UTC | #11
Hi Rongwei,

On Thu, 15 Sep 2022, Rongwei Liu wrote:

> HI Ivan:
>
> BR
> Rongwei
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Wednesday, September 14, 2022 23:18
>> To: Rongwei Liu <rongweil@nvidia.com>
>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi Rongwei,
>>
>> On Wed, 14 Sep 2022, Rongwei Liu wrote:
>>
>>> HI
>>>
>>> BR
>>> Rongwei
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>> Sent: Wednesday, September 14, 2022 15:32
>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the
>>>> transfer table
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On Wed, 14 Sep 2022, Rongwei Liu wrote:
>>>>
>>>>> HI
>>>>>
>>>>> BR
>>>>> Rongwei
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>> Sent: Tuesday, September 13, 2022 22:33
>>>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>>>> Subject: RE: [PATCH v1] ethdev: add direction info when creating
>>>>>> the transfer table
>>>>>>
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> Hi Rongwei,
>>>>>>
>>>>>> PSB
>>>>>>
>>>>>> On Tue, 13 Sep 2022, Rongwei Liu wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> BR
>>>>>>> Rongwei
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>>>> Sent: Tuesday, September 13, 2022 00:57
>>>>>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>>>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>>>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
>>>>>>>> NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
>>>>>>>> Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang
>>>>>>>> <yuying.zhang@intel.com>; Andrew Rybchenko
>>>>>>>> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
>>>>>>>> <rasland@nvidia.com>
>>>>>>>> Subject: Re: [PATCH v1] ethdev: add direction info when creating
>>>>>>>> the transfer table
>>>>>>>>
>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
>>>>>>>>
>>>>>>>>> The transfer domain rule is able to match traffic wire/vf origin
>>>>>>>>> and it means two directions' underlayer resource.
>>>>>>>>
>>>>>>>> The point of fact is that matching traffic coming from some
>>>>>>>> entity like wire / VF has been long generalised in the form of
>> representors.
>>>>>>>> So, a flow rule with attribute "transfer" is able to match
>>>>>>>> traffic coming from either a REPRESENTED_PORT or from a
>>>> PORT_REPRESENTOR
>>>>>> (please find these items).
>>>>>>>>
>>>>>>>>>
>>>>>>>>> In customer deployments, they usually match only one direction
>>>>>>>>> traffic in single flow table: either from wire or from vf.
>>>>>>>>
>>>>>>>> Which customer deployments? Could you please provide detailed
>>>> examples?
>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> We saw a lot of customers' deployment like:
>>>>>>> 1. Match overlay traffic from wire and do decap, then send to
>>>>>>> specific
>>>> vport.
>>>>>>> 2. Match specific 5-tuples and do encap, then send to wire.
>>>>>>> The matching criteria has obvious direction preference.
>>>>>>
>>>>>> Thank you. My questions are as follows:
>>>>>>
>>>>>> In (1), when you say "from wire", do you mean the need to match
>>>>>> packets arriving via whatever physical ports rather then matching
>>>>>> packets arriving from some specific phys. port?
>>>>
>>>> ^^
>>>>
>>>> Could you please find my question above? Based on your understanding
>>>> of templates in async flow approach, an answer to this question may
>>>> help us find the common ground.
>>> It means traffic arrived from physical ports (transfer_proxy role) or south
>> band per you concept.
>>
>> Transfer proxy has nothing to do with physical ports. And I should stress out
>> that "south band" and the likes are NOT my concepts. Instead, I think that
>> direction designations like "south" or "north" aren't applicable when talking
>> about the embedded switch and its flow (transfer) rules.
>>
>>> Traffic from vport (not transfer_proxy) or north band per your concept won't
>> hit even if same packets.
>>
>> Please see above. Transfer proxy is a completely different concept.
>> And I never used "north band" concept.
>>
>>>>
>>>> --
>>>>
>>>>>>
>>>>>> If, however, matching traffic "from wire" in fact means matching
>>>>>> packets arriving from a *specific* physical port, then for sure
>>>>>> item REPRESENTED_PORT should perfectly do the job, and the proposed
>>>>>> attribute is unneeded.
>>>>>>
>>>>>> (BTW, in DPDK, it is customary to use term "physical port", not
>>>>>> "wire")
>>>>>>
>>>>>> In (1), what are "vport"s? Please explain. Once again, I should
>>>>>> remind that, in DPDK, folks prefer terms "represented entity" /
>>>> "representor"
>>>>>> over vendor-specific terms like "vport", etc.
>>>>>>
>>>>> Vport is virtual port for short such as VF.
>>>>
>>>> Thanks. As I say, term "vport" might be confusing to some readers, so
>>>> it'd be better to provide this explanation (about VF) in the commit
>>>> description next time.
>>> Ack. Will add VF as an example.
>>>>
>>>>>> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
>>>>>> Could you please explain, why not just add a match item
>>>>>> REPRESENTED_PORT pointing to that VF via its representor? Doing so
>>>>>> should perfectly define the exact direction / traffic source. Isn't
>>>>>> that
>>>> sufficient?
>>>>>>
>>>>> Per my view, there is matching field and matching value difference.
>>>>> Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as
>>>>> same or
>>>> different matching criteria?
>>>>> I would like to call them same since it can be summarized like
>>>>> 1.1.1.0/30 REPRESENTED_PORT is just another matching item, no
>>>>> essential
>>>> differences and it can't stand for direction info.
>>>>
>>>> It looks like we're starting to run into disagreement here.
>>>> There's no "direction" at all. There's an embedded switch inside the
>>>> NIC, and there're (logical) switch ports that packets enter the switch from.
>>>>
>>>> When the user submits a "transfer" rule and does not provide neither
>>>> REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern, the
>> embedded
>>>> switch is supposed to match packets coming from ANY ports, be it VFs
>>>> or physical (wire) ports.
>>>>
>>>> But when the user provides, in example, item REPRESENTED_PORT to
>>>> point to the physical (wire) port, the embedded switch knows exactly
>>>> which port the packets should enter it from.
>>>> In this case, it is supposed to match only packets coming from that
>>>> physical port. And this should be sufficient.
>>>> This in fact replaces the need to know a "direction".
>>>> It's just an exact specification of packet's origin.
>>>>
>>> There is traffic arriving or leaving the switch, so there is always direction,
>> implicit or explicit.
>>
>> This does not contradict my thoughts above. "Direction" is *defined* by two
>> points (like in geometry): an initial point (the switch port through which a
>> packet enters the switch) and the terminal point (the match engine inside the
>> switch). If one knows these two points, no extra hints are required to specify
>> some "direction". Because direction is already represented by this "vector" of
>> sorts. That's why presence of the port match item in the pattern is absolutely
>> sufficient.
> Good to see this. Thank for the information.

You're very welcome.

> This update leverages the concept exactly defined by you: "an initial point (the switch port through which a
> packet enters the switch)"

No, it doesn't seem so. Based on your explanations, it appears that
this update tries to refer to a "super set" of ports which have
something in common. For example, with attribute "wire_orig"
you seem to be trying to request that the rule match packets
arriving from wire through ANY of the phys.ports. So my point
is: why express an obvious match item as an attrbiute?

For example, nobody tries to replace match item IPv4 with
an attribute "is_ipv4". That would be strange, to say the
least. Why should the "vf_orig" case be an exception then?

> If you think direction not good, we can change to other words like "initial port"/"origin port" etc.

As I explained multiple times, "direction" is rather obscure from the
viewpoint located inside the embedded switch. Yes, on non-transfer (VNIC)
level, there are *exactly* two directions: ingress and egress.
But, inside of the embedded switch (transfer rules), there can
be *multiple* various "directions", which are not even
directions, = they're traffic PATHs in fact.

Renaming to "intitial port" and "origin port" won't be helpful either
because, for users, it will be hard to figure out the difference
between the attribute and items PORT_REPRESENTOR / REPRESENTED_PORT.

If, however, you add new items instead of the attribute, the user
will likely see that the new items and the existing ones are
just alternative options = representor-based items help
to address exact ports (one rule - one port), whilst
your new items help to address super sets of ports
like "all wire ports" or "all guest ports".

So, the short of it:
1) these "wire_orig" / "vf_orig" are in fact yet another match criteria;
2) because of that, they should go to match items and not to attributes.

>>
>> However, based on your later explanations, the use of precise port item is
>> simply inconvenient in your use case because you are trying to match traffic
>> from *multiple* ports that have something in common (i.e. all VFs or all wire
>> ports).
>>
>> And, instead of adding a new item type which would serve exactly your needs,
>> you for some reason try to add an attribute, which has multiple drawbacks
>> which I described in my previous letter.
>>
>>> For transfer rules, there is a concept transfer_proxy.
>>> It takes the switch ownership; all switch rules should be configured via
>> transfer_proxy.
>>
>> Yes, such concept exists, but it's a don't care with regard to the problem that
>> we're discussing, sorry.
>> Furthermore, unlike "switch domain ID" (which is the same for all ethdevs
>> belonging to a given physical NIC board), nobody guarantees that it's only one
>> transfer proxy port. Some NIC vendors allows transfer rules to be added via
>> any ethdev port.
>>
> Does any flow rule leverage switchid already. Is it too obscure for end-user?

No, I'm not saying about flow rules. I'm explaining the logic which
application may use to identify which ethdevs are on which NICs.

Imagine a DPDK application which has two ethdevs instantiated:
one ethdev sits on top of the admin. PF (ethdev 0), the other
one sits on top of a low-privilege PF (ethdev 1).
In the latter case, it can also be a VF.

Both ethdev 0 and ethdev 1 belong to the same physical NIC board.

Now, what I'm trying to explain is the fact that "proxy"
behaviour may differ between various vendors:

- some vendors say that they can support managing "transfer" rules via
   any PFs / VFs. They do not require that some specific PF ethdev be
   used to do that. With such vendors, if the application makes a
   query "What's the proxy port ID for the ethdev 1?", it will
   get "The proxy port ID for ethdev 1 is 1" response.

- but other vendors cannot support the above workflow and they require
   that "transfer" rules be managed using some specific (admin) ethdev.
   If the application makes the same query here, it will get the
   following response: "The proxy port ID for ethdev 1 is 0".

So, given these explanations, it is incorrect to assume that
the proxy port ID for all ethdevs belonging to the same NIC
board will be the same. They simply may not be like this.

However, *regardless* of the two above scenarious and regardless
of vendor, for NICs which have embedded switch feature, when the
user tries to check the "switch domain ID" for ethdev 0 and
ethdev 1, they will get the same value. So, this should be
the right criterion for the application (not for flow
rules themselves) to decide which ethdev belongs to
which physical NIC board.

>>>
>>> Image a logic switch with one PF and two VFs.
>>> PF is the transfer proxy and VF belongs to the PF logically.
>>> When receiving traffic from PF, we can say it comes into the logic switch.
>>
>> That's correct.
>>
>>> When packet sent from VF (VF belongs to PF), so we can say traffic leaves
>> the switch.
>>
>> That's not correct. Traffic sent from VF (for example, a guest VM is sending
>> packets) also *enters* the switch. PFs and VFs are in fact *separate* logical
>> ports of the embedded switch.
>>
>>>
>>> Item REPRESENTED_PORT indicates switch to match traffic sent from which
>> port, comes into, or leave switch.
>>
>> That is not correct either. Item REPRESENTED_PORT tells the switch to match
>> packets which come into the switch FROM the logical port which is
>> represented by the given DPDK ethdev.
>>
>> For example, if ethdev="E" is the *main* PF which is bound to physical port "P",
>> then item REPRESENTED_PORT with ethdev ID being set to "E" tells the switch
>> that only packet coming to NIC from *wire* via physical port "E" should match.
>>
>>> We can say it as one kind of packet metadata.
>>
>> Kind of yes, but might be vendor-specific. No need to delve into this.
>>
>>> Like you said, DPDK always treat transfer to match any PORTs traffic.
>>
>> Slight correction: it treats it this way until it sees an exact port item.
>> If the user provides REPRESENTED_PORT (or PORT_REPRESENTOR), it's no
>> longer *any* ports traffic, it's an exact port traffic. That's it.
>>
>>> When REPRESENTED_PORT is specified, the rules are limited to some
>> dedicated PORTs.
>>
>> These rules match only packets arriving TO the embedded switch FROM the
>> said dedicated ports.
>>
>>> Other PORTs are ignored because metadata mismatching.
>>
>> Kind of yes, correct.
>>
>>> Rules still have the capability to match ANY PORTS if metadata matched.
>>
>> This statement is only correct for the cases when the user does NOT use
>> neither item REPRESENTED_PORT nor item PORT_REPRESENTOR.
>>
>>>
>>> This update will allow user to cut the other PORTs matching capabilities.
>>
>> As I explained, this is exactly what items PORT_REPRESENTOR and
>> REPRESENTED_PORT do. No need to have an extra attribute.
>>
>> If the user adds item REPRESENTED_PORT with ethdev_id="E", like in the
>> above example, to match packets entering NIC via the physical port "P", then
>> this rule will NOT match packets entering NIC from other points. For example,
>> packets transmitted by a virtual machine via a VF will not match in this case.
>>
>>>>> Port id depends on the attach sequence.
>>>>
>>>> Unfortunately, this is hardly a good argument because flow rules are
>>>> supposed to be inserted based on the run-time packet learning. Attach
>>>> sequence is a don't care here.
>>>>
>>>>>> Also please mind that, although I appreciate your explanations
>>>>>> here, on the mailing list, they should finally be added to the
>>>>>> commit message, so that readers do not have to look for them elsewhere.
>>>>>>
>>>>> We have explained the high possibility of single-direction matching, right?
>>>>
>>>> Not quite. As I said, it is not correct to assume any "direction",
>>>> like in geographical sense ("north", "south", etc.). Application has
>>>> ethdevs, and they are representors of some "virtual ports" (in your
>>>> terminology) belonging to the switch, for example, VFs, SFs or physical
>> ports.
>>>>
>>>> The user adds an appropriate item to the pattern (REPRESENTED_PORT),
>>>> and doing so specifies the packet path which it enters the switch.
>>>>
>>>>> It' hard to list all the possibilities of traffic matching preferences.
>>>>
>>>> And let's say more: one need never do this. That's exactly the reason
>>>> why DPDK has abandoned the concept of "direction" in *transfer* rules
>>>> and switched to the use of precise criteria (REPRESENTED_PORT, etc.).
>>>>
>>> As far as I know, DPDK changes "transfer ingress" to "transfer", so it' more
>> clear that transfer can match both directions (both ingress and egress).
>>
>> Not quite. DPDK has abandoned the use of "ingress / egress" in "transfer"
>> rules because "ingress" and "egress" are only applicable on the VNIC level. For
>> example, there is a PF attached to DPDK application:
>> packets that the application receives through this ethdev, are ingress, and
>> packets that it transmits (tx_burst) are egress.
>>
>> I can explain in other words. Imagine yourself standing *inside* a room which
>> only has one door. When someone enters the room, it's "ingress", when
>> someone leaves, it's "egress". It's relative to your viewpoint.
>> In this example, such a room represents a VNIC / ethdev.
>>
>> And now imagine yourself standing *outside* of another room / auditorium
>> which has multiple doors / exits. You're standing near some particular exit "A"
>> (VNIC / ethdev), but people may enter this room via another door "B" and then
>> leave it via yet another door "C". In this case, from your viewpoint, this traffic
>> cannot be considered neither ingress nor egress. Because these people do not
>> approach you.
>>
>> Like in this example, embedded switch is like a large auditorium with many-
>> many doors / exits. And there can be many-many
>> directions: packet can enter the switch via phys. port "P1"
>> and then leave it via another phys. port "P2". Or it can enter the switch via
>> phys. port and the leave it via VF's logical port (to be delivered to a guest
>> machine), or a packet can travel from one VF to another one.
>>
>> There's no PRE-DEFINED direction like "north to south" or "east to west".
>> And this explains why it's very undesirable to use term "direction".
>>
>>> REPRESENTED_PORT is the evolution of "port_id", I think, it' only one kind of
>> matching items.
>>
>> Yes. But nobody prevents you from defining yet another match item which will
>> be able to refer to a *group* of ports which have something in common (i.e.
>> "all guest ports of this switch"
>> pointing to all logical ports currently attached to virtual machines / guests, or
>> "all wire ports of this swtich").
>>
>>>
>>> For large scale deployment like 10M rules, if we can save resources
>> significantly by introducing direction, why not?
>>
>> I do not deny the fact that you have a use case where resources can be saved
>> significantly if you give the PMD some extra knowledge when creating a flow
>> table / pattern template. That's totally OK. What I object is the very
>> implementation and the use of term "direction". If you add new item types
>> (like above), then, when you create an async table 1 pattern template, you will
>> have item ANY_WIRE_PORTS, and, for table 2 pattern template, you'll have
>> item ANY_GUEST_PORTS.
>> As you see, the two pattern templates now differ because the match criteria
>> use different items.
>>
>>>
>>> Again, async API:
>>> 1. pattern template A
>>> 2. action template B
>>> 3. table C with pattern template A + action template B.
>>> 4. rule D, E, F...
>>> The specified REPRESENTED_PORT is provided in rules (D, E, F...) not pattern
>> template A or action template B or table C.
>>> Resources may be allocated early at step 3 since table' rule_nums property.
>>
>> No, item REPRESENTED_PORT *can* be provided inside pattern template A,
>> but, as you pointed out earlier, the problem is that you can't distinguish
>> different pattern templates which have this item, because pattern templates
>> know nothing about *exact* port IDs and only know item MASKS. Yes, I agree
>> that in your case such problem exists, but, as I say above, it can be solved by
>> adding new item types: one for referring to all phys. ports of a given NIC and
>> another one for pointing to a group of current guest users (VFs).
>>
>>>>> The underlay is the one we have met for now.
>>>>>>>
>>>>>>>>> Introduce one new member transfer_mode into rte_flow_attr to
>>>>>>>>> indicate the flow table direction property: from wire, from vf
>>>>>>>>> or bi-direction(default).
>>>>>>>>
>>>>>>>> AFAIK, 'rte_flow_attr' serves both traditional flow rule
>>>>>>>> insertion and asynchronous (table) approach. The patch adds the
>>>>>>>> attributes to generic 'rte_flow_attr' but, for some reason, ignores non-
>> table rules.
>>>>>>>>
>>>>>>>>>
>>>>>>> Sync API uses one rule to contain everything. It' hard for PMD to
>>>>>>> determine
>>>>>> if this rule has direction preference or not.
>>>>>>> Image a situation, just for an example:
>>>>>>> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
>>>>>>> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
>>>>>>> 1 and 2 share the same matching conditions (eth / ipv4 / udp /
>>>>>>> vxlan /...), so
>>>>>> sync API consider them share matching determination logic.
>>>>>>> It means "2" have 1M scale capability too. Obviously, it wastes a
>>>>>>> lot of
>>>>>> resources.
>>>>>>
>>>>>> Strictly speaking, they do not share the same match pattern.
>>>>>> Your example clearly shows that, in (1), the pattern should request
>>>>>> packets coming from "vport 1" and, in (2), packets coming from "vport 0".
>>>>>>
>>>>>> My point is simple: the "vport" from which packets enter the
>>>>>> embedded switch is ALSO a match criterion. If you accept this,
>>>>>> you'll see: the matching conditions differ.
>>>>>>
>>>>> See above.
>>>>> In this case, I think the matching fields are both "port_id +
>>>>> ipv4_vxlan". They
>>>> are same.
>>>>> Only differs with values like vni 100 or 200 vice versa.
>>>>
>>>> Not quite. Look closer: you use *different* port IDs for (1) and (2).
>>>> The value of "ethdev_id" field in item REPRESENTED_PORT differs.
>>>>
>>>>>>>
>>>>>>> In async API, there is pattern_template introduced. We can mark "1"
>>>>>>> to use
>>>>>> pattern_tempate id 1 and "2" to use pattern_template 2.
>>>>>>> They will be separated from each other, don't share anymore.
>>>>>>
>>>>>> Consider an example. "Wire" is a physical port represented by PF0
>>>>>> which, in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is
>>>>>> attached to guest and is represented by a representor ethdev 1 in DPDK.
>>>>>>
>>>>>> So, some rules (template 1) are needed to deliver packets from "wire"
>>>>>> to "VF" and also decapsulate them. And some rules (template 2) are
>>>>>> needed to deliver packets in the opposite direction, from "VF"
>>>>>> to "wire" and also encapsulate them.
>>>>>>
>>>>>> My question is, what prevents you from adding match item
>>>>>> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
>>>>>> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
>>>>>>
>>>>>> As I said previously, if you insert such item before eth / ipv4 /
>>>>>> etc to your match pattern, doing so defines an *exact* direction / source.
>>>>>>
>>>>> Could you check the async API guidance? I think pattern template
>>>>> focusing
>>>> on the matching field (mask).
>>>>> "REPRESENTED_PORT[ethdev_id=0] " and
>>>> "REPRESENTED_PORT[ethdev_id=1] "are the same.
>>>>> 1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
>>>>> 2. action template: action1 / actions2. / 3. table create with
>>>>> pattern_template plus action template..
>>>>> REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create
>>>> REPRESENTED_PORT port_id is 0 / actions ....
>>>>> REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create
>>>> REPRESENTED_PORT port_id is 1 / actions ....
>>>>
>>>> OK, so, based on this explanation, it appears that you might be
>>>> looking to refer
>>>> to:
>>>> a) a *set* of any physical (wire) ports
>>>> b) a *set* of any guest ports (VFs)
>>>>
>>> Great, looks we are more and more closer to the agreement.
>>
>> Looks so.
>>
>>>> You chose to achieve this using an attribute, but:
>>>>
>>>> 1) as I explained above, the use of term "direction" is wrong;
>>>>     please hear me out: I'm not saying that your use case and
>>>>     your optimisation is wrong: I'm saying that naming for it
>>>>     is wrong: it has nothing to do with "direction";
>>>>
>>> Do you have any better naming proposal?
>>
>> As I said, what you are trying to achieve using a new attribute would be way
>> better to achieve using new pattern items which can be easily told one from
>> another in PMD when pre-allocaing resources for different async flow tables.
>>
>> So, I don't have any proposal for *attribute* naming.
>> What I propose is to consider new items instead.
>>
>>>> 2) while naming a *set* of wire ports as "wire_orig" might be OK,
>>>>     sticking with term "vf_orig" for a *set* of guest ports is
>>>>     clearly not, simply because the user may pass another PF
>>>>     to a guest instead of passing a VF; in other words,
>>>>     a better term is needed here;
>>>>
>>> Like you said, vport may contain VF, SF etc. vport_orgin is on the logic switch
>> perspective.
>>> Any proposal is welcome.
>>
>> The problem is, vport can be easily confused with a slightly more generic
>> "lport" (embedded switch's "logical port"), and, logical ports, in turn, are not
>> confined to just VFs or PFs. For example, physical (wire) ports are ALSO logical
>> ports of the switch.
>>
>>>> 3) since it is possible to plug multiple NICs to a DPDK application,
>>>>     even from different vendors, the user may end up having multiple
>>>>     physical ports belonging to different physical NICs attached to
>>>>     the application; if this is the case, then referring to a *set*
>>>>     of wire ports using the new attribute is ambiguous in the
>>>>     sense that it's unclear whether this applies only to
>>>>     wire ports of some specific physical NIC or to the
>>>>     physical ports of *all* NICs managed by the app;
>>>>
>>> Not matter how many NICs has been probed by the DPDK, there is always
>> switch/PF/VF/SF.. concept.
>>
>> Correct.
>>
>>> Each switch must have an owner identified by transfer_proxy(). Vport (VF/SF)
>> can't cross switch in normal case.
>>
>> No. That is not correct. This is tricky, but please hear me out: an individual NIC
>> board (that is, a given *switch*) is identified only by its switch domain ID. As I
>> explained above, "transfer proxy" is just a technical hint for the applcation to
>> indicate an ethdev through which "transfer" rules must be managed. Not all
>> vendors support this concept (and they are not obliged to support it).
>>
>>> The traffic comes from one NIC can't be offloaded by other NICs unless
>> forwarded by the application.
>>
>> Right, but forwarding in software (inside DPDK application) is out of scope with
>> regard to the problem that we're discussing.
>>
>>> If user use new attribute to cut one side resource, I think user is smart
>> enough to management the rules in different NICs.
>>
>> As I explained above, I do not deny the existence of the problem that your
>> patch is trying to solve. Now it looks like we're on the same page with regard
>> to understanding the fact that what you're trying to do is to introduce a match
>> criterion that would refer to a GROUP of similar ports. In my opinion, this is
>> not an *attribute*, it's a *match criterion*, and it should be implemented as
>> two new items.
>>
>> Having two different item types would perfectly fit the need to know the
>> difference between such "directions" (as per your terminology) early enough,
>> when parsing templates.
>>
>>> No default behavior changed with this update.
>>>
>>>> 4) adding an attribute instead of yet another pattern item type
>>>>     is not quite good because PMDs need to be updated separately
>>>>     to detect this attribute and throw an error if it's not
>>>>     supported, whilst with a new item type, the PMDs do not
>>>>     need to be updated = if a PMD sees an unsupported item
>>>>     while traversing the item with switch () { case }, it
>>>>     will anyway throw an error;
>>>>
>>> PMD also need to check if it supports new matching item or not, right?
>>> We can't assume NIC vendor' PMD implementation, right?
>>
>> No-no-no. Imagine a PMD which does not support "transfer" rules.
>> In such PMD, in the flow parsing function one would have:
>>
>> if (!!attr->transfer) {
>>      print_error("Transfer is not supported");
>>      return EINVAL;
>> }
>>
>> If you add a new attribute, then PMDs which are NOT going to support it need
>> to be updated to add similar check.
>> Otherwise, they will simply ignore presence / absence of the attribute in the
>> rule, and validation result will be unreliable.
>>
>> Yes, if this attribute is 0x0, then indeed behaviour does nto change. But what if
>> it's 0x1 or 0x2?
>> PMDs that do not support these values must somehow reject such rules on
>> parsing.
>>
>> However, this problem does not manifest itself when parsing items. Typially, in
>> a PMD, one would have:
>>
>> switch (item->type) {
>>      case RTE_FLOW_ITEM_TYPE_VOID:
>>          break;
>>
>>      case RTE_FLOW_ITEM_TYPE_ETH:
>>          /* blah-blah-blah */
>>          break;
>>
>>      default:
>>          return ENOTSUP;
>> }
> Are you assuming all PMDs will be implemented in the upper style?

One may take a look at the existing PMDs. It's open source after all.

When one has an array of items of unknown count which is
END-terminated, then, obviously, the PMD has to traverse
it one way or another. If it stubles upon an unknown
item, it will have nothing to do but to throw an error.

> This new field targets async API which was added recently. No impact on sync API.

Rongwei, I see your point. The problem with it, however, is that even
if you describe it in comments, the code won't prevent non-sync API
from seeing this attribute in "struct rte_flow_attr".

As I say, "struct rte_flow_attr" has been here for ages.
When one adds a flow rule in a sync way, they fill out
the very same structure. And the user may set this new
argument to non-zero by mistake. Yes, you may argue
that the app developer should be smart enough to
read your comment before the struct member which
says that this field is for a-sync only. Right.
But that's not the only scenario. The field may
become non-zero because of some other mistake in
the program which, for example, leads to the
struct memory being corrupted in one way or
another. That's why the PMD has to validate flow rules...

So, the PMD must detect this inconsistency somehow and throw an error.
With your approach (attribute), the PMDs have to be updated to have
these checks. With the item approach that I suggest, updating the
PMDs is obviously not needed. Am I missing something? Let's discuss.

> I don't predict any effort on the existing PMD behavior.

I see your point. But how is this expressed in code?
As I explain above, consistency checks are what
flow validate API is for. New argument means
new checks. That's it.

> But agree with you: we should emphasize it' only for async mode.

It's better to express this in code. So that the problem (if any)
can be detected programmatically and not just from reading comments.
From my point of view, the easiest way to have this done is to
add items instead of attributes, = no need to update PMDs.

>
>>
>> So, if you introduce two new item types to solve your problem, then you won't
>> have to update existing PMDs. If the vendor wants to support the new items
>> (say, MLX or SFC), they'll update their code to accept the items. But other
>> vendors will not do anything. If the user tries to pass such an item to a vendor
>> which doesn't support the feature, the "default" case will just throw an error.
>>
>> This is what I mean when pointing out such difference between adding an
>> attribute VS adding new item types.
>>
>>>> 5) as in (4), a new attribute is not good from documentation
>>>>     standpoint; plase search for "represented_port = Y" in
>>>>     documentation = this way, all supported items are
>>>>     easily defined for various NIC vendors, but the
>>>>     same isn't true for attributes = there is no
>>>>     way to indicate supported attributes in doc.
>>>>
>>>> If points (1 - 5) make sense to you, then, if I may be so bold, I'd
>>>> like to suggest that the idea of adding a new attribute be abandoned.
>>>> Instead, I'd like to suggest adding new items:
>>>>
>>>> (the names are just sketch, for sure, it should be discussed)
>>>>
>>>> ANY_PHY_PORTS { switch_domain_id }
>>>>   = match packets entering the embedded switch from *whatever*
>>>>     physical ports belonging to the given switch domain
>>>>
>>> How many PHY_PORTS can one switch have, per your thought? Can I treat
>> the PHY_PORTS as the { switch_domain_id } owner as transfer_proxy()?
>>
>> A single physical NIC board is supposed to have a single embedded switch
>> engine. Hence, if the NIC board has, in example, two or four physical ports,
>> these will be the physical ports of the switch. That's it.
>>
>> As for the transfer proxy, please see my explanations above.
>> It's not *always* reliable to tell whether two given ethdevs belong to the same
>> physical NIC board or not.
>>
>> Switch domain ID is the right criterion (for applications).
>>
>>>> ANY_GUEST_PORTS { switch_domain_id }
>>>>   = match packets entering the embedded switch from *whatever*
>>>>     guest ports (VFs, PFs, etc.) belonging to the given
>>>>     switch domain
>>>>
>>>> The field "switch_domain_id" is required to tell one physical board /
>>>> vendor from another (as I explained in point (3)).
>>>> The application can query this parameter from ethdev's switch info:
>>>> please see "struct rte_eth_switch_info".
>>>>
>>>> What's your opinion?
>>>>
>>> How can we handle ANY_PHY_PORTS/ ANY_GUEST_PORTS ' relationship
>> with REPRESENTED_PORT if conflicts?
>>> Need future tuning.
>>
>> And if you carry on with "vf_orig" / "wire_orig" approach, you will inevitably
>> have the very same problem: possible conflict with items like
>> REPRESENTED_PORT. So does it matter? Yes, checks need to be done by PMDs
>> when parsing patterns.
>>
>>> Like I said before,  offloaded rules can't cross different NIC vendor'
>> "switch_domain_id".
>>> If user probes multiple NICs in one application, application should take care
>> of packet forwarding.
>>> Also application should be aware which ports belong to which NICs.
>>
>> Yes, perhaps, domain ID is not needed in the new items.
>> But the application still must keep track of switch domain IDs itself so it knows
>> which rules to manage via which ethdevs.
>>
>> Any other opinions?
> ANY_PHY_PORTS/ ANY_GUEST_PORTS looks like a super set of ports.

So does the new attribute, doesn't it?

> This will come another challenge: "why can't we use REPRESENTED_PORT  with mask" or "combine several REPRESENTED_PORT together"?

This problem has been here for many other items, including now deprecated
items PF, VF and PHY_PORT. Yes, theoretically, when the PMD looks through
the pattern, it has to check that its items do not overlap / contradict.
That's kind of OK, isn't it? The PMD has to check things after all...

For example, no one prevents user from submitting a pattern
with several adjacent items ETH in it. The PMD is supposed
to turn such request down.

>>
>>>>>
>>>>>>>
>>>>>>>> For example, the diff below adds the attributes to "table"
>>>>>>>> commands in testpmd but does not add them to regular (non-table)
>>>>>>>> commands like "flow create". Why?
>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> "table" command limits pattern_template to single direction or
>>>>>>> bidirection
>>>>>> per user specified attribute.
>>>>>>
>>>>>> As I say above, the same effect can be achieved by adding item
>>>>>> REPRESENTED_PORT to the corresponding pattern template.
>>>>> See above.
>>>>>>
>>>>>>> "rule" command must tight with one "table_id", so the rule will
>>>>>>> inherit the
>>>>>> "table" direction property, no need to specify again.
>>>>>>
>>>>>> You migh've misunderstood. I do not talk about "rule" command
>>>>>> coupled with some "table". What I talk about is regular, NON-async
>>>>>> flow insertion commands.
>>>>>>
>>>>>> Please take a look at section "/* Validate/create attributes. */"
>>>>>> in file "app/test-pmd/cmdline_flow.c". When one adds a new flow
>>>>>> attribute, they should reflect it the same way as VC_INGRESS,
>>>> VC_TRANSFER, etc.
>>>>>>
>>>>>> That's it.
>>>>> We don't intend to pass this to sync API. The above code example is
>>>>> for sync
>>>> API.
>>>>
>>>> So I understand. But there's one slight problem: in your patch, you
>>>> add the new attributes to the structure which is *shared* between
>>>> sync and async use case scenarios. If one adds an attribute to this
>>>> structure, they have to provide accessors for it in all sync-related
>>>> commands in testpmd, but your patch does not do that.
>>>>
>>> Like the title said, "creating transfer table" is the ASYNC operation.
>>> We have limited the scope of this patch. Sync API will be another story.
>>> Maybe we can add one more sentence to emphasize async API again.
>>
>> No-no-no. There might be slight misunderstanding. I understand that you are
>> limiting the scope of your patch by saying this and this.
>> That's OK. What I'm trying to point out is the fact that your patch nevertheless
>> touches the COMMON part of the flow API which is shared between two
>> approaches (sync and async).
> Yeah, you are right, we should emphasize it for async API not sync in the code and comments.
>>
>> Imagine a reader that does not know anything about the async approach.
>> He just opens the file in vim and goes directly to struct rte_flow_attr.
>> And, over there, he sees the new attribute "wire_orig". He then immediately
>> assumes that these attributes can be used in testpmd. Now the reader opens
>> testpmd and tries to insert a flow rule using the sync approach:
>>
>> flow create priority 0 transfer vf_orig pattern / ... / end actions drop
>>
>
> This is wrong statement.
> If user has no idea with cmdline usage, he should rely on "tab indication' not something by guessing.
>
> The command prefix "flow" bifurcated now to sync and async now, user may use any keyword combinations.
> He will get "argument error" if it's not good unless he knows what' he is doing.
> Again:  we should emphasize it's only for async API only.

OK, even if this example is not good enough, I still believe that
it is not right to introduce new match criteria in the form of
rule attributes. Match criteria belong in the pattern.

>
>> And doing so will be a failure, because your patch does not add the new
>> attribute keyword to sync flow rule syntax parser. That's it.
>>
>> Once again, I should ephasize: the reader MAY know nothing about the async
>> approach. But if the attribute is present in "struct rte_flow_attr", it
>> immediately means that it is available everywhere. Both sync and async.
>>
>> So, with this in mind, your attempt to limit the scope of the patch to async-only
>> rules looks a little bit artificial. It's not correct from the *formal* standpoint.
>>
>>>
>>>> In other words, it is wrong to assume that "struct rte_flow_attr"
>>>> only applies to async approach. It had been introduced long before
>>>> the async flow design was added to DPDK. That's it.
>>>>
>>>>>>
>>>>>> But, as I say, I still believe that the new attributes aren't needed.
>>>>> I think we are not at the same page for now. Can we reach agreement
>>>>> on the same matching criteria first?
>>>>>>>
>>>>>>>>> It helps to save underlayer memory also on insertion rate.
>>>>>>>>
>>>>>>>> Which memory? Host memory? NIC memory? Term "underlayer" is
>>>> vague.
>>>>>>>> I suggest that the commit message be revised to first explain how
>>>>>>>> such memory is spent currently, then explain why this is not
>>>>>>>> optimal and, finally, which way the patch is supposed to improve
>>>>>>>> that. I.e. be more
>>>>>> specific.
>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> For large scalable rules, HW (depends on implementation) always
>>>>>>> needs
>>>>>> memory to hold the rules' patterns and actions, either from NIC or
>>>>>> from
>>>> host.
>>>>>>> The memory footprint highly depends on "user rules' complexity",
>>>>>>> also diff
>>>>>> between NICs.
>>>>>>> ~50% memory saving is expected if one-direction is cut.
>>>>>>
>>>>>> Regardless of this talk, this explanation should probably be
>>>>>> present in the commit description.
>>>>>>
>>>>> This number may differ with different NICs or implementation. We
>>>>> can't say
>>>> it for sure.
>>>>
>>>> Not an exact number, of course, but a brief explanation of:
>>>> a) what is wrong / not optimal in the current design;
>>> Please check the commit log, transfer have the capability to match bi-
>> direction traffic no matter what ports.
>>>> b) how it is observed in customer deployments;
>>> Customer have the requirements to save resources and their offloaded rules
>> is direction aware.
>>>> c) why the proposed patch is a good solution.
>>> New attributes provide the way to remove one direction and save underlayer
>> resource.
>>> All of the above can be found in the commit log.
>>
>> I understand all of that, but my point is, the existing commit message is way
>> too brief. Yes, it mentions that SOME customers have SOME deployments, but
>> it does not shed light on which specifics these deployments have. For example,
>> back in the day, when items PORT_REPRESENTOR and REPRESENTED_PORT
>> were added, the cover letter for that patch series provided details of
>> deployment specifics (application: OvS, scenario: full offload rules).
>>
>> So, it's always better to expand on such specifics so that the reader has full
>> picture in their head and doesn't need to look elsewhere.
>> Not all readers of the commit message will be happy to delve into our
>> discussions on the mailing list to get the gist.
>>
> It' approach diverse. Pattern item approach will attract another discussion thread, right?

As I said, match criteria belong in flow pattern. I recognise the
importance of the problem that you're looking to solve. It's very
good that you care to address it, but what this patch tries to do
is to add more match criteria in the form of new attributes with
rather questionable names... There's a room for improvement.

When I say that new features should not confuse readers, I mean
a very basic thing: readers know that match criteria all sit
in the pattern. And they refer to the pattern item enum in
the code and in documentation to learn about criteria,
while "struct rte_flow_attr" is an unusual place from
which to learn about match criteria.

> We should get a conclusion and reflect in the commit changes&logs, and it's easy for others to absorb.

Yes, but before we get to that, perhaps it pays to hear
more feedback from other reviewers. Thomas? Ori? Andrew?

>>>
>>>>
>>>
>>>>>>>
>>>>>>>>> By default, the transfer domain is bi-direction, and no behavior
>> changes.
>>>>>>>>>
>>>>>>>>> 1. Match wire origin only
>>>>>>>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
>>>>>>>>> 2. Match vf origin only
>>>>>>>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
>>>>>>>>>
>>>>>>>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
>>>>>>>>> ---
>>>>>>>>> app/test-pmd/cmdline_flow.c                 | 26
>> +++++++++++++++++++++
>>>>>>>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
>>>>>>>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
>>>>>>>>> 3 files changed, 36 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/app/test-pmd/cmdline_flow.c
>>>>>>>>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82
>>>>>>>>> 100644
>>>>>>>>> --- a/app/test-pmd/cmdline_flow.c
>>>>>>>>> +++ b/app/test-pmd/cmdline_flow.c
>>>>>>>>> @@ -177,6 +177,8 @@ enum index {
>>>>>>>>>       TABLE_INGRESS,
>>>>>>>>>       TABLE_EGRESS,
>>>>>>>>>       TABLE_TRANSFER,
>>>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] =
>> {
>>>>>>>>>       TABLE_INGRESS,
>>>>>>>>>       TABLE_EGRESS,
>>>>>>>>>       TABLE_TRANSFER,
>>>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
>>>>>>>>>               .next = NEXT(next_table_attr),
>>>>>>>>>               .call = parse_table,
>>>>>>>>>       },
>>>>>>>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
>>>>>>>>> +             .name = "wire_orig",
>>>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>>>
>>>>>>>> This does not explain the "wire" aspect. It's too broad.
>>>>>>>>
>>>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>>>> +             .call = parse_table,
>>>>>>>>> +     },
>>>>>>>>> +     [TABLE_TRANSFER_VF_ORIG] = {
>>>>>>>>> +             .name = "vf_orig",
>>>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>>>
>>>>>>>> This explanation simply duplicates such of the "wire_orig".
>>>>>>>> It does not explain the "vf" part. Should be more specific.
>>>>>>>>
>>>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>>>> +             .call = parse_table,
>>>>>>>>> +     },
>>>>>>>>>       [TABLE_RULES_NUMBER] = {
>>>>>>>>>               .name = "rules_number",
>>>>>>>>>               .help = "number of rules in table", @@ -8894,6
>>>>>>>>> +8910,16 @@ parse_table(struct context *ctx, const struct token
>>>>>>>>> +*token,
>>>>>>>>>       case TABLE_TRANSFER:
>>>>>>>>>               out->args.table.attr.flow_attr.transfer = 1;
>>>>>>>>>               return len;
>>>>>>>>> +     case TABLE_TRANSFER_WIRE_ORIG:
>>>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>>>> +                     return -1;
>>>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
>>>>>>>>> +             return len;
>>>>>>>>> +     case TABLE_TRANSFER_VF_ORIG:
>>>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>>>> +                     return -1;
>>>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
>>>>>>>>> +             return len;
>>>>>>>>>       default:
>>>>>>>>>               return -1;
>>>>>>>>>       }
>>>>>>>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> index 330e34427d..603b7988dd 100644
>>>>>>>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> @@ -3332,7 +3332,8 @@ It is bound to
>>>>>>>> ``rte_flow_template_table_create()``::
>>>>>>>>>
>>>>>>>>>   flow template_table {port_id} create
>>>>>>>>>       [table_id {id}] [group {group_id}]
>>>>>>>>> -       [priority {level}] [ingress] [egress] [transfer]
>>>>>>>>> +       [priority {level}] [ingress] [egress]
>>>>>>>>> +       [transfer [vf_orig] [wire_orig]]
>>>>>>>>
>>>>>>>> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
>>>>>>>> [wire_orig] ?
>>>>>>>>
>>>>>>>>>       rules_number {number}
>>>>>>>>>       pattern_template {pattern_template_id}
>>>>>>>>>       actions_template {actions_template_id} diff --git
>>>>>>>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>>>>>>>>> a79f1e7ef0..512b08d817 100644
>>>>>>>>> --- a/lib/ethdev/rte_flow.h
>>>>>>>>> +++ b/lib/ethdev/rte_flow.h
>>>>>>>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
>>>>>>>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
>>>>>>>>>        */
>>>>>>>>>       uint32_t transfer:1;
>>>>>>>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
>>>>>>>>> +     /**
>>>>>>>>> +      * 0 means bidirection,
>>>>>>>>> +      * 0x1 origin uplink,
>>>>>>>>
>>>>>>>> What does "uplink" mean? It's too vague. Hardly a good term.
>>>>
>>>> I believe this comment should be reworked, in case the idea of having
>>>> an extra attribute persists.
>>>>
>>>>>>>>
>>>>>>>>> +      * 0x2 origin vport,
>>>>>>>>
>>>>>>>> What does "origin vport" mean? Hardly a good term as well.
>>>>
>>>> I still believe this explanation is way too brief and needs to be
>>>> reworked to provide more details, to define the use case for the attribute
>> more specifically.
>>>>
>>>>>>>>
>>>>>>>>> +      * N/A both set.
>>>>>>>>
>>>>>>>> What's this?
>>>>
>>>> The question stands.
>>>>
>>>>>>>>
>>>>>>>>> +      */
>>>>>>>>> +     uint32_t transfer_mode:2;
>>>>>>>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>> --
>>>>>>>>> 2.27.0
>>>>>>>>>
>>>>>>>>
>>>>>>>> Since the attributes are added to generic 'struct rte_flow_attr',
>>>>>>>> non-table
>>>>>>>> (synchronous) flow rules are supposed to support them, too. If
>>>>>>>> that is indeed the case, then I'm afraid such proposal does not
>>>>>>>> agree with the existing items PORT_REPRESENTOR and
>> REPRESENTED_PORT.
>>>> They
>>>>>>>> do exactly the same thing, but they are designed to be way more
>>>>>>>> generic. Why
>>>>>> not use them?
>>>>>>
>>>>>> The question stands.
>>>>>>
>>>>>>>>
>>>>>>>> Ivan
>>>>>>>
>>>>>>
>>>>>> Ivan
>>>>>
>>>
>>
>> Thank you.
>

Thanks,
Ivan
  
Thomas Monjalon Sept. 15, 2022, 8:18 a.m. UTC | #12
15/09/2022 09:47, Ivan Malov:
> As I said, match criteria belong in flow pattern. I recognise the
> importance of the problem that you're looking to solve. It's very
> good that you care to address it, but what this patch tries to do
> is to add more match criteria in the form of new attributes with
> rather questionable names... There's a room for improvement.
> 
> When I say that new features should not confuse readers, I mean
> a very basic thing: readers know that match criteria all sit
> in the pattern. And they refer to the pattern item enum in
> the code and in documentation to learn about criteria,
> while "struct rte_flow_attr" is an unusual place from
> which to learn about match criteria.
> 
> > We should get a conclusion and reflect in the commit changes&logs, and it's easy for others to absorb.
> 
> Yes, but before we get to that, perhaps it pays to hear
> more feedback from other reviewers. Thomas? Ori? Andrew?

Sorry I did not read all.
I think the main question is about the use of attributes.
I refer to this commit of Ivan last year which was agreed:

    ethdev: deprecate direction attributes in transfer flows
    
    Attributes "ingress" and "egress" can only apply unambiguosly
    to non-"transfer" flows. In "transfer" flows, the standpoint
    is effectively shifted to the embedded switch. There can be
    many different endpoints connected to the switch, so the
    use of "ingress" / "egress" does not shed light on which
    endpoints precisely can be considered as traffic sources.
    
    Add relevant deprecation notices and suggest the use of precise
    traffic source items (PORT_REPRESENTOR and REPRESENTED_PORT).
    
    Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
    Acked-by: Ori Kam <orika@nvidia.com>
    Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
    Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

So +1 for using only pattern items as matching criteria.
  
Rongwei Liu Sept. 15, 2022, 8:48 a.m. UTC | #13
HI Ivan:

BR
Rongwei

-----Original Message-----
From: Ivan Malov <ivan.malov@oktetlabs.ru> 
Sent: Thursday, September 15, 2022 15:47
To: Rongwei Liu <rongweil@nvidia.com>
Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer table

External email: Use caution opening links or attachments


Hi Rongwei,

On Thu, 15 Sep 2022, Rongwei Liu wrote:

> HI Ivan:
>
> BR
> Rongwei
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Wednesday, September 14, 2022 23:18
>> To: Rongwei Liu <rongweil@nvidia.com>
>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi Rongwei,
>>
>> On Wed, 14 Sep 2022, Rongwei Liu wrote:
>>
>>> HI
>>>
>>> BR
>>> Rongwei
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>> Sent: Wednesday, September 14, 2022 15:32
>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the
>>>> transfer table
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On Wed, 14 Sep 2022, Rongwei Liu wrote:
>>>>
>>>>> HI
>>>>>
>>>>> BR
>>>>> Rongwei
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>> Sent: Tuesday, September 13, 2022 22:33
>>>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>>>> Subject: RE: [PATCH v1] ethdev: add direction info when creating
>>>>>> the transfer table
>>>>>>
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> Hi Rongwei,
>>>>>>
>>>>>> PSB
>>>>>>
>>>>>> On Tue, 13 Sep 2022, Rongwei Liu wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> BR
>>>>>>> Rongwei
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>>>> Sent: Tuesday, September 13, 2022 00:57
>>>>>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>>>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>>>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
>>>>>>>> NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
>>>>>>>> Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang
>>>>>>>> <yuying.zhang@intel.com>; Andrew Rybchenko
>>>>>>>> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
>>>>>>>> <rasland@nvidia.com>
>>>>>>>> Subject: Re: [PATCH v1] ethdev: add direction info when creating
>>>>>>>> the transfer table
>>>>>>>>
>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
>>>>>>>>
>>>>>>>>> The transfer domain rule is able to match traffic wire/vf origin
>>>>>>>>> and it means two directions' underlayer resource.
>>>>>>>>
>>>>>>>> The point of fact is that matching traffic coming from some
>>>>>>>> entity like wire / VF has been long generalised in the form of
>> representors.
>>>>>>>> So, a flow rule with attribute "transfer" is able to match
>>>>>>>> traffic coming from either a REPRESENTED_PORT or from a
>>>> PORT_REPRESENTOR
>>>>>> (please find these items).
>>>>>>>>
>>>>>>>>>
>>>>>>>>> In customer deployments, they usually match only one direction
>>>>>>>>> traffic in single flow table: either from wire or from vf.
>>>>>>>>
>>>>>>>> Which customer deployments? Could you please provide detailed
>>>> examples?
>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> We saw a lot of customers' deployment like:
>>>>>>> 1. Match overlay traffic from wire and do decap, then send to
>>>>>>> specific
>>>> vport.
>>>>>>> 2. Match specific 5-tuples and do encap, then send to wire.
>>>>>>> The matching criteria has obvious direction preference.
>>>>>>
>>>>>> Thank you. My questions are as follows:
>>>>>>
>>>>>> In (1), when you say "from wire", do you mean the need to match
>>>>>> packets arriving via whatever physical ports rather then matching
>>>>>> packets arriving from some specific phys. port?
>>>>
>>>> ^^
>>>>
>>>> Could you please find my question above? Based on your understanding
>>>> of templates in async flow approach, an answer to this question may
>>>> help us find the common ground.
>>> It means traffic arrived from physical ports (transfer_proxy role) or south
>> band per you concept.
>>
>> Transfer proxy has nothing to do with physical ports. And I should stress out
>> that "south band" and the likes are NOT my concepts. Instead, I think that
>> direction designations like "south" or "north" aren't applicable when talking
>> about the embedded switch and its flow (transfer) rules.
>>
>>> Traffic from vport (not transfer_proxy) or north band per your concept won't
>> hit even if same packets.
>>
>> Please see above. Transfer proxy is a completely different concept.
>> And I never used "north band" concept.
>>
>>>>
>>>> --
>>>>
>>>>>>
>>>>>> If, however, matching traffic "from wire" in fact means matching
>>>>>> packets arriving from a *specific* physical port, then for sure
>>>>>> item REPRESENTED_PORT should perfectly do the job, and the proposed
>>>>>> attribute is unneeded.
>>>>>>
>>>>>> (BTW, in DPDK, it is customary to use term "physical port", not
>>>>>> "wire")
>>>>>>
>>>>>> In (1), what are "vport"s? Please explain. Once again, I should
>>>>>> remind that, in DPDK, folks prefer terms "represented entity" /
>>>> "representor"
>>>>>> over vendor-specific terms like "vport", etc.
>>>>>>
>>>>> Vport is virtual port for short such as VF.
>>>>
>>>> Thanks. As I say, term "vport" might be confusing to some readers, so
>>>> it'd be better to provide this explanation (about VF) in the commit
>>>> description next time.
>>> Ack. Will add VF as an example.
>>>>
>>>>>> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
>>>>>> Could you please explain, why not just add a match item
>>>>>> REPRESENTED_PORT pointing to that VF via its representor? Doing so
>>>>>> should perfectly define the exact direction / traffic source. Isn't
>>>>>> that
>>>> sufficient?
>>>>>>
>>>>> Per my view, there is matching field and matching value difference.
>>>>> Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as
>>>>> same or
>>>> different matching criteria?
>>>>> I would like to call them same since it can be summarized like
>>>>> 1.1.1.0/30 REPRESENTED_PORT is just another matching item, no
>>>>> essential
>>>> differences and it can't stand for direction info.
>>>>
>>>> It looks like we're starting to run into disagreement here.
>>>> There's no "direction" at all. There's an embedded switch inside the
>>>> NIC, and there're (logical) switch ports that packets enter the switch from.
>>>>
>>>> When the user submits a "transfer" rule and does not provide neither
>>>> REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern, the
>> embedded
>>>> switch is supposed to match packets coming from ANY ports, be it VFs
>>>> or physical (wire) ports.
>>>>
>>>> But when the user provides, in example, item REPRESENTED_PORT to
>>>> point to the physical (wire) port, the embedded switch knows exactly
>>>> which port the packets should enter it from.
>>>> In this case, it is supposed to match only packets coming from that
>>>> physical port. And this should be sufficient.
>>>> This in fact replaces the need to know a "direction".
>>>> It's just an exact specification of packet's origin.
>>>>
>>> There is traffic arriving or leaving the switch, so there is always direction,
>> implicit or explicit.
>>
>> This does not contradict my thoughts above. "Direction" is *defined* by two
>> points (like in geometry): an initial point (the switch port through which a
>> packet enters the switch) and the terminal point (the match engine inside the
>> switch). If one knows these two points, no extra hints are required to specify
>> some "direction". Because direction is already represented by this "vector" of
>> sorts. That's why presence of the port match item in the pattern is absolutely
>> sufficient.
> Good to see this. Thank for the information.

You're very welcome.

> This update leverages the concept exactly defined by you: "an initial point (the switch port through which a
> packet enters the switch)"

No, it doesn't seem so. Based on your explanations, it appears that
this update tries to refer to a "super set" of ports which have
something in common. For example, with attribute "wire_orig"
you seem to be trying to request that the rule match packets
arriving from wire through ANY of the phys.ports. So my point
is: why express an obvious match item as an attrbiute?

Let me explain more based on your point and sentences:
"Direction" is *defined* by two points (like in geometry): an initial point (the switch port through which a
packet enters the switch) and the terminal point (the match engine inside the switch).

Wire_orig: an initial port is from uplink or internet wire, terminal port is switch (physical or logic), switch will handle the packets eventually 
Vport_orig: an initial port is from virtual terminal port is switch (physical or logic)
Looks they match perfectly.

I think you have some misunderstanding on matching item, it should contain matching filed(rte_item->mask), matching value(rte_item->spec)
What you proposed "ANY_GUEST_PORT/ANY_PHY_PORT" seemed to mix both together.
ANY_GUEST_PORT: port belongs to specific switch domain and it's virtual
ANY_PHY_PORT: port belongs to specific switch domain and it's physical.


For example, nobody tries to replace match item IPv4 with
an attribute "is_ipv4". That would be strange, to say the
least. Why should the "vf_orig" case be an exception then?

It's a good point. There is already "port_id/represented_port", why do you want to add "IS_***_PORTS" matching item?
Like IPv4, it matches rx_only, tx_only, rx_tx for INGRESS EGRESS TRANSFER domain, eventually it will follow domain principle.
Matching item should be generic. It stands for what the users care and what they want. 

"vf_orig"/"wire_orig" is resource sensitive and beyond matching items and matching item should follow  it always.
By using this, there is no possibility to match the cut-off path. It' very advance feature


> If you think direction not good, we can change to other words like "initial port"/"origin port" etc.

As I explained multiple times, "direction" is rather obscure from the
viewpoint located inside the embedded switch. Yes, on non-transfer (VNIC)
level, there are *exactly* two directions: ingress and egress.
But, inside of the embedded switch (transfer rules), there can
be *multiple* various "directions", which are not even
directions, = they're traffic PATHs in fact.

Renaming to "intitial port" and "origin port" won't be helpful either
because, for users, it will be hard to figure out the difference
between the attribute and items PORT_REPRESENTOR / REPRESENTED_PORT.

If, however, you add new items instead of the attribute, the user
will likely see that the new items and the existing ones are
just alternative options = representor-based items help
to address exact ports (one rule - one port), whilst
your new items help to address super sets of ports
like "all wire ports" or "all guest ports".

You forgot rte_item->mask here. 

So, the short of it:
1) these "wire_orig" / "vf_orig" are in fact yet another match criteria;
2) because of that, they should go to match items and not to attributes.

>>
>> However, based on your later explanations, the use of precise port item is
>> simply inconvenient in your use case because you are trying to match traffic
>> from *multiple* ports that have something in common (i.e. all VFs or all wire
>> ports).
>>
>> And, instead of adding a new item type which would serve exactly your needs,
>> you for some reason try to add an attribute, which has multiple drawbacks
>> which I described in my previous letter.
>>
>>> For transfer rules, there is a concept transfer_proxy.
>>> It takes the switch ownership; all switch rules should be configured via
>> transfer_proxy.
>>
>> Yes, such concept exists, but it's a don't care with regard to the problem that
>> we're discussing, sorry.
>> Furthermore, unlike "switch domain ID" (which is the same for all ethdevs
>> belonging to a given physical NIC board), nobody guarantees that it's only one
>> transfer proxy port. Some NIC vendors allows transfer rules to be added via
>> any ethdev port.
>>
> Does any flow rule leverage switchid already. Is it too obscure for end-user?

No, I'm not saying about flow rules. I'm explaining the logic which
application may use to identify which ethdevs are on which NICs.

Imagine a DPDK application which has two ethdevs instantiated:
one ethdev sits on top of the admin. PF (ethdev 0), the other
one sits on top of a low-privilege PF (ethdev 1).
In the latter case, it can also be a VF.

Both ethdev 0 and ethdev 1 belong to the same physical NIC board.

Now, what I'm trying to explain is the fact that "proxy"
behaviour may differ between various vendors:

- some vendors say that they can support managing "transfer" rules via
   any PFs / VFs. They do not require that some specific PF ethdev be
   used to do that. With such vendors, if the application makes a
   query "What's the proxy port ID for the ethdev 1?", it will
   get "The proxy port ID for ethdev 1 is 1" response.

- but other vendors cannot support the above workflow and they require
   that "transfer" rules be managed using some specific (admin) ethdev.
   If the application makes the same query here, it will get the
   following response: "The proxy port ID for ethdev 1 is 0".

So, given these explanations, it is incorrect to assume that
the proxy port ID for all ethdevs belonging to the same NIC
board will be the same. They simply may not be like this.

However, *regardless* of the two above scenarious and regardless
of vendor, for NICs which have embedded switch feature, when the
user tries to check the "switch domain ID" for ethdev 0 and
ethdev 1, they will get the same value. So, this should be
the right criterion for the application (not for flow
rules themselves) to decide which ethdev belongs to
which physical NIC board.

Why you said user is good to check switch domain id and know port belongings.
But not good to know basic dpdk rte_flow api usage?
There are too many assumptions.

Using VF as example, they are different from beginnings, see sriov commands:

echo $num > /sysfs/ .... /PF_BDF/sriov_num
echo VF_BDF > /sysfs/.../bind or unbind

>>>
>>> Image a logic switch with one PF and two VFs.
>>> PF is the transfer proxy and VF belongs to the PF logically.
>>> When receiving traffic from PF, we can say it comes into the logic switch.
>>
>> That's correct.
>>
>>> When packet sent from VF (VF belongs to PF), so we can say traffic leaves
>> the switch.
>>
>> That's not correct. Traffic sent from VF (for example, a guest VM is sending
>> packets) also *enters* the switch. PFs and VFs are in fact *separate* logical
>> ports of the embedded switch.
>>
>>>
>>> Item REPRESENTED_PORT indicates switch to match traffic sent from which
>> port, comes into, or leave switch.
>>
>> That is not correct either. Item REPRESENTED_PORT tells the switch to match
>> packets which come into the switch FROM the logical port which is
>> represented by the given DPDK ethdev.
>>
>> For example, if ethdev="E" is the *main* PF which is bound to physical port "P",
>> then item REPRESENTED_PORT with ethdev ID being set to "E" tells the switch
>> that only packet coming to NIC from *wire* via physical port "E" should match.
>>
>>> We can say it as one kind of packet metadata.
>>
>> Kind of yes, but might be vendor-specific. No need to delve into this.
>>
>>> Like you said, DPDK always treat transfer to match any PORTs traffic.
>>
>> Slight correction: it treats it this way until it sees an exact port item.
>> If the user provides REPRESENTED_PORT (or PORT_REPRESENTOR), it's no
>> longer *any* ports traffic, it's an exact port traffic. That's it.
>>
>>> When REPRESENTED_PORT is specified, the rules are limited to some
>> dedicated PORTs.
>>
>> These rules match only packets arriving TO the embedded switch FROM the
>> said dedicated ports.
>>
>>> Other PORTs are ignored because metadata mismatching.
>>
>> Kind of yes, correct.
>>
>>> Rules still have the capability to match ANY PORTS if metadata matched.
>>
>> This statement is only correct for the cases when the user does NOT use
>> neither item REPRESENTED_PORT nor item PORT_REPRESENTOR.
>>
>>>
>>> This update will allow user to cut the other PORTs matching capabilities.
>>
>> As I explained, this is exactly what items PORT_REPRESENTOR and
>> REPRESENTED_PORT do. No need to have an extra attribute.
>>
>> If the user adds item REPRESENTED_PORT with ethdev_id="E", like in the
>> above example, to match packets entering NIC via the physical port "P", then
>> this rule will NOT match packets entering NIC from other points. For example,
>> packets transmitted by a virtual machine via a VF will not match in this case.
>>
>>>>> Port id depends on the attach sequence.
>>>>
>>>> Unfortunately, this is hardly a good argument because flow rules are
>>>> supposed to be inserted based on the run-time packet learning. Attach
>>>> sequence is a don't care here.
>>>>
>>>>>> Also please mind that, although I appreciate your explanations
>>>>>> here, on the mailing list, they should finally be added to the
>>>>>> commit message, so that readers do not have to look for them elsewhere.
>>>>>>
>>>>> We have explained the high possibility of single-direction matching, right?
>>>>
>>>> Not quite. As I said, it is not correct to assume any "direction",
>>>> like in geographical sense ("north", "south", etc.). Application has
>>>> ethdevs, and they are representors of some "virtual ports" (in your
>>>> terminology) belonging to the switch, for example, VFs, SFs or physical
>> ports.
>>>>
>>>> The user adds an appropriate item to the pattern (REPRESENTED_PORT),
>>>> and doing so specifies the packet path which it enters the switch.
>>>>
>>>>> It' hard to list all the possibilities of traffic matching preferences.
>>>>
>>>> And let's say more: one need never do this. That's exactly the reason
>>>> why DPDK has abandoned the concept of "direction" in *transfer* rules
>>>> and switched to the use of precise criteria (REPRESENTED_PORT, etc.).
>>>>
>>> As far as I know, DPDK changes "transfer ingress" to "transfer", so it' more
>> clear that transfer can match both directions (both ingress and egress).
>>
>> Not quite. DPDK has abandoned the use of "ingress / egress" in "transfer"
>> rules because "ingress" and "egress" are only applicable on the VNIC level. For
>> example, there is a PF attached to DPDK application:
>> packets that the application receives through this ethdev, are ingress, and
>> packets that it transmits (tx_burst) are egress.
>>
>> I can explain in other words. Imagine yourself standing *inside* a room which
>> only has one door. When someone enters the room, it's "ingress", when
>> someone leaves, it's "egress". It's relative to your viewpoint.
>> In this example, such a room represents a VNIC / ethdev.
>>
>> And now imagine yourself standing *outside* of another room / auditorium
>> which has multiple doors / exits. You're standing near some particular exit "A"
>> (VNIC / ethdev), but people may enter this room via another door "B" and then
>> leave it via yet another door "C". In this case, from your viewpoint, this traffic
>> cannot be considered neither ingress nor egress. Because these people do not
>> approach you.
>>
>> Like in this example, embedded switch is like a large auditorium with many-
>> many doors / exits. And there can be many-many
>> directions: packet can enter the switch via phys. port "P1"
>> and then leave it via another phys. port "P2". Or it can enter the switch via
>> phys. port and the leave it via VF's logical port (to be delivered to a guest
>> machine), or a packet can travel from one VF to another one.
>>
>> There's no PRE-DEFINED direction like "north to south" or "east to west".
>> And this explains why it's very undesirable to use term "direction".
>>
>>> REPRESENTED_PORT is the evolution of "port_id", I think, it' only one kind of
>> matching items.
>>
>> Yes. But nobody prevents you from defining yet another match item which will
>> be able to refer to a *group* of ports which have something in common (i.e.
>> "all guest ports of this switch"
>> pointing to all logical ports currently attached to virtual machines / guests, or
>> "all wire ports of this swtich").
>>
>>>
>>> For large scale deployment like 10M rules, if we can save resources
>> significantly by introducing direction, why not?
>>
>> I do not deny the fact that you have a use case where resources can be saved
>> significantly if you give the PMD some extra knowledge when creating a flow
>> table / pattern template. That's totally OK. What I object is the very
>> implementation and the use of term "direction". If you add new item types
>> (like above), then, when you create an async table 1 pattern template, you will
>> have item ANY_WIRE_PORTS, and, for table 2 pattern template, you'll have
>> item ANY_GUEST_PORTS.
>> As you see, the two pattern templates now differ because the match criteria
>> use different items.
>>
>>>
>>> Again, async API:
>>> 1. pattern template A
>>> 2. action template B
>>> 3. table C with pattern template A + action template B.
>>> 4. rule D, E, F...
>>> The specified REPRESENTED_PORT is provided in rules (D, E, F...) not pattern
>> template A or action template B or table C.
>>> Resources may be allocated early at step 3 since table' rule_nums property.
>>
>> No, item REPRESENTED_PORT *can* be provided inside pattern template A,
>> but, as you pointed out earlier, the problem is that you can't distinguish
>> different pattern templates which have this item, because pattern templates
>> know nothing about *exact* port IDs and only know item MASKS. Yes, I agree
>> that in your case such problem exists, but, as I say above, it can be solved by
>> adding new item types: one for referring to all phys. ports of a given NIC and
>> another one for pointing to a group of current guest users (VFs).
>>
>>>>> The underlay is the one we have met for now.
>>>>>>>
>>>>>>>>> Introduce one new member transfer_mode into rte_flow_attr to
>>>>>>>>> indicate the flow table direction property: from wire, from vf
>>>>>>>>> or bi-direction(default).
>>>>>>>>
>>>>>>>> AFAIK, 'rte_flow_attr' serves both traditional flow rule
>>>>>>>> insertion and asynchronous (table) approach. The patch adds the
>>>>>>>> attributes to generic 'rte_flow_attr' but, for some reason, ignores non-
>> table rules.
>>>>>>>>
>>>>>>>>>
>>>>>>> Sync API uses one rule to contain everything. It' hard for PMD to
>>>>>>> determine
>>>>>> if this rule has direction preference or not.
>>>>>>> Image a situation, just for an example:
>>>>>>> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
>>>>>>> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
>>>>>>> 1 and 2 share the same matching conditions (eth / ipv4 / udp /
>>>>>>> vxlan /...), so
>>>>>> sync API consider them share matching determination logic.
>>>>>>> It means "2" have 1M scale capability too. Obviously, it wastes a
>>>>>>> lot of
>>>>>> resources.
>>>>>>
>>>>>> Strictly speaking, they do not share the same match pattern.
>>>>>> Your example clearly shows that, in (1), the pattern should request
>>>>>> packets coming from "vport 1" and, in (2), packets coming from "vport 0".
>>>>>>
>>>>>> My point is simple: the "vport" from which packets enter the
>>>>>> embedded switch is ALSO a match criterion. If you accept this,
>>>>>> you'll see: the matching conditions differ.
>>>>>>
>>>>> See above.
>>>>> In this case, I think the matching fields are both "port_id +
>>>>> ipv4_vxlan". They
>>>> are same.
>>>>> Only differs with values like vni 100 or 200 vice versa.
>>>>
>>>> Not quite. Look closer: you use *different* port IDs for (1) and (2).
>>>> The value of "ethdev_id" field in item REPRESENTED_PORT differs.
>>>>
>>>>>>>
>>>>>>> In async API, there is pattern_template introduced. We can mark "1"
>>>>>>> to use
>>>>>> pattern_tempate id 1 and "2" to use pattern_template 2.
>>>>>>> They will be separated from each other, don't share anymore.
>>>>>>
>>>>>> Consider an example. "Wire" is a physical port represented by PF0
>>>>>> which, in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is
>>>>>> attached to guest and is represented by a representor ethdev 1 in DPDK.
>>>>>>
>>>>>> So, some rules (template 1) are needed to deliver packets from "wire"
>>>>>> to "VF" and also decapsulate them. And some rules (template 2) are
>>>>>> needed to deliver packets in the opposite direction, from "VF"
>>>>>> to "wire" and also encapsulate them.
>>>>>>
>>>>>> My question is, what prevents you from adding match item
>>>>>> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
>>>>>> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
>>>>>>
>>>>>> As I said previously, if you insert such item before eth / ipv4 /
>>>>>> etc to your match pattern, doing so defines an *exact* direction / source.
>>>>>>
>>>>> Could you check the async API guidance? I think pattern template
>>>>> focusing
>>>> on the matching field (mask).
>>>>> "REPRESENTED_PORT[ethdev_id=0] " and
>>>> "REPRESENTED_PORT[ethdev_id=1] "are the same.
>>>>> 1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
>>>>> 2. action template: action1 / actions2. / 3. table create with
>>>>> pattern_template plus action template..
>>>>> REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create
>>>> REPRESENTED_PORT port_id is 0 / actions ....
>>>>> REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create
>>>> REPRESENTED_PORT port_id is 1 / actions ....
>>>>
>>>> OK, so, based on this explanation, it appears that you might be
>>>> looking to refer
>>>> to:
>>>> a) a *set* of any physical (wire) ports
>>>> b) a *set* of any guest ports (VFs)
>>>>
>>> Great, looks we are more and more closer to the agreement.
>>
>> Looks so.
>>
>>>> You chose to achieve this using an attribute, but:
>>>>
>>>> 1) as I explained above, the use of term "direction" is wrong;
>>>>     please hear me out: I'm not saying that your use case and
>>>>     your optimisation is wrong: I'm saying that naming for it
>>>>     is wrong: it has nothing to do with "direction";
>>>>
>>> Do you have any better naming proposal?
>>
>> As I said, what you are trying to achieve using a new attribute would be way
>> better to achieve using new pattern items which can be easily told one from
>> another in PMD when pre-allocaing resources for different async flow tables.
>>
>> So, I don't have any proposal for *attribute* naming.
>> What I propose is to consider new items instead.
>>
>>>> 2) while naming a *set* of wire ports as "wire_orig" might be OK,
>>>>     sticking with term "vf_orig" for a *set* of guest ports is
>>>>     clearly not, simply because the user may pass another PF
>>>>     to a guest instead of passing a VF; in other words,
>>>>     a better term is needed here;
>>>>
>>> Like you said, vport may contain VF, SF etc. vport_orgin is on the logic switch
>> perspective.
>>> Any proposal is welcome.
>>
>> The problem is, vport can be easily confused with a slightly more generic
>> "lport" (embedded switch's "logical port"), and, logical ports, in turn, are not
>> confined to just VFs or PFs. For example, physical (wire) ports are ALSO logical
>> ports of the switch.
>>
>>>> 3) since it is possible to plug multiple NICs to a DPDK application,
>>>>     even from different vendors, the user may end up having multiple
>>>>     physical ports belonging to different physical NICs attached to
>>>>     the application; if this is the case, then referring to a *set*
>>>>     of wire ports using the new attribute is ambiguous in the
>>>>     sense that it's unclear whether this applies only to
>>>>     wire ports of some specific physical NIC or to the
>>>>     physical ports of *all* NICs managed by the app;
>>>>
>>> Not matter how many NICs has been probed by the DPDK, there is always
>> switch/PF/VF/SF.. concept.
>>
>> Correct.
>>
>>> Each switch must have an owner identified by transfer_proxy(). Vport (VF/SF)
>> can't cross switch in normal case.
>>
>> No. That is not correct. This is tricky, but please hear me out: an individual NIC
>> board (that is, a given *switch*) is identified only by its switch domain ID. As I
>> explained above, "transfer proxy" is just a technical hint for the applcation to
>> indicate an ethdev through which "transfer" rules must be managed. Not all
>> vendors support this concept (and they are not obliged to support it).
>>
>>> The traffic comes from one NIC can't be offloaded by other NICs unless
>> forwarded by the application.
>>
>> Right, but forwarding in software (inside DPDK application) is out of scope with
>> regard to the problem that we're discussing.
>>
>>> If user use new attribute to cut one side resource, I think user is smart
>> enough to management the rules in different NICs.
>>
>> As I explained above, I do not deny the existence of the problem that your
>> patch is trying to solve. Now it looks like we're on the same page with regard
>> to understanding the fact that what you're trying to do is to introduce a match
>> criterion that would refer to a GROUP of similar ports. In my opinion, this is
>> not an *attribute*, it's a *match criterion*, and it should be implemented as
>> two new items.
>>
>> Having two different item types would perfectly fit the need to know the
>> difference between such "directions" (as per your terminology) early enough,
>> when parsing templates.
>>
>>> No default behavior changed with this update.
>>>
>>>> 4) adding an attribute instead of yet another pattern item type
>>>>     is not quite good because PMDs need to be updated separately
>>>>     to detect this attribute and throw an error if it's not
>>>>     supported, whilst with a new item type, the PMDs do not
>>>>     need to be updated = if a PMD sees an unsupported item
>>>>     while traversing the item with switch () { case }, it
>>>>     will anyway throw an error;
>>>>
>>> PMD also need to check if it supports new matching item or not, right?
>>> We can't assume NIC vendor' PMD implementation, right?
>>
>> No-no-no. Imagine a PMD which does not support "transfer" rules.
>> In such PMD, in the flow parsing function one would have:
>>
>> if (!!attr->transfer) {
>>      print_error("Transfer is not supported");
>>      return EINVAL;
>> }
>>
>> If you add a new attribute, then PMDs which are NOT going to support it need
>> to be updated to add similar check.
>> Otherwise, they will simply ignore presence / absence of the attribute in the
>> rule, and validation result will be unreliable.
>>
>> Yes, if this attribute is 0x0, then indeed behaviour does nto change. But what if
>> it's 0x1 or 0x2?
>> PMDs that do not support these values must somehow reject such rules on
>> parsing.
>>
>> However, this problem does not manifest itself when parsing items. Typially, in
>> a PMD, one would have:
>>
>> switch (item->type) {
>>      case RTE_FLOW_ITEM_TYPE_VOID:
>>          break;
>>
>>      case RTE_FLOW_ITEM_TYPE_ETH:
>>          /* blah-blah-blah */
>>          break;
>>
>>      default:
>>          return ENOTSUP;
>> }
> Are you assuming all PMDs will be implemented in the upper style?

One may take a look at the existing PMDs. It's open source after all.

When one has an array of items of unknown count which is
END-terminated, then, obviously, the PMD has to traverse
it one way or another. If it stubles upon an unknown
item, it will have nothing to do but to throw an error.

> This new field targets async API which was added recently. No impact on sync API.

Rongwei, I see your point. The problem with it, however, is that even
if you describe it in comments, the code won't prevent non-sync API
from seeing this attribute in "struct rte_flow_attr".

As I say, "struct rte_flow_attr" has been here for ages.
When one adds a flow rule in a sync way, they fill out
the very same structure. And the user may set this new
argument to non-zero by mistake. Yes, you may argue
that the app developer should be smart enough to
read your comment before the struct member which
says that this field is for a-sync only. Right.
But that's not the only scenario. The field may
become non-zero because of some other mistake in
the program which, for example, leads to the
struct memory being corrupted in one way or
another. That's why the PMD has to validate flow rules...

I am very confusing. 
If the memory is corrupted or set mistakenly, we should fix it.
Memory corruption will lead unpredictable mistake especially under multi-thread.

So, the PMD must detect this inconsistency somehow and throw an error.
With your approach (attribute), the PMDs have to be updated to have
these checks. With the item approach that I suggest, updating the
PMDs is obviously not needed. Am I missing something? Let's discuss.
 
I am afraid pmd still needs to check pattern conflicts between "IS_ANY_***PORTS" with "port_id"/"represented_port" to avoid conflicts.
May I know if the attributes fully occupied at your side for some special purpose?
 

> I don't predict any effort on the existing PMD behavior.

I see your point. But how is this expressed in code?
As I explain above, consistency checks are what
flow validate API is for. New argument means
new checks. That's it.

Like commit log and what I mentioned multiple time.
If user choose to use advanced feature, they should read manual carefully and take the responsibility.

> But agree with you: we should emphasize it' only for async mode.

It's better to express this in code. So that the problem (if any)
can be detected programmatically and not just from reading comments.
From my point of view, the easiest way to have this done is to
add items instead of attributes, = no need to update PMDs.

We have readme, code snippet already. 
Even if user set the attribute in the sync API, nothing should happen since no underlayer support.(still behave like current TRANSFER domain)
Unless PMD has bug, but it is always good to fix bugs, right?

>
>>
>> So, if you introduce two new item types to solve your problem, then you won't
>> have to update existing PMDs. If the vendor wants to support the new items
>> (say, MLX or SFC), they'll update their code to accept the items. But other
>> vendors will not do anything. If the user tries to pass such an item to a vendor
>> which doesn't support the feature, the "default" case will just throw an error.
>>
>> This is what I mean when pointing out such difference between adding an
>> attribute VS adding new item types.
>>
>>>> 5) as in (4), a new attribute is not good from documentation
>>>>     standpoint; plase search for "represented_port = Y" in
>>>>     documentation = this way, all supported items are
>>>>     easily defined for various NIC vendors, but the
>>>>     same isn't true for attributes = there is no
>>>>     way to indicate supported attributes in doc.
>>>>
>>>> If points (1 - 5) make sense to you, then, if I may be so bold, I'd
>>>> like to suggest that the idea of adding a new attribute be abandoned.
>>>> Instead, I'd like to suggest adding new items:
>>>>
>>>> (the names are just sketch, for sure, it should be discussed)
>>>>
>>>> ANY_PHY_PORTS { switch_domain_id }
>>>>   = match packets entering the embedded switch from *whatever*
>>>>     physical ports belonging to the given switch domain
>>>>
>>> How many PHY_PORTS can one switch have, per your thought? Can I treat
>> the PHY_PORTS as the { switch_domain_id } owner as transfer_proxy()?
>>
>> A single physical NIC board is supposed to have a single embedded switch
>> engine. Hence, if the NIC board has, in example, two or four physical ports,
>> these will be the physical ports of the switch. That's it.
>>
>> As for the transfer proxy, please see my explanations above.
>> It's not *always* reliable to tell whether two given ethdevs belong to the same
>> physical NIC board or not.
>>
>> Switch domain ID is the right criterion (for applications).
>>
>>>> ANY_GUEST_PORTS { switch_domain_id }
>>>>   = match packets entering the embedded switch from *whatever*
>>>>     guest ports (VFs, PFs, etc.) belonging to the given
>>>>     switch domain
>>>>
>>>> The field "switch_domain_id" is required to tell one physical board /
>>>> vendor from another (as I explained in point (3)).
>>>> The application can query this parameter from ethdev's switch info:
>>>> please see "struct rte_eth_switch_info".
>>>>
>>>> What's your opinion?
>>>>
>>> How can we handle ANY_PHY_PORTS/ ANY_GUEST_PORTS ' relationship
>> with REPRESENTED_PORT if conflicts?
>>> Need future tuning.
>>
>> And if you carry on with "vf_orig" / "wire_orig" approach, you will inevitably
>> have the very same problem: possible conflict with items like
>> REPRESENTED_PORT. So does it matter? Yes, checks need to be done by PMDs
>> when parsing patterns.
>>
>>> Like I said before,  offloaded rules can't cross different NIC vendor'
>> "switch_domain_id".
>>> If user probes multiple NICs in one application, application should take care
>> of packet forwarding.
>>> Also application should be aware which ports belong to which NICs.
>>
>> Yes, perhaps, domain ID is not needed in the new items.
>> But the application still must keep track of switch domain IDs itself so it knows
>> which rules to manage via which ethdevs.
>>
>> Any other opinions?
> ANY_PHY_PORTS/ ANY_GUEST_PORTS looks like a super set of ports.

So does the new attribute, doesn't it?

> This will come another challenge: "why can't we use REPRESENTED_PORT  with mask" or "combine several REPRESENTED_PORT together"?

This problem has been here for many other items, including now deprecated
items PF, VF and PHY_PORT. Yes, theoretically, when the PMD looks through
the pattern, it has to check that its items do not overlap / contradict.
That's kind of OK, isn't it? The PMD has to check things after all...

For example, no one prevents user from submitting a pattern
with several adjacent items ETH in it. The PMD is supposed
to turn such request down.

>>
>>>>>
>>>>>>>
>>>>>>>> For example, the diff below adds the attributes to "table"
>>>>>>>> commands in testpmd but does not add them to regular (non-table)
>>>>>>>> commands like "flow create". Why?
>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> "table" command limits pattern_template to single direction or
>>>>>>> bidirection
>>>>>> per user specified attribute.
>>>>>>
>>>>>> As I say above, the same effect can be achieved by adding item
>>>>>> REPRESENTED_PORT to the corresponding pattern template.
>>>>> See above.
>>>>>>
>>>>>>> "rule" command must tight with one "table_id", so the rule will
>>>>>>> inherit the
>>>>>> "table" direction property, no need to specify again.
>>>>>>
>>>>>> You migh've misunderstood. I do not talk about "rule" command
>>>>>> coupled with some "table". What I talk about is regular, NON-async
>>>>>> flow insertion commands.
>>>>>>
>>>>>> Please take a look at section "/* Validate/create attributes. */"
>>>>>> in file "app/test-pmd/cmdline_flow.c". When one adds a new flow
>>>>>> attribute, they should reflect it the same way as VC_INGRESS,
>>>> VC_TRANSFER, etc.
>>>>>>
>>>>>> That's it.
>>>>> We don't intend to pass this to sync API. The above code example is
>>>>> for sync
>>>> API.
>>>>
>>>> So I understand. But there's one slight problem: in your patch, you
>>>> add the new attributes to the structure which is *shared* between
>>>> sync and async use case scenarios. If one adds an attribute to this
>>>> structure, they have to provide accessors for it in all sync-related
>>>> commands in testpmd, but your patch does not do that.
>>>>
>>> Like the title said, "creating transfer table" is the ASYNC operation.
>>> We have limited the scope of this patch. Sync API will be another story.
>>> Maybe we can add one more sentence to emphasize async API again.
>>
>> No-no-no. There might be slight misunderstanding. I understand that you are
>> limiting the scope of your patch by saying this and this.
>> That's OK. What I'm trying to point out is the fact that your patch nevertheless
>> touches the COMMON part of the flow API which is shared between two
>> approaches (sync and async).
> Yeah, you are right, we should emphasize it for async API not sync in the code and comments.
>>
>> Imagine a reader that does not know anything about the async approach.
>> He just opens the file in vim and goes directly to struct rte_flow_attr.
>> And, over there, he sees the new attribute "wire_orig". He then immediately
>> assumes that these attributes can be used in testpmd. Now the reader opens
>> testpmd and tries to insert a flow rule using the sync approach:
>>
>> flow create priority 0 transfer vf_orig pattern / ... / end actions drop
>>
>
> This is wrong statement.
> If user has no idea with cmdline usage, he should rely on "tab indication' not something by guessing.
>
> The command prefix "flow" bifurcated now to sync and async now, user may use any keyword combinations.
> He will get "argument error" if it's not good unless he knows what' he is doing.
> Again:  we should emphasize it's only for async API only.

OK, even if this example is not good enough, I still believe that
it is not right to introduce new match criteria in the form of
rule attributes. Match criteria belong in the pattern.

>
>> And doing so will be a failure, because your patch does not add the new
>> attribute keyword to sync flow rule syntax parser. That's it.
>>
>> Once again, I should ephasize: the reader MAY know nothing about the async
>> approach. But if the attribute is present in "struct rte_flow_attr", it
>> immediately means that it is available everywhere. Both sync and async.
>>
>> So, with this in mind, your attempt to limit the scope of the patch to async-only
>> rules looks a little bit artificial. It's not correct from the *formal* standpoint.
>>
>>>
>>>> In other words, it is wrong to assume that "struct rte_flow_attr"
>>>> only applies to async approach. It had been introduced long before
>>>> the async flow design was added to DPDK. That's it.
>>>>
>>>>>>
>>>>>> But, as I say, I still believe that the new attributes aren't needed.
>>>>> I think we are not at the same page for now. Can we reach agreement
>>>>> on the same matching criteria first?
>>>>>>>
>>>>>>>>> It helps to save underlayer memory also on insertion rate.
>>>>>>>>
>>>>>>>> Which memory? Host memory? NIC memory? Term "underlayer" is
>>>> vague.
>>>>>>>> I suggest that the commit message be revised to first explain how
>>>>>>>> such memory is spent currently, then explain why this is not
>>>>>>>> optimal and, finally, which way the patch is supposed to improve
>>>>>>>> that. I.e. be more
>>>>>> specific.
>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> For large scalable rules, HW (depends on implementation) always
>>>>>>> needs
>>>>>> memory to hold the rules' patterns and actions, either from NIC or
>>>>>> from
>>>> host.
>>>>>>> The memory footprint highly depends on "user rules' complexity",
>>>>>>> also diff
>>>>>> between NICs.
>>>>>>> ~50% memory saving is expected if one-direction is cut.
>>>>>>
>>>>>> Regardless of this talk, this explanation should probably be
>>>>>> present in the commit description.
>>>>>>
>>>>> This number may differ with different NICs or implementation. We
>>>>> can't say
>>>> it for sure.
>>>>
>>>> Not an exact number, of course, but a brief explanation of:
>>>> a) what is wrong / not optimal in the current design;
>>> Please check the commit log, transfer have the capability to match bi-
>> direction traffic no matter what ports.
>>>> b) how it is observed in customer deployments;
>>> Customer have the requirements to save resources and their offloaded rules
>> is direction aware.
>>>> c) why the proposed patch is a good solution.
>>> New attributes provide the way to remove one direction and save underlayer
>> resource.
>>> All of the above can be found in the commit log.
>>
>> I understand all of that, but my point is, the existing commit message is way
>> too brief. Yes, it mentions that SOME customers have SOME deployments, but
>> it does not shed light on which specifics these deployments have. For example,
>> back in the day, when items PORT_REPRESENTOR and REPRESENTED_PORT
>> were added, the cover letter for that patch series provided details of
>> deployment specifics (application: OvS, scenario: full offload rules).
>>
>> So, it's always better to expand on such specifics so that the reader has full
>> picture in their head and doesn't need to look elsewhere.
>> Not all readers of the commit message will be happy to delve into our
>> discussions on the mailing list to get the gist.
>>
> It' approach diverse. Pattern item approach will attract another discussion thread, right?

As I said, match criteria belong in flow pattern. I recognise the
importance of the problem that you're looking to solve. It's very
good that you care to address it, but what this patch tries to do
is to add more match criteria in the form of new attributes with
rather questionable names... There's a room for improvement.

When I say that new features should not confuse readers, I mean
a very basic thing: readers know that match criteria all sit
in the pattern. And they refer to the pattern item enum in
the code and in documentation to learn about criteria,
while "struct rte_flow_attr" is an unusual place from
which to learn about match criteria.

> We should get a conclusion and reflect in the commit changes&logs, and it's easy for others to absorb.

Yes, but before we get to that, perhaps it pays to hear
more feedback from other reviewers. Thomas? Ori? Andrew?

>>>
>>>>
>>>
>>>>>>>
>>>>>>>>> By default, the transfer domain is bi-direction, and no behavior
>> changes.
>>>>>>>>>
>>>>>>>>> 1. Match wire origin only
>>>>>>>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
>>>>>>>>> 2. Match vf origin only
>>>>>>>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
>>>>>>>>>
>>>>>>>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
>>>>>>>>> ---
>>>>>>>>> app/test-pmd/cmdline_flow.c                 | 26
>> +++++++++++++++++++++
>>>>>>>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
>>>>>>>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
>>>>>>>>> 3 files changed, 36 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/app/test-pmd/cmdline_flow.c
>>>>>>>>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82
>>>>>>>>> 100644
>>>>>>>>> --- a/app/test-pmd/cmdline_flow.c
>>>>>>>>> +++ b/app/test-pmd/cmdline_flow.c
>>>>>>>>> @@ -177,6 +177,8 @@ enum index {
>>>>>>>>>       TABLE_INGRESS,
>>>>>>>>>       TABLE_EGRESS,
>>>>>>>>>       TABLE_TRANSFER,
>>>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] =
>> {
>>>>>>>>>       TABLE_INGRESS,
>>>>>>>>>       TABLE_EGRESS,
>>>>>>>>>       TABLE_TRANSFER,
>>>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
>>>>>>>>>               .next = NEXT(next_table_attr),
>>>>>>>>>               .call = parse_table,
>>>>>>>>>       },
>>>>>>>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
>>>>>>>>> +             .name = "wire_orig",
>>>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>>>
>>>>>>>> This does not explain the "wire" aspect. It's too broad.
>>>>>>>>
>>>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>>>> +             .call = parse_table,
>>>>>>>>> +     },
>>>>>>>>> +     [TABLE_TRANSFER_VF_ORIG] = {
>>>>>>>>> +             .name = "vf_orig",
>>>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>>>
>>>>>>>> This explanation simply duplicates such of the "wire_orig".
>>>>>>>> It does not explain the "vf" part. Should be more specific.
>>>>>>>>
>>>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>>>> +             .call = parse_table,
>>>>>>>>> +     },
>>>>>>>>>       [TABLE_RULES_NUMBER] = {
>>>>>>>>>               .name = "rules_number",
>>>>>>>>>               .help = "number of rules in table", @@ -8894,6
>>>>>>>>> +8910,16 @@ parse_table(struct context *ctx, const struct token
>>>>>>>>> +*token,
>>>>>>>>>       case TABLE_TRANSFER:
>>>>>>>>>               out->args.table.attr.flow_attr.transfer = 1;
>>>>>>>>>               return len;
>>>>>>>>> +     case TABLE_TRANSFER_WIRE_ORIG:
>>>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>>>> +                     return -1;
>>>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
>>>>>>>>> +             return len;
>>>>>>>>> +     case TABLE_TRANSFER_VF_ORIG:
>>>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>>>> +                     return -1;
>>>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
>>>>>>>>> +             return len;
>>>>>>>>>       default:
>>>>>>>>>               return -1;
>>>>>>>>>       }
>>>>>>>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> index 330e34427d..603b7988dd 100644
>>>>>>>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>>>> @@ -3332,7 +3332,8 @@ It is bound to
>>>>>>>> ``rte_flow_template_table_create()``::
>>>>>>>>>
>>>>>>>>>   flow template_table {port_id} create
>>>>>>>>>       [table_id {id}] [group {group_id}]
>>>>>>>>> -       [priority {level}] [ingress] [egress] [transfer]
>>>>>>>>> +       [priority {level}] [ingress] [egress]
>>>>>>>>> +       [transfer [vf_orig] [wire_orig]]
>>>>>>>>
>>>>>>>> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
>>>>>>>> [wire_orig] ?
>>>>>>>>
>>>>>>>>>       rules_number {number}
>>>>>>>>>       pattern_template {pattern_template_id}
>>>>>>>>>       actions_template {actions_template_id} diff --git
>>>>>>>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>>>>>>>>> a79f1e7ef0..512b08d817 100644
>>>>>>>>> --- a/lib/ethdev/rte_flow.h
>>>>>>>>> +++ b/lib/ethdev/rte_flow.h
>>>>>>>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
>>>>>>>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
>>>>>>>>>        */
>>>>>>>>>       uint32_t transfer:1;
>>>>>>>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
>>>>>>>>> +     /**
>>>>>>>>> +      * 0 means bidirection,
>>>>>>>>> +      * 0x1 origin uplink,
>>>>>>>>
>>>>>>>> What does "uplink" mean? It's too vague. Hardly a good term.
>>>>
>>>> I believe this comment should be reworked, in case the idea of having
>>>> an extra attribute persists.
>>>>
>>>>>>>>
>>>>>>>>> +      * 0x2 origin vport,
>>>>>>>>
>>>>>>>> What does "origin vport" mean? Hardly a good term as well.
>>>>
>>>> I still believe this explanation is way too brief and needs to be
>>>> reworked to provide more details, to define the use case for the attribute
>> more specifically.
>>>>
>>>>>>>>
>>>>>>>>> +      * N/A both set.
>>>>>>>>
>>>>>>>> What's this?
>>>>
>>>> The question stands.
>>>>
>>>>>>>>
>>>>>>>>> +      */
>>>>>>>>> +     uint32_t transfer_mode:2;
>>>>>>>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>> --
>>>>>>>>> 2.27.0
>>>>>>>>>
>>>>>>>>
>>>>>>>> Since the attributes are added to generic 'struct rte_flow_attr',
>>>>>>>> non-table
>>>>>>>> (synchronous) flow rules are supposed to support them, too. If
>>>>>>>> that is indeed the case, then I'm afraid such proposal does not
>>>>>>>> agree with the existing items PORT_REPRESENTOR and
>> REPRESENTED_PORT.
>>>> They
>>>>>>>> do exactly the same thing, but they are designed to be way more
>>>>>>>> generic. Why
>>>>>> not use them?
>>>>>>
>>>>>> The question stands.
>>>>>>
>>>>>>>>
>>>>>>>> Ivan
>>>>>>>
>>>>>>
>>>>>> Ivan
>>>>>
>>>
>>
>> Thank you.
>

Thanks,
Ivan
  
Ivan Malov Sept. 15, 2022, 9:42 a.m. UTC | #14
Hi Thomas,

On Thu, 15 Sep 2022, Thomas Monjalon wrote:

> 15/09/2022 09:47, Ivan Malov:
>> As I said, match criteria belong in flow pattern. I recognise the
>> importance of the problem that you're looking to solve. It's very
>> good that you care to address it, but what this patch tries to do
>> is to add more match criteria in the form of new attributes with
>> rather questionable names... There's a room for improvement.
>>
>> When I say that new features should not confuse readers, I mean
>> a very basic thing: readers know that match criteria all sit
>> in the pattern. And they refer to the pattern item enum in
>> the code and in documentation to learn about criteria,
>> while "struct rte_flow_attr" is an unusual place from
>> which to learn about match criteria.
>>
>>> We should get a conclusion and reflect in the commit changes&logs, and it's easy for others to absorb.
>>
>> Yes, but before we get to that, perhaps it pays to hear
>> more feedback from other reviewers. Thomas? Ori? Andrew?
>
> Sorry I did not read all.

OK, I will attempt to summarise it to some extent
in my next response to Rongwei which is to follow.

> I think the main question is about the use of attributes.
> I refer to this commit of Ivan last year which was agreed:
>
>    ethdev: deprecate direction attributes in transfer flows
>
>    Attributes "ingress" and "egress" can only apply unambiguosly
>    to non-"transfer" flows. In "transfer" flows, the standpoint
>    is effectively shifted to the embedded switch. There can be
>    many different endpoints connected to the switch, so the
>    use of "ingress" / "egress" does not shed light on which
>    endpoints precisely can be considered as traffic sources.
>
>    Add relevant deprecation notices and suggest the use of precise
>    traffic source items (PORT_REPRESENTOR and REPRESENTED_PORT).
>
>    Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
>    Acked-by: Ori Kam <orika@nvidia.com>
>    Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>    Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
>
> So +1 for using only pattern items as matching criteria.

Thank you.

>
>
>

Ivan
  
Ivan Malov Sept. 15, 2022, 10:59 a.m. UTC | #15
Hi Rongwei,

In this reply, I do not include the previous mail because the amount
of inline commentary has gone haywire over the past couple of days.
Let's re-iterate.

But before I get to that, I'd like to offer a fresh perspective:

Perhaps, if we all agree that term "vport" means an endpoint which
can stand for any "port" except for physical one, then it should
be possible to use term ANY_VPORTS rather than ANY_GUEST_PORTS.

But that's tricky, of course. I don't have a way with naming,
so more opinions are welcome and very-very desirable here.

So:

1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
    primitives are in fact yet another match criteria?

    ..

    To me, it looks so. If they are match criteria, then they belong
    in match pattern, that is, they should be expressed as new items.

    For "transfer" rules, the *existing* attributes are: "group"
    and "priority". As you may note, these are clearly not match
    criteria. They control the look-up order. So, to this day,
    there're no match criteria in DPDK expressed as attributes.

    If these "wire_orig" / "vf_orig" are going to be introduced
    as attributes, that should be backed with strong motivation.

2) From your viewpoint, why items "ANY_PHYS_PORTS" and "ANY_VPORTS"
    won't do? Or, which problems do you think they may inflict?

    ..

    Previously, you explained why REPRESENTED_PORT would not
    fit your needs. And I understand your point: to async API,
    two pattern templates which both have item REPRESENTED_PORT
    in them cannot be clearly distinguished and are in fact the
    same set of criteria (provided that all other items are also
    the same and have the same masks). Templates are, well,
    templates (or shapes) of the rules to come later and
    do not include exact "spec" for the "ethdev_id".
    Got it.

    But that's not going to be the case with items ANY_PHYS_PORTS and
    ANY_VPORTS, is it? In one async table template, the user submits
    item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
    In another template, the user submits item ANY_VPORTS to
    state that they want to match only traffic transmitted
    software endpoints (DPDK ethdevs, guest VFs, etc.)
    connected to the switch.

    In this example, the PMD will clearly see that the two templates
    differ. So it will be able to allocate separate resources, each
    one "cutting one half of traffic" (as per your concept).

3) In your most recent response, you suggested that one might have
    had the attributes occupied for some other purposes. To me,
    they're not. Neither me nor my closest colleagues have
    any plans on them. When I advocate using item approach
    over the attribute approach, I do this to ensure
    a) clarity of the API contract and b) robustness.

4) Also, in your response, you suggested that I might have
    confused item mask and spec. That is not the case.
    If we agree, that switch domain ID is unneeded in
    the new items, then these items will have no
    fields in them (like item PF had not had any
    before it was deprecated).

    No fields in new items => no field masks.
    So what's the problem then?

5) With regard to our talk about identifying the relationship
    between ethdevs and switch domains, you said that the user
    could know the difference from the very beginning:
    /sysfs/ .... /PF_BDF/sriov_num

    That is true for the user who starts the application, but
    this knowledge is hard to obtain from the application
    perspective = it's hard to automate.

    This is why ethdevs are able to advertise their domain IDs.
    And, as I explained, looking at domain ID to understand
    port relationship is valid, whilst looking at proxy IDs
    to achieve the same goal is not. Proxy port IDs only
    serve the purpose of finding an entry point for
    managing flows. That has slightly different
    meaning, but this subtle difference is important.

6) As for the confusion over the difference between fixing
    bugs and making the code robust by extra checks:

    Yes, I agree that the programmer who writes the
    application must be intelligent enough to use
    flow primitives the proper way. Yes, the user
    who starts the application also should thread
    carefully. But that does not prevent some
    mistakes in other parts of code from
    corrupting various chunks of memory,
    including, for example, flow attrs.

    You say that such mistakes have to be "just fixed"
    as any other bugs. Right. But how much time will
    the programmer spend to identify the bugs?

    If the PMDs do all the checks (as with attributes),
    the hypothetical bug will manifest itself much
    earlier. That will simplify debugging by a lot...

    So, my point is that it's still better to ensure
    that new flow primitives have all necessary
    checks in place. For attributes, it is
    required to add them separately.

    For items, as I explained, it might not be necessary
    in the majority of cases simply because of the
    switch (item->type) { case } structure.

So, these are some of my points to explain why the
attribute approach is untenable. To me, attributes
are something global, which demands checks in all
flow-capable PMDs. Items seem better because they
are don't cares to all PMDs which are unaware of
the async concept. So, even if someone does not
implement the async concept or does not like
the new item names, they can turn a blind
eye to this - with attributes, thay can't.

Thank you.
  
Thomas Monjalon Sept. 15, 2022, 11:16 a.m. UTC | #16
15/09/2022 12:59, Ivan Malov:
> Hi Rongwei,
> 
> In this reply, I do not include the previous mail because the amount
> of inline commentary has gone haywire over the past couple of days.
> Let's re-iterate.
> 
> But before I get to that, I'd like to offer a fresh perspective:
> 
> Perhaps, if we all agree that term "vport" means an endpoint which
> can stand for any "port" except for physical one, then it should
> be possible to use term ANY_VPORTS rather than ANY_GUEST_PORTS.

The opposite of "physical" is "virtual" indeed.

> But that's tricky, of course. I don't have a way with naming,
> so more opinions are welcome and very-very desirable here.
> 
> So:
> 
> 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
>     primitives are in fact yet another match criteria?
> 
>     ..
> 
>     To me, it looks so. If they are match criteria, then they belong
>     in match pattern, that is, they should be expressed as new items.
> 
>     For "transfer" rules, the *existing* attributes are: "group"
>     and "priority". As you may note, these are clearly not match
>     criteria. They control the look-up order. So, to this day,
>     there're no match criteria in DPDK expressed as attributes.
> 
>     If these "wire_orig" / "vf_orig" are going to be introduced
>     as attributes, that should be backed with strong motivation.

I prefer we keep matching in a single place, not in attributes.


> 2) From your viewpoint, why items "ANY_PHYS_PORTS" and "ANY_VPORTS"
>     won't do? Or, which problems do you think they may inflict?
> 
>     ..
> 
>     Previously, you explained why REPRESENTED_PORT would not
>     fit your needs. And I understand your point: to async API,
>     two pattern templates which both have item REPRESENTED_PORT
>     in them cannot be clearly distinguished and are in fact the
>     same set of criteria (provided that all other items are also
>     the same and have the same masks). Templates are, well,
>     templates (or shapes) of the rules to come later and
>     do not include exact "spec" for the "ethdev_id".
>     Got it.
> 
>     But that's not going to be the case with items ANY_PHYS_PORTS and
>     ANY_VPORTS, is it? In one async table template, the user submits
>     item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
>     In another template, the user submits item ANY_VPORTS to
>     state that they want to match only traffic transmitted
>     software endpoints (DPDK ethdevs, guest VFs, etc.)
>     connected to the switch.
> 
>     In this example, the PMD will clearly see that the two templates
>     differ. So it will be able to allocate separate resources, each
>     one "cutting one half of traffic" (as per your concept).
> 
> 3) In your most recent response, you suggested that one might have
>     had the attributes occupied for some other purposes. To me,
>     they're not. Neither me nor my closest colleagues have
>     any plans on them. When I advocate using item approach
>     over the attribute approach, I do this to ensure
>     a) clarity of the API contract and b) robustness.
> 
> 4) Also, in your response, you suggested that I might have
>     confused item mask and spec. That is not the case.
>     If we agree, that switch domain ID is unneeded in
>     the new items, then these items will have no
>     fields in them (like item PF had not had any
>     before it was deprecated).
> 
>     No fields in new items => no field masks.
>     So what's the problem then?
> 
> 5) With regard to our talk about identifying the relationship
>     between ethdevs and switch domains, you said that the user
>     could know the difference from the very beginning:
>     /sysfs/ .... /PF_BDF/sriov_num
> 
>     That is true for the user who starts the application, but
>     this knowledge is hard to obtain from the application
>     perspective = it's hard to automate.
> 
>     This is why ethdevs are able to advertise their domain IDs.
>     And, as I explained, looking at domain ID to understand

namely rte_eth_dev_info.switch_info.domain_id

>     port relationship is valid, whilst looking at proxy IDs
>     to achieve the same goal is not. Proxy port IDs only
>     serve the purpose of finding an entry point for
>     managing flows. That has slightly different
>     meaning, but this subtle difference is important.

There is also a concept of sibling ports
to get all ports belonging to the same hardware.


> 6) As for the confusion over the difference between fixing
>     bugs and making the code robust by extra checks:
> 
>     Yes, I agree that the programmer who writes the
>     application must be intelligent enough to use
>     flow primitives the proper way. Yes, the user
>     who starts the application also should thread
>     carefully. But that does not prevent some
>     mistakes in other parts of code from
>     corrupting various chunks of memory,
>     including, for example, flow attrs.
> 
>     You say that such mistakes have to be "just fixed"
>     as any other bugs. Right. But how much time will
>     the programmer spend to identify the bugs?
> 
>     If the PMDs do all the checks (as with attributes),
>     the hypothetical bug will manifest itself much
>     earlier. That will simplify debugging by a lot...
> 
>     So, my point is that it's still better to ensure
>     that new flow primitives have all necessary
>     checks in place. For attributes, it is
>     required to add them separately.

If flow insertion is done in a fast path,
such checks may be skipped.

>     For items, as I explained, it might not be necessary
>     in the majority of cases simply because of the
>     switch (item->type) { case } structure.
> 
> So, these are some of my points to explain why the
> attribute approach is untenable. To me, attributes
> are something global, which demands checks in all
> flow-capable PMDs. Items seem better because they
> are don't cares to all PMDs which are unaware of
> the async concept. So, even if someone does not
> implement the async concept or does not like
> the new item names, they can turn a blind
> eye to this - with attributes, thay can't.
> 
> Thank you.
  
Ori Kam Sept. 20, 2022, 9:41 a.m. UTC | #17
Hi Ivan, Thomas and Rongwei

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, 15 September 2022 14:16
> 
> 15/09/2022 12:59, Ivan Malov:
> > Hi Rongwei,
> >
> > In this reply, I do not include the previous mail because the amount
> > of inline commentary has gone haywire over the past couple of days.
> > Let's re-iterate.
> >
> > But before I get to that, I'd like to offer a fresh perspective:
> >
> > Perhaps, if we all agree that term "vport" means an endpoint which
> > can stand for any "port" except for physical one, then it should
> > be possible to use term ANY_VPORTS rather than ANY_GUEST_PORTS.
> 
> The opposite of "physical" is "virtual" indeed.
> 
> > But that's tricky, of course. I don't have a way with naming,
> > so more opinions are welcome and very-very desirable here.
> >
> > So:
> >
> > 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
> >     primitives are in fact yet another match criteria?
> >
> >     ..
> >
> >     To me, it looks so. If they are match criteria, then they belong
> >     in match pattern, that is, they should be expressed as new items.
> >
> >     For "transfer" rules, the *existing* attributes are: "group"
> >     and "priority". As you may note, these are clearly not match
> >     criteria. They control the look-up order. So, to this day,
> >     there're no match criteria in DPDK expressed as attributes.
> >
> >     If these "wire_orig" / "vf_orig" are going to be introduced
> >     as attributes, that should be backed with strong motivation.
> 
> I prefer we keep matching in a single place, not in attributes.
> 

I think we are talking about two different features.
Feature 1:
Allow matching on all vports that are not wire
Feature 2:
Save allocation space and allow fast insertion.
In this case, the matching is not on all vports it can be just part of the vports
but it will never be the wire port.
For example:
port 0 - wire
ports 1,2,3,4,5  - vports
the application want to inset only those rules:
represented_port(port_id=2) / eth / ipv4 (src==xx)
represented_port(port_id=4) / eth / ipv4 (src==xx)
represented_port(port_id=4) / eth / ipv4 (src==yy)

For feature 1 I fully agree with you Ivan, this should be added as an item.
For feature 2 I think Rongwei's suggestion is the better option.
If I understand correctly the idea is to give hint to the PMD on where to allocate memory
and how to insert the rules most optimally. Since this is shared for all rules it makes more sense
to add it as an attribute, just like we don’t have an ingress item (maybe we should?)

Ivan we have the item RTE_FLOW_ITEM_TYPE_PF and RTE_FLOW_ITEM_TYPE_VF which are deprecated,
So do you want to un-deprecate them?

To summarize, if PMD can use such an hint during rule creation and save memory, I vote
to allow it.
if the idea is to match on all vports then it should be an item.

> 
> > 2) From your viewpoint, why items "ANY_PHYS_PORTS" and
> "ANY_VPORTS"
> >     won't do? Or, which problems do you think they may inflict?
> >
> >     ..
> >
> >     Previously, you explained why REPRESENTED_PORT would not
> >     fit your needs. And I understand your point: to async API,
> >     two pattern templates which both have item REPRESENTED_PORT
> >     in them cannot be clearly distinguished and are in fact the
> >     same set of criteria (provided that all other items are also
> >     the same and have the same masks). Templates are, well,
> >     templates (or shapes) of the rules to come later and
> >     do not include exact "spec" for the "ethdev_id".
> >     Got it.
> >
> >     But that's not going to be the case with items ANY_PHYS_PORTS and
> >     ANY_VPORTS, is it? In one async table template, the user submits
> >     item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
> >     In another template, the user submits item ANY_VPORTS to
> >     state that they want to match only traffic transmitted
> >     software endpoints (DPDK ethdevs, guest VFs, etc.)
> >     connected to the switch.
> >
> >     In this example, the PMD will clearly see that the two templates
> >     differ. So it will be able to allocate separate resources, each
> >     one "cutting one half of traffic" (as per your concept).
> >
> > 3) In your most recent response, you suggested that one might have
> >     had the attributes occupied for some other purposes. To me,
> >     they're not. Neither me nor my closest colleagues have
> >     any plans on them. When I advocate using item approach
> >     over the attribute approach, I do this to ensure
> >     a) clarity of the API contract and b) robustness.

If something is shared for all rules in the same table, it should be a table
property.

> >
> > 4) Also, in your response, you suggested that I might have
> >     confused item mask and spec. That is not the case.
> >     If we agree, that switch domain ID is unneeded in
> >     the new items, then these items will have no
> >     fields in them (like item PF had not had any
> >     before it was deprecated).
> >
> >     No fields in new items => no field masks.
> >     So what's the problem then?
> >
> > 5) With regard to our talk about identifying the relationship
> >     between ethdevs and switch domains, you said that the user
> >     could know the difference from the very beginning:
> >     /sysfs/ .... /PF_BDF/sriov_num
> >
> >     That is true for the user who starts the application, but
> >     this knowledge is hard to obtain from the application
> >     perspective = it's hard to automate.
> >
> >     This is why ethdevs are able to advertise their domain IDs.
> >     And, as I explained, looking at domain ID to understand
> 
> namely rte_eth_dev_info.switch_info.domain_id
> 
> >     port relationship is valid, whilst looking at proxy IDs
> >     to achieve the same goal is not. Proxy port IDs only
> >     serve the purpose of finding an entry point for
> >     managing flows. That has slightly different
> >     meaning, but this subtle difference is important.
> 
> There is also a concept of sibling ports
> to get all ports belonging to the same hardware.
> 
> 
> > 6) As for the confusion over the difference between fixing
> >     bugs and making the code robust by extra checks:
> >
> >     Yes, I agree that the programmer who writes the
> >     application must be intelligent enough to use
> >     flow primitives the proper way. Yes, the user
> >     who starts the application also should thread
> >     carefully. But that does not prevent some
> >     mistakes in other parts of code from
> >     corrupting various chunks of memory,
> >     including, for example, flow attrs.
> >
> >     You say that such mistakes have to be "just fixed"
> >     as any other bugs. Right. But how much time will
> >     the programmer spend to identify the bugs?
> >
> >     If the PMDs do all the checks (as with attributes),
> >     the hypothetical bug will manifest itself much
> >     earlier. That will simplify debugging by a lot...
> >
> >     So, my point is that it's still better to ensure
> >     that new flow primitives have all necessary
> >     checks in place. For attributes, it is
> >     required to add them separately.
> 
> If flow insertion is done in a fast path,
> such checks may be skipped.

The idea is that all rules in this table will share the same configuration,
there is no reason to say everything again for each rule. This is why
the rule attributes were moved to the table struct and not per rule.

> 
> >     For items, as I explained, it might not be necessary
> >     in the majority of cases simply because of the
> >     switch (item->type) { case } structure.
> >
> > So, these are some of my points to explain why the
> > attribute approach is untenable. To me, attributes
> > are something global, which demands checks in all
> > flow-capable PMDs. Items seem better because they
> > are don't cares to all PMDs which are unaware of
> > the async concept. So, even if someone does not
> > implement the async concept or does not like
> > the new item names, they can turn a blind
> > eye to this - with attributes, thay can't.
> >

Good point,
Maybe we should add hints in the attribute,
for example, hint_only_wire in this case it will be clear that
PMD may ignore this, and it should be fully documented that this is not a mandatory field.
What do you think?

> > Thank you.
> 
>
  
Ivan Malov Sept. 20, 2022, 12:45 p.m. UTC | #18
Hi Ori,

On Tue, 20 Sep 2022, Ori Kam wrote:

> Hi Ivan, Thomas and Rongwei
>
>> -----Original Message-----
>> From: Thomas Monjalon <thomas@monjalon.net>
>> Sent: Thursday, 15 September 2022 14:16
>>
>> 15/09/2022 12:59, Ivan Malov:
>>> Hi Rongwei,
>>>
>>> In this reply, I do not include the previous mail because the amount
>>> of inline commentary has gone haywire over the past couple of days.
>>> Let's re-iterate.
>>>
>>> But before I get to that, I'd like to offer a fresh perspective:
>>>
>>> Perhaps, if we all agree that term "vport" means an endpoint which
>>> can stand for any "port" except for physical one, then it should
>>> be possible to use term ANY_VPORTS rather than ANY_GUEST_PORTS.
>>
>> The opposite of "physical" is "virtual" indeed.
>>
>>> But that's tricky, of course. I don't have a way with naming,
>>> so more opinions are welcome and very-very desirable here.
>>>
>>> So:
>>>
>>> 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
>>>     primitives are in fact yet another match criteria?
>>>
>>>     ..
>>>
>>>     To me, it looks so. If they are match criteria, then they belong
>>>     in match pattern, that is, they should be expressed as new items.
>>>
>>>     For "transfer" rules, the *existing* attributes are: "group"
>>>     and "priority". As you may note, these are clearly not match
>>>     criteria. They control the look-up order. So, to this day,
>>>     there're no match criteria in DPDK expressed as attributes.
>>>
>>>     If these "wire_orig" / "vf_orig" are going to be introduced
>>>     as attributes, that should be backed with strong motivation.
>>
>> I prefer we keep matching in a single place, not in attributes.
>>
>
> I think we are talking about two different features.
> Feature 1:
> Allow matching on all vports that are not wire
> Feature 2:
> Save allocation space and allow fast insertion.
> In this case, the matching is not on all vports it can be just part of the vports
> but it will never be the wire port.
> For example:
> port 0 - wire
> ports 1,2,3,4,5  - vports
> the application want to inset only those rules:
> represented_port(port_id=2) / eth / ipv4 (src==xx)
> represented_port(port_id=4) / eth / ipv4 (src==xx)
> represented_port(port_id=4) / eth / ipv4 (src==yy)
>
> For feature 1 I fully agree with you Ivan, this should be added as an item.

Thank you.

> For feature 2 I think Rongwei's suggestion is the better option.
> If I understand correctly the idea is to give hint to the PMD on where to allocate memory
> and how to insert the rules most optimally. Since this is shared for all rules it makes more sense
> to add it as an attribute, just like we don’t have an ingress item (maybe we should?)

But isn't pattern template also supposed to be shared for all rules
in the table? I.e., the user creates an async flow table and submits
a flow "shape" (which consists of attrs, pattern template and action
template). So why should "giving a hint" via an item template be
considered worse than doig so via an attribute?

As for "ingress" item, - no, one should not add such. We have had
many discussions concerning this bit in the past. Ingress/egress
are non-transfer terms. They belong in the scope of vNIC / ethdev
filtering, not to embedded switch rules.

In my opinion, in the embedded switch, one should either point to
some precise switch ports (using REPRESENTOR / REPRESENTED items)
or use another kind of item to refer to a "super set" of ports
which have something in common ("all wire ports", "all NON-wire ports").

>
> Ivan we have the item RTE_FLOW_ITEM_TYPE_PF and RTE_FLOW_ITEM_TYPE_VF which are deprecated,
> So do you want to un-deprecate them?

No. These items are deprecated because:

a) their names suggest that application knows whether an ethdev
    sits on top of a PF or that the application has some
    knowledge of existence of particular VFs, but in
    reality applications should not be worried of
    the underlying function type = to them, all
    ethdevs are just representors of something,
    and if the application needs to refer to
    VFs (or other PFs, - doesn't matter), it
    should do that via REPRESENTOR items;

b) such items would duplicate REPRESENTOR / REPRESENTED.

>
> To summarize, if PMD can use such an hint during rule creation and save memory, I vote
> to allow it.
> if the idea is to match on all vports then it should be an item.

But such a hint would effectively be a match criterion, too, right?
So, in fact it's a combined use case: a match criterion which is
flexible enough to be a "hint" = i.e. the PMD can see it when
processing the pattern *template* and treat it as a hint.

>
>>
>>> 2) From your viewpoint, why items "ANY_PHYS_PORTS" and
>> "ANY_VPORTS"
>>>     won't do? Or, which problems do you think they may inflict?
>>>
>>>     ..
>>>
>>>     Previously, you explained why REPRESENTED_PORT would not
>>>     fit your needs. And I understand your point: to async API,
>>>     two pattern templates which both have item REPRESENTED_PORT
>>>     in them cannot be clearly distinguished and are in fact the
>>>     same set of criteria (provided that all other items are also
>>>     the same and have the same masks). Templates are, well,
>>>     templates (or shapes) of the rules to come later and
>>>     do not include exact "spec" for the "ethdev_id".
>>>     Got it.
>>>
>>>     But that's not going to be the case with items ANY_PHYS_PORTS and
>>>     ANY_VPORTS, is it? In one async table template, the user submits
>>>     item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
>>>     In another template, the user submits item ANY_VPORTS to
>>>     state that they want to match only traffic transmitted
>>>     software endpoints (DPDK ethdevs, guest VFs, etc.)
>>>     connected to the switch.
>>>
>>>     In this example, the PMD will clearly see that the two templates
>>>     differ. So it will be able to allocate separate resources, each
>>>     one "cutting one half of traffic" (as per your concept).
>>>
>>> 3) In your most recent response, you suggested that one might have
>>>     had the attributes occupied for some other purposes. To me,
>>>     they're not. Neither me nor my closest colleagues have
>>>     any plans on them. When I advocate using item approach
>>>     over the attribute approach, I do this to ensure
>>>     a) clarity of the API contract and b) robustness.
>
> If something is shared for all rules in the same table, it should be a table
> property.

But the whole pattern *template* is also a table property, isn't it?

>
>>>
>>> 4) Also, in your response, you suggested that I might have
>>>     confused item mask and spec. That is not the case.
>>>     If we agree, that switch domain ID is unneeded in
>>>     the new items, then these items will have no
>>>     fields in them (like item PF had not had any
>>>     before it was deprecated).
>>>
>>>     No fields in new items => no field masks.
>>>     So what's the problem then?
>>>
>>> 5) With regard to our talk about identifying the relationship
>>>     between ethdevs and switch domains, you said that the user
>>>     could know the difference from the very beginning:
>>>     /sysfs/ .... /PF_BDF/sriov_num
>>>
>>>     That is true for the user who starts the application, but
>>>     this knowledge is hard to obtain from the application
>>>     perspective = it's hard to automate.
>>>
>>>     This is why ethdevs are able to advertise their domain IDs.
>>>     And, as I explained, looking at domain ID to understand
>>
>> namely rte_eth_dev_info.switch_info.domain_id
>>
>>>     port relationship is valid, whilst looking at proxy IDs
>>>     to achieve the same goal is not. Proxy port IDs only
>>>     serve the purpose of finding an entry point for
>>>     managing flows. That has slightly different
>>>     meaning, but this subtle difference is important.
>>
>> There is also a concept of sibling ports
>> to get all ports belonging to the same hardware.
>>
>>
>>> 6) As for the confusion over the difference between fixing
>>>     bugs and making the code robust by extra checks:
>>>
>>>     Yes, I agree that the programmer who writes the
>>>     application must be intelligent enough to use
>>>     flow primitives the proper way. Yes, the user
>>>     who starts the application also should thread
>>>     carefully. But that does not prevent some
>>>     mistakes in other parts of code from
>>>     corrupting various chunks of memory,
>>>     including, for example, flow attrs.
>>>
>>>     You say that such mistakes have to be "just fixed"
>>>     as any other bugs. Right. But how much time will
>>>     the programmer spend to identify the bugs?
>>>
>>>     If the PMDs do all the checks (as with attributes),
>>>     the hypothetical bug will manifest itself much
>>>     earlier. That will simplify debugging by a lot...
>>>
>>>     So, my point is that it's still better to ensure
>>>     that new flow primitives have all necessary
>>>     checks in place. For attributes, it is
>>>     required to add them separately.
>>
>> If flow insertion is done in a fast path,
>> such checks may be skipped.
>
> The idea is that all rules in this table will share the same configuration,
> there is no reason to say everything again for each rule. This is why
> the rule attributes were moved to the table struct and not per rule.
>
>>
>>>     For items, as I explained, it might not be necessary
>>>     in the majority of cases simply because of the
>>>     switch (item->type) { case } structure.
>>>
>>> So, these are some of my points to explain why the
>>> attribute approach is untenable. To me, attributes
>>> are something global, which demands checks in all
>>> flow-capable PMDs. Items seem better because they
>>> are don't cares to all PMDs which are unaware of
>>> the async concept. So, even if someone does not
>>> implement the async concept or does not like
>>> the new item names, they can turn a blind
>>> eye to this - with attributes, thay can't.
>>>
>
> Good point,
> Maybe we should add hints in the attribute,
> for example, hint_only_wire in this case it will be clear that
> PMD may ignore this, and it should be fully documented that this is not a mandatory field.
> What do you think?

Theoretically, making terminology softer (like with the word "hint")
could make things easier for vendors who may find the new feature
confusing or something like that. But if, in reality, this hint
is indeed another match criterion (see my comments above), then
in no event shall the prefix "hint" be an excuse for this
criterion not being expressed as a pattern item.

Please hear me out: I don't mean to sound arrogant, - just trying
to understand why expressing the new bit as an item can't be
efficient enough for the async flow approach.

>
>>> Thank you.
>>
>>
>
>

Ivan
  
Ori Kam Sept. 20, 2022, 1:59 p.m. UTC | #19
Hi Ivan,

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Tuesday, 20 September 2022 15:46
> 
> Hi Ori,
> 
> On Tue, 20 Sep 2022, Ori Kam wrote:
> 
> > Hi Ivan, Thomas and Rongwei
> >
> >> -----Original Message-----
> >> From: Thomas Monjalon <thomas@monjalon.net>
> >> Sent: Thursday, 15 September 2022 14:16
> >>
> >> 15/09/2022 12:59, Ivan Malov:
> >>> Hi Rongwei,
> >>>
> >>> In this reply, I do not include the previous mail because the amount
> >>> of inline commentary has gone haywire over the past couple of days.
> >>> Let's re-iterate.
> >>>
> >>> But before I get to that, I'd like to offer a fresh perspective:
> >>>
> >>> Perhaps, if we all agree that term "vport" means an endpoint which
> >>> can stand for any "port" except for physical one, then it should
> >>> be possible to use term ANY_VPORTS rather than ANY_GUEST_PORTS.
> >>
> >> The opposite of "physical" is "virtual" indeed.
> >>
> >>> But that's tricky, of course. I don't have a way with naming,
> >>> so more opinions are welcome and very-very desirable here.
> >>>
> >>> So:
> >>>
> >>> 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
> >>>     primitives are in fact yet another match criteria?
> >>>
> >>>     ..
> >>>
> >>>     To me, it looks so. If they are match criteria, then they belong
> >>>     in match pattern, that is, they should be expressed as new items.
> >>>
> >>>     For "transfer" rules, the *existing* attributes are: "group"
> >>>     and "priority". As you may note, these are clearly not match
> >>>     criteria. They control the look-up order. So, to this day,
> >>>     there're no match criteria in DPDK expressed as attributes.
> >>>
> >>>     If these "wire_orig" / "vf_orig" are going to be introduced
> >>>     as attributes, that should be backed with strong motivation.
> >>
> >> I prefer we keep matching in a single place, not in attributes.
> >>
> >
> > I think we are talking about two different features.
> > Feature 1:
> > Allow matching on all vports that are not wire
> > Feature 2:
> > Save allocation space and allow fast insertion.
> > In this case, the matching is not on all vports it can be just part of the vports
> > but it will never be the wire port.
> > For example:
> > port 0 - wire
> > ports 1,2,3,4,5  - vports
> > the application want to inset only those rules:
> > represented_port(port_id=2) / eth / ipv4 (src==xx)
> > represented_port(port_id=4) / eth / ipv4 (src==xx)
> > represented_port(port_id=4) / eth / ipv4 (src==yy)
> >
> > For feature 1 I fully agree with you Ivan, this should be added as an item.
> 
> Thank you.
> 
> > For feature 2 I think Rongwei's suggestion is the better option.
> > If I understand correctly the idea is to give hint to the PMD on where to
> allocate memory
> > and how to insert the rules most optimally. Since this is shared for all rules it
> makes more sense
> > to add it as an attribute, just like we don’t have an ingress item (maybe we
> should?)
> 
> But isn't pattern template also supposed to be shared for all rules
> in the table? I.e., the user creates an async flow table and submits
> a flow "shape" (which consists of attrs, pattern template and action
> template). So why should "giving a hint" via an item template be
> considered worse than doig so via an attribute?
> 

The same item template maybe used elsewhere, for example, the following
pattern  eth / ipv4(src, dst) / udp(sport, dport), can be used on number of different
tables.
I think that the main difference between us is that from my point of view this value is just
where to allocate resources / how to better insert the rule. It is not related to matching.
From Nvidia viewpoint we need this information so we can allocate the resource at the correct
place and avoid inserting duplication of rules.
I agree that by using the item we can get the same results, but it is incorrect since we are not matching on it.
Part of the idea of template API is to give as many hints as possible to the PMD so the insertion will be optimized.


> As for "ingress" item, - no, one should not add such. We have had
> many discussions concerning this bit in the past. Ingress/egress
> are non-transfer terms. They belong in the scope of vNIC / ethdev
> filtering, not to embedded switch rules.
> 
> In my opinion, in the embedded switch, one should either point to
> some precise switch ports (using REPRESENTOR / REPRESENTED items)
> or use another kind of item to refer to a "super set" of ports
> which have something in common ("all wire ports", "all NON-wire ports").
> 

But this is my point we don't want all wire ports or all NON-wire ports, we just know that in this table
we will have only non-wire / wire ports.

> >
> > Ivan we have the item RTE_FLOW_ITEM_TYPE_PF and
> RTE_FLOW_ITEM_TYPE_VF which are deprecated,
> > So do you want to un-deprecate them?
> 
> No. These items are deprecated because:
> 
> a) their names suggest that application knows whether an ethdev
>     sits on top of a PF or that the application has some
>     knowledge of existence of particular VFs, but in
>     reality applications should not be worried of
>     the underlying function type = to them, all
>     ethdevs are just representors of something,
>     and if the application needs to refer to
>     VFs (or other PFs, - doesn't matter), it
>     should do that via REPRESENTOR items;
> 
> b) such items would duplicate REPRESENTOR / REPRESENTED.
> 
Agree with everything you say. 

> >
> > To summarize, if PMD can use such an hint during rule creation and save
> memory, I vote
> > to allow it.
> > if the idea is to match on all vports then it should be an item.
> 
> But such a hint would effectively be a match criterion, too, right?
> So, in fact it's a combined use case: a match criterion which is
> flexible enough to be a "hint" = i.e. the PMD can see it when
> processing the pattern *template* and treat it as a hint.
>

Yes, but it is an implicit match, just like saying ingress. Egress it has meaning above the
matching. In addition, there is no reason to add extra item for each rule we create, just
to enable something that is fixed during the table creation.
Extra item in pattern template means extra item for each rule.
I know we can avoid this and optimize the code but why add something that no one needs
after table creation?

 
> >
> >>
> >>> 2) From your viewpoint, why items "ANY_PHYS_PORTS" and
> >> "ANY_VPORTS"
> >>>     won't do? Or, which problems do you think they may inflict?
> >>>
> >>>     ..
> >>>
> >>>     Previously, you explained why REPRESENTED_PORT would not
> >>>     fit your needs. And I understand your point: to async API,
> >>>     two pattern templates which both have item REPRESENTED_PORT
> >>>     in them cannot be clearly distinguished and are in fact the
> >>>     same set of criteria (provided that all other items are also
> >>>     the same and have the same masks). Templates are, well,
> >>>     templates (or shapes) of the rules to come later and
> >>>     do not include exact "spec" for the "ethdev_id".
> >>>     Got it.
> >>>
> >>>     But that's not going to be the case with items ANY_PHYS_PORTS and
> >>>     ANY_VPORTS, is it? In one async table template, the user submits
> >>>     item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
> >>>     In another template, the user submits item ANY_VPORTS to
> >>>     state that they want to match only traffic transmitted
> >>>     software endpoints (DPDK ethdevs, guest VFs, etc.)
> >>>     connected to the switch.
> >>>
> >>>     In this example, the PMD will clearly see that the two templates
> >>>     differ. So it will be able to allocate separate resources, each
> >>>     one "cutting one half of traffic" (as per your concept).
> >>>
> >>> 3) In your most recent response, you suggested that one might have
> >>>     had the attributes occupied for some other purposes. To me,
> >>>     they're not. Neither me nor my closest colleagues have
> >>>     any plans on them. When I advocate using item approach
> >>>     over the attribute approach, I do this to ensure
> >>>     a) clarity of the API contract and b) robustness.
> >
> > If something is shared for all rules in the same table, it should be a table
> > property.
> 
> But the whole pattern *template* is also a table property, isn't it?
> 

Like I said above the pattern template can be used in all domains that is why
there is a split between table and patter, in addition to that each table may have
number of pattern templates.

> >
> >>>
> >>> 4) Also, in your response, you suggested that I might have
> >>>     confused item mask and spec. That is not the case.
> >>>     If we agree, that switch domain ID is unneeded in
> >>>     the new items, then these items will have no
> >>>     fields in them (like item PF had not had any
> >>>     before it was deprecated).
> >>>
> >>>     No fields in new items => no field masks.
> >>>     So what's the problem then?
> >>>
> >>> 5) With regard to our talk about identifying the relationship
> >>>     between ethdevs and switch domains, you said that the user
> >>>     could know the difference from the very beginning:
> >>>     /sysfs/ .... /PF_BDF/sriov_num
> >>>
> >>>     That is true for the user who starts the application, but
> >>>     this knowledge is hard to obtain from the application
> >>>     perspective = it's hard to automate.
> >>>
> >>>     This is why ethdevs are able to advertise their domain IDs.
> >>>     And, as I explained, looking at domain ID to understand
> >>
> >> namely rte_eth_dev_info.switch_info.domain_id
> >>
> >>>     port relationship is valid, whilst looking at proxy IDs
> >>>     to achieve the same goal is not. Proxy port IDs only
> >>>     serve the purpose of finding an entry point for
> >>>     managing flows. That has slightly different
> >>>     meaning, but this subtle difference is important.
> >>
> >> There is also a concept of sibling ports
> >> to get all ports belonging to the same hardware.
> >>
> >>
> >>> 6) As for the confusion over the difference between fixing
> >>>     bugs and making the code robust by extra checks:
> >>>
> >>>     Yes, I agree that the programmer who writes the
> >>>     application must be intelligent enough to use
> >>>     flow primitives the proper way. Yes, the user
> >>>     who starts the application also should thread
> >>>     carefully. But that does not prevent some
> >>>     mistakes in other parts of code from
> >>>     corrupting various chunks of memory,
> >>>     including, for example, flow attrs.
> >>>
> >>>     You say that such mistakes have to be "just fixed"
> >>>     as any other bugs. Right. But how much time will
> >>>     the programmer spend to identify the bugs?
> >>>
> >>>     If the PMDs do all the checks (as with attributes),
> >>>     the hypothetical bug will manifest itself much
> >>>     earlier. That will simplify debugging by a lot...
> >>>
> >>>     So, my point is that it's still better to ensure
> >>>     that new flow primitives have all necessary
> >>>     checks in place. For attributes, it is
> >>>     required to add them separately.
> >>
> >> If flow insertion is done in a fast path,
> >> such checks may be skipped.
> >
> > The idea is that all rules in this table will share the same configuration,
> > there is no reason to say everything again for each rule. This is why
> > the rule attributes were moved to the table struct and not per rule.
> >
> >>
> >>>     For items, as I explained, it might not be necessary
> >>>     in the majority of cases simply because of the
> >>>     switch (item->type) { case } structure.
> >>>
> >>> So, these are some of my points to explain why the
> >>> attribute approach is untenable. To me, attributes
> >>> are something global, which demands checks in all
> >>> flow-capable PMDs. Items seem better because they
> >>> are don't cares to all PMDs which are unaware of
> >>> the async concept. So, even if someone does not
> >>> implement the async concept or does not like
> >>> the new item names, they can turn a blind
> >>> eye to this - with attributes, thay can't.
> >>>
> >
> > Good point,
> > Maybe we should add hints in the attribute,
> > for example, hint_only_wire in this case it will be clear that
> > PMD may ignore this, and it should be fully documented that this is not a
> mandatory field.
> > What do you think?
> 
> Theoretically, making terminology softer (like with the word "hint")
> could make things easier for vendors who may find the new feature
> confusing or something like that. But if, in reality, this hint
> is indeed another match criterion (see my comments above), then
> in no event shall the prefix "hint" be an excuse for this
> criterion not being expressed as a pattern item.
> 

Please see my response above. This is the point it is much more than matching.

> Please hear me out: I don't mean to sound arrogant, - just trying
> to understand why expressing the new bit as an item can't be
> efficient enough for the async flow approach.
>

I don't think you are arrogant, and I hope that you see that I do understand your comments.
saying that I hope I explained why I think it is better to have it as a table attribute and not as an
item. (We are not matching on it, this helps the PMD allocate the table at the best location and avoid
duplication of rules)  

If you wish, we can have a short phone call and discuss this.

Best,
Ori

 
> >
> >>> Thank you.
> >>
> >>
> >
> >
> 
> Ivan
  
Ivan Malov Sept. 20, 2022, 3:28 p.m. UTC | #20
Hi Ori,

On Tue, 20 Sep 2022, Ori Kam wrote:

> Hi Ivan,
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Tuesday, 20 September 2022 15:46
>>
>> Hi Ori,
>>
>> On Tue, 20 Sep 2022, Ori Kam wrote:
>>
>>> Hi Ivan, Thomas and Rongwei
>>>
>>>> -----Original Message-----
>>>> From: Thomas Monjalon <thomas@monjalon.net>
>>>> Sent: Thursday, 15 September 2022 14:16
>>>>
>>>> 15/09/2022 12:59, Ivan Malov:
>>>>> Hi Rongwei,
>>>>>
>>>>> In this reply, I do not include the previous mail because the amount
>>>>> of inline commentary has gone haywire over the past couple of days.
>>>>> Let's re-iterate.
>>>>>
>>>>> But before I get to that, I'd like to offer a fresh perspective:
>>>>>
>>>>> Perhaps, if we all agree that term "vport" means an endpoint which
>>>>> can stand for any "port" except for physical one, then it should
>>>>> be possible to use term ANY_VPORTS rather than ANY_GUEST_PORTS.
>>>>
>>>> The opposite of "physical" is "virtual" indeed.
>>>>
>>>>> But that's tricky, of course. I don't have a way with naming,
>>>>> so more opinions are welcome and very-very desirable here.
>>>>>
>>>>> So:
>>>>>
>>>>> 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
>>>>>     primitives are in fact yet another match criteria?
>>>>>
>>>>>     ..
>>>>>
>>>>>     To me, it looks so. If they are match criteria, then they belong
>>>>>     in match pattern, that is, they should be expressed as new items.
>>>>>
>>>>>     For "transfer" rules, the *existing* attributes are: "group"
>>>>>     and "priority". As you may note, these are clearly not match
>>>>>     criteria. They control the look-up order. So, to this day,
>>>>>     there're no match criteria in DPDK expressed as attributes.
>>>>>
>>>>>     If these "wire_orig" / "vf_orig" are going to be introduced
>>>>>     as attributes, that should be backed with strong motivation.
>>>>
>>>> I prefer we keep matching in a single place, not in attributes.
>>>>
>>>
>>> I think we are talking about two different features.
>>> Feature 1:
>>> Allow matching on all vports that are not wire
>>> Feature 2:
>>> Save allocation space and allow fast insertion.
>>> In this case, the matching is not on all vports it can be just part of the vports
>>> but it will never be the wire port.
>>> For example:
>>> port 0 - wire
>>> ports 1,2,3,4,5  - vports
>>> the application want to inset only those rules:
>>> represented_port(port_id=2) / eth / ipv4 (src==xx)
>>> represented_port(port_id=4) / eth / ipv4 (src==xx)
>>> represented_port(port_id=4) / eth / ipv4 (src==yy)
>>>
>>> For feature 1 I fully agree with you Ivan, this should be added as an item.
>>
>> Thank you.
>>
>>> For feature 2 I think Rongwei's suggestion is the better option.
>>> If I understand correctly the idea is to give hint to the PMD on where to
>> allocate memory
>>> and how to insert the rules most optimally. Since this is shared for all rules it
>> makes more sense
>>> to add it as an attribute, just like we don’t have an ingress item (maybe we
>> should?)
>>
>> But isn't pattern template also supposed to be shared for all rules
>> in the table? I.e., the user creates an async flow table and submits
>> a flow "shape" (which consists of attrs, pattern template and action
>> template). So why should "giving a hint" via an item template be
>> considered worse than doig so via an attribute?
>>
>
> The same item template maybe used elsewhere, for example, the following
> pattern  eth / ipv4(src, dst) / udp(sport, dport), can be used on number of different
> tables.

In my understanding, the user may want to create flow table A
and use pattern template A' for it, which is as follows:

any_vports / eth / ipv4 / udp

The PMD can see this item and treat it exactly the same
way as it could treat such attribute ("where to allocate
resources, etc.").

Then the user may want to create flow table B and
use pattern template B' for it:

any_phy_ports / eth / ipv4 / udp

Once again, the PMD can clearly see the difference between
the A' and B' templates and, this time, allocate resources
the other way (as per efficiency requirements).

By saying "can be used on number of different tables", do you mean
that it is important to make the *network* part of the pattern
shareable between flow tables? I.e. are you saying that
templates A' and B' cause resource duplication just
because of the same *network* part in your case?

> I think that the main difference between us is that from my point of view this value is just
> where to allocate resources / how to better insert the rule. It is not related to matching.

To me, it *is* the match criterion which, at the same time, serves
as a value indicating the way how resources should be allocated.
But before all, it is a match criterion.

If it refers to a group of ports = in order to ditch "the other half"
of traffic from consideration (like Rongwei explained), then it
looks like a match criterion.

> From Nvidia viewpoint we need this information so we can allocate the resource at the correct
> place and avoid inserting duplication of rules.

I see.

> I agree that by using the item we can get the same results, but it is incorrect since we are not matching on it.

If one provides item UDP in the pattern and does not match on any UDP
fields, doing so nevertheless *is* matching on particular packet type.

The same seemingly goes for the new attribute / item. If it is
provided, then the user doesn't want the rule to affect
packets coming from certain ports (i.e. from wire).

So still sounds like matching.

> Part of the idea of template API is to give as many hints as possible to the PMD so the insertion will be optimized.

I see.

>
>
>> As for "ingress" item, - no, one should not add such. We have had
>> many discussions concerning this bit in the past. Ingress/egress
>> are non-transfer terms. They belong in the scope of vNIC / ethdev
>> filtering, not to embedded switch rules.
>>
>> In my opinion, in the embedded switch, one should either point to
>> some precise switch ports (using REPRESENTOR / REPRESENTED items)
>> or use another kind of item to refer to a "super set" of ports
>> which have something in common ("all wire ports", "all NON-wire ports").
>>
>
> But this is my point we don't want all wire ports or all NON-wire ports, we just know that in this table
> we will have only non-wire / wire ports.

But how do these two viewpoints contradict each other?

>
>>>
>>> Ivan we have the item RTE_FLOW_ITEM_TYPE_PF and
>> RTE_FLOW_ITEM_TYPE_VF which are deprecated,
>>> So do you want to un-deprecate them?
>>
>> No. These items are deprecated because:
>>
>> a) their names suggest that application knows whether an ethdev
>>     sits on top of a PF or that the application has some
>>     knowledge of existence of particular VFs, but in
>>     reality applications should not be worried of
>>     the underlying function type = to them, all
>>     ethdevs are just representors of something,
>>     and if the application needs to refer to
>>     VFs (or other PFs, - doesn't matter), it
>>     should do that via REPRESENTOR items;
>>
>> b) such items would duplicate REPRESENTOR / REPRESENTED.
>>
> Agree with everything you say.

Great we're on the same page regarding this bit.

>
>>>
>>> To summarize, if PMD can use such an hint during rule creation and save
>> memory, I vote
>>> to allow it.
>>> if the idea is to match on all vports then it should be an item.
>>
>> But such a hint would effectively be a match criterion, too, right?
>> So, in fact it's a combined use case: a match criterion which is
>> flexible enough to be a "hint" = i.e. the PMD can see it when
>> processing the pattern *template* and treat it as a hint.
>>
>
> Yes, but it is an implicit match, just like saying ingress. Egress it has meaning above the
> matching. In addition, there is no reason to add extra item for each rule we create, just
> to enable something that is fixed during the table creation.
> Extra item in pattern template means extra item for each rule.
> I know we can avoid this and optimize the code but why add something that no one needs
> after table creation?

Good question. But, in case some way exists to make such optimisation
laconic enough to avoid confusion etc., then it should be no problem
in preferring the pattern approach over attribute approach.

>
>
>>>
>>>>
>>>>> 2) From your viewpoint, why items "ANY_PHYS_PORTS" and
>>>> "ANY_VPORTS"
>>>>>     won't do? Or, which problems do you think they may inflict?
>>>>>
>>>>>     ..
>>>>>
>>>>>     Previously, you explained why REPRESENTED_PORT would not
>>>>>     fit your needs. And I understand your point: to async API,
>>>>>     two pattern templates which both have item REPRESENTED_PORT
>>>>>     in them cannot be clearly distinguished and are in fact the
>>>>>     same set of criteria (provided that all other items are also
>>>>>     the same and have the same masks). Templates are, well,
>>>>>     templates (or shapes) of the rules to come later and
>>>>>     do not include exact "spec" for the "ethdev_id".
>>>>>     Got it.
>>>>>
>>>>>     But that's not going to be the case with items ANY_PHYS_PORTS and
>>>>>     ANY_VPORTS, is it? In one async table template, the user submits
>>>>>     item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
>>>>>     In another template, the user submits item ANY_VPORTS to
>>>>>     state that they want to match only traffic transmitted
>>>>>     software endpoints (DPDK ethdevs, guest VFs, etc.)
>>>>>     connected to the switch.
>>>>>
>>>>>     In this example, the PMD will clearly see that the two templates
>>>>>     differ. So it will be able to allocate separate resources, each
>>>>>     one "cutting one half of traffic" (as per your concept).
>>>>>
>>>>> 3) In your most recent response, you suggested that one might have
>>>>>     had the attributes occupied for some other purposes. To me,
>>>>>     they're not. Neither me nor my closest colleagues have
>>>>>     any plans on them. When I advocate using item approach
>>>>>     over the attribute approach, I do this to ensure
>>>>>     a) clarity of the API contract and b) robustness.
>>>
>>> If something is shared for all rules in the same table, it should be a table
>>> property.
>>
>> But the whole pattern *template* is also a table property, isn't it?
>>
>
> Like I said above the pattern template can be used in all domains that is why
> there is a split between table and patter, in addition to that each table may have
> number of pattern templates.

This is a valuable clarification. However, even if the attribute way
may seem OK after this explanation, then I still don't understand
why it is required to add this attribute to the generic "struct
rte_flow_attr" and not just to the *table* attr.

Generic "struct rte_flow_attr" is used both for async and
sync (regular) approach. So why add something to generic
struct which is never going to make sense to sync flows?

>
>>>
>>>>>
>>>>> 4) Also, in your response, you suggested that I might have
>>>>>     confused item mask and spec. That is not the case.
>>>>>     If we agree, that switch domain ID is unneeded in
>>>>>     the new items, then these items will have no
>>>>>     fields in them (like item PF had not had any
>>>>>     before it was deprecated).
>>>>>
>>>>>     No fields in new items => no field masks.
>>>>>     So what's the problem then?
>>>>>
>>>>> 5) With regard to our talk about identifying the relationship
>>>>>     between ethdevs and switch domains, you said that the user
>>>>>     could know the difference from the very beginning:
>>>>>     /sysfs/ .... /PF_BDF/sriov_num
>>>>>
>>>>>     That is true for the user who starts the application, but
>>>>>     this knowledge is hard to obtain from the application
>>>>>     perspective = it's hard to automate.
>>>>>
>>>>>     This is why ethdevs are able to advertise their domain IDs.
>>>>>     And, as I explained, looking at domain ID to understand
>>>>
>>>> namely rte_eth_dev_info.switch_info.domain_id
>>>>
>>>>>     port relationship is valid, whilst looking at proxy IDs
>>>>>     to achieve the same goal is not. Proxy port IDs only
>>>>>     serve the purpose of finding an entry point for
>>>>>     managing flows. That has slightly different
>>>>>     meaning, but this subtle difference is important.
>>>>
>>>> There is also a concept of sibling ports
>>>> to get all ports belonging to the same hardware.
>>>>
>>>>
>>>>> 6) As for the confusion over the difference between fixing
>>>>>     bugs and making the code robust by extra checks:
>>>>>
>>>>>     Yes, I agree that the programmer who writes the
>>>>>     application must be intelligent enough to use
>>>>>     flow primitives the proper way. Yes, the user
>>>>>     who starts the application also should thread
>>>>>     carefully. But that does not prevent some
>>>>>     mistakes in other parts of code from
>>>>>     corrupting various chunks of memory,
>>>>>     including, for example, flow attrs.
>>>>>
>>>>>     You say that such mistakes have to be "just fixed"
>>>>>     as any other bugs. Right. But how much time will
>>>>>     the programmer spend to identify the bugs?
>>>>>
>>>>>     If the PMDs do all the checks (as with attributes),
>>>>>     the hypothetical bug will manifest itself much
>>>>>     earlier. That will simplify debugging by a lot...
>>>>>
>>>>>     So, my point is that it's still better to ensure
>>>>>     that new flow primitives have all necessary
>>>>>     checks in place. For attributes, it is
>>>>>     required to add them separately.
>>>>
>>>> If flow insertion is done in a fast path,
>>>> such checks may be skipped.
>>>
>>> The idea is that all rules in this table will share the same configuration,
>>> there is no reason to say everything again for each rule. This is why
>>> the rule attributes were moved to the table struct and not per rule.
>>>
>>>>
>>>>>     For items, as I explained, it might not be necessary
>>>>>     in the majority of cases simply because of the
>>>>>     switch (item->type) { case } structure.
>>>>>
>>>>> So, these are some of my points to explain why the
>>>>> attribute approach is untenable. To me, attributes
>>>>> are something global, which demands checks in all
>>>>> flow-capable PMDs. Items seem better because they
>>>>> are don't cares to all PMDs which are unaware of
>>>>> the async concept. So, even if someone does not
>>>>> implement the async concept or does not like
>>>>> the new item names, they can turn a blind
>>>>> eye to this - with attributes, thay can't.
>>>>>
>>>
>>> Good point,
>>> Maybe we should add hints in the attribute,
>>> for example, hint_only_wire in this case it will be clear that
>>> PMD may ignore this, and it should be fully documented that this is not a
>> mandatory field.
>>> What do you think?
>>
>> Theoretically, making terminology softer (like with the word "hint")
>> could make things easier for vendors who may find the new feature
>> confusing or something like that. But if, in reality, this hint
>> is indeed another match criterion (see my comments above), then
>> in no event shall the prefix "hint" be an excuse for this
>> criterion not being expressed as a pattern item.
>>
>
> Please see my response above. This is the point it is much more than matching.
>
>> Please hear me out: I don't mean to sound arrogant, - just trying
>> to understand why expressing the new bit as an item can't be
>> efficient enough for the async flow approach.
>>
>
> I don't think you are arrogant, and I hope that you see that I do understand your comments.
> saying that I hope I explained why I think it is better to have it as a table attribute and not as an
> item. (We are not matching on it, this helps the PMD allocate the table at the best location and avoid
> duplication of rules)
>
> If you wish, we can have a short phone call and discuss this.
>
> Best,
> Ori
>
>
>>>
>>>>> Thank you.
>>>>
>>>>
>>>
>>>
>>
>> Ivan
>

Thanks,
Ivan
  
Ori Kam Sept. 21, 2022, 7:34 a.m. UTC | #21
Hi Ivan,

PSB my comments.

In any case, I'm afraid we are in a deadlock.
I understand your viewpoint, I don't think it is the correct
one for the feature suggested here.
For all the reasons I listed.

So from my viewpoint, the patch is Acked.
If you wish as I suggested before, we can have a meeting with
Rongwei and anyone else who is interested and close this subject.

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Tuesday, 20 September 2022 18:28
RE: [PATCH v1] ethdev: add direction info when creating the transfer
> table
> 
> Hi Ori,
> 
> On Tue, 20 Sep 2022, Ori Kam wrote:
> 
> > Hi Ivan,
> >
> >> -----Original Message-----
> >> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >> Sent: Tuesday, 20 September 2022 15:46
> >>
> >> Hi Ori,
> >>
> >> On Tue, 20 Sep 2022, Ori Kam wrote:
> >>
> >>> Hi Ivan, Thomas and Rongwei
> >>>
> >>>> -----Original Message-----
> >>>> From: Thomas Monjalon <thomas@monjalon.net>
> >>>> Sent: Thursday, 15 September 2022 14:16
> >>>>
> >>>> 15/09/2022 12:59, Ivan Malov:
> >>>>> Hi Rongwei,
> >>>>>
> >>>>> In this reply, I do not include the previous mail because the amount
> >>>>> of inline commentary has gone haywire over the past couple of days.
> >>>>> Let's re-iterate.
> >>>>>
> >>>>> But before I get to that, I'd like to offer a fresh perspective:
> >>>>>
> >>>>> Perhaps, if we all agree that term "vport" means an endpoint which
> >>>>> can stand for any "port" except for physical one, then it should
> >>>>> be possible to use term ANY_VPORTS rather than
> ANY_GUEST_PORTS.
> >>>>
> >>>> The opposite of "physical" is "virtual" indeed.
> >>>>
> >>>>> But that's tricky, of course. I don't have a way with naming,
> >>>>> so more opinions are welcome and very-very desirable here.
> >>>>>
> >>>>> So:
> >>>>>
> >>>>> 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
> >>>>>     primitives are in fact yet another match criteria?
> >>>>>
> >>>>>     ..
> >>>>>
> >>>>>     To me, it looks so. If they are match criteria, then they belong
> >>>>>     in match pattern, that is, they should be expressed as new items.
> >>>>>
> >>>>>     For "transfer" rules, the *existing* attributes are: "group"
> >>>>>     and "priority". As you may note, these are clearly not match
> >>>>>     criteria. They control the look-up order. So, to this day,
> >>>>>     there're no match criteria in DPDK expressed as attributes.
> >>>>>
> >>>>>     If these "wire_orig" / "vf_orig" are going to be introduced
> >>>>>     as attributes, that should be backed with strong motivation.
> >>>>
> >>>> I prefer we keep matching in a single place, not in attributes.
> >>>>
> >>>
> >>> I think we are talking about two different features.
> >>> Feature 1:
> >>> Allow matching on all vports that are not wire
> >>> Feature 2:
> >>> Save allocation space and allow fast insertion.
> >>> In this case, the matching is not on all vports it can be just part of the
> vports
> >>> but it will never be the wire port.
> >>> For example:
> >>> port 0 - wire
> >>> ports 1,2,3,4,5  - vports
> >>> the application want to inset only those rules:
> >>> represented_port(port_id=2) / eth / ipv4 (src==xx)
> >>> represented_port(port_id=4) / eth / ipv4 (src==xx)
> >>> represented_port(port_id=4) / eth / ipv4 (src==yy)
> >>>
> >>> For feature 1 I fully agree with you Ivan, this should be added as an item.
> >>
> >> Thank you.
> >>
> >>> For feature 2 I think Rongwei's suggestion is the better option.
> >>> If I understand correctly the idea is to give hint to the PMD on where to
> >> allocate memory
> >>> and how to insert the rules most optimally. Since this is shared for all
> rules it
> >> makes more sense
> >>> to add it as an attribute, just like we don’t have an ingress item (maybe
> we
> >> should?)
> >>
> >> But isn't pattern template also supposed to be shared for all rules
> >> in the table? I.e., the user creates an async flow table and submits
> >> a flow "shape" (which consists of attrs, pattern template and action
> >> template). So why should "giving a hint" via an item template be
> >> considered worse than doig so via an attribute?
> >>
> >
> > The same item template maybe used elsewhere, for example, the
> following
> > pattern  eth / ipv4(src, dst) / udp(sport, dport), can be used on number of
> different
> > tables.
> 
> In my understanding, the user may want to create flow table A
> and use pattern template A' for it, which is as follows:
> 
> any_vports / eth / ipv4 / udp
> 
> The PMD can see this item and treat it exactly the same
> way as it could treat such attribute ("where to allocate
> resources, etc.").
> 
> Then the user may want to create flow table B and
> use pattern template B' for it:
> 
> any_phy_ports / eth / ipv4 / udp
> 
> Once again, the PMD can clearly see the difference between
> the A' and B' templates and, this time, allocate resources
> the other way (as per efficiency requirements).
> 

Yes, but again you select all vports, this is not what the application wants
the application wants to insert the following rules:
Assuming port 0 is wire and DPDK ports 1,2,3,4,5 are vports.
Represented_port(id=2) / eth / ipv4/ udp
Represented_port(id=5) / eth / ipv4/ udp

As you can see the application doesn’t want all ports just some vports but for sure not the
wire port.

I agree that we can go with your approach, but it isn't correct since why application should
insert:
any_phy_ports / Represented_port(id=2) / eth / ipv4/ udp
any_phy_ports / Represented_port(id=2) / eth / ipv4/ udp

> By saying "can be used on number of different tables", do you mean
> that it is important to make the *network* part of the pattern
> shareable between flow tables? I.e. are you saying that
> templates A' and B' cause resource duplication just
> because of the same *network* part in your case?
> 

I'm saying that adding will mean that the application can't reuse the pattern template it created.
if the application created 5 tuple template.
It can reuse it in ingress tables, egress tables, FDB tables there is no need to create
extra pattern templates.

> > I think that the main difference between us is that from my point of view
> this value is just
> > where to allocate resources / how to better insert the rule. It is not related
> to matching.
> 
> To me, it *is* the match criterion which, at the same time, serves
> as a value indicating the way how resources should be allocated.
> But before all, it is a match criterion.
> 

Depends on how you define matching, but just like ingress / egress is not matching 
the same goes here.

> If it refers to a group of ports = in order to ditch "the other half"
> of traffic from consideration (like Rongwei explained), then it
> looks like a match criterion.
> 

See above comment, also this case it relates to allocation and insertion,
the matching is side product.

> > From Nvidia viewpoint we need this information so we can allocate the
> resource at the correct
> > place and avoid inserting duplication of rules.
> 
> I see.
> 
> > I agree that by using the item we can get the same results, but it is incorrect
> since we are not matching on it.
> 
> If one provides item UDP in the pattern and does not match on any UDP
> fields, doing so nevertheless *is* matching on particular packet type.
> 
Yes it matches all UDP will again in this case the idea is not to match all vports
but just to tell the PMD that there will be only vports arriving to this table. 

> The same seemingly goes for the new attribute / item. If it is
> provided, then the user doesn't want the rule to affect
> packets coming from certain ports (i.e. from wire).
> 
> So still sounds like matching.
> 
> > Part of the idea of template API is to give as many hints as possible to the
> PMD so the insertion will be optimized.
> 
> I see.
> 
> >
> >
> >> As for "ingress" item, - no, one should not add such. We have had
> >> many discussions concerning this bit in the past. Ingress/egress
> >> are non-transfer terms. They belong in the scope of vNIC / ethdev
> >> filtering, not to embedded switch rules.
> >>
> >> In my opinion, in the embedded switch, one should either point to
> >> some precise switch ports (using REPRESENTOR / REPRESENTED items)
> >> or use another kind of item to refer to a "super set" of ports
> >> which have something in common ("all wire ports", "all NON-wire ports").
> >>
> >
> > But this is my point we don't want all wire ports or all NON-wire ports, we
> just know that in this table
> > we will have only non-wire / wire ports.
> 
> But how do these two viewpoints contradict each other?
> 

Which viewpoints?

> >
> >>>
> >>> Ivan we have the item RTE_FLOW_ITEM_TYPE_PF and
> >> RTE_FLOW_ITEM_TYPE_VF which are deprecated,
> >>> So do you want to un-deprecate them?
> >>
> >> No. These items are deprecated because:
> >>
> >> a) their names suggest that application knows whether an ethdev
> >>     sits on top of a PF or that the application has some
> >>     knowledge of existence of particular VFs, but in
> >>     reality applications should not be worried of
> >>     the underlying function type = to them, all
> >>     ethdevs are just representors of something,
> >>     and if the application needs to refer to
> >>     VFs (or other PFs, - doesn't matter), it
> >>     should do that via REPRESENTOR items;
> >>
> >> b) such items would duplicate REPRESENTOR / REPRESENTED.
> >>
> > Agree with everything you say.
> 
> Great we're on the same page regarding this bit.
> 
> >
> >>>
> >>> To summarize, if PMD can use such an hint during rule creation and save
> >> memory, I vote
> >>> to allow it.
> >>> if the idea is to match on all vports then it should be an item.
> >>
> >> But such a hint would effectively be a match criterion, too, right?
> >> So, in fact it's a combined use case: a match criterion which is
> >> flexible enough to be a "hint" = i.e. the PMD can see it when
> >> processing the pattern *template* and treat it as a hint.
> >>
> >
> > Yes, but it is an implicit match, just like saying ingress. Egress it has meaning
> above the
> > matching. In addition, there is no reason to add extra item for each rule we
> create, just
> > to enable something that is fixed during the table creation.
> > Extra item in pattern template means extra item for each rule.
> > I know we can avoid this and optimize the code but why add something
> that no one needs
> > after table creation?
> 
> Good question. But, in case some way exists to make such optimisation
> laconic enough to avoid confusion etc., then it should be no problem
> in preferring the pattern approach over attribute approach.
> 
> >
> >
> >>>
> >>>>
> >>>>> 2) From your viewpoint, why items "ANY_PHYS_PORTS" and
> >>>> "ANY_VPORTS"
> >>>>>     won't do? Or, which problems do you think they may inflict?
> >>>>>
> >>>>>     ..
> >>>>>
> >>>>>     Previously, you explained why REPRESENTED_PORT would not
> >>>>>     fit your needs. And I understand your point: to async API,
> >>>>>     two pattern templates which both have item REPRESENTED_PORT
> >>>>>     in them cannot be clearly distinguished and are in fact the
> >>>>>     same set of criteria (provided that all other items are also
> >>>>>     the same and have the same masks). Templates are, well,
> >>>>>     templates (or shapes) of the rules to come later and
> >>>>>     do not include exact "spec" for the "ethdev_id".
> >>>>>     Got it.
> >>>>>
> >>>>>     But that's not going to be the case with items ANY_PHYS_PORTS
> and
> >>>>>     ANY_VPORTS, is it? In one async table template, the user submits
> >>>>>     item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
> >>>>>     In another template, the user submits item ANY_VPORTS to
> >>>>>     state that they want to match only traffic transmitted
> >>>>>     software endpoints (DPDK ethdevs, guest VFs, etc.)
> >>>>>     connected to the switch.
> >>>>>
> >>>>>     In this example, the PMD will clearly see that the two templates
> >>>>>     differ. So it will be able to allocate separate resources, each
> >>>>>     one "cutting one half of traffic" (as per your concept).
> >>>>>
> >>>>> 3) In your most recent response, you suggested that one might have
> >>>>>     had the attributes occupied for some other purposes. To me,
> >>>>>     they're not. Neither me nor my closest colleagues have
> >>>>>     any plans on them. When I advocate using item approach
> >>>>>     over the attribute approach, I do this to ensure
> >>>>>     a) clarity of the API contract and b) robustness.
> >>>
> >>> If something is shared for all rules in the same table, it should be a table
> >>> property.
> >>
> >> But the whole pattern *template* is also a table property, isn't it?
> >>
> >
> > Like I said above the pattern template can be used in all domains that is
> why
> > there is a split between table and patter, in addition to that each table may
> have
> > number of pattern templates.
> 
> This is a valuable clarification. However, even if the attribute way
> may seem OK after this explanation, then I still don't understand
> why it is required to add this attribute to the generic "struct
> rte_flow_attr" and not just to the *table* attr.
> 
> Generic "struct rte_flow_attr" is used both for async and
> sync (regular) approach. So why add something to generic
> struct which is never going to make sense to sync flows?
> 

I guess we can move it to the table attribute, but I think that even 
in standard rte_flow API this can save duplicate insertion.

> >
> >>>
> >>>>>
> >>>>> 4) Also, in your response, you suggested that I might have
> >>>>>     confused item mask and spec. That is not the case.
> >>>>>     If we agree, that switch domain ID is unneeded in
> >>>>>     the new items, then these items will have no
> >>>>>     fields in them (like item PF had not had any
> >>>>>     before it was deprecated).
> >>>>>
> >>>>>     No fields in new items => no field masks.
> >>>>>     So what's the problem then?
> >>>>>
> >>>>> 5) With regard to our talk about identifying the relationship
> >>>>>     between ethdevs and switch domains, you said that the user
> >>>>>     could know the difference from the very beginning:
> >>>>>     /sysfs/ .... /PF_BDF/sriov_num
> >>>>>
> >>>>>     That is true for the user who starts the application, but
> >>>>>     this knowledge is hard to obtain from the application
> >>>>>     perspective = it's hard to automate.
> >>>>>
> >>>>>     This is why ethdevs are able to advertise their domain IDs.
> >>>>>     And, as I explained, looking at domain ID to understand
> >>>>
> >>>> namely rte_eth_dev_info.switch_info.domain_id
> >>>>
> >>>>>     port relationship is valid, whilst looking at proxy IDs
> >>>>>     to achieve the same goal is not. Proxy port IDs only
> >>>>>     serve the purpose of finding an entry point for
> >>>>>     managing flows. That has slightly different
> >>>>>     meaning, but this subtle difference is important.
> >>>>
> >>>> There is also a concept of sibling ports
> >>>> to get all ports belonging to the same hardware.
> >>>>
> >>>>
> >>>>> 6) As for the confusion over the difference between fixing
> >>>>>     bugs and making the code robust by extra checks:
> >>>>>
> >>>>>     Yes, I agree that the programmer who writes the
> >>>>>     application must be intelligent enough to use
> >>>>>     flow primitives the proper way. Yes, the user
> >>>>>     who starts the application also should thread
> >>>>>     carefully. But that does not prevent some
> >>>>>     mistakes in other parts of code from
> >>>>>     corrupting various chunks of memory,
> >>>>>     including, for example, flow attrs.
> >>>>>
> >>>>>     You say that such mistakes have to be "just fixed"
> >>>>>     as any other bugs. Right. But how much time will
> >>>>>     the programmer spend to identify the bugs?
> >>>>>
> >>>>>     If the PMDs do all the checks (as with attributes),
> >>>>>     the hypothetical bug will manifest itself much
> >>>>>     earlier. That will simplify debugging by a lot...
> >>>>>
> >>>>>     So, my point is that it's still better to ensure
> >>>>>     that new flow primitives have all necessary
> >>>>>     checks in place. For attributes, it is
> >>>>>     required to add them separately.
> >>>>
> >>>> If flow insertion is done in a fast path,
> >>>> such checks may be skipped.
> >>>
> >>> The idea is that all rules in this table will share the same configuration,
> >>> there is no reason to say everything again for each rule. This is why
> >>> the rule attributes were moved to the table struct and not per rule.
> >>>
> >>>>
> >>>>>     For items, as I explained, it might not be necessary
> >>>>>     in the majority of cases simply because of the
> >>>>>     switch (item->type) { case } structure.
> >>>>>
> >>>>> So, these are some of my points to explain why the
> >>>>> attribute approach is untenable. To me, attributes
> >>>>> are something global, which demands checks in all
> >>>>> flow-capable PMDs. Items seem better because they
> >>>>> are don't cares to all PMDs which are unaware of
> >>>>> the async concept. So, even if someone does not
> >>>>> implement the async concept or does not like
> >>>>> the new item names, they can turn a blind
> >>>>> eye to this - with attributes, thay can't.
> >>>>>
> >>>
> >>> Good point,
> >>> Maybe we should add hints in the attribute,
> >>> for example, hint_only_wire in this case it will be clear that
> >>> PMD may ignore this, and it should be fully documented that this is not a
> >> mandatory field.
> >>> What do you think?
> >>
> >> Theoretically, making terminology softer (like with the word "hint")
> >> could make things easier for vendors who may find the new feature
> >> confusing or something like that. But if, in reality, this hint
> >> is indeed another match criterion (see my comments above), then
> >> in no event shall the prefix "hint" be an excuse for this
> >> criterion not being expressed as a pattern item.
> >>
> >
> > Please see my response above. This is the point it is much more than
> matching.
> >
> >> Please hear me out: I don't mean to sound arrogant, - just trying
> >> to understand why expressing the new bit as an item can't be
> >> efficient enough for the async flow approach.
> >>
> >
> > I don't think you are arrogant, and I hope that you see that I do understand
> your comments.
> > saying that I hope I explained why I think it is better to have it as a table
> attribute and not as an
> > item. (We are not matching on it, this helps the PMD allocate the table at
> the best location and avoid
> > duplication of rules)
> >
> > If you wish, we can have a short phone call and discuss this.
> >
> > Best,
> > Ori
> >
> >
> >>>
> >>>>> Thank you.
> >>>>
> >>>>
> >>>
> >>>
> >>
> >> Ivan
> >
> 
> Thanks,
> Ivan
  
Andrew Rybchenko Sept. 21, 2022, 8:39 a.m. UTC | #22
Hi Ori,

On 9/21/22 10:34, Ori Kam wrote:
> Hi Ivan,
> 
> PSB my comments.
> 
> In any case, I'm afraid we are in a deadlock.
> I understand your viewpoint, I don't think it is the correct
> one for the feature suggested here.
> For all the reasons I listed.
> 
> So from my viewpoint, the patch is Acked.
> If you wish as I suggested before, we can have a meeting with
> Rongwei and anyone else who is interested and close this subject.
> 
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Tuesday, 20 September 2022 18:28
> RE: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> Hi Ori,
>>
>> On Tue, 20 Sep 2022, Ori Kam wrote:
>>
>>> Hi Ivan,
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>> Sent: Tuesday, 20 September 2022 15:46
>>>>
>>>> Hi Ori,
>>>>
>>>> On Tue, 20 Sep 2022, Ori Kam wrote:
>>>>
>>>>> Hi Ivan, Thomas and Rongwei
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Thomas Monjalon <thomas@monjalon.net>
>>>>>> Sent: Thursday, 15 September 2022 14:16
>>>>>>
>>>>>> 15/09/2022 12:59, Ivan Malov:
>>>>>>> Hi Rongwei,
>>>>>>>
>>>>>>> In this reply, I do not include the previous mail because the amount
>>>>>>> of inline commentary has gone haywire over the past couple of days.
>>>>>>> Let's re-iterate.
>>>>>>>
>>>>>>> But before I get to that, I'd like to offer a fresh perspective:
>>>>>>>
>>>>>>> Perhaps, if we all agree that term "vport" means an endpoint which
>>>>>>> can stand for any "port" except for physical one, then it should
>>>>>>> be possible to use term ANY_VPORTS rather than
>> ANY_GUEST_PORTS.
>>>>>>
>>>>>> The opposite of "physical" is "virtual" indeed.
>>>>>>
>>>>>>> But that's tricky, of course. I don't have a way with naming,
>>>>>>> so more opinions are welcome and very-very desirable here.
>>>>>>>
>>>>>>> So:
>>>>>>>
>>>>>>> 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
>>>>>>>      primitives are in fact yet another match criteria?
>>>>>>>
>>>>>>>      ..
>>>>>>>
>>>>>>>      To me, it looks so. If they are match criteria, then they belong
>>>>>>>      in match pattern, that is, they should be expressed as new items.
>>>>>>>
>>>>>>>      For "transfer" rules, the *existing* attributes are: "group"
>>>>>>>      and "priority". As you may note, these are clearly not match
>>>>>>>      criteria. They control the look-up order. So, to this day,
>>>>>>>      there're no match criteria in DPDK expressed as attributes.
>>>>>>>
>>>>>>>      If these "wire_orig" / "vf_orig" are going to be introduced
>>>>>>>      as attributes, that should be backed with strong motivation.
>>>>>>
>>>>>> I prefer we keep matching in a single place, not in attributes.
>>>>>>
>>>>>
>>>>> I think we are talking about two different features.
>>>>> Feature 1:
>>>>> Allow matching on all vports that are not wire

It is good that we share understanding here.
I.e. the feature is about matching.

>>>>> Feature 2:
>>>>> Save allocation space and allow fast insertion.
>>>>> In this case, the matching is not on all vports it can be just part of the
>> vports
>>>>> but it will never be the wire port.
>>>>> For example:
>>>>> port 0 - wire
>>>>> ports 1,2,3,4,5  - vports
>>>>> the application want to inset only those rules:
>>>>> represented_port(port_id=2) / eth / ipv4 (src==xx)
>>>>> represented_port(port_id=4) / eth / ipv4 (src==xx)
>>>>> represented_port(port_id=4) / eth / ipv4 (src==yy)
>>>>>
>>>>> For feature 1 I fully agree with you Ivan, this should be added as an item.
>>>>
>>>> Thank you.
>>>>
>>>>> For feature 2 I think Rongwei's suggestion is the better option.
>>>>> If I understand correctly the idea is to give hint to the PMD on where to
>>>> allocate memory
>>>>> and how to insert the rules most optimallySince this is shared for all
>> rules it
>>>> makes more sense
>>>>> to add it as an attribute, 

Hm, if I want to match on IPv4-UDP source port only in the
table, may I add an attribute for it? What does the direction
matching criteria so special to add an attribute for it?

Jokes aside. I perfectly realize that addition of a new
attribute is simple. It is simple from implementation point
of view. But it does not make it right from overall design
point of view. IMHO pattern is responsible for matching in
RTE flow API and all matching criteria should be there.

As for optimizations - I believe it is doable in a different
way. Just create a table and use flow rule with matching on
a direction and jump to the table. I guess you have everything
you need in the case.

 >>>>>
just like we don’t have an ingress item (maybe
>> we
>>>> should?)
>>>>
>>>> But isn't pattern template also supposed to be shared for all rules
>>>> in the table? I.e., the user creates an async flow table and submits
>>>> a flow "shape" (which consists of attrs, pattern template and action
>>>> template). So why should "giving a hint" via an item template be
>>>> considered worse than doig so via an attribute?
>>>>
>>>
>>> The same item template maybe used elsewhere, for example, the
>> following
>>> pattern  eth / ipv4(src, dst) / udp(sport, dport), can be used on number of
>> different
>>> tables.
>>
>> In my understanding, the user may want to create flow table A
>> and use pattern template A' for it, which is as follows:
>>
>> any_vports / eth / ipv4 / udp
>>
>> The PMD can see this item and treat it exactly the same
>> way as it could treat such attribute ("where to allocate
>> resources, etc.").
>>
>> Then the user may want to create flow table B and
>> use pattern template B' for it:
>>
>> any_phy_ports / eth / ipv4 / udp
>>
>> Once again, the PMD can clearly see the difference between
>> the A' and B' templates and, this time, allocate resources
>> the other way (as per efficiency requirements).
>>
> 
> Yes, but again you select all vports, this is not what the application wants
> the application wants to insert the following rules:
> Assuming port 0 is wire and DPDK ports 1,2,3,4,5 are vports.
> Represented_port(id=2) / eth / ipv4/ udp
> Represented_port(id=5) / eth / ipv4/ udp
> 
> As you can see the application doesn’t want all ports just some vports but for sure not the
> wire port.
> 
> I agree that we can go with your approach, but it isn't correct since why application should
> insert:
> any_phy_ports / Represented_port(id=2) / eth / ipv4/ udp
> any_phy_ports / Represented_port(id=2) / eth / ipv4/ udp
> 
>> By saying "can be used on number of different tables", do you mean
>> that it is important to make the *network* part of the pattern
>> shareable between flow tables? I.e. are you saying that
>> templates A' and B' cause resource duplication just
>> because of the same *network* part in your case?
>>
> 
> I'm saying that adding will mean that the application can't reuse the pattern template it created.
> if the application created 5 tuple template.
> It can reuse it in ingress tables, egress tables, FDB tables there is no need to create
> extra pattern templates.
> 
>>> I think that the main difference between us is that from my point of view
>> this value is just
>>> where to allocate resources / how to better insert the rule. It is not related
>> to matching.
>>
>> To me, it *is* the match criterion which, at the same time, serves
>> as a value indicating the way how resources should be allocated.
>> But before all, it is a match criterion.
>>
> 
> Depends on how you define matching, but just like ingress / egress is not matching
> the same goes here.
> 
>> If it refers to a group of ports = in order to ditch "the other half"
>> of traffic from consideration (like Rongwei explained), then it
>> looks like a match criterion.
>>
> 
> See above comment, also this case it relates to allocation and insertion,
> the matching is side product.
> 
>>>  From Nvidia viewpoint we need this information so we can allocate the
>> resource at the correct
>>> place and avoid inserting duplication of rules.
>>
>> I see.
>>
>>> I agree that by using the item we can get the same results, but it is incorrect
>> since we are not matching on it.
>>
>> If one provides item UDP in the pattern and does not match on any UDP
>> fields, doing so nevertheless *is* matching on particular packet type.
>>
> Yes it matches all UDP will again in this case the idea is not to match all vports
> but just to tell the PMD that there will be only vports arriving to this table.
> 
>> The same seemingly goes for the new attribute / item. If it is
>> provided, then the user doesn't want the rule to affect
>> packets coming from certain ports (i.e. from wire).
>>
>> So still sounds like matching.
>>
>>> Part of the idea of template API is to give as many hints as possible to the
>> PMD so the insertion will be optimized.
>>
>> I see.
>>
>>>
>>>
>>>> As for "ingress" item, - no, one should not add such. We have had
>>>> many discussions concerning this bit in the past. Ingress/egress
>>>> are non-transfer terms. They belong in the scope of vNIC / ethdev
>>>> filtering, not to embedded switch rules.
>>>>
>>>> In my opinion, in the embedded switch, one should either point to
>>>> some precise switch ports (using REPRESENTOR / REPRESENTED items)
>>>> or use another kind of item to refer to a "super set" of ports
>>>> which have something in common ("all wire ports", "all NON-wire ports").
>>>>
>>>
>>> But this is my point we don't want all wire ports or all NON-wire ports, we
>> just know that in this table
>>> we will have only non-wire / wire ports.
>>
>> But how do these two viewpoints contradict each other?
>>
> 
> Which viewpoints?
> 
>>>
>>>>>
>>>>> Ivan we have the item RTE_FLOW_ITEM_TYPE_PF and
>>>> RTE_FLOW_ITEM_TYPE_VF which are deprecated,
>>>>> So do you want to un-deprecate them?
>>>>
>>>> No. These items are deprecated because:
>>>>
>>>> a) their names suggest that application knows whether an ethdev
>>>>      sits on top of a PF or that the application has some
>>>>      knowledge of existence of particular VFs, but in
>>>>      reality applications should not be worried of
>>>>      the underlying function type = to them, all
>>>>      ethdevs are just representors of something,
>>>>      and if the application needs to refer to
>>>>      VFs (or other PFs, - doesn't matter), it
>>>>      should do that via REPRESENTOR items;
>>>>
>>>> b) such items would duplicate REPRESENTOR / REPRESENTED.
>>>>
>>> Agree with everything you say.
>>
>> Great we're on the same page regarding this bit.
>>
>>>
>>>>>
>>>>> To summarize, if PMD can use such an hint during rule creation and save
>>>> memory, I vote
>>>>> to allow it.
>>>>> if the idea is to match on all vports then it should be an item.
>>>>
>>>> But such a hint would effectively be a match criterion, too, right?
>>>> So, in fact it's a combined use case: a match criterion which is
>>>> flexible enough to be a "hint" = i.e. the PMD can see it when
>>>> processing the pattern *template* and treat it as a hint.
>>>>
>>>
>>> Yes, but it is an implicit match, just like saying ingress. Egress it has meaning
>> above the
>>> matching. In addition, there is no reason to add extra item for each rule we
>> create, just
>>> to enable something that is fixed during the table creation.
>>> Extra item in pattern template means extra item for each rule.
>>> I know we can avoid this and optimize the code but why add something
>> that no one needs
>>> after table creation?
>>
>> Good question. But, in case some way exists to make such optimisation
>> laconic enough to avoid confusion etc., then it should be no problem
>> in preferring the pattern approach over attribute approach.
>>
>>>
>>>
>>>>>
>>>>>>
>>>>>>> 2) From your viewpoint, why items "ANY_PHYS_PORTS" and
>>>>>> "ANY_VPORTS"
>>>>>>>      won't do? Or, which problems do you think they may inflict?
>>>>>>>
>>>>>>>      ..
>>>>>>>
>>>>>>>      Previously, you explained why REPRESENTED_PORT would not
>>>>>>>      fit your needs. And I understand your point: to async API,
>>>>>>>      two pattern templates which both have item REPRESENTED_PORT
>>>>>>>      in them cannot be clearly distinguished and are in fact the
>>>>>>>      same set of criteria (provided that all other items are also
>>>>>>>      the same and have the same masks). Templates are, well,
>>>>>>>      templates (or shapes) of the rules to come later and
>>>>>>>      do not include exact "spec" for the "ethdev_id".
>>>>>>>      Got it.
>>>>>>>
>>>>>>>      But that's not going to be the case with items ANY_PHYS_PORTS
>> and
>>>>>>>      ANY_VPORTS, is it? In one async table template, the user submits
>>>>>>>      item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
>>>>>>>      In another template, the user submits item ANY_VPORTS to
>>>>>>>      state that they want to match only traffic transmitted
>>>>>>>      software endpoints (DPDK ethdevs, guest VFs, etc.)
>>>>>>>      connected to the switch.
>>>>>>>
>>>>>>>      In this example, the PMD will clearly see that the two templates
>>>>>>>      differ. So it will be able to allocate separate resources, each
>>>>>>>      one "cutting one half of traffic" (as per your concept).
>>>>>>>
>>>>>>> 3) In your most recent response, you suggested that one might have
>>>>>>>      had the attributes occupied for some other purposes. To me,
>>>>>>>      they're not. Neither me nor my closest colleagues have
>>>>>>>      any plans on them. When I advocate using item approach
>>>>>>>      over the attribute approach, I do this to ensure
>>>>>>>      a) clarity of the API contract and b) robustness.
>>>>>
>>>>> If something is shared for all rules in the same table, it should be a table
>>>>> property.
>>>>
>>>> But the whole pattern *template* is also a table property, isn't it?
>>>>
>>>
>>> Like I said above the pattern template can be used in all domains that is
>> why
>>> there is a split between table and patter, in addition to that each table may
>> have
>>> number of pattern templates.
>>
>> This is a valuable clarification. However, even if the attribute way
>> may seem OK after this explanation, then I still don't understand
>> why it is required to add this attribute to the generic "struct
>> rte_flow_attr" and not just to the *table* attr.
>>
>> Generic "struct rte_flow_attr" is used both for async and
>> sync (regular) approach. So why add something to generic
>> struct which is never going to make sense to sync flows?
>>
> 
> I guess we can move it to the table attribute, but I think that even
> in standard rte_flow API this can save duplicate insertion.
> 
>>>
>>>>>
>>>>>>>
>>>>>>> 4) Also, in your response, you suggested that I might have
>>>>>>>      confused item mask and spec. That is not the case.
>>>>>>>      If we agree, that switch domain ID is unneeded in
>>>>>>>      the new items, then these items will have no
>>>>>>>      fields in them (like item PF had not had any
>>>>>>>      before it was deprecated).
>>>>>>>
>>>>>>>      No fields in new items => no field masks.
>>>>>>>      So what's the problem then?
>>>>>>>
>>>>>>> 5) With regard to our talk about identifying the relationship
>>>>>>>      between ethdevs and switch domains, you said that the user
>>>>>>>      could know the difference from the very beginning:
>>>>>>>      /sysfs/ .... /PF_BDF/sriov_num
>>>>>>>
>>>>>>>      That is true for the user who starts the application, but
>>>>>>>      this knowledge is hard to obtain from the application
>>>>>>>      perspective = it's hard to automate.
>>>>>>>
>>>>>>>      This is why ethdevs are able to advertise their domain IDs.
>>>>>>>      And, as I explained, looking at domain ID to understand
>>>>>>
>>>>>> namely rte_eth_dev_info.switch_info.domain_id
>>>>>>
>>>>>>>      port relationship is valid, whilst looking at proxy IDs
>>>>>>>      to achieve the same goal is not. Proxy port IDs only
>>>>>>>      serve the purpose of finding an entry point for
>>>>>>>      managing flows. That has slightly different
>>>>>>>      meaning, but this subtle difference is important.
>>>>>>
>>>>>> There is also a concept of sibling ports
>>>>>> to get all ports belonging to the same hardware.
>>>>>>
>>>>>>
>>>>>>> 6) As for the confusion over the difference between fixing
>>>>>>>      bugs and making the code robust by extra checks:
>>>>>>>
>>>>>>>      Yes, I agree that the programmer who writes the
>>>>>>>      application must be intelligent enough to use
>>>>>>>      flow primitives the proper way. Yes, the user
>>>>>>>      who starts the application also should thread
>>>>>>>      carefully. But that does not prevent some
>>>>>>>      mistakes in other parts of code from
>>>>>>>      corrupting various chunks of memory,
>>>>>>>      including, for example, flow attrs.
>>>>>>>
>>>>>>>      You say that such mistakes have to be "just fixed"
>>>>>>>      as any other bugs. Right. But how much time will
>>>>>>>      the programmer spend to identify the bugs?
>>>>>>>
>>>>>>>      If the PMDs do all the checks (as with attributes),
>>>>>>>      the hypothetical bug will manifest itself much
>>>>>>>      earlier. That will simplify debugging by a lot...
>>>>>>>
>>>>>>>      So, my point is that it's still better to ensure
>>>>>>>      that new flow primitives have all necessary
>>>>>>>      checks in place. For attributes, it is
>>>>>>>      required to add them separately.
>>>>>>
>>>>>> If flow insertion is done in a fast path,
>>>>>> such checks may be skipped.
>>>>>
>>>>> The idea is that all rules in this table will share the same configuration,
>>>>> there is no reason to say everything again for each rule. This is why
>>>>> the rule attributes were moved to the table struct and not per rule.
>>>>>
>>>>>>
>>>>>>>      For items, as I explained, it might not be necessary
>>>>>>>      in the majority of cases simply because of the
>>>>>>>      switch (item->type) { case } structure.
>>>>>>>
>>>>>>> So, these are some of my points to explain why the
>>>>>>> attribute approach is untenable. To me, attributes
>>>>>>> are something global, which demands checks in all
>>>>>>> flow-capable PMDs. Items seem better because they
>>>>>>> are don't cares to all PMDs which are unaware of
>>>>>>> the async concept. So, even if someone does not
>>>>>>> implement the async concept or does not like
>>>>>>> the new item names, they can turn a blind
>>>>>>> eye to this - with attributes, thay can't.
>>>>>>>
>>>>>
>>>>> Good point,
>>>>> Maybe we should add hints in the attribute,
>>>>> for example, hint_only_wire in this case it will be clear that
>>>>> PMD may ignore this, and it should be fully documented that this is not a
>>>> mandatory field.
>>>>> What do you think?
>>>>
>>>> Theoretically, making terminology softer (like with the word "hint")
>>>> could make things easier for vendors who may find the new feature
>>>> confusing or something like that. But if, in reality, this hint
>>>> is indeed another match criterion (see my comments above), then
>>>> in no event shall the prefix "hint" be an excuse for this
>>>> criterion not being expressed as a pattern item.
>>>>
>>>
>>> Please see my response above. This is the point it is much more than
>> matching.
>>>
>>>> Please hear me out: I don't mean to sound arrogant, - just trying
>>>> to understand why expressing the new bit as an item can't be
>>>> efficient enough for the async flow approach.
>>>>
>>>
>>> I don't think you are arrogant, and I hope that you see that I do understand
>> your comments.
>>> saying that I hope I explained why I think it is better to have it as a table
>> attribute and not as an
>>> item. (We are not matching on it, this helps the PMD allocate the table at
>> the best location and avoid
>>> duplication of rules)
>>>
>>> If you wish, we can have a short phone call and discuss this.
>>>
>>> Best,
>>> Ori
>>>
>>>
>>>>>
>>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Ivan
>>>
>>
>> Thanks,
>> Ivan
  
Ivan Malov Sept. 21, 2022, 9:04 a.m. UTC | #23
Hi Ori,

On Wed, 21 Sep 2022, Ori Kam wrote:

> Hi Ivan,
>
> PSB my comments.
>
> In any case, I'm afraid we are in a deadlock.

Hope we're not in fact.

> I understand your viewpoint, I don't think it is the correct
> one for the feature suggested here.
> For all the reasons I listed.

Ori, your two most recent replies are indeed valuable clarifications.
Now it's clear to me that your intention is to match on exact ports,
as usual, but this time with a hint for the flow table. Got it.

In your response, you say that matching on ALL vports is not what
the use case needs. OK, I understood. But please note that the
item name does not say "ALL", it says "ANY".

OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
and "PHY_PORTS_ONLY". This way, if user provides item VPORTS_ONLY
and then  provides item REPRESENTED_PORT, these two items do not
contradict each other. Item VPORTS_ONLY defines the scope of some
kind, then the following item, REPRESENTED_PORT, makes it narrower.

And, in documentation, one can say clearly that the user *may*
omit item VPORTS_ONLY in the exact rule pattern provided that
they have already submitted this item as part of the template.

It's like with match items IPV4 / UDP. Item IPV4 does not
contradict item UDP. They supplement each other. Same way,
VPORTS_ONLY says one thing (PHYS_PORTS do NOT match),
and then REPRESENTED_PORT clarifies it and specifies
which exact VPORT shall match. Isn't that acceptable?

>
> So from my viewpoint, the patch is Acked.
> If you wish as I suggested before, we can have a meeting with
> Rongwei and anyone else who is interested and close this subject.
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Tuesday, 20 September 2022 18:28
> RE: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> Hi Ori,
>>
>> On Tue, 20 Sep 2022, Ori Kam wrote:
>>
>>> Hi Ivan,
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>> Sent: Tuesday, 20 September 2022 15:46
>>>>
>>>> Hi Ori,
>>>>
>>>> On Tue, 20 Sep 2022, Ori Kam wrote:
>>>>
>>>>> Hi Ivan, Thomas and Rongwei
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Thomas Monjalon <thomas@monjalon.net>
>>>>>> Sent: Thursday, 15 September 2022 14:16
>>>>>>
>>>>>> 15/09/2022 12:59, Ivan Malov:
>>>>>>> Hi Rongwei,
>>>>>>>
>>>>>>> In this reply, I do not include the previous mail because the amount
>>>>>>> of inline commentary has gone haywire over the past couple of days.
>>>>>>> Let's re-iterate.
>>>>>>>
>>>>>>> But before I get to that, I'd like to offer a fresh perspective:
>>>>>>>
>>>>>>> Perhaps, if we all agree that term "vport" means an endpoint which
>>>>>>> can stand for any "port" except for physical one, then it should
>>>>>>> be possible to use term ANY_VPORTS rather than
>> ANY_GUEST_PORTS.
>>>>>>
>>>>>> The opposite of "physical" is "virtual" indeed.
>>>>>>
>>>>>>> But that's tricky, of course. I don't have a way with naming,
>>>>>>> so more opinions are welcome and very-very desirable here.
>>>>>>>
>>>>>>> So:
>>>>>>>
>>>>>>> 1) Do you agree that, in your proposal, the new "wire_orig" / "vf_orig"
>>>>>>>     primitives are in fact yet another match criteria?
>>>>>>>
>>>>>>>     ..
>>>>>>>
>>>>>>>     To me, it looks so. If they are match criteria, then they belong
>>>>>>>     in match pattern, that is, they should be expressed as new items.
>>>>>>>
>>>>>>>     For "transfer" rules, the *existing* attributes are: "group"
>>>>>>>     and "priority". As you may note, these are clearly not match
>>>>>>>     criteria. They control the look-up order. So, to this day,
>>>>>>>     there're no match criteria in DPDK expressed as attributes.
>>>>>>>
>>>>>>>     If these "wire_orig" / "vf_orig" are going to be introduced
>>>>>>>     as attributes, that should be backed with strong motivation.
>>>>>>
>>>>>> I prefer we keep matching in a single place, not in attributes.
>>>>>>
>>>>>
>>>>> I think we are talking about two different features.
>>>>> Feature 1:
>>>>> Allow matching on all vports that are not wire
>>>>> Feature 2:
>>>>> Save allocation space and allow fast insertion.
>>>>> In this case, the matching is not on all vports it can be just part of the
>> vports
>>>>> but it will never be the wire port.
>>>>> For example:
>>>>> port 0 - wire
>>>>> ports 1,2,3,4,5  - vports
>>>>> the application want to inset only those rules:
>>>>> represented_port(port_id=2) / eth / ipv4 (src==xx)
>>>>> represented_port(port_id=4) / eth / ipv4 (src==xx)
>>>>> represented_port(port_id=4) / eth / ipv4 (src==yy)
>>>>>
>>>>> For feature 1 I fully agree with you Ivan, this should be added as an item.
>>>>
>>>> Thank you.
>>>>
>>>>> For feature 2 I think Rongwei's suggestion is the better option.
>>>>> If I understand correctly the idea is to give hint to the PMD on where to
>>>> allocate memory
>>>>> and how to insert the rules most optimally. Since this is shared for all
>> rules it
>>>> makes more sense
>>>>> to add it as an attribute, just like we don’t have an ingress item (maybe
>> we
>>>> should?)
>>>>
>>>> But isn't pattern template also supposed to be shared for all rules
>>>> in the table? I.e., the user creates an async flow table and submits
>>>> a flow "shape" (which consists of attrs, pattern template and action
>>>> template). So why should "giving a hint" via an item template be
>>>> considered worse than doig so via an attribute?
>>>>
>>>
>>> The same item template maybe used elsewhere, for example, the
>> following
>>> pattern  eth / ipv4(src, dst) / udp(sport, dport), can be used on number of
>> different
>>> tables.
>>
>> In my understanding, the user may want to create flow table A
>> and use pattern template A' for it, which is as follows:
>>
>> any_vports / eth / ipv4 / udp
>>
>> The PMD can see this item and treat it exactly the same
>> way as it could treat such attribute ("where to allocate
>> resources, etc.").
>>
>> Then the user may want to create flow table B and
>> use pattern template B' for it:
>>
>> any_phy_ports / eth / ipv4 / udp
>>
>> Once again, the PMD can clearly see the difference between
>> the A' and B' templates and, this time, allocate resources
>> the other way (as per efficiency requirements).
>>
>
> Yes, but again you select all vports, this is not what the application wants
> the application wants to insert the following rules:
> Assuming port 0 is wire and DPDK ports 1,2,3,4,5 are vports.
> Represented_port(id=2) / eth / ipv4/ udp
> Represented_port(id=5) / eth / ipv4/ udp
>
> As you can see the application doesn’t want all ports just some vports but for sure not the
> wire port.
>
> I agree that we can go with your approach, but it isn't correct since why application should
> insert:
> any_phy_ports / Represented_port(id=2) / eth / ipv4/ udp
> any_phy_ports / Represented_port(id=2) / eth / ipv4/ udp

Thanks for the explanation. Now I get the idea, yes.
But the rules which you list and say they're incorrect
are in fact correct. Please see my thoughts above.

>
>> By saying "can be used on number of different tables", do you mean
>> that it is important to make the *network* part of the pattern
>> shareable between flow tables? I.e. are you saying that
>> templates A' and B' cause resource duplication just
>> because of the same *network* part in your case?
>>
>
> I'm saying that adding will mean that the application can't reuse the pattern template it created.
> if the application created 5 tuple template.
> It can reuse it in ingress tables, egress tables, FDB tables there is no need to create
> extra pattern templates.

I'd say, if the application has to create two separate templates which
only differ in the first item (VPORTS / PHYS_PORTS), then it should be
pretty much acceptable. Yes, theoretically, if something can be shared,
then why not indeed share it, but, on the other hand, sharing logic
can be error prone. Keeping templates with this kind of item in
them separate could potentially make code more robust, I think.

No strong opinion here. Andrew? Thomas?

>
>>> I think that the main difference between us is that from my point of view
>> this value is just
>>> where to allocate resources / how to better insert the rule. It is not related
>> to matching.
>>
>> To me, it *is* the match criterion which, at the same time, serves
>> as a value indicating the way how resources should be allocated.
>> But before all, it is a match criterion.
>>
>
> Depends on how you define matching, but just like ingress / egress is not matching
> the same goes here.
>
>> If it refers to a group of ports = in order to ditch "the other half"
>> of traffic from consideration (like Rongwei explained), then it
>> looks like a match criterion.
>>
>
> See above comment, also this case it relates to allocation and insertion,
> the matching is side product.
>
>>> From Nvidia viewpoint we need this information so we can allocate the
>> resource at the correct
>>> place and avoid inserting duplication of rules.
>>
>> I see.
>>
>>> I agree that by using the item we can get the same results, but it is incorrect
>> since we are not matching on it.
>>
>> If one provides item UDP in the pattern and does not match on any UDP
>> fields, doing so nevertheless *is* matching on particular packet type.
>>
> Yes it matches all UDP will again in this case the idea is not to match all vports
> but just to tell the PMD that there will be only vports arriving to this table.

Please see above. This time, I do understand the idea. But I do not
propose to say that "ALL" vports should match. I never suggested
to use word "ALL". Only "ANY". Now I see it can rather be "ONLY"
or somethign like that. I say that this item defines the scope
(broad match), and the following one, REPRESENTED_PORT, will
define an exact port to match that belongs in this scope.

>
>> The same seemingly goes for the new attribute / item. If it is
>> provided, then the user doesn't want the rule to affect
>> packets coming from certain ports (i.e. from wire).
>>
>> So still sounds like matching.
>>
>>> Part of the idea of template API is to give as many hints as possible to the
>> PMD so the insertion will be optimized.
>>
>> I see.
>>
>>>
>>>
>>>> As for "ingress" item, - no, one should not add such. We have had
>>>> many discussions concerning this bit in the past. Ingress/egress
>>>> are non-transfer terms. They belong in the scope of vNIC / ethdev
>>>> filtering, not to embedded switch rules.
>>>>
>>>> In my opinion, in the embedded switch, one should either point to
>>>> some precise switch ports (using REPRESENTOR / REPRESENTED items)
>>>> or use another kind of item to refer to a "super set" of ports
>>>> which have something in common ("all wire ports", "all NON-wire ports").
>>>>
>>>
>>> But this is my point we don't want all wire ports or all NON-wire ports, we
>> just know that in this table
>>> we will have only non-wire / wire ports.
>>
>> But how do these two viewpoints contradict each other?
>>
>
> Which viewpoints?
>
>>>
>>>>>
>>>>> Ivan we have the item RTE_FLOW_ITEM_TYPE_PF and
>>>> RTE_FLOW_ITEM_TYPE_VF which are deprecated,
>>>>> So do you want to un-deprecate them?
>>>>
>>>> No. These items are deprecated because:
>>>>
>>>> a) their names suggest that application knows whether an ethdev
>>>>     sits on top of a PF or that the application has some
>>>>     knowledge of existence of particular VFs, but in
>>>>     reality applications should not be worried of
>>>>     the underlying function type = to them, all
>>>>     ethdevs are just representors of something,
>>>>     and if the application needs to refer to
>>>>     VFs (or other PFs, - doesn't matter), it
>>>>     should do that via REPRESENTOR items;
>>>>
>>>> b) such items would duplicate REPRESENTOR / REPRESENTED.
>>>>
>>> Agree with everything you say.
>>
>> Great we're on the same page regarding this bit.
>>
>>>
>>>>>
>>>>> To summarize, if PMD can use such an hint during rule creation and save
>>>> memory, I vote
>>>>> to allow it.
>>>>> if the idea is to match on all vports then it should be an item.
>>>>
>>>> But such a hint would effectively be a match criterion, too, right?
>>>> So, in fact it's a combined use case: a match criterion which is
>>>> flexible enough to be a "hint" = i.e. the PMD can see it when
>>>> processing the pattern *template* and treat it as a hint.
>>>>
>>>
>>> Yes, but it is an implicit match, just like saying ingress. Egress it has meaning
>> above the
>>> matching. In addition, there is no reason to add extra item for each rule we
>> create, just
>>> to enable something that is fixed during the table creation.
>>> Extra item in pattern template means extra item for each rule.
>>> I know we can avoid this and optimize the code but why add something
>> that no one needs
>>> after table creation?
>>
>> Good question. But, in case some way exists to make such optimisation
>> laconic enough to avoid confusion etc., then it should be no problem
>> in preferring the pattern approach over attribute approach.
>>
>>>
>>>
>>>>>
>>>>>>
>>>>>>> 2) From your viewpoint, why items "ANY_PHYS_PORTS" and
>>>>>> "ANY_VPORTS"
>>>>>>>     won't do? Or, which problems do you think they may inflict?
>>>>>>>
>>>>>>>     ..
>>>>>>>
>>>>>>>     Previously, you explained why REPRESENTED_PORT would not
>>>>>>>     fit your needs. And I understand your point: to async API,
>>>>>>>     two pattern templates which both have item REPRESENTED_PORT
>>>>>>>     in them cannot be clearly distinguished and are in fact the
>>>>>>>     same set of criteria (provided that all other items are also
>>>>>>>     the same and have the same masks). Templates are, well,
>>>>>>>     templates (or shapes) of the rules to come later and
>>>>>>>     do not include exact "spec" for the "ethdev_id".
>>>>>>>     Got it.
>>>>>>>
>>>>>>>     But that's not going to be the case with items ANY_PHYS_PORTS
>> and
>>>>>>>     ANY_VPORTS, is it? In one async table template, the user submits
>>>>>>>     item ANY_PHYS_PORTS (instead of table attribute "wire_orig").
>>>>>>>     In another template, the user submits item ANY_VPORTS to
>>>>>>>     state that they want to match only traffic transmitted
>>>>>>>     software endpoints (DPDK ethdevs, guest VFs, etc.)
>>>>>>>     connected to the switch.
>>>>>>>
>>>>>>>     In this example, the PMD will clearly see that the two templates
>>>>>>>     differ. So it will be able to allocate separate resources, each
>>>>>>>     one "cutting one half of traffic" (as per your concept).
>>>>>>>
>>>>>>> 3) In your most recent response, you suggested that one might have
>>>>>>>     had the attributes occupied for some other purposes. To me,
>>>>>>>     they're not. Neither me nor my closest colleagues have
>>>>>>>     any plans on them. When I advocate using item approach
>>>>>>>     over the attribute approach, I do this to ensure
>>>>>>>     a) clarity of the API contract and b) robustness.
>>>>>
>>>>> If something is shared for all rules in the same table, it should be a table
>>>>> property.
>>>>
>>>> But the whole pattern *template* is also a table property, isn't it?
>>>>
>>>
>>> Like I said above the pattern template can be used in all domains that is
>> why
>>> there is a split between table and patter, in addition to that each table may
>> have
>>> number of pattern templates.
>>
>> This is a valuable clarification. However, even if the attribute way
>> may seem OK after this explanation, then I still don't understand
>> why it is required to add this attribute to the generic "struct
>> rte_flow_attr" and not just to the *table* attr.
>>
>> Generic "struct rte_flow_attr" is used both for async and
>> sync (regular) approach. So why add something to generic
>> struct which is never going to make sense to sync flows?
>>
>
> I guess we can move it to the table attribute, but I think that even
> in standard rte_flow API this can save duplicate insertion.

Could you please expand on the standard (non-Async) rule insertion?
In which way can it save resources? Just an example, to get the idea.

Also, please note, that, during our talk with Rongwei, I failed
to explan that, if this new attribute can indeed be used not
only for Async flows, but also for standard (sync) ones,
then testpmd diff should also extend testpmd parser for
regular flows, i.e. the user should be able to write

flow create 0 transfer new_attr_here pattern ... ...

(or validate).

I'm affraid the current code only allows to specify the attribute
in commands which work exclusively for async tables.

That does not seem right.

>
>>>
>>>>>
>>>>>>>
>>>>>>> 4) Also, in your response, you suggested that I might have
>>>>>>>     confused item mask and spec. That is not the case.
>>>>>>>     If we agree, that switch domain ID is unneeded in
>>>>>>>     the new items, then these items will have no
>>>>>>>     fields in them (like item PF had not had any
>>>>>>>     before it was deprecated).
>>>>>>>
>>>>>>>     No fields in new items => no field masks.
>>>>>>>     So what's the problem then?
>>>>>>>
>>>>>>> 5) With regard to our talk about identifying the relationship
>>>>>>>     between ethdevs and switch domains, you said that the user
>>>>>>>     could know the difference from the very beginning:
>>>>>>>     /sysfs/ .... /PF_BDF/sriov_num
>>>>>>>
>>>>>>>     That is true for the user who starts the application, but
>>>>>>>     this knowledge is hard to obtain from the application
>>>>>>>     perspective = it's hard to automate.
>>>>>>>
>>>>>>>     This is why ethdevs are able to advertise their domain IDs.
>>>>>>>     And, as I explained, looking at domain ID to understand
>>>>>>
>>>>>> namely rte_eth_dev_info.switch_info.domain_id
>>>>>>
>>>>>>>     port relationship is valid, whilst looking at proxy IDs
>>>>>>>     to achieve the same goal is not. Proxy port IDs only
>>>>>>>     serve the purpose of finding an entry point for
>>>>>>>     managing flows. That has slightly different
>>>>>>>     meaning, but this subtle difference is important.
>>>>>>
>>>>>> There is also a concept of sibling ports
>>>>>> to get all ports belonging to the same hardware.
>>>>>>
>>>>>>
>>>>>>> 6) As for the confusion over the difference between fixing
>>>>>>>     bugs and making the code robust by extra checks:
>>>>>>>
>>>>>>>     Yes, I agree that the programmer who writes the
>>>>>>>     application must be intelligent enough to use
>>>>>>>     flow primitives the proper way. Yes, the user
>>>>>>>     who starts the application also should thread
>>>>>>>     carefully. But that does not prevent some
>>>>>>>     mistakes in other parts of code from
>>>>>>>     corrupting various chunks of memory,
>>>>>>>     including, for example, flow attrs.
>>>>>>>
>>>>>>>     You say that such mistakes have to be "just fixed"
>>>>>>>     as any other bugs. Right. But how much time will
>>>>>>>     the programmer spend to identify the bugs?
>>>>>>>
>>>>>>>     If the PMDs do all the checks (as with attributes),
>>>>>>>     the hypothetical bug will manifest itself much
>>>>>>>     earlier. That will simplify debugging by a lot...
>>>>>>>
>>>>>>>     So, my point is that it's still better to ensure
>>>>>>>     that new flow primitives have all necessary
>>>>>>>     checks in place. For attributes, it is
>>>>>>>     required to add them separately.
>>>>>>
>>>>>> If flow insertion is done in a fast path,
>>>>>> such checks may be skipped.
>>>>>
>>>>> The idea is that all rules in this table will share the same configuration,
>>>>> there is no reason to say everything again for each rule. This is why
>>>>> the rule attributes were moved to the table struct and not per rule.
>>>>>
>>>>>>
>>>>>>>     For items, as I explained, it might not be necessary
>>>>>>>     in the majority of cases simply because of the
>>>>>>>     switch (item->type) { case } structure.
>>>>>>>
>>>>>>> So, these are some of my points to explain why the
>>>>>>> attribute approach is untenable. To me, attributes
>>>>>>> are something global, which demands checks in all
>>>>>>> flow-capable PMDs. Items seem better because they
>>>>>>> are don't cares to all PMDs which are unaware of
>>>>>>> the async concept. So, even if someone does not
>>>>>>> implement the async concept or does not like
>>>>>>> the new item names, they can turn a blind
>>>>>>> eye to this - with attributes, thay can't.
>>>>>>>
>>>>>
>>>>> Good point,
>>>>> Maybe we should add hints in the attribute,
>>>>> for example, hint_only_wire in this case it will be clear that
>>>>> PMD may ignore this, and it should be fully documented that this is not a
>>>> mandatory field.
>>>>> What do you think?
>>>>
>>>> Theoretically, making terminology softer (like with the word "hint")
>>>> could make things easier for vendors who may find the new feature
>>>> confusing or something like that. But if, in reality, this hint
>>>> is indeed another match criterion (see my comments above), then
>>>> in no event shall the prefix "hint" be an excuse for this
>>>> criterion not being expressed as a pattern item.
>>>>
>>>
>>> Please see my response above. This is the point it is much more than
>> matching.
>>>
>>>> Please hear me out: I don't mean to sound arrogant, - just trying
>>>> to understand why expressing the new bit as an item can't be
>>>> efficient enough for the async flow approach.
>>>>
>>>
>>> I don't think you are arrogant, and I hope that you see that I do understand
>> your comments.
>>> saying that I hope I explained why I think it is better to have it as a table
>> attribute and not as an
>>> item. (We are not matching on it, this helps the PMD allocate the table at
>> the best location and avoid
>>> duplication of rules)
>>>
>>> If you wish, we can have a short phone call and discuss this.
>>>
>>> Best,
>>> Ori
>>>
>>>
>>>>>
>>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Ivan
>>>
>>
>> Thanks,
>> Ivan
>

Ivan
  
Thomas Monjalon Sept. 21, 2022, 9:40 a.m. UTC | #24
21/09/2022 11:04, Ivan Malov:
> Now it's clear to me that your intention is to match on exact ports,
> as usual, but this time with a hint for the flow table. Got it.
> 
> In your response, you say that matching on ALL vports is not what
> the use case needs. OK, I understood. But please note that the
> item name does not say "ALL", it says "ANY".
> 
> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
> and "PHY_PORTS_ONLY". This way, if user provides item VPORTS_ONLY
> and then  provides item REPRESENTED_PORT, these two items do not
> contradict each other. Item VPORTS_ONLY defines the scope of some
> kind, then the following item, REPRESENTED_PORT, makes it narrower.
> 
> And, in documentation, one can say clearly that the user *may*
> omit item VPORTS_ONLY in the exact rule pattern provided that
> they have already submitted this item as part of the template.

I think the problem that Rongwei & Ori are trying to solve
is to allocate resources for the templates table in the right place.
A table can have multiple templates.
If all rules/templates for this table are dedicated to virtual ports,
then the table will be allocated in a place managing only virtual ports.
This allocation decision must be taken at table creation,
whereas rules will be created later.
In order to do this specific table allocation for vports,
we need to restrict all templates of the table to be "vports only".

I hope it makes things clearer.
Now the question is how to achieve this? Solutions are:

1/ give a hint to the table allocation
2/ insert a pattern item in all templates of the table

I don't see any other solution. Please propose if there are more options.
  
Andrew Rybchenko Sept. 21, 2022, 10:04 a.m. UTC | #25
On 9/21/22 12:40, Thomas Monjalon wrote:
> 21/09/2022 11:04, Ivan Malov:
>> Now it's clear to me that your intention is to match on exact ports,
>> as usual, but this time with a hint for the flow table. Got it.
>>
>> In your response, you say that matching on ALL vports is not what
>> the use case needs. OK, I understood. But please note that the
>> item name does not say "ALL", it says "ANY".
>>
>> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
>> and "PHY_PORTS_ONLY". This way, if user provides item VPORTS_ONLY
>> and then  provides item REPRESENTED_PORT, these two items do not
>> contradict each other. Item VPORTS_ONLY defines the scope of some
>> kind, then the following item, REPRESENTED_PORT, makes it narrower.
>>
>> And, in documentation, one can say clearly that the user *may*
>> omit item VPORTS_ONLY in the exact rule pattern provided that
>> they have already submitted this item as part of the template.
> 
> I think the problem that Rongwei & Ori are trying to solve
> is to allocate resources for the templates table in the right place.
> A table can have multiple templates.
> If all rules/templates for this table are dedicated to virtual ports,
> then the table will be allocated in a place managing only virtual ports.
> This allocation decision must be taken at table creation,
> whereas rules will be created later.
> In order to do this specific table allocation for vports,
> we need to restrict all templates of the table to be "vports only".
> 
> I hope it makes things clearer.
> Now the question is how to achieve this? Solutions are:
> 
> 1/ give a hint to the table allocation
> 2/ insert a pattern item in all templates of the table
> 
> I don't see any other solution. Please propose if there are more options.
> 
> 

See my mail

3/ use jump rule which ensures that all traffic meets out
    expectations

It means that the table creation could be postponed. Or the
table could be per-configured at the point of creation and
finalized when we know that all traffic will be from wires
or from vports. Yes, it complicates internals to achieve
the optimization.
  
Ori Kam Sept. 21, 2022, 12:41 p.m. UTC | #26
Hi All,
To avoid multi threads, I will only answer this thread since I assume
everyone is clear about the issue.

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 
> On 9/21/22 12:40, Thomas Monjalon wrote:
> > 21/09/2022 11:04, Ivan Malov:
> >> Now it's clear to me that your intention is to match on exact ports,
> >> as usual, but this time with a hint for the flow table. Got it.
> >>
> >> In your response, you say that matching on ALL vports is not what
> >> the use case needs. OK, I understood. But please note that the
> >> item name does not say "ALL", it says "ANY".
> >>
> >> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
> >> and "PHY_PORTS_ONLY". This way, if user provides item VPORTS_ONLY
> >> and then  provides item REPRESENTED_PORT, these two items do not
> >> contradict each other. Item VPORTS_ONLY defines the scope of some
> >> kind, then the following item, REPRESENTED_PORT, makes it narrower.
> >>
> >> And, in documentation, one can say clearly that the user *may*
> >> omit item VPORTS_ONLY in the exact rule pattern provided that
> >> they have already submitted this item as part of the template.
> >
> > I think the problem that Rongwei & Ori are trying to solve
> > is to allocate resources for the templates table in the right place.
> > A table can have multiple templates.
> > If all rules/templates for this table are dedicated to virtual ports,
> > then the table will be allocated in a place managing only virtual ports.
> > This allocation decision must be taken at table creation,
> > whereas rules will be created later.
> > In order to do this specific table allocation for vports,
> > we need to restrict all templates of the table to be "vports only".
> >
> > I hope it makes things clearer.
> > Now the question is how to achieve this? Solutions are:
> >
> > 1/ give a hint to the table allocation
> > 2/ insert a pattern item in all templates of the table
> >
> > I don't see any other solution. Please propose if there are more options.
> >
> >
> 
> See my mail
> 
> 3/ use jump rule which ensures that all traffic meets out
>     expectations
> 
> It means that the table creation could be postponed. Or the
> table could be per-configured at the point of creation and
> finalized when we know that all traffic will be from wires
> or from vports. Yes, it complicates internals to achieve
> the optimization.

Sorry Andrew your suggestion is not a valid one for the following reasons:
1. table creation can't be postponed this is a key idea of the rte_flow template API.
2. we can never know what rules will be inserted if the application doesn't tell us.
     how can we know this is the last rule? What do we do with the first rule?
3. I don't see how jumping helps since it worsens the issue when you jump to a table,
    how does the PMD know if this table should have only wire or only vports?

I agree with Thomas, there are two valid options, I vote for the hint since this is the
feature idea to tell the PMD where this resource should be allocated.

Best,
Ori
  
Morten Brørup Sept. 21, 2022, 12:51 p.m. UTC | #27
> From: Ori Kam [mailto:orika@nvidia.com]
> Sent: Wednesday, 21 September 2022 14.41
> 
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >
> > On 9/21/22 12:40, Thomas Monjalon wrote:
> > > 21/09/2022 11:04, Ivan Malov:
> > >> Now it's clear to me that your intention is to match on exact
> ports,
> > >> as usual, but this time with a hint for the flow table. Got it.
> > >>
> > >> In your response, you say that matching on ALL vports is not what
> > >> the use case needs. OK, I understood. But please note that the
> > >> item name does not say "ALL", it says "ANY".
> > >>
> > >> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
> > >> and "PHY_PORTS_ONLY". This way, if user provides item VPORTS_ONLY
> > >> and then  provides item REPRESENTED_PORT, these two items do not
> > >> contradict each other. Item VPORTS_ONLY defines the scope of some
> > >> kind, then the following item, REPRESENTED_PORT, makes it
> narrower.
> > >>
> > >> And, in documentation, one can say clearly that the user *may*
> > >> omit item VPORTS_ONLY in the exact rule pattern provided that
> > >> they have already submitted this item as part of the template.
> > >
> > > I think the problem that Rongwei & Ori are trying to solve
> > > is to allocate resources for the templates table in the right
> place.
> > > A table can have multiple templates.
> > > If all rules/templates for this table are dedicated to virtual
> ports,
> > > then the table will be allocated in a place managing only virtual
> ports.
> > > This allocation decision must be taken at table creation,
> > > whereas rules will be created later.
> > > In order to do this specific table allocation for vports,
> > > we need to restrict all templates of the table to be "vports only".
> > >
> > > I hope it makes things clearer.
> > > Now the question is how to achieve this? Solutions are:
> > >
> > > 1/ give a hint to the table allocation
> > > 2/ insert a pattern item in all templates of the table
> > >
> > > I don't see any other solution. Please propose if there are more
> options.
> > >
> > >
> >
> > See my mail
> >
> > 3/ use jump rule which ensures that all traffic meets out
> >     expectations
> >
> > It means that the table creation could be postponed. Or the
> > table could be per-configured at the point of creation and
> > finalized when we know that all traffic will be from wires
> > or from vports. Yes, it complicates internals to achieve
> > the optimization.
> 
> Sorry Andrew your suggestion is not a valid one for the following
> reasons:
> 1. table creation can't be postponed this is a key idea of the rte_flow
> template API.
> 2. we can never know what rules will be inserted if the application
> doesn't tell us.
>      how can we know this is the last rule? What do we do with the
> first rule?
> 3. I don't see how jumping helps since it worsens the issue when you
> jump to a table,
>     how does the PMD know if this table should have only wire or only
> vports?
> 
> I agree with Thomas, there are two valid options, I vote for the hint
> since this is the
> feature idea to tell the PMD where this resource should be allocated.

This is an optimization; I agree with Ori that a hint is appropriate, like the MBUF_FAST_FREE hint on TX queues.

No need to add more complexity by requiring the driver to recognize that the pattern is present in all templates. (And perhaps also remove that pattern when applying the templates.)

> 
> Best,
> Ori
  
Andrew Rybchenko Sept. 22, 2022, 7:39 a.m. UTC | #28
On 9/21/22 15:51, Morten Brørup wrote:
>> From: Ori Kam [mailto:orika@nvidia.com]
>> Sent: Wednesday, 21 September 2022 14.41
>>
>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>
>>> On 9/21/22 12:40, Thomas Monjalon wrote:
>>>> 21/09/2022 11:04, Ivan Malov:
>>>>> Now it's clear to me that your intention is to match on exact
>> ports,
>>>>> as usual, but this time with a hint for the flow table. Got it.
>>>>>
>>>>> In your response, you say that matching on ALL vports is not what
>>>>> the use case needs. OK, I understood. But please note that the
>>>>> item name does not say "ALL", it says "ANY".
>>>>>
>>>>> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
>>>>> and "PHY_PORTS_ONLY". This way, if user provides item VPORTS_ONLY
>>>>> and then  provides item REPRESENTED_PORT, these two items do not
>>>>> contradict each other. Item VPORTS_ONLY defines the scope of some
>>>>> kind, then the following item, REPRESENTED_PORT, makes it
>> narrower.
>>>>>
>>>>> And, in documentation, one can say clearly that the user *may*
>>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
>>>>> they have already submitted this item as part of the template.
>>>>
>>>> I think the problem that Rongwei & Ori are trying to solve
>>>> is to allocate resources for the templates table in the right
>> place.
>>>> A table can have multiple templates.
>>>> If all rules/templates for this table are dedicated to virtual
>> ports,
>>>> then the table will be allocated in a place managing only virtual
>> ports.
>>>> This allocation decision must be taken at table creation,
>>>> whereas rules will be created later.
>>>> In order to do this specific table allocation for vports,
>>>> we need to restrict all templates of the table to be "vports only".
>>>>
>>>> I hope it makes things clearer.
>>>> Now the question is how to achieve this? Solutions are:
>>>>
>>>> 1/ give a hint to the table allocation
>>>> 2/ insert a pattern item in all templates of the table
>>>>
>>>> I don't see any other solution. Please propose if there are more
>> options.
>>>>
>>>>
>>>
>>> See my mail
>>>
>>> 3/ use jump rule which ensures that all traffic meets out
>>>      expectations
>>>
>>> It means that the table creation could be postponed. Or the
>>> table could be per-configured at the point of creation and
>>> finalized when we know that all traffic will be from wires
>>> or from vports. Yes, it complicates internals to achieve
>>> the optimization.
>>
>> Sorry Andrew your suggestion is not a valid one for the following
>> reasons:
>> 1. table creation can't be postponed this is a key idea of the rte_flow
>> template API.

I guess nobody cares if it delays insertion on the first rule
only. Anyway, see below.

>> 2. we can never know what rules will be inserted if the application
>> doesn't tell us.
>>       how can we know this is the last rule? What do we do with the
>> first rule?
>> 3. I don't see how jumping helps since it worsens the issue when you
>> jump to a table,
>>      how does the PMD know if this table should have only wire or only
>> vports?

Jump rules say so. PMD can analyze there rules.
May be just need an attribute saying that all jump rules
to the table are configured and further attempts to reconfigure
will be rejected?

>>
>> I agree with Thomas, there are two valid options, I vote for the hint
>> since this is the
>> feature idea to tell the PMD where this resource should be allocated.
> 
> This is an optimization; I agree with Ori that a hint is appropriate, like the MBUF_FAST_FREE hint on TX queues.
> 
> No need to add more complexity by requiring the driver to recognize that the pattern is present in all templates. (And perhaps also remove that pattern when applying the templates.)

What does the part of the matching criteria so special
that it is allowed to have dedicated hint attribute?

May be we can have really generic solution when any
part of the matching criteria could provide such hints?
  
Ori Kam Sept. 22, 2022, 10:06 a.m. UTC | #29
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, 22 September 2022 10:39
> 
> On 9/21/22 15:51, Morten Brørup wrote:
> >> From: Ori Kam [mailto:orika@nvidia.com]
> >> Sent: Wednesday, 21 September 2022 14.41
> >>
> >>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>
> >>> On 9/21/22 12:40, Thomas Monjalon wrote:
> >>>> 21/09/2022 11:04, Ivan Malov:
> >>>>> Now it's clear to me that your intention is to match on exact
> >> ports,
> >>>>> as usual, but this time with a hint for the flow table. Got it.
> >>>>>
> >>>>> In your response, you say that matching on ALL vports is not what
> >>>>> the use case needs. OK, I understood. But please note that the
> >>>>> item name does not say "ALL", it says "ANY".
> >>>>>
> >>>>> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
> >>>>> and "PHY_PORTS_ONLY". This way, if user provides item
> VPORTS_ONLY
> >>>>> and then  provides item REPRESENTED_PORT, these two items do not
> >>>>> contradict each other. Item VPORTS_ONLY defines the scope of some
> >>>>> kind, then the following item, REPRESENTED_PORT, makes it
> >> narrower.
> >>>>>
> >>>>> And, in documentation, one can say clearly that the user *may*
> >>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
> >>>>> they have already submitted this item as part of the template.
> >>>>
> >>>> I think the problem that Rongwei & Ori are trying to solve
> >>>> is to allocate resources for the templates table in the right
> >> place.
> >>>> A table can have multiple templates.
> >>>> If all rules/templates for this table are dedicated to virtual
> >> ports,
> >>>> then the table will be allocated in a place managing only virtual
> >> ports.
> >>>> This allocation decision must be taken at table creation,
> >>>> whereas rules will be created later.
> >>>> In order to do this specific table allocation for vports,
> >>>> we need to restrict all templates of the table to be "vports only".
> >>>>
> >>>> I hope it makes things clearer.
> >>>> Now the question is how to achieve this? Solutions are:
> >>>>
> >>>> 1/ give a hint to the table allocation
> >>>> 2/ insert a pattern item in all templates of the table
> >>>>
> >>>> I don't see any other solution. Please propose if there are more
> >> options.
> >>>>
> >>>>
> >>>
> >>> See my mail
> >>>
> >>> 3/ use jump rule which ensures that all traffic meets out
> >>>      expectations
> >>>
> >>> It means that the table creation could be postponed. Or the
> >>> table could be per-configured at the point of creation and
> >>> finalized when we know that all traffic will be from wires
> >>> or from vports. Yes, it complicates internals to achieve
> >>> the optimization.
> >>
> >> Sorry Andrew your suggestion is not a valid one for the following
> >> reasons:
> >> 1. table creation can't be postponed this is a key idea of the rte_flow
> >> template API.
> 
> I guess nobody cares if it delays insertion on the first rule
> only. Anyway, see below.
> 
> >> 2. we can never know what rules will be inserted if the application
> >> doesn't tell us.
> >>       how can we know this is the last rule? What do we do with the
> >> first rule?
> >> 3. I don't see how jumping helps since it worsens the issue when you
> >> jump to a table,
> >>      how does the PMD know if this table should have only wire or only
> >> vports?
> 
> Jump rules say so. PMD can analyze there rules.
> May be just need an attribute saying that all jump rules
> to the table are configured and further attempts to reconfigure
> will be rejected?
> 

The idea is the PMD will not analyze rules. That is why we have the table
and template.
Sorry, I don't understand what attribute can be in jump? The jump is just
to table. It can't say anything about the table destination table.
This is all this patch adds the attribute to a table to say where this
table should be located.

> >>
> >> I agree with Thomas, there are two valid options, I vote for the hint
> >> since this is the
> >> feature idea to tell the PMD where this resource should be allocated.
> >
> > This is an optimization; I agree with Ori that a hint is appropriate, like the
> MBUF_FAST_FREE hint on TX queues.
> >
> > No need to add more complexity by requiring the driver to recognize that
> the pattern is present in all templates. (And perhaps also remove that
> pattern when applying the templates.)
> 
> What does the part of the matching criteria so special
> that it is allowed to have dedicated hint attribute?
> 
> May be we can have really generic solution when any
> part of the matching criteria could provide such hints?

That is the point I keep returning to, it is not matching!
This is on which HW resource the table should be allocated.
Think about ingress/egress/transfer why are they not in the  pattern?
They are where rules should be offloaded, they are different domain.
Like we have elsewhere for example in action create we can state on which
domain the action should be created. If the application selects a number of domains
it may mean that extra resources will be allocated.
  
Andrew Rybchenko Sept. 22, 2022, 10:31 a.m. UTC | #30
On 9/22/22 13:06, Ori Kam wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Thursday, 22 September 2022 10:39
>>
>> On 9/21/22 15:51, Morten Brørup wrote:
>>>> From: Ori Kam [mailto:orika@nvidia.com]
>>>> Sent: Wednesday, 21 September 2022 14.41
>>>>
>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>
>>>>> On 9/21/22 12:40, Thomas Monjalon wrote:
>>>>>> 21/09/2022 11:04, Ivan Malov:
>>>>>>> Now it's clear to me that your intention is to match on exact
>>>> ports,
>>>>>>> as usual, but this time with a hint for the flow table. Got it.
>>>>>>>
>>>>>>> In your response, you say that matching on ALL vports is not what
>>>>>>> the use case needs. OK, I understood. But please note that the
>>>>>>> item name does not say "ALL", it says "ANY".
>>>>>>>
>>>>>>> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
>>>>>>> and "PHY_PORTS_ONLY". This way, if user provides item
>> VPORTS_ONLY
>>>>>>> and then  provides item REPRESENTED_PORT, these two items do not
>>>>>>> contradict each other. Item VPORTS_ONLY defines the scope of some
>>>>>>> kind, then the following item, REPRESENTED_PORT, makes it
>>>> narrower.
>>>>>>>
>>>>>>> And, in documentation, one can say clearly that the user *may*
>>>>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
>>>>>>> they have already submitted this item as part of the template.
>>>>>>
>>>>>> I think the problem that Rongwei & Ori are trying to solve
>>>>>> is to allocate resources for the templates table in the right
>>>> place.
>>>>>> A table can have multiple templates.
>>>>>> If all rules/templates for this table are dedicated to virtual
>>>> ports,
>>>>>> then the table will be allocated in a place managing only virtual
>>>> ports.
>>>>>> This allocation decision must be taken at table creation,
>>>>>> whereas rules will be created later.
>>>>>> In order to do this specific table allocation for vports,
>>>>>> we need to restrict all templates of the table to be "vports only".
>>>>>>
>>>>>> I hope it makes things clearer.
>>>>>> Now the question is how to achieve this? Solutions are:
>>>>>>
>>>>>> 1/ give a hint to the table allocation
>>>>>> 2/ insert a pattern item in all templates of the table
>>>>>>
>>>>>> I don't see any other solution. Please propose if there are more
>>>> options.
>>>>>>
>>>>>>
>>>>>
>>>>> See my mail
>>>>>
>>>>> 3/ use jump rule which ensures that all traffic meets out
>>>>>       expectations
>>>>>
>>>>> It means that the table creation could be postponed. Or the
>>>>> table could be per-configured at the point of creation and
>>>>> finalized when we know that all traffic will be from wires
>>>>> or from vports. Yes, it complicates internals to achieve
>>>>> the optimization.
>>>>
>>>> Sorry Andrew your suggestion is not a valid one for the following
>>>> reasons:
>>>> 1. table creation can't be postponed this is a key idea of the rte_flow
>>>> template API.
>>
>> I guess nobody cares if it delays insertion on the first rule
>> only. Anyway, see below.
>>
>>>> 2. we can never know what rules will be inserted if the application
>>>> doesn't tell us.
>>>>        how can we know this is the last rule? What do we do with the
>>>> first rule?
>>>> 3. I don't see how jumping helps since it worsens the issue when you
>>>> jump to a table,
>>>>       how does the PMD know if this table should have only wire or only
>>>> vports?
>>
>> Jump rules say so. PMD can analyze there rules.
>> May be just need an attribute saying that all jump rules
>> to the table are configured and further attempts to reconfigure
>> will be rejected?
>>
> 
> The idea is the PMD will not analyze rules. That is why we have the table
> and template.
> Sorry, I don't understand what attribute can be in jump? The jump is just
> to table. It can't say anything about the table destination table.
> This is all this patch adds the attribute to a table to say where this
> table should be located.
> 
>>>>
>>>> I agree with Thomas, there are two valid options, I vote for the hint
>>>> since this is the
>>>> feature idea to tell the PMD where this resource should be allocated.
>>>
>>> This is an optimization; I agree with Ori that a hint is appropriate, like the
>> MBUF_FAST_FREE hint on TX queues.
>>>
>>> No need to add more complexity by requiring the driver to recognize that
>> the pattern is present in all templates. (And perhaps also remove that
>> pattern when applying the templates.)
>>
>> What does the part of the matching criteria so special
>> that it is allowed to have dedicated hint attribute?
>>
>> May be we can have really generic solution when any
>> part of the matching criteria could provide such hints?
> 
> That is the point I keep returning to, it is not matching!
> This is on which HW resource the table should be allocated.

Sorry, but it is just your HW details that you have different
location/resources for rules which apply on packets coming
from wire and coming from host (vports).

> Think about ingress/egress/transfer why are they not in the  pattern?

We have no ingress/egress in transfer domain any more because
it is ambiguous.

Transfer itself is really a different domain. Logically and
from privileges point of view. That's why it is important to
distinguish it.

Ingress and egress in non-transfer case are natively bound
to two main functions of the driver: transmit (egress rules)
and receive (ingress rules). In general, it is a matching
criteria as well, but because of its nature (explained
above) it is simply handy to distinguish it from the very
beginning.

> They are where rules should be offloaded, they are different domain.

We have just two domains: transfer and non-transfer.

> Like we have elsewhere for example in action create we can state on which
> domain the action should be created. If the application selects a number of domains
> it may mean that extra resources will be allocated.> 

Two more points:

1/ If it is just a hint, it is optional for PMD to
   support/handle it. It means that it MUST NOT impose any
   limitations on matching. If so, if you want a rule to
   be applied on packets coming from wire, you still MUST
   specify it in the pattern.
   So, it does not sound like a hint in your case.

2/ struct rte_flow_attr is used for really all rules.
    How a new attribute should be interpreted in non-transfer
    rules? Similar to ingress/egress? Duplication?
    Or even harder (if it is NOT a hint): should it really
    enforce matching of packets coming from wire (i.e. not
    a different vport)? Not sure that it is doable or even
    make sense.
    We can say that the attribute may be used for the transfer
    rules only. If so, it MUST be checked on ethdev level
    since it is a generic rule.

3/ struct rte_flow_attr is used for sync and async rules.
    As I understand you're using it for async rules only.
    Does it make sense for sync rules?
  
Ivan Malov Sept. 22, 2022, 12:43 p.m. UTC | #31
Hi Ori,

On Thu, 22 Sep 2022, Ori Kam wrote:

> Hi Andrew,
>
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Thursday, 22 September 2022 10:39
>>
>> On 9/21/22 15:51, Morten Brørup wrote:
>>>> From: Ori Kam [mailto:orika@nvidia.com]
>>>> Sent: Wednesday, 21 September 2022 14.41
>>>>
>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>
>>>>> On 9/21/22 12:40, Thomas Monjalon wrote:
>>>>>> 21/09/2022 11:04, Ivan Malov:
>>>>>>> Now it's clear to me that your intention is to match on exact
>>>> ports,
>>>>>>> as usual, but this time with a hint for the flow table. Got it.
>>>>>>>
>>>>>>> In your response, you say that matching on ALL vports is not what
>>>>>>> the use case needs. OK, I understood. But please note that the
>>>>>>> item name does not say "ALL", it says "ANY".
>>>>>>>
>>>>>>> OK. Say, "ANY" is also confusing. Let's then name it "VPORTS_ONLY"
>>>>>>> and "PHY_PORTS_ONLY". This way, if user provides item
>> VPORTS_ONLY
>>>>>>> and then  provides item REPRESENTED_PORT, these two items do not
>>>>>>> contradict each other. Item VPORTS_ONLY defines the scope of some
>>>>>>> kind, then the following item, REPRESENTED_PORT, makes it
>>>> narrower.
>>>>>>>
>>>>>>> And, in documentation, one can say clearly that the user *may*
>>>>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
>>>>>>> they have already submitted this item as part of the template.
>>>>>>
>>>>>> I think the problem that Rongwei & Ori are trying to solve
>>>>>> is to allocate resources for the templates table in the right
>>>> place.
>>>>>> A table can have multiple templates.
>>>>>> If all rules/templates for this table are dedicated to virtual
>>>> ports,
>>>>>> then the table will be allocated in a place managing only virtual
>>>> ports.
>>>>>> This allocation decision must be taken at table creation,
>>>>>> whereas rules will be created later.
>>>>>> In order to do this specific table allocation for vports,
>>>>>> we need to restrict all templates of the table to be "vports only".
>>>>>>
>>>>>> I hope it makes things clearer.
>>>>>> Now the question is how to achieve this? Solutions are:
>>>>>>
>>>>>> 1/ give a hint to the table allocation
>>>>>> 2/ insert a pattern item in all templates of the table
>>>>>>
>>>>>> I don't see any other solution. Please propose if there are more
>>>> options.
>>>>>>
>>>>>>
>>>>>
>>>>> See my mail
>>>>>
>>>>> 3/ use jump rule which ensures that all traffic meets out
>>>>>      expectations
>>>>>
>>>>> It means that the table creation could be postponed. Or the
>>>>> table could be per-configured at the point of creation and
>>>>> finalized when we know that all traffic will be from wires
>>>>> or from vports. Yes, it complicates internals to achieve
>>>>> the optimization.
>>>>
>>>> Sorry Andrew your suggestion is not a valid one for the following
>>>> reasons:
>>>> 1. table creation can't be postponed this is a key idea of the rte_flow
>>>> template API.
>>
>> I guess nobody cares if it delays insertion on the first rule
>> only. Anyway, see below.
>>
>>>> 2. we can never know what rules will be inserted if the application
>>>> doesn't tell us.
>>>>       how can we know this is the last rule? What do we do with the
>>>> first rule?
>>>> 3. I don't see how jumping helps since it worsens the issue when you
>>>> jump to a table,
>>>>      how does the PMD know if this table should have only wire or only
>>>> vports?
>>
>> Jump rules say so. PMD can analyze there rules.
>> May be just need an attribute saying that all jump rules
>> to the table are configured and further attempts to reconfigure
>> will be rejected?
>>
>
> The idea is the PMD will not analyze rules. That is why we have the table
> and template.

PMDs will not analyze **rules**, yes. But that does not dismiss the
need to analyze **tables** and **templates** when they are created.
I.e. table/template creation is some sort of "cold"/"slow" path.
The PMD sees the item in the pattern and translates it to the
internal representation of the table. Just like it **would**
do in case of the attribute approach. But when the rules
are inserted (**hot** async path), the PMD should just
collect exact "spec" values from the pattern without
analyzing it, as per the previously learned template.

From the HW resource usage perspective (in your case),
why isn't such design good enough?

> Sorry, I don't understand what attribute can be in jump? The jump is just
> to table. It can't say anything about the table destination table.
> This is all this patch adds the attribute to a table to say where this
> table should be located.
>
>>>>
>>>> I agree with Thomas, there are two valid options, I vote for the hint
>>>> since this is the
>>>> feature idea to tell the PMD where this resource should be allocated.
>>>
>>> This is an optimization; I agree with Ori that a hint is appropriate, like the
>> MBUF_FAST_FREE hint on TX queues.
>>>
>>> No need to add more complexity by requiring the driver to recognize that
>> the pattern is present in all templates. (And perhaps also remove that
>> pattern when applying the templates.)
>>
>> What does the part of the matching criteria so special
>> that it is allowed to have dedicated hint attribute?
>>
>> May be we can have really generic solution when any
>> part of the matching criteria could provide such hints?
>
> That is the point I keep returning to, it is not matching!

Let's face it: these attributes are in fact matching, which,
in the case of MLX5, is translated into resource properties.
I.e., to MLX5 (internally!), these attributes are indeed
not matching but separate resource allocation. Got it.

But what about other vendors? I guess, hardly can someone
say for sure that others' internals work the same way...

> This is on which HW resource the table should be allocated.
> Think about ingress/egress/transfer why are they not in the  pattern?

- ingres/egress only applies to non-transfer rules
   and serves to catch either incoming or outcoming
   traffic of the single "door" (ethdev)

   (furthermore, these attributes had been defined
    long before the transfer concept was added, so
    even if we NOW realise these attributes **could**
    have been expressed in the form of items, I'm
    afraid it's no use crying over spilt milk)

- transfer is not in pattern because it is not
   a match criterion; it is in fact the indication
   of which **match engine** to use: either the
   one of the embedded switch or the one of
   the vNIC / ethdev

> They are where rules should be offloaded, they are different domain.

It's OK to say that generic concept of "embedded switch level",
or "transfer domain", in the case of MLX5, is in turn split
into two different HW domains, - it's vendor-specific
internals, - but it's not OK to assume that the same
separation is also valid for other vendors.

> Like we have elsewhere for example in action create we can state on which
> domain the action should be created. If the application selects a number of domains
> it may mean that extra resources will be allocated.

Could you please expand on this / give an example?
Just for me to check whether my point of view
could be wrong based on the example or not.

>
>
>
>
>

Ivan
  
Ori Kam Sept. 22, 2022, 1 p.m. UTC | #32
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, 22 September 2022 13:31
> 
> On 9/22/22 13:06, Ori Kam wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Thursday, 22 September 2022 10:39
> >>
> >> On 9/21/22 15:51, Morten Brørup wrote:
> >>>> From: Ori Kam [mailto:orika@nvidia.com]
> >>>> Sent: Wednesday, 21 September 2022 14.41
> >>>>
> >>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>
> >>>>> On 9/21/22 12:40, Thomas Monjalon wrote:
> >>>>>> 21/09/2022 11:04, Ivan Malov:
> >>>>>>> Now it's clear to me that your intention is to match on exact
> >>>> ports,
> >>>>>>> as usual, but this time with a hint for the flow table. Got it.
> >>>>>>>
> >>>>>>> In your response, you say that matching on ALL vports is not what
> >>>>>>> the use case needs. OK, I understood. But please note that the
> >>>>>>> item name does not say "ALL", it says "ANY".
> >>>>>>>
> >>>>>>> OK. Say, "ANY" is also confusing. Let's then name it
> "VPORTS_ONLY"
> >>>>>>> and "PHY_PORTS_ONLY". This way, if user provides item
> >> VPORTS_ONLY
> >>>>>>> and then  provides item REPRESENTED_PORT, these two items do
> not
> >>>>>>> contradict each other. Item VPORTS_ONLY defines the scope of
> some
> >>>>>>> kind, then the following item, REPRESENTED_PORT, makes it
> >>>> narrower.
> >>>>>>>
> >>>>>>> And, in documentation, one can say clearly that the user *may*
> >>>>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
> >>>>>>> they have already submitted this item as part of the template.
> >>>>>>
> >>>>>> I think the problem that Rongwei & Ori are trying to solve
> >>>>>> is to allocate resources for the templates table in the right
> >>>> place.
> >>>>>> A table can have multiple templates.
> >>>>>> If all rules/templates for this table are dedicated to virtual
> >>>> ports,
> >>>>>> then the table will be allocated in a place managing only virtual
> >>>> ports.
> >>>>>> This allocation decision must be taken at table creation,
> >>>>>> whereas rules will be created later.
> >>>>>> In order to do this specific table allocation for vports,
> >>>>>> we need to restrict all templates of the table to be "vports only".
> >>>>>>
> >>>>>> I hope it makes things clearer.
> >>>>>> Now the question is how to achieve this? Solutions are:
> >>>>>>
> >>>>>> 1/ give a hint to the table allocation
> >>>>>> 2/ insert a pattern item in all templates of the table
> >>>>>>
> >>>>>> I don't see any other solution. Please propose if there are more
> >>>> options.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> See my mail
> >>>>>
> >>>>> 3/ use jump rule which ensures that all traffic meets out
> >>>>>       expectations
> >>>>>
> >>>>> It means that the table creation could be postponed. Or the
> >>>>> table could be per-configured at the point of creation and
> >>>>> finalized when we know that all traffic will be from wires
> >>>>> or from vports. Yes, it complicates internals to achieve
> >>>>> the optimization.
> >>>>
> >>>> Sorry Andrew your suggestion is not a valid one for the following
> >>>> reasons:
> >>>> 1. table creation can't be postponed this is a key idea of the rte_flow
> >>>> template API.
> >>
> >> I guess nobody cares if it delays insertion on the first rule
> >> only. Anyway, see below.
> >>
> >>>> 2. we can never know what rules will be inserted if the application
> >>>> doesn't tell us.
> >>>>        how can we know this is the last rule? What do we do with the
> >>>> first rule?
> >>>> 3. I don't see how jumping helps since it worsens the issue when you
> >>>> jump to a table,
> >>>>       how does the PMD know if this table should have only wire or only
> >>>> vports?
> >>
> >> Jump rules say so. PMD can analyze there rules.
> >> May be just need an attribute saying that all jump rules
> >> to the table are configured and further attempts to reconfigure
> >> will be rejected?
> >>
> >
> > The idea is the PMD will not analyze rules. That is why we have the table
> > and template.
> > Sorry, I don't understand what attribute can be in jump? The jump is just
> > to table. It can't say anything about the table destination table.
> > This is all this patch adds the attribute to a table to say where this
> > table should be located.
> >
> >>>>
> >>>> I agree with Thomas, there are two valid options, I vote for the hint
> >>>> since this is the
> >>>> feature idea to tell the PMD where this resource should be allocated.
> >>>
> >>> This is an optimization; I agree with Ori that a hint is appropriate, like the
> >> MBUF_FAST_FREE hint on TX queues.
> >>>
> >>> No need to add more complexity by requiring the driver to recognize
> that
> >> the pattern is present in all templates. (And perhaps also remove that
> >> pattern when applying the templates.)
> >>
> >> What does the part of the matching criteria so special
> >> that it is allowed to have dedicated hint attribute?
> >>
> >> May be we can have really generic solution when any
> >> part of the matching criteria could provide such hints?
> >
> > That is the point I keep returning to, it is not matching!
> > This is on which HW resource the table should be allocated.
> 
> Sorry, but it is just your HW details that you have different
> location/resources for rules which apply on packets coming
> from wire and coming from host (vports).
> 


Right, maybe other HW may have this issue, and this patch
can help them but currently, this patch solves something in Nvidia HW.
Template API is all about giving hints, some of the hints can be used
only buy some PMDs.
I promise that any vendor that has some way to optimize its PMD
I will support, may differently name or different place but not all PMD
are equal, each one needs its hints.


> > Think about ingress/egress/transfer why are they not in the  pattern?
> 
> We have no ingress/egress in transfer domain any more because
> it is ambiguous.
> 

Yes that is why the name is wire and non wire,

> Transfer itself is really a different domain. Logically and
> from privileges point of view. That's why it is important to
> distinguish it.
> 
> Ingress and egress in non-transfer case are natively bound
> to two main functions of the driver: transmit (egress rules)
> and receive (ingress rules). In general, it is a matching
> criteria as well, but because of its nature (explained
> above) it is simply handy to distinguish it from the very
> beginning.
> 

I agree with you, but this is just to show that even if something can be treated
as matching it is not the best way to look at it that way.

> > They are where rules should be offloaded, they are different domain.
> 
> We have just two domains: transfer and non-transfer.
> 
> > Like we have elsewhere for example in action create we can state on which
> > domain the action should be created. If the application selects a number of
> domains
> > it may mean that extra resources will be allocated.>
> 
> Two more points:
> 
> 1/ If it is just a hint, it is optional for PMD to
>    support/handle it. It means that it MUST NOT impose any
>    limitations on matching. If so, if you want a rule to
>    be applied on packets coming from wire, you still MUST
>    specify it in the pattern.
>    So, it does not sound like a hint in your case.

Right it is optional if the application doesn't give this hint
the PMD will create just like it does now tables for both wire and
non wire.

> 
> 2/ struct rte_flow_attr is used for really all rules.
>     How a new attribute should be interpreted in non-transfer
>     rules? Similar to ingress/egress? Duplication?
>     Or even harder (if it is NOT a hint): should it really
>     enforce matching of packets coming from wire (i.e. not
>     a different vport)? Not sure that it is doable or even
>     make sense.
>     We can say that the attribute may be used for the transfer
>     rules only. If so, it MUST be checked on ethdev level
>     since it is a generic rule.
> 

From my point of view, it should be treated only in case of transfer,
I think it also stated in the original commit this way.
Why should we validate it? We don't validate if the application
set transfer + ingress/egress or just ingress+egress or non.


> 3/ struct rte_flow_attr is used for sync and async rules.
>     As I understand you're using it for async rules only.
>     Does it make sense for sync rules?

Yes, it can save insertion. Since even the sync API since it doesn't have this bit
duplicate the rule. 
But if pressed we can move it to the table attribute, do you think it will be better?
  
Ori Kam Sept. 22, 2022, 2:46 p.m. UTC | #33
Hi Ivan,

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Thursday, 22 September 2022 15:43
> 
> Hi Ori,
> 
> On Thu, 22 Sep 2022, Ori Kam wrote:
> 
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Thursday, 22 September 2022 10:39
> >>
> >> On 9/21/22 15:51, Morten Brørup wrote:
> >>>> From: Ori Kam [mailto:orika@nvidia.com]
> >>>> Sent: Wednesday, 21 September 2022 14.41
> >>>>
> >>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>
> >>>>> On 9/21/22 12:40, Thomas Monjalon wrote:
> >>>>>> 21/09/2022 11:04, Ivan Malov:
> >>>>>>> Now it's clear to me that your intention is to match on exact
> >>>> ports,
> >>>>>>> as usual, but this time with a hint for the flow table. Got it.
> >>>>>>>
> >>>>>>> In your response, you say that matching on ALL vports is not what
> >>>>>>> the use case needs. OK, I understood. But please note that the
> >>>>>>> item name does not say "ALL", it says "ANY".
> >>>>>>>
> >>>>>>> OK. Say, "ANY" is also confusing. Let's then name it
> "VPORTS_ONLY"
> >>>>>>> and "PHY_PORTS_ONLY". This way, if user provides item
> >> VPORTS_ONLY
> >>>>>>> and then  provides item REPRESENTED_PORT, these two items do
> not
> >>>>>>> contradict each other. Item VPORTS_ONLY defines the scope of
> some
> >>>>>>> kind, then the following item, REPRESENTED_PORT, makes it
> >>>> narrower.
> >>>>>>>
> >>>>>>> And, in documentation, one can say clearly that the user *may*
> >>>>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
> >>>>>>> they have already submitted this item as part of the template.
> >>>>>>
> >>>>>> I think the problem that Rongwei & Ori are trying to solve
> >>>>>> is to allocate resources for the templates table in the right
> >>>> place.
> >>>>>> A table can have multiple templates.
> >>>>>> If all rules/templates for this table are dedicated to virtual
> >>>> ports,
> >>>>>> then the table will be allocated in a place managing only virtual
> >>>> ports.
> >>>>>> This allocation decision must be taken at table creation,
> >>>>>> whereas rules will be created later.
> >>>>>> In order to do this specific table allocation for vports,
> >>>>>> we need to restrict all templates of the table to be "vports only".
> >>>>>>
> >>>>>> I hope it makes things clearer.
> >>>>>> Now the question is how to achieve this? Solutions are:
> >>>>>>
> >>>>>> 1/ give a hint to the table allocation
> >>>>>> 2/ insert a pattern item in all templates of the table
> >>>>>>
> >>>>>> I don't see any other solution. Please propose if there are more
> >>>> options.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> See my mail
> >>>>>
> >>>>> 3/ use jump rule which ensures that all traffic meets out
> >>>>>      expectations
> >>>>>
> >>>>> It means that the table creation could be postponed. Or the
> >>>>> table could be per-configured at the point of creation and
> >>>>> finalized when we know that all traffic will be from wires
> >>>>> or from vports. Yes, it complicates internals to achieve
> >>>>> the optimization.
> >>>>
> >>>> Sorry Andrew your suggestion is not a valid one for the following
> >>>> reasons:
> >>>> 1. table creation can't be postponed this is a key idea of the rte_flow
> >>>> template API.
> >>
> >> I guess nobody cares if it delays insertion on the first rule
> >> only. Anyway, see below.
> >>
> >>>> 2. we can never know what rules will be inserted if the application
> >>>> doesn't tell us.
> >>>>       how can we know this is the last rule? What do we do with the
> >>>> first rule?
> >>>> 3. I don't see how jumping helps since it worsens the issue when you
> >>>> jump to a table,
> >>>>      how does the PMD know if this table should have only wire or only
> >>>> vports?
> >>
> >> Jump rules say so. PMD can analyze there rules.
> >> May be just need an attribute saying that all jump rules
> >> to the table are configured and further attempts to reconfigure
> >> will be rejected?
> >>
> >
> > The idea is the PMD will not analyze rules. That is why we have the table
> > and template.
> 
> PMDs will not analyze **rules**, yes. But that does not dismiss the
> need to analyze **tables** and **templates** when they are created.
> I.e. table/template creation is some sort of "cold"/"slow" path.
> The PMD sees the item in the pattern and translates it to the
> internal representation of the table. Just like it **would**
> do in case of the attribute approach. But when the rules
> are inserted (**hot** async path), the PMD should just
> collect exact "spec" values from the pattern without
> analyzing it, as per the previously learned template.
> 

Right so why should we force the application to give us what 
we both agree is just an hint for each rule, the PMD will not use it
so why give it?

> From the HW resource usage perspective (in your case),
> why isn't such design good enough?
> 

From my first reply I told you that both ways can work.
I think as SW developer (not as Nvidia guy) that since this is an attribute
for the table, just like you said the only place we use it is during table creation
the correct place for it is in the table and not as part of the pattern.
Pure SW design.

> > Sorry, I don't understand what attribute can be in jump? The jump is just
> > to table. It can't say anything about the table destination table.
> > This is all this patch adds the attribute to a table to say where this
> > table should be located.
> >
> >>>>
> >>>> I agree with Thomas, there are two valid options, I vote for the hint
> >>>> since this is the
> >>>> feature idea to tell the PMD where this resource should be allocated.
> >>>
> >>> This is an optimization; I agree with Ori that a hint is appropriate, like the
> >> MBUF_FAST_FREE hint on TX queues.
> >>>
> >>> No need to add more complexity by requiring the driver to recognize
> that
> >> the pattern is present in all templates. (And perhaps also remove that
> >> pattern when applying the templates.)
> >>
> >> What does the part of the matching criteria so special
> >> that it is allowed to have dedicated hint attribute?
> >>
> >> May be we can have really generic solution when any
> >> part of the matching criteria could provide such hints?
> >
> > That is the point I keep returning to, it is not matching!
> 
> Let's face it: these attributes are in fact matching, which,
> in the case of MLX5, is translated into resource properties.
> I.e., to MLX5 (internally!), these attributes are indeed
> not matching but separate resource allocation. Got it.
> 

Happy to hear.

> But what about other vendors? I guess, hardly can someone
> say for sure that others' internals work the same way...
> 

Maybe some please see my  answer to Andrew,
the idea is that we in DPDK want the best insertion for all vendors,
any vendor that thinks he can get a perf boost by getting a hint from 
the application will get my support. This is the idea of template API and
fast insertion.

> > This is on which HW resource the table should be allocated.
> > Think about ingress/egress/transfer why are they not in the  pattern?
> 
> - ingres/egress only applies to non-transfer rules
>    and serves to catch either incoming or outcoming
>    traffic of the single "door" (ethdev)
> 
>    (furthermore, these attributes had been defined
>     long before the transfer concept was added, so
>     even if we NOW realise these attributes **could**
>     have been expressed in the form of items, I'm
>     afraid it's no use crying over spilt milk)
> 
> - transfer is not in pattern because it is not
>    a match criterion; it is in fact the indication
>    of which **match engine** to use: either the
>    one of the embedded switch or the one of
>    the vNIC / ethdev
> 

This was just to show some point that there are cases that
even if something could be used as item maybe it is not the best
way.
In any case just like my reply about. From pure SW point of view
I think that it more correct to have it as an attribute.

> > They are where rules should be offloaded, they are different domain.
> 
> It's OK to say that generic concept of "embedded switch level",
> or "transfer domain", in the case of MLX5, is in turn split
> into two different HW domains, - it's vendor-specific
> internals, - but it's not OK to assume that the same
> separation is also valid for other vendors.
> 

Never said it is true to all vendors, I guess to some but for sure not all of them.
Just like I'm sure not all vendors will use other hints.

> > Like we have elsewhere for example in action create we can state on which
> > domain the action should be created. If the application selects a number of
> domains
> > it may mean that extra resources will be allocated.
> 
> Could you please expand on this / give an example?
> Just for me to check whether my point of view
> could be wrong based on the example or not.
>

Let's look at rte_flow_action_handle_create one of the conf parameters is ingress/egress/transfer
application may mark an action to be used only in ingress or in ingress+egress
if the application selects ingress+egress it is possible that insertion rate and PPS
maybe slower.
I hope this makes it clearer.

Best,
Ori
 
> >
> >
> >
> >
> >
> 
> Ivan
  
Andrew Rybchenko Sept. 23, 2022, 7:25 a.m. UTC | #34
Hi Ori,

On 9/22/22 16:00, Ori Kam wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Thursday, 22 September 2022 13:31
>>
>> On 9/22/22 13:06, Ori Kam wrote:
>>> Hi Andrew,
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Thursday, 22 September 2022 10:39
>>>>
>>>> On 9/21/22 15:51, Morten Brørup wrote:
>>>>>> From: Ori Kam [mailto:orika@nvidia.com]
>>>>>> Sent: Wednesday, 21 September 2022 14.41
>>>>>>
>>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>>>>
>>>>>>> On 9/21/22 12:40, Thomas Monjalon wrote:
>>>>>>>> 21/09/2022 11:04, Ivan Malov:
>>>>>>>>> Now it's clear to me that your intention is to match on exact
>>>>>> ports,
>>>>>>>>> as usual, but this time with a hint for the flow table. Got it.
>>>>>>>>>
>>>>>>>>> In your response, you say that matching on ALL vports is not what
>>>>>>>>> the use case needs. OK, I understood. But please note that the
>>>>>>>>> item name does not say "ALL", it says "ANY".
>>>>>>>>>
>>>>>>>>> OK. Say, "ANY" is also confusing. Let's then name it
>> "VPORTS_ONLY"
>>>>>>>>> and "PHY_PORTS_ONLY". This way, if user provides item
>>>> VPORTS_ONLY
>>>>>>>>> and then  provides item REPRESENTED_PORT, these two items do
>> not
>>>>>>>>> contradict each other. Item VPORTS_ONLY defines the scope of
>> some
>>>>>>>>> kind, then the following item, REPRESENTED_PORT, makes it
>>>>>> narrower.
>>>>>>>>>
>>>>>>>>> And, in documentation, one can say clearly that the user *may*
>>>>>>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
>>>>>>>>> they have already submitted this item as part of the template.
>>>>>>>>
>>>>>>>> I think the problem that Rongwei & Ori are trying to solve
>>>>>>>> is to allocate resources for the templates table in the right
>>>>>> place.
>>>>>>>> A table can have multiple templates.
>>>>>>>> If all rules/templates for this table are dedicated to virtual
>>>>>> ports,
>>>>>>>> then the table will be allocated in a place managing only virtual
>>>>>> ports.
>>>>>>>> This allocation decision must be taken at table creation,
>>>>>>>> whereas rules will be created later.
>>>>>>>> In order to do this specific table allocation for vports,
>>>>>>>> we need to restrict all templates of the table to be "vports only".
>>>>>>>>
>>>>>>>> I hope it makes things clearer.
>>>>>>>> Now the question is how to achieve this? Solutions are:
>>>>>>>>
>>>>>>>> 1/ give a hint to the table allocation
>>>>>>>> 2/ insert a pattern item in all templates of the table
>>>>>>>>
>>>>>>>> I don't see any other solution. Please propose if there are more
>>>>>> options.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> See my mail
>>>>>>>
>>>>>>> 3/ use jump rule which ensures that all traffic meets out
>>>>>>>        expectations
>>>>>>>
>>>>>>> It means that the table creation could be postponed. Or the
>>>>>>> table could be per-configured at the point of creation and
>>>>>>> finalized when we know that all traffic will be from wires
>>>>>>> or from vports. Yes, it complicates internals to achieve
>>>>>>> the optimization.
>>>>>>
>>>>>> Sorry Andrew your suggestion is not a valid one for the following
>>>>>> reasons:
>>>>>> 1. table creation can't be postponed this is a key idea of the rte_flow
>>>>>> template API.
>>>>
>>>> I guess nobody cares if it delays insertion on the first rule
>>>> only. Anyway, see below.
>>>>
>>>>>> 2. we can never know what rules will be inserted if the application
>>>>>> doesn't tell us.
>>>>>>         how can we know this is the last rule? What do we do with the
>>>>>> first rule?
>>>>>> 3. I don't see how jumping helps since it worsens the issue when you
>>>>>> jump to a table,
>>>>>>        how does the PMD know if this table should have only wire or only
>>>>>> vports?
>>>>
>>>> Jump rules say so. PMD can analyze there rules.
>>>> May be just need an attribute saying that all jump rules
>>>> to the table are configured and further attempts to reconfigure
>>>> will be rejected?
>>>>
>>>
>>> The idea is the PMD will not analyze rules. That is why we have the table
>>> and template.
>>> Sorry, I don't understand what attribute can be in jump? The jump is just
>>> to table. It can't say anything about the table destination table.
>>> This is all this patch adds the attribute to a table to say where this
>>> table should be located.
>>>
>>>>>>
>>>>>> I agree with Thomas, there are two valid options, I vote for the hint
>>>>>> since this is the
>>>>>> feature idea to tell the PMD where this resource should be allocated.
>>>>>
>>>>> This is an optimization; I agree with Ori that a hint is appropriate, like the
>>>> MBUF_FAST_FREE hint on TX queues.
>>>>>
>>>>> No need to add more complexity by requiring the driver to recognize
>> that
>>>> the pattern is present in all templates. (And perhaps also remove that
>>>> pattern when applying the templates.)
>>>>
>>>> What does the part of the matching criteria so special
>>>> that it is allowed to have dedicated hint attribute?
>>>>
>>>> May be we can have really generic solution when any
>>>> part of the matching criteria could provide such hints?
>>>
>>> That is the point I keep returning to, it is not matching!
>>> This is on which HW resource the table should be allocated.
>>
>> Sorry, but it is just your HW details that you have different
>> location/resources for rules which apply on packets coming
>> from wire and coming from host (vports).
>>
> 
> 
> Right, maybe other HW may have this issue, and this patch
> can help them but currently, this patch solves something in Nvidia HW.
> Template API is all about giving hints, some of the hints can be used
> only buy some PMDs.
> I promise that any vendor that has some way to optimize its PMD
> I will support, may differently name or different place but not all PMD
> are equal, each one needs its hints.
> 
> 
>>> Think about ingress/egress/transfer why are they not in the  pattern?
>>
>> We have no ingress/egress in transfer domain any more because
>> it is ambiguous.
>>
> 
> Yes that is why the name is wire and non wire,

My question here is why application really needs to know it.
Why does it make the difference?
IMHO for a VM which uses some function everything coming to it
is from the logical wire.
Of course since it is a transfer layer, we are talking about an
application like OvS. May be OvS knows the difference...

> 
>> Transfer itself is really a different domain. Logically and
>> from privileges point of view. That's why it is important to
>> distinguish it.
>>
>> Ingress and egress in non-transfer case are natively bound
>> to two main functions of the driver: transmit (egress rules)
>> and receive (ingress rules). In general, it is a matching
>> criteria as well, but because of its nature (explained
>> above) it is simply handy to distinguish it from the very
>> beginning.
>>
> 
> I agree with you, but this is just to show that even if something can be treated
> as matching it is not the best way to look at it that way.
> 
>>> They are where rules should be offloaded, they are different domain.
>>
>> We have just two domains: transfer and non-transfer.
>>
>>> Like we have elsewhere for example in action create we can state on which
>>> domain the action should be created. If the application selects a number of
>> domains
>>> it may mean that extra resources will be allocated.>
>>
>> Two more points:
>>
>> 1/ If it is just a hint, it is optional for PMD to
>>     support/handle it. It means that it MUST NOT impose any
>>     limitations on matching. If so, if you want a rule to
>>     be applied on packets coming from wire, you still MUST
>>     specify it in the pattern.
>>     So, it does not sound like a hint in your case.
> 
> Right it is optional if the application doesn't give this hint
> the PMD will create just like it does now tables for both wire and
> non wire.

Let's make it clear here and in the attribute documentation.
You're taking about one side, but there is an another one.
If it is just a hint which is optional to specify/interpret,
it does not impose any matching criteria. So, if a rule does
not specify source, the rule must be applied on traffic
coming from both wire and not-wire. In order to limit
souses, we still need matching criteria anyway - i.e. pattern
item. Moreover, if a PMD supports the hint and matching
criteria contradicts it, flow rule insertion must fail.

Could you please confirm that we share our understanding here.

> 
>>
>> 2/ struct rte_flow_attr is used for really all rules.
>>      How a new attribute should be interpreted in non-transfer
>>      rules? Similar to ingress/egress? Duplication?
>>      Or even harder (if it is NOT a hint): should it really
>>      enforce matching of packets coming from wire (i.e. not
>>      a different vport)? Not sure that it is doable or even
>>      make sense.
>>      We can say that the attribute may be used for the transfer
>>      rules only. If so, it MUST be checked on ethdev level
>>      since it is a generic rule.
>>
> 
>  From my point of view, it should be treated only in case of transfer,
> I think it also stated in the original commit this way.
> Why should we validate it?

Because it is a generic limitation. Otherwise each PMD must
check it. The check is required to avoid misusage.

> We don't validate if the application
> set transfer + ingress/egress or just ingress+egress or non.

We're going to add corresponding checks on ethdev when we
finalize deprecation of ingress/egress in transfer rules.

> 
> 
>> 3/ struct rte_flow_attr is used for sync and async rules.
>>      As I understand you're using it for async rules only.
>>      Does it make sense for sync rules?
> 
> Yes, it can save insertion. Since even the sync API since it doesn't have this bit
> duplicate the rule.

Sorry I don't understand above.

> But if pressed we can move it to the table attribute, do you think it will be better?

If it is not a generic thing, IMHO it is better to put in
table attributes.

Andrew.
  
Ori Kam Sept. 23, 2022, 4:11 p.m. UTC | #35
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Friday, 23 September 2022 10:26
> 
> Hi Ori,
> 
> On 9/22/22 16:00, Ori Kam wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Thursday, 22 September 2022 13:31
> >>
> >> On 9/22/22 13:06, Ori Kam wrote:
> >>> Hi Andrew,
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Thursday, 22 September 2022 10:39
> >>>>
> >>>> On 9/21/22 15:51, Morten Brørup wrote:
> >>>>>> From: Ori Kam [mailto:orika@nvidia.com]
> >>>>>> Sent: Wednesday, 21 September 2022 14.41
> >>>>>>
> >>>>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>>>>>
> >>>>>>> On 9/21/22 12:40, Thomas Monjalon wrote:
> >>>>>>>> 21/09/2022 11:04, Ivan Malov:
> >>>>>>>>> Now it's clear to me that your intention is to match on exact
> >>>>>> ports,
> >>>>>>>>> as usual, but this time with a hint for the flow table. Got it.
> >>>>>>>>>
> >>>>>>>>> In your response, you say that matching on ALL vports is not
> what
> >>>>>>>>> the use case needs. OK, I understood. But please note that the
> >>>>>>>>> item name does not say "ALL", it says "ANY".
> >>>>>>>>>
> >>>>>>>>> OK. Say, "ANY" is also confusing. Let's then name it
> >> "VPORTS_ONLY"
> >>>>>>>>> and "PHY_PORTS_ONLY". This way, if user provides item
> >>>> VPORTS_ONLY
> >>>>>>>>> and then  provides item REPRESENTED_PORT, these two items
> do
> >> not
> >>>>>>>>> contradict each other. Item VPORTS_ONLY defines the scope of
> >> some
> >>>>>>>>> kind, then the following item, REPRESENTED_PORT, makes it
> >>>>>> narrower.
> >>>>>>>>>
> >>>>>>>>> And, in documentation, one can say clearly that the user *may*
> >>>>>>>>> omit item VPORTS_ONLY in the exact rule pattern provided that
> >>>>>>>>> they have already submitted this item as part of the template.
> >>>>>>>>
> >>>>>>>> I think the problem that Rongwei & Ori are trying to solve
> >>>>>>>> is to allocate resources for the templates table in the right
> >>>>>> place.
> >>>>>>>> A table can have multiple templates.
> >>>>>>>> If all rules/templates for this table are dedicated to virtual
> >>>>>> ports,
> >>>>>>>> then the table will be allocated in a place managing only virtual
> >>>>>> ports.
> >>>>>>>> This allocation decision must be taken at table creation,
> >>>>>>>> whereas rules will be created later.
> >>>>>>>> In order to do this specific table allocation for vports,
> >>>>>>>> we need to restrict all templates of the table to be "vports only".
> >>>>>>>>
> >>>>>>>> I hope it makes things clearer.
> >>>>>>>> Now the question is how to achieve this? Solutions are:
> >>>>>>>>
> >>>>>>>> 1/ give a hint to the table allocation
> >>>>>>>> 2/ insert a pattern item in all templates of the table
> >>>>>>>>
> >>>>>>>> I don't see any other solution. Please propose if there are more
> >>>>>> options.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> See my mail
> >>>>>>>
> >>>>>>> 3/ use jump rule which ensures that all traffic meets out
> >>>>>>>        expectations
> >>>>>>>
> >>>>>>> It means that the table creation could be postponed. Or the
> >>>>>>> table could be per-configured at the point of creation and
> >>>>>>> finalized when we know that all traffic will be from wires
> >>>>>>> or from vports. Yes, it complicates internals to achieve
> >>>>>>> the optimization.
> >>>>>>
> >>>>>> Sorry Andrew your suggestion is not a valid one for the following
> >>>>>> reasons:
> >>>>>> 1. table creation can't be postponed this is a key idea of the rte_flow
> >>>>>> template API.
> >>>>
> >>>> I guess nobody cares if it delays insertion on the first rule
> >>>> only. Anyway, see below.
> >>>>
> >>>>>> 2. we can never know what rules will be inserted if the application
> >>>>>> doesn't tell us.
> >>>>>>         how can we know this is the last rule? What do we do with the
> >>>>>> first rule?
> >>>>>> 3. I don't see how jumping helps since it worsens the issue when you
> >>>>>> jump to a table,
> >>>>>>        how does the PMD know if this table should have only wire or
> only
> >>>>>> vports?
> >>>>
> >>>> Jump rules say so. PMD can analyze there rules.
> >>>> May be just need an attribute saying that all jump rules
> >>>> to the table are configured and further attempts to reconfigure
> >>>> will be rejected?
> >>>>
> >>>
> >>> The idea is the PMD will not analyze rules. That is why we have the table
> >>> and template.
> >>> Sorry, I don't understand what attribute can be in jump? The jump is just
> >>> to table. It can't say anything about the table destination table.
> >>> This is all this patch adds the attribute to a table to say where this
> >>> table should be located.
> >>>
> >>>>>>
> >>>>>> I agree with Thomas, there are two valid options, I vote for the hint
> >>>>>> since this is the
> >>>>>> feature idea to tell the PMD where this resource should be
> allocated.
> >>>>>
> >>>>> This is an optimization; I agree with Ori that a hint is appropriate, like
> the
> >>>> MBUF_FAST_FREE hint on TX queues.
> >>>>>
> >>>>> No need to add more complexity by requiring the driver to recognize
> >> that
> >>>> the pattern is present in all templates. (And perhaps also remove that
> >>>> pattern when applying the templates.)
> >>>>
> >>>> What does the part of the matching criteria so special
> >>>> that it is allowed to have dedicated hint attribute?
> >>>>
> >>>> May be we can have really generic solution when any
> >>>> part of the matching criteria could provide such hints?
> >>>
> >>> That is the point I keep returning to, it is not matching!
> >>> This is on which HW resource the table should be allocated.
> >>
> >> Sorry, but it is just your HW details that you have different
> >> location/resources for rules which apply on packets coming
> >> from wire and coming from host (vports).
> >>
> >
> >
> > Right, maybe other HW may have this issue, and this patch
> > can help them but currently, this patch solves something in Nvidia HW.
> > Template API is all about giving hints, some of the hints can be used
> > only buy some PMDs.
> > I promise that any vendor that has some way to optimize its PMD
> > I will support, may differently name or different place but not all PMD
> > are equal, each one needs its hints.
> >
> >
> >>> Think about ingress/egress/transfer why are they not in the  pattern?
> >>
> >> We have no ingress/egress in transfer domain any more because
> >> it is ambiguous.
> >>
> >
> > Yes that is why the name is wire and non wire,
> 
> My question here is why application really needs to know it.
> Why does it make the difference?
> IMHO for a VM which uses some function everything coming to it
> is from the logical wire.
> Of course since it is a transfer layer, we are talking about an
> application like OvS. May be OvS knows the difference...
> 

We are talking about the app that controls the switch,
for example OVS, we are not talking about an application that can
control only NIC (ingress/egress)
In any case the idea of the template API is to optimize applications
that has prior knowledge and can give hints to the PMD so
they will get better insertion and resource allocation.
If the application doesn't know, that is also O.K but it will not
get the most performance boost.

> >
> >> Transfer itself is really a different domain. Logically and
> >> from privileges point of view. That's why it is important to
> >> distinguish it.
> >>
> >> Ingress and egress in non-transfer case are natively bound
> >> to two main functions of the driver: transmit (egress rules)
> >> and receive (ingress rules). In general, it is a matching
> >> criteria as well, but because of its nature (explained
> >> above) it is simply handy to distinguish it from the very
> >> beginning.
> >>
> >
> > I agree with you, but this is just to show that even if something can be
> treated
> > as matching it is not the best way to look at it that way.
> >
> >>> They are where rules should be offloaded, they are different domain.
> >>
> >> We have just two domains: transfer and non-transfer.
> >>
> >>> Like we have elsewhere for example in action create we can state on
> which
> >>> domain the action should be created. If the application selects a number
> of
> >> domains
> >>> it may mean that extra resources will be allocated.>
> >>
> >> Two more points:
> >>
> >> 1/ If it is just a hint, it is optional for PMD to
> >>     support/handle it. It means that it MUST NOT impose any
> >>     limitations on matching. If so, if you want a rule to
> >>     be applied on packets coming from wire, you still MUST
> >>     specify it in the pattern.
> >>     So, it does not sound like a hint in your case.
> >
> > Right it is optional if the application doesn't give this hint
> > the PMD will create just like it does now tables for both wire and
> > non wire.
> 
> Let's make it clear here and in the attribute documentation.
> You're taking about one side, but there is an another one.
> If it is just a hint which is optional to specify/interpret,
> it does not impose any matching criteria. So, if a rule does
> not specify source, the rule must be applied on traffic
> coming from both wire and not-wire. In order to limit
> souses, we still need matching criteria anyway - i.e. pattern
> item. Moreover, if a PMD supports the hint and matching
> criteria contradicts it, flow rule insertion must fail.
> 
> Could you please confirm that we share our understanding here.
> 

I fully agree that it should be clearly stated that this is only a hint.
I'm not sure I agree with you on the second part that says rule must be applied
on both sides if there is no source.
If the application gives this hint then he is bounded by it.
for example he can have only matching on ipv4 in group X
while in group x-1 it set a rule that moves traffic from wire to group X
so application knows that in group X there is only traffic that arrived from
wire without matching on it.

He can look at the relaxed_matching attribute, that states that
PMD should only match on fields that have a non zero mask 
and not the pattern order.

This is the same if application set relax matching and matches on UDP dport = 100
the PMD/HW will not verity that the packet is really UDP packet and just check the
dport field.

Like everything else application when giving an hint should know that the hint binds
it to it. It can't say I give a hint but will  o the reverse of the given hint.


> >
> >>
> >> 2/ struct rte_flow_attr is used for really all rules.
> >>      How a new attribute should be interpreted in non-transfer
> >>      rules? Similar to ingress/egress? Duplication?
> >>      Or even harder (if it is NOT a hint): should it really
> >>      enforce matching of packets coming from wire (i.e. not
> >>      a different vport)? Not sure that it is doable or even
> >>      make sense.
> >>      We can say that the attribute may be used for the transfer
> >>      rules only. If so, it MUST be checked on ethdev level
> >>      since it is a generic rule.
> >>
> >
> >  From my point of view, it should be treated only in case of transfer,
> > I think it also stated in the original commit this way.
> > Why should we validate it?
> 
> Because it is a generic limitation. Otherwise each PMD must
> check it. The check is required to avoid misusage.
> 
> > We don't validate if the application
> > set transfer + ingress/egress or just ingress+egress or non.
> 
> We're going to add corresponding checks on ethdev when we
> finalize deprecation of ingress/egress in transfer rules.
> 

I will nack this patch.
Since it adds extra checks that will result in perf degradation.
If the application doesn't something against the documations.
that doesn't cause system crash DPDK should not validate it.

Like you don't validate anything in data path. 

> >
> >
> >> 3/ struct rte_flow_attr is used for sync and async rules.
> >>      As I understand you're using it for async rules only.
> >>      Does it make sense for sync rules?
> >
> > Yes, it can save insertion. Since even the sync API since it doesn't have this
> bit
> > duplicate the rule.
> 
> Sorry I don't understand above.
> 

Sorry, something broke with my sentence.
Even using standard API this hint can help, since using it PMD can insert
The rule on only one hw resource and doesn't need to duplicate it.

> > But if pressed we can move it to the table attribute, do you think it will be
> better?
> 
> If it is not a generic thing, IMHO it is better to put in
> table attributes.
> 

So just to make sure I understand correctly, you perefer that all
none generic will only be in the template API?

I'm O.K with that.

> Andrew.

Thanks,
Ori
  

Patch

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 7f50028eb7..b25b595e82 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -177,6 +177,8 @@  enum index {
 	TABLE_INGRESS,
 	TABLE_EGRESS,
 	TABLE_TRANSFER,
+	TABLE_TRANSFER_WIRE_ORIG,
+	TABLE_TRANSFER_VF_ORIG,
 	TABLE_RULES_NUMBER,
 	TABLE_PATTERN_TEMPLATE,
 	TABLE_ACTIONS_TEMPLATE,
@@ -1141,6 +1143,8 @@  static const enum index next_table_attr[] = {
 	TABLE_INGRESS,
 	TABLE_EGRESS,
 	TABLE_TRANSFER,
+	TABLE_TRANSFER_WIRE_ORIG,
+	TABLE_TRANSFER_VF_ORIG,
 	TABLE_RULES_NUMBER,
 	TABLE_PATTERN_TEMPLATE,
 	TABLE_ACTIONS_TEMPLATE,
@@ -2881,6 +2885,18 @@  static const struct token token_list[] = {
 		.next = NEXT(next_table_attr),
 		.call = parse_table,
 	},
+	[TABLE_TRANSFER_WIRE_ORIG] = {
+		.name = "wire_orig",
+		.help = "affect rule direction to transfer",
+		.next = NEXT(next_table_attr),
+		.call = parse_table,
+	},
+	[TABLE_TRANSFER_VF_ORIG] = {
+		.name = "vf_orig",
+		.help = "affect rule direction to transfer",
+		.next = NEXT(next_table_attr),
+		.call = parse_table,
+	},
 	[TABLE_RULES_NUMBER] = {
 		.name = "rules_number",
 		.help = "number of rules in table",
@@ -8894,6 +8910,16 @@  parse_table(struct context *ctx, const struct token *token,
 	case TABLE_TRANSFER:
 		out->args.table.attr.flow_attr.transfer = 1;
 		return len;
+	case TABLE_TRANSFER_WIRE_ORIG:
+		if (!out->args.table.attr.flow_attr.transfer)
+			return -1;
+		out->args.table.attr.flow_attr.transfer_mode = 1;
+		return len;
+	case TABLE_TRANSFER_VF_ORIG:
+		if (!out->args.table.attr.flow_attr.transfer)
+			return -1;
+		out->args.table.attr.flow_attr.transfer_mode = 2;
+		return len;
 	default:
 		return -1;
 	}
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 330e34427d..603b7988dd 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3332,7 +3332,8 @@  It is bound to ``rte_flow_template_table_create()``::
 
    flow template_table {port_id} create
        [table_id {id}] [group {group_id}]
-       [priority {level}] [ingress] [egress] [transfer]
+       [priority {level}] [ingress] [egress]
+       [transfer [vf_orig] [wire_orig]]
        rules_number {number}
        pattern_template {pattern_template_id}
        actions_template {actions_template_id}
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index a79f1e7ef0..512b08d817 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -130,7 +130,14 @@  struct rte_flow_attr {
 	 * through a suitable port. @see rte_flow_pick_transfer_proxy().
 	 */
 	uint32_t transfer:1;
-	uint32_t reserved:29; /**< Reserved, must be zero. */
+	/**
+	 * 0 means bidirection,
+	 * 0x1 origin uplink,
+	 * 0x2 origin vport,
+	 * N/A both set.
+	 */
+	uint32_t transfer_mode:2;
+	uint32_t reserved:27; /**< Reserved, must be zero. */
 };
 
 /**