[RFC,1/3] ethdev: extend flow metadata

Message ID 20190603213231.27020-1-yskoh@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series [RFC,1/3] ethdev: extend flow metadata |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Yongseok Koh June 3, 2019, 9:32 p.m. UTC
  Currently, metadata can be set on egress path via mbuf tx_meatadata field
with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.

This patch extends the usability.

1) RTE_FLOW_ACTION_TYPE_SET_META

When supporting multiple tables, Tx metadata can also be set by a rule and
matched by another rule. This new action allows metadata to be set as a
result of flow match.

2) Metadata on ingress

There's also need to support metadata on packet Rx. Metadata can be set by
SET_META action and matched by META item like Tx. The final value set by
the action will be delivered to application via mbuf metadata field with
PKT_RX_METADATA ol_flag.

For this purpose, mbuf->tx_metadata is moved as a separate new field and
renamed to 'metadata' to support both Rx and Tx metadata.

For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
propagated to the other path depending on HW capability.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 app/test-pmd/util.c                   |  2 +-
 doc/guides/prog_guide/rte_flow.rst    | 73 +++++++++++++++++++--------
 drivers/net/mlx5/mlx5_rxtx.c          | 12 ++---
 drivers/net/mlx5/mlx5_rxtx_vec.c      |  4 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |  2 +-
 lib/librte_ethdev/rte_flow.h          | 43 ++++++++++++++--
 lib/librte_mbuf/rte_mbuf.h            | 21 ++++----
 8 files changed, 111 insertions(+), 48 deletions(-)
  

Comments

Andrew Rybchenko June 9, 2019, 2:23 p.m. UTC | #1
On 6/4/19 12:32 AM, Yongseok Koh wrote:
> Currently, metadata can be set on egress path via mbuf tx_meatadata field
> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
>
> This patch extends the usability.
>
> 1) RTE_FLOW_ACTION_TYPE_SET_META
>
> When supporting multiple tables, Tx metadata can also be set by a rule and
> matched by another rule. This new action allows metadata to be set as a
> result of flow match.
>
> 2) Metadata on ingress
>
> There's also need to support metadata on packet Rx. Metadata can be set by
> SET_META action and matched by META item like Tx. The final value set by
> the action will be delivered to application via mbuf metadata field with
> PKT_RX_METADATA ol_flag.
>
> For this purpose, mbuf->tx_metadata is moved as a separate new field and
> renamed to 'metadata' to support both Rx and Tx metadata.
>
> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> propagated to the other path depending on HW capability.
>
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

There is a mark on Rx which is delivered to application in hash.fdir.hi.
Why do we need one more 32-bit value set by NIC and delivered to 
application?
What is the difference between MARK and META on Rx?
When application should use MARK and when META?
Is there cases when both could be necessary?

Moreover, the third patch adds 32-bit tags which are not delivered to
application. May be META/MARK should be simply a kind of TAG (e.g. with
index 0 or marked using additional attribute) which is delivered to 
application?

(It is either API breakage (if tx_metadata is removed) or ABI breakage
if metadata and tx_metadata will share new location after shinfo).

Andrew.
  
Wang, Haiyue June 10, 2019, 3:19 a.m. UTC | #2
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Andrew Rybchenko
> Sent: Sunday, June 9, 2019 22:24
> To: Yongseok Koh <yskoh@mellanox.com>; shahafs@mellanox.com; thomas@monjalon.net; Yigit, Ferruh
> <ferruh.yigit@intel.com>; adrien.mazarguil@6wind.com; olivier.matz@6wind.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata
> 
> On 6/4/19 12:32 AM, Yongseok Koh wrote:
> > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> >
> > This patch extends the usability.
> >
> > 1) RTE_FLOW_ACTION_TYPE_SET_META
> >
> > When supporting multiple tables, Tx metadata can also be set by a rule and
> > matched by another rule. This new action allows metadata to be set as a
> > result of flow match.
> >
> > 2) Metadata on ingress
> >
> > There's also need to support metadata on packet Rx. Metadata can be set by
> > SET_META action and matched by META item like Tx. The final value set by
> > the action will be delivered to application via mbuf metadata field with
> > PKT_RX_METADATA ol_flag.
> >
> > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > renamed to 'metadata' to support both Rx and Tx metadata.
> >
> > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > propagated to the other path depending on HW capability.
> >
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> 
> There is a mark on Rx which is delivered to application in hash.fdir.hi.
> Why do we need one more 32-bit value set by NIC and delivered to
> application?
> What is the difference between MARK and META on Rx?
> When application should use MARK and when META?
> Is there cases when both could be necessary?
> 
In my understanding, MARK is FDIR related thing, META seems to be NIC
specific. And we also need this kind of specific data field to export
NIC's data to application.

> Moreover, the third patch adds 32-bit tags which are not delivered to
> application. May be META/MARK should be simply a kind of TAG (e.g. with
> index 0 or marked using additional attribute) which is delivered to
> application?
> 
> (It is either API breakage (if tx_metadata is removed) or ABI breakage
> if metadata and tx_metadata will share new location after shinfo).
> 
Make use of udata64 to export NIC metadata to application ?
	RTE_STD_C11
	union {
		void *userdata;   /**< Can be used for external metadata */
		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
		uint64_t rx_metadata;
	};
> Andrew.
  
Andrew Rybchenko June 10, 2019, 7:20 a.m. UTC | #3
On 6/10/19 6:19 AM, Wang, Haiyue wrote:
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Andrew Rybchenko
>> Sent: Sunday, June 9, 2019 22:24
>> To: Yongseok Koh <yskoh@mellanox.com>; shahafs@mellanox.com; thomas@monjalon.net; Yigit, Ferruh
>> <ferruh.yigit@intel.com>; adrien.mazarguil@6wind.com; olivier.matz@6wind.com
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata
>>
>> On 6/4/19 12:32 AM, Yongseok Koh wrote:
>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field
>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
>>>
>>> This patch extends the usability.
>>>
>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>
>>> When supporting multiple tables, Tx metadata can also be set by a rule and
>>> matched by another rule. This new action allows metadata to be set as a
>>> result of flow match.
>>>
>>> 2) Metadata on ingress
>>>
>>> There's also need to support metadata on packet Rx. Metadata can be set by
>>> SET_META action and matched by META item like Tx. The final value set by
>>> the action will be delivered to application via mbuf metadata field with
>>> PKT_RX_METADATA ol_flag.
>>>
>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and
>>> renamed to 'metadata' to support both Rx and Tx metadata.
>>>
>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>> propagated to the other path depending on HW capability.
>>>
>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>> There is a mark on Rx which is delivered to application in hash.fdir.hi.
>> Why do we need one more 32-bit value set by NIC and delivered to
>> application?
>> What is the difference between MARK and META on Rx?
>> When application should use MARK and when META?
>> Is there cases when both could be necessary?
>>
> In my understanding, MARK is FDIR related thing, META seems to be NIC
> specific. And we also need this kind of specific data field to export
> NIC's data to application.

I think it is better to avoid NIC vendor-specifics in motivation. I 
understand
that it exists for you, but I think it is better to look at it from RTE 
flow API
definition point of view: both are 32-bit (except endianess and I'm not sure
that I understand why meta is defined as big-endian since it is not a value
coming from or going to network in a packet, I'm sorry that I've missed it
on review that time), both may be set using action on Rx, both may be
matched using pattern item.

>> Moreover, the third patch adds 32-bit tags which are not delivered to
>> application. May be META/MARK should be simply a kind of TAG (e.g. with
>> index 0 or marked using additional attribute) which is delivered to
>> application?
>>
>> (It is either API breakage (if tx_metadata is removed) or ABI breakage
>> if metadata and tx_metadata will share new location after shinfo).
>>
> Make use of udata64 to export NIC metadata to application ?
> 	RTE_STD_C11
> 	union {
> 		void *userdata;   /**< Can be used for external metadata */
> 		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
> 		uint64_t rx_metadata;
> 	};

As I understand it does not work for Tx and I'm not sure that it is
a good idea to have different locations for Tx and Rx.

RFC adds it at the end of mbuf, but it was rejected before since
it eats space in mbuf structure (CC Konstantin).

There is a long discussion on the topic before [1], [2], [3] and [4].

Andrew.

[1] http://mails.dpdk.org/archives/dev/2018-August/109660.html
[2] http://mails.dpdk.org/archives/dev/2018-September/111771.html
[3] http://mails.dpdk.org/archives/dev/2018-October/114559.html
[4] http://mails.dpdk.org/archives/dev/2018-October/115469.html
  
Yongseok Koh June 11, 2019, 12:06 a.m. UTC | #4
On Mon, Jun 10, 2019 at 10:20:28AM +0300, Andrew Rybchenko wrote:
> On 6/10/19 6:19 AM, Wang, Haiyue wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Andrew Rybchenko
> > > Sent: Sunday, June 9, 2019 22:24
> > > To: Yongseok Koh <yskoh@mellanox.com>; shahafs@mellanox.com; thomas@monjalon.net; Yigit, Ferruh
> > > <ferruh.yigit@intel.com>; adrien.mazarguil@6wind.com; olivier.matz@6wind.com
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata
> > > 
> > > On 6/4/19 12:32 AM, Yongseok Koh wrote:
> > > > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > > > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> > > > 
> > > > This patch extends the usability.
> > > > 
> > > > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > > > 
> > > > When supporting multiple tables, Tx metadata can also be set by a rule and
> > > > matched by another rule. This new action allows metadata to be set as a
> > > > result of flow match.
> > > > 
> > > > 2) Metadata on ingress
> > > > 
> > > > There's also need to support metadata on packet Rx. Metadata can be set by
> > > > SET_META action and matched by META item like Tx. The final value set by
> > > > the action will be delivered to application via mbuf metadata field with
> > > > PKT_RX_METADATA ol_flag.
> > > > 
> > > > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > > > renamed to 'metadata' to support both Rx and Tx metadata.
> > > > 
> > > > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > > > propagated to the other path depending on HW capability.
> > > > 
> > > > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > > There is a mark on Rx which is delivered to application in hash.fdir.hi.
> > > Why do we need one more 32-bit value set by NIC and delivered to
> > > application?
> > > What is the difference between MARK and META on Rx?
> > > When application should use MARK and when META?
> > > Is there cases when both could be necessary?
> > > 
> > In my understanding, MARK is FDIR related thing, META seems to be NIC
> > specific. And we also need this kind of specific data field to export
> > NIC's data to application.
> 
> I think it is better to avoid NIC vendor-specifics in motivation. I
> understand
> that it exists for you, but I think it is better to look at it from RTE flow
> API
> definition point of view: both are 32-bit (except endianess and I'm not sure
> that I understand why meta is defined as big-endian since it is not a value
> coming from or going to network in a packet, I'm sorry that I've missed it
> on review that time), both may be set using action on Rx, both may be
> matched using pattern item.

Yes, MARK and META has the same characteristic on Rx path. Let me clarify why I
picked this way.

What if device has more bits to deliver to host? Currently, only 32-bit data can
be delivered to user via MARK ID. Now we have more requests from users (OVS
connection tracking) that want to see more information generated during flow
match from the device. Let's say it is 64 bits and it may contain intermediate
match results to keep track of multi-table match, to keep address of callback
function to call, or so. I thought about extending the current MARK to 64-bit
but I knew that we couldn't make more room in the first cacheline of mbuf where
every vendor has their critical interest. And the FDIR has been there for a long
time and has lots of use-cases in DPDK (not easy to break). This is why I'm
suggesting to obtain another 32 bits in the second cacheline of the structure.

Also, I thought about other scenario as well. Even though we have MARK item
introduced lately, it isn't used by any PMD at all for now, meaning it might not
be match-able on a certain device. What if there are two types registers on Rx
and one is match-able and the other isn't? PMD can use META for match-able
register while MARK is used for non-match-able register without supporting
item match. If MARK simply becomes 64-bit just because it has the same
characteristic in terms of rte_flow, only one of such registers can be used as
we can't say only part of bits are match-able on the item. Instead of extending
the MARK to 64 bits, I thought it would be better to give more flexibility by
bundling it with Tx metadata, which can set by mbuf.

The actual issue we have may be how we can make it scalable? What if there's
more need to carry more data from device? Well, IIRC, Olivier once suggested to
put a pointer (like mbuf->userdata) to extend mbuf struct beyond two cachelines.
But we still have some space left at the end.

> > > Moreover, the third patch adds 32-bit tags which are not delivered to
> > > application. May be META/MARK should be simply a kind of TAG (e.g. with
> > > index 0 or marked using additional attribute) which is delivered to
> > > application?

Yes, TAG is a kind of transient device-internal data which isn't delivered to
host. It would be a design choice. I could define all these kinds as an array of
MARK IDs having different attributes - some are exportable/match-able and others
are not, which sounds quite complex. As rte_flow doesn't have a direct way to
check device capability (user has to call a series of validate functions
instead), I thought defining TAG would be better.

> > > (It is either API breakage (if tx_metadata is removed) or ABI breakage
> > > if metadata and tx_metadata will share new location after shinfo).

Fortunately, mlx5 is the only entity which uses tx_metadata so far.

> > Make use of udata64 to export NIC metadata to application ?
> > 	RTE_STD_C11
> > 	union {
> > 		void *userdata;   /**< Can be used for external metadata */
> > 		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
> > 		uint64_t rx_metadata;
> > 	};
> 
> As I understand it does not work for Tx and I'm not sure that it is
> a good idea to have different locations for Tx and Rx.
> 
> RFC adds it at the end of mbuf, but it was rejected before since
> it eats space in mbuf structure (CC Konstantin).

Yep, I was in the discussion. IIRC, the reason wasn't because it ate space but
because it could recycle unused space on Tx path. We still have 16B after shinfo
and I'm not sure how many bytes we should reserve. I think reserving space for
one pointer would be fine.

Thanks,
Yongseok

> There is a long discussion on the topic before [1], [2], [3] and [4].
> 
> Andrew.
> 
> [1] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-August%2F109660.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&amp;sdata=EFHyECwg0NBRvyrouZqWD6x0WD4xAsqsfYQGrEvS%2BEg%3D&amp;reserved=0
> [2] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111771.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&amp;sdata=M8cQSmQhWKlUVKvFgux0T0TWAnJhPxdO4Dn3fkReTyg%3D&amp;reserved=0
> [3] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F114559.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&amp;sdata=ZVm5god7n1i07OCc5Z7B%2BBUpnjXCraJXU0FeF5KkCRc%3D&amp;reserved=0
> [4] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F115469.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&amp;sdata=XgKV%2B331Vqsq9Ns40giI1nAwscVxBxqb78vB1BY8z%2Bc%3D&amp;reserved=0
  
Andrew Rybchenko June 19, 2019, 9:05 a.m. UTC | #5
On 11.06.2019 3:06, Yongseok Koh wrote:
> On Mon, Jun 10, 2019 at 10:20:28AM +0300, Andrew Rybchenko wrote:
>> On 6/10/19 6:19 AM, Wang, Haiyue wrote:
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Andrew Rybchenko
>>>> Sent: Sunday, June 9, 2019 22:24
>>>> To: Yongseok Koh <yskoh@mellanox.com>; shahafs@mellanox.com; thomas@monjalon.net; Yigit, Ferruh
>>>> <ferruh.yigit@intel.com>; adrien.mazarguil@6wind.com; olivier.matz@6wind.com
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata
>>>>
>>>> On 6/4/19 12:32 AM, Yongseok Koh wrote:
>>>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field
>>>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
>>>>>
>>>>> This patch extends the usability.
>>>>>
>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>>>
>>>>> When supporting multiple tables, Tx metadata can also be set by a rule and
>>>>> matched by another rule. This new action allows metadata to be set as a
>>>>> result of flow match.
>>>>>
>>>>> 2) Metadata on ingress
>>>>>
>>>>> There's also need to support metadata on packet Rx. Metadata can be set by
>>>>> SET_META action and matched by META item like Tx. The final value set by
>>>>> the action will be delivered to application via mbuf metadata field with
>>>>> PKT_RX_METADATA ol_flag.
>>>>>
>>>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and
>>>>> renamed to 'metadata' to support both Rx and Tx metadata.
>>>>>
>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>>>> propagated to the other path depending on HW capability.
>>>>>
>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>>> There is a mark on Rx which is delivered to application in hash.fdir.hi.
>>>> Why do we need one more 32-bit value set by NIC and delivered to
>>>> application?
>>>> What is the difference between MARK and META on Rx?
>>>> When application should use MARK and when META?
>>>> Is there cases when both could be necessary?
>>>>
>>> In my understanding, MARK is FDIR related thing, META seems to be NIC
>>> specific. And we also need this kind of specific data field to export
>>> NIC's data to application.
>> I think it is better to avoid NIC vendor-specifics in motivation. I
>> understand
>> that it exists for you, but I think it is better to look at it from RTE flow
>> API
>> definition point of view: both are 32-bit (except endianess and I'm not sure
>> that I understand why meta is defined as big-endian since it is not a value
>> coming from or going to network in a packet, I'm sorry that I've missed it
>> on review that time), both may be set using action on Rx, both may be
>> matched using pattern item.
> Yes, MARK and META has the same characteristic on Rx path. Let me clarify why I
> picked this way.
>
> What if device has more bits to deliver to host? Currently, only 32-bit data can
> be delivered to user via MARK ID. Now we have more requests from users (OVS
> connection tracking) that want to see more information generated during flow
> match from the device. Let's say it is 64 bits and it may contain intermediate
> match results to keep track of multi-table match, to keep address of callback
> function to call, or so. I thought about extending the current MARK to 64-bit
> but I knew that we couldn't make more room in the first cacheline of mbuf where
> every vendor has their critical interest. And the FDIR has been there for a long
> time and has lots of use-cases in DPDK (not easy to break). This is why I'm
> suggesting to obtain another 32 bits in the second cacheline of the structure.
>
> Also, I thought about other scenario as well. Even though we have MARK item
> introduced lately, it isn't used by any PMD at all for now, meaning it might not
> be match-able on a certain device. What if there are two types registers on Rx
> and one is match-able and the other isn't? PMD can use META for match-able
> register while MARK is used for non-match-able register without supporting
> item match. If MARK simply becomes 64-bit just because it has the same
> characteristic in terms of rte_flow, only one of such registers can be used as
> we can't say only part of bits are match-able on the item. Instead of extending
> the MARK to 64 bits, I thought it would be better to give more flexibility by
> bundling it with Tx metadata, which can set by mbuf.

Thanks a lot for explanations. If the way is finally approved, priority
among META and MARK should be defined. I.e. if only one is supported
or only one may be match, it must be MARK. Otherwise, it will be too
complicated for applications to find out which one to use.
Is there any limitations on usage of MARK or META in transfer rules?
There is a lot of work on documentation in this area to make it usable.


> The actual issue we have may be how we can make it scalable? What if there's
> more need to carry more data from device? Well, IIRC, Olivier once suggested to
> put a pointer (like mbuf->userdata) to extend mbuf struct beyond two cachelines.
> But we still have some space left at the end.
>
>>>> Moreover, the third patch adds 32-bit tags which are not delivered to
>>>> application. May be META/MARK should be simply a kind of TAG (e.g. with
>>>> index 0 or marked using additional attribute) which is delivered to
>>>> application?
> Yes, TAG is a kind of transient device-internal data which isn't delivered to
> host. It would be a design choice. I could define all these kinds as an array of
> MARK IDs having different attributes - some are exportable/match-able and others
> are not, which sounds quite complex. As rte_flow doesn't have a direct way to
> check device capability (user has to call a series of validate functions
> instead), I thought defining TAG would be better.
>
>>>> (It is either API breakage (if tx_metadata is removed) or ABI breakage
>>>> if metadata and tx_metadata will share new location after shinfo).
> Fortunately, mlx5 is the only entity which uses tx_metadata so far.

As I understand it is still breakage.

>>> Make use of udata64 to export NIC metadata to application ?
>>> 	RTE_STD_C11
>>> 	union {
>>> 		void *userdata;   /**< Can be used for external metadata */
>>> 		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
>>> 		uint64_t rx_metadata;
>>> 	};
>> As I understand it does not work for Tx and I'm not sure that it is
>> a good idea to have different locations for Tx and Rx.
>>
>> RFC adds it at the end of mbuf, but it was rejected before since
>> it eats space in mbuf structure (CC Konstantin).
> Yep, I was in the discussion. IIRC, the reason wasn't because it ate space but
> because it could recycle unused space on Tx path. We still have 16B after shinfo
> and I'm not sure how many bytes we should reserve. I think reserving space for
> one pointer would be fine.

I have no strong opinion.

Thanks,
Andrew.

> Thanks,
> Yongseok
>
>> There is a long discussion on the topic before [1], [2], [3] and [4].
>>
>> Andrew.
>>
>> [1] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-August%2F109660.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&amp;sdata=EFHyECwg0NBRvyrouZqWD6x0WD4xAsqsfYQGrEvS%2BEg%3D&amp;reserved=0
>> [2] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111771.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&amp;sdata=M8cQSmQhWKlUVKvFgux0T0TWAnJhPxdO4Dn3fkReTyg%3D&amp;reserved=0
>> [3] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F114559.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&amp;sdata=ZVm5god7n1i07OCc5Z7B%2BBUpnjXCraJXU0FeF5KkCRc%3D&amp;reserved=0
>> [4] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F115469.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&amp;sdata=XgKV%2B331Vqsq9Ns40giI1nAwscVxBxqb78vB1BY8z%2Bc%3D&amp;reserved=0
  

Patch

diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index a1164b7053..6ecc97351f 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -182,7 +182,7 @@  tx_pkt_set_md(uint16_t port_id, __rte_unused uint16_t queue,
 	 * and set ol_flags accordingly.
 	 */
 	for (i = 0; i < nb_pkts; i++) {
-		pkts[i]->tx_metadata = ports[port_id].tx_metadata;
+		pkts[i]->metadata = ports[port_id].tx_metadata;
 		pkts[i]->ol_flags |= PKT_TX_METADATA;
 	}
 	return nb_pkts;
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index a34d012e55..016cd90e52 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -658,6 +658,32 @@  the physical device, with virtual groups in the PMD or not at all.
    | ``mask`` | ``id``   | zeroed to match any value |
    +----------+----------+---------------------------+
 
+Item: ``META``
+^^^^^^^^^^^^^^^^^
+
+Matches 32 bit metadata item set.
+
+On egress, metadata can be set either by mbuf metadata field with
+PKT_TX_METADATA flag or ``SET_META`` action. On ingress, ``SET_META`` action
+sets metadata for a packet and the metadata will be reported via mbuf metadata
+field with PKT_RX_METADATA flag.
+
+- Default ``mask`` matches the specified Rx metadata value.
+
+.. _table_rte_flow_item_meta:
+
+.. table:: META
+
+   +----------+----------+---------------------------------------+
+   | Field    | Subfield | Value                                 |
+   +==========+==========+=======================================+
+   | ``spec`` | ``data`` | 32 bit metadata value                 |
+   +----------+----------+---------------------------------------+
+   | ``last`` | ``data`` | upper range value                     |
+   +----------+----------+---------------------------------------+
+   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
+   +----------+----------+---------------------------------------+
+
 Data matching item types
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -1189,27 +1215,6 @@  Normally preceded by any of:
 - `Item: ICMP6_ND_NS`_
 - `Item: ICMP6_ND_OPT`_
 
-Item: ``META``
-^^^^^^^^^^^^^^
-
-Matches an application specific 32 bit metadata item.
-
-- Default ``mask`` matches the specified metadata value.
-
-.. _table_rte_flow_item_meta:
-
-.. table:: META
-
-   +----------+----------+---------------------------------------+
-   | Field    | Subfield | Value                                 |
-   +==========+==========+=======================================+
-   | ``spec`` | ``data`` | 32 bit metadata value                 |
-   +----------+--------------------------------------------------+
-   | ``last`` | ``data`` | upper range value                     |
-   +----------+----------+---------------------------------------+
-   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
-   +----------+----------+---------------------------------------+
-
 Actions
 ~~~~~~~
 
@@ -2345,6 +2350,32 @@  Otherwise, RTE_FLOW_ERROR_TYPE_ACTION error will be returned.
    | ``mac_addr`` | MAC address   |
    +--------------+---------------+
 
+Action: ``SET_META``
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Set metadata. Item ``META`` matches metadata.
+
+Metadata set by mbuf metadata field with PKT_TX_METADATA flag on egress will be
+overridden by this action. On ingress, the metadata will be carried by mbuf
+metadata field with PKT_RX_METADATA flag if set.
+
+Altering partial bits is supported with ``mask``. For bits which have never been
+set, unpredictable value will be seen depending on driver implementation. For
+loopback/hairpin packet, metadata set on Rx/Tx may or may not be propagated to
+the other path depending on HW capability.
+
+.. _table_rte_flow_action_set_meta:
+
+.. table:: SET_META
+
+   +----------+----------------------------+
+   | Field    | Value                      |
+   +==========+============================+
+   | ``data`` | 32 bit metadata value      |
+   +----------+----------------------------+
+   | ``mask`` | bit-mask applies to "data" |
+   +----------+----------------------------+
+
 Negative types
 ~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 7174ffc91c..19b4a2567b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -626,8 +626,7 @@  mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		txq_mbuf_to_swp(txq, buf, (uint8_t *)&swp_offsets, &swp_types);
 		raw = ((uint8_t *)(uintptr_t)wqe) + 2 * MLX5_WQE_DWORD_SIZE;
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Replace the Ethernet type by the VLAN if necessary. */
 		if (buf->ol_flags & PKT_TX_VLAN_PKT) {
 			uint32_t vlan = rte_cpu_to_be_32(0x81000000 |
@@ -1029,8 +1028,7 @@  mlx5_tx_burst_mpw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		--pkts_n;
 		cs_flags = txq_ol_cksum_to_cs(buf);
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Retrieve packet information. */
 		length = PKT_LEN(buf);
 		assert(length);
@@ -1264,8 +1262,7 @@  mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts,
 		max_wqe = (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi);
 		cs_flags = txq_ol_cksum_to_cs(buf);
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Retrieve packet information. */
 		length = PKT_LEN(buf);
 		/* Start new session if packet differs. */
@@ -1547,8 +1544,7 @@  txq_burst_empw(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 			break;
 		cs_flags = txq_ol_cksum_to_cs(buf);
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Retrieve packet information. */
 		length = PKT_LEN(buf);
 		/* Start new session if:
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index 9a3a5ae437..9f99c8cb03 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -71,7 +71,7 @@  txq_calc_offload(struct rte_mbuf **pkts, uint16_t pkts_n, uint8_t *cs_flags,
 	if (!pkts_n)
 		return 0;
 	p0_metadata = pkts[0]->ol_flags & PKT_TX_METADATA ?
-			pkts[0]->tx_metadata : 0;
+		      pkts[0]->metadata : 0;
 	/* Count the number of packets having same offload parameters. */
 	for (pos = 1; pos < pkts_n; ++pos) {
 		/* Check if packet has same checksum flags. */
@@ -81,7 +81,7 @@  txq_calc_offload(struct rte_mbuf **pkts, uint16_t pkts_n, uint8_t *cs_flags,
 		/* Check if packet has same metadata. */
 		if (txq_offloads & DEV_TX_OFFLOAD_MATCH_METADATA) {
 			pn_metadata = pkts[pos]->ol_flags & PKT_TX_METADATA ?
-					pkts[pos]->tx_metadata : 0;
+				      pkts[pos]->metadata : 0;
 			if (pn_metadata != p0_metadata)
 				break;
 		}
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index b2cc710887..c54914e97a 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -131,7 +131,7 @@  txq_scatter_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 		uint8x16_t ctrl;
 		rte_be32_t metadata =
 			metadata_ol && (buf->ol_flags & PKT_TX_METADATA) ?
-			buf->tx_metadata : 0;
+			buf->metadata : 0;
 
 		assert(segs_n);
 		max_elts = elts_n - (elts_head - txq->elts_tail);
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index dce3ee4b40..3de640a2fd 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -129,7 +129,7 @@  txq_scatter_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 		__m128i ctrl;
 		rte_be32_t metadata =
 			metadata_ol && (buf->ol_flags & PKT_TX_METADATA) ?
-			buf->tx_metadata : 0;
+			buf->metadata : 0;
 
 		assert(segs_n);
 		max_elts = elts_n - (elts_head - txq->elts_tail);
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index f3a8fb103f..cda8628183 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -417,7 +417,8 @@  enum rte_flow_item_type {
 	/**
 	 * [META]
 	 *
-	 * Matches a metadata value specified in mbuf metadata field.
+	 * Matches a metadata value.
+	 *
 	 * See struct rte_flow_item_meta.
 	 */
 	RTE_FLOW_ITEM_TYPE_META,
@@ -1164,9 +1165,16 @@  rte_flow_item_icmp6_nd_opt_tla_eth_mask = {
 #endif
 
 /**
- * RTE_FLOW_ITEM_TYPE_META.
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
  *
- * Matches a specified metadata value.
+ * RTE_FLOW_ITEM_TYPE_META
+ *
+ * Matches a specified metadata value. On egress, metadata can be set either by
+ * mbuf metadata field with PKT_TX_METADATA flag or
+ * RTE_FLOW_ACTION_TYPE_SET_META. On ingress, RTE_FLOW_ACTION_TYPE_SET_META sets
+ * metadata for a packet and the metadata will be reported via mbuf metadata
+ * field with PKT_RX_METADATA flag.
  */
 struct rte_flow_item_meta {
 	rte_be32_t data;
@@ -1650,6 +1658,13 @@  enum rte_flow_action_type {
 	 * See struct rte_flow_action_set_mac.
 	 */
 	RTE_FLOW_ACTION_TYPE_SET_MAC_DST,
+
+	/**
+	 * Set metadata on ingress or egress path.
+	 *
+	 * See struct rte_flow_action_set_meta.
+	 */
+	RTE_FLOW_ACTION_TYPE_SET_META,
 };
 
 /**
@@ -2131,6 +2146,28 @@  struct rte_flow_action_set_mac {
 	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ACTION_TYPE_SET_META
+ *
+ * Set metadata. Metadata set by mbuf metadata field with PKT_TX_METADATA flag
+ * on egress will be overridden by this action. On ingress, the metadata will be
+ * carried by mbuf metadata field with PKT_RX_METADATA flag if set.
+ *
+ * Altering partial bits is supported with mask. For bits which have never been
+ * set, unpredictable value will be seen depending on driver implementation. For
+ * loopback/hairpin packet, metadata set on Rx/Tx may or may not be propagated
+ * to the other path depending on HW capability.
+ *
+ * RTE_FLOW_ITEM_TYPE_META matches metadata.
+ */
+struct rte_flow_action_set_meta {
+	rte_be32_t data;
+	rte_be32_t mask;
+};
+
 /*
  * Definition of a single action.
  *
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index e4c2da6ee6..60f2b553e6 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -200,6 +200,11 @@  extern "C" {
 
 /* add new RX flags here */
 
+/**
+ * Indicate that mbuf has metadata from device.
+ */
+#define PKT_RX_METADATA	(1ULL << 23)
+
 /* add new TX flags here */
 
 /**
@@ -648,17 +653,6 @@  struct rte_mbuf {
 			/**< User defined tags. See rte_distributor_process() */
 			uint32_t usr;
 		} hash;                   /**< hash information */
-		struct {
-			/**
-			 * Application specific metadata value
-			 * for egress flow rule match.
-			 * Valid if PKT_TX_METADATA is set.
-			 * Located here to allow conjunct use
-			 * with hash.sched.hi.
-			 */
-			uint32_t tx_metadata;
-			uint32_t reserved;
-		};
 	};
 
 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
@@ -725,6 +719,11 @@  struct rte_mbuf {
 	 */
 	struct rte_mbuf_ext_shared_info *shinfo;
 
+	/** Application specific metadata value for flow rule match.
+	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
+	 */
+	uint32_t metadata;
+
 } __rte_cache_aligned;
 
 /**