[v9,2/4] ethdev: introduce protocol hdr based buffer split

Message ID 20220613102550.241759-3-wenxuanx.wu@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Andrew Rybchenko
Headers
Series add an api to support proto based buffer split |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Wu, WenxuanX June 13, 2022, 10:25 a.m. UTC
  From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

If both inner and outer L2/L3/L4 level protocol header split can be
supported by a PMD. Corresponding protocol header capability is
RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/ethdev/rte_ethdev.c | 32 +++++++++++++++++++++++++++++++-
 lib/ethdev/rte_ethdev.h | 14 +++++++++++++-
 2 files changed, 44 insertions(+), 2 deletions(-)
  

Comments

Thomas Monjalon July 7, 2022, 9:07 a.m. UTC | #1
13/06/2022 12:25, wenxuanx.wu@intel.com:
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1176,6 +1176,9 @@ struct rte_eth_txmode {
>   *   specified in the first array element, the second buffer, from the
>   *   pool in the second element, and so on.
>   *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>   * - The offsets from the segment description elements specify
>   *   the data offset from the buffer beginning except the first mbuf.
>   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
>   *     - pool from the last valid element
>   *     - the buffer size from this pool
>   *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field should not be configured.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field should not be configured.
>   */
>  struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**< Supported ptypes mask of a specific pmd, configures split point. */

The doxygen syntax is wrong: remove the "<" which is for post-comment.

> +	uint32_t proto_hdr;
>  };

How do we know it is a length or buffer split?
Is it based on checking some 0 value?
  
Andrew Rybchenko July 8, 2022, 3 p.m. UTC | #2
On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> struct rte_eth_rxseg_split {
> 
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures "split point" */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> 			       configures "split point" */

There is a big problem here that using RTE_PTYPE_* defines I can't
request split after either TCP or UDP header.

>      };
> 
> If both inner and outer L2/L3/L4 level protocol header split can be
> supported by a PMD. Corresponding protocol header capability is
> RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
> RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
> RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.

I think there is no point to list above defines here if it is not
the only supported defines.

> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>      seg2 - pool2, off1=0B
> 
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> following:
>      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - udp header @ 128 in mbuf from pool1
>      seg2 - payload @ 0 in mbuf from pool2

Sorry, but I still see no definition what should happen with, for
example, ARP packet with above config.

> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field should not be configured.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field should
> not be configured.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>   lib/ethdev/rte_ethdev.c | 32 +++++++++++++++++++++++++++++++-
>   lib/ethdev/rte_ethdev.h | 14 +++++++++++++-
>   2 files changed, 44 insertions(+), 2 deletions(-)

Do we need a dedicated feature in doc/guides/nics/features.rst?
Or should be just update buffer split to refer to a new supported
header split API and callback?

Also the feature definitely deserves entry in the release notes.

[snip]
  
Ding, Xuan July 11, 2022, 9:54 a.m. UTC | #3
Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, July 7, 2022 5:08 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org; Wu, WenxuanX <wenxuanx.wu@intel.com>;
> Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>;
> Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
> 
> 13/06/2022 12:25, wenxuanx.wu@intel.com:
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1176,6 +1176,9 @@ struct rte_eth_txmode {
> >   *   specified in the first array element, the second buffer, from the
> >   *   pool in the second element, and so on.
> >   *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >   * - The offsets from the segment description elements specify
> >   *   the data offset from the buffer beginning except the first mbuf.
> >   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> >   *     - pool from the last valid element
> >   *     - the buffer size from this pool
> >   *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field should not be configured.
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field should not be configured.
> >   */
> >  struct rte_eth_rxseg_split {
> >  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >  	uint16_t length; /**< Segment data length, configures split point. */
> >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	/**< Supported ptypes mask of a specific pmd, configures split point.
> */
> 
> The doxygen syntax is wrong: remove the "<" which is for post-comment.

Thanks for your catch.

> 
> > +	uint32_t proto_hdr;
> >  };
> 
> How do we know it is a length or buffer split?
> Is it based on checking some 0 value?

Yes, as Andrew suggests, we introduced the API rte_eth_supported_hdrs_get() in v9.
It will report the driver supported protocol headers to be split.
If the API returns ENOTSUP, it means driver supports length based buffer split.

Of course, no matter what kind of buffer split it is, we need to check
RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT first.

Thanks,
Xuan

>
  
Thomas Monjalon July 11, 2022, 10:12 a.m. UTC | #4
11/07/2022 11:54, Ding, Xuan:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 13/06/2022 12:25, wenxuanx.wu@intel.com:
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -1176,6 +1176,9 @@ struct rte_eth_txmode {
> > >   *   specified in the first array element, the second buffer, from the
> > >   *   pool in the second element, and so on.
> > >   *
> > > + * - The proto_hdrs in the elements define the split position of
> > > + *   received packets.
> > > + *
> > >   * - The offsets from the segment description elements specify
> > >   *   the data offset from the buffer beginning except the first mbuf.
> > >   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> > >   *     - pool from the last valid element
> > >   *     - the buffer size from this pool
> > >   *     - zero offset
> > > + *
> > > + * - Length based buffer split:
> > > + *     - mp, length, offset should be configured.
> > > + *     - The proto_hdr field should not be configured.
> > > + *
> > > + * - Protocol header based buffer split:
> > > + *     - mp, offset, proto_hdr should be configured.
> > > + *     - The length field should not be configured.
> > >   */
> > >  struct rte_eth_rxseg_split {
> > >  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> > from. */
> > >  	uint16_t length; /**< Segment data length, configures split point. */
> > >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> > */
> > > -	uint32_t reserved; /**< Reserved field. */
> > > +	/**< Supported ptypes mask of a specific pmd, configures split point.
> > */
> > 
> > The doxygen syntax is wrong: remove the "<" which is for post-comment.
> 
> Thanks for your catch.
> 
> > 
> > > +	uint32_t proto_hdr;
> > >  };
> > 
> > How do we know it is a length or buffer split?
> > Is it based on checking some 0 value?
> 
> Yes, as Andrew suggests, we introduced the API rte_eth_supported_hdrs_get() in v9.
> It will report the driver supported protocol headers to be split.
> If the API returns ENOTSUP, it means driver supports length based buffer split.
> 
> Of course, no matter what kind of buffer split it is, we need to check
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT first.

So you need to talk about RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in the comment of this struct.
  
Ding, Xuan July 21, 2022, 3:24 a.m. UTC | #5
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: 2022年7月8日 23:01
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happens after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > struct rte_eth_rxseg_split {
> >
> >          struct rte_mempool *mp; /* memory pools to allocate segment from */
> >          uint16_t length; /* segment maximal data length,
> >                              configures "split point" */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> > 			       configures "split point" */
> 
> There is a big problem here that using RTE_PTYPE_* defines I can't request split
> after either TCP or UDP header.

Sorry, for some reason I missed your reply.

Current RTE_PTYPE_* list all the tunnel and L2/L3/L4 protocol headers (both outer and inner).
Do you mean that we should support higher layer protocols after L4?

I think tunnel and L2/L3/L4 protocol headers are enough.
In DPDK, we don't parse higher level protocols after L4.
And the higher layer protocols are richer, we can't list all of them.
What do you think?

> 
> >      };
> >
> > If both inner and outer L2/L3/L4 level protocol header split can be
> > supported by a PMD. Corresponding protocol header capability is
> > RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6,
> > RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
> > RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> > RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.
> 
> I think there is no point to list above defines here if it is not the only supported
> defines.

Yes, since we use a API to return the protocol header driver supported to split,
there is no need to list the incomplete RTE_PTYPE* here. Please see next version.

> 
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >      seg2 - pool2, off1=0B
> >
> > The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> > following:
> >      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >      seg1 - udp header @ 128 in mbuf from pool1
> >      seg2 - payload @ 0 in mbuf from pool2
> 
> Sorry, but I still see no definition what should happen with, for example, ARP
> packet with above config.

Thanks, because the following reply was not answered in v8, 
the definition has not been added in v9 yet.

"
Our NIC only supports to split the packets into two segments,
so there will be an exact match for the only one protocol header configured. Back to this
question, for the set of proto_hdrs configured, it can have two behaviors:
1. The aggressive way is to split on longest match you mentioned, E.g. we configure split
on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or ETH-IPV6, it can also split on ETH-IPV4
or ETH.
2. A more conservative way is to split only when the packets meet the all protocol headers
in the Rx packet segment. In the above situation, it will not do split for ETH-IPV4-UDP
and ETH-IPV6.

I prefer the second behavior, because the split is usually for the inner most header and
payload, if it does not meet, the rest of the headers have no actual value.
"

Hope to get your insights.
And we will update the doc to define the behavior in next version.

> 
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field should not be configured.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field should not be configured.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory, respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > Acked-by: Ray Kinsella <mdr@ashroe.eu>
> > ---
> >   lib/ethdev/rte_ethdev.c | 32 +++++++++++++++++++++++++++++++-
> >   lib/ethdev/rte_ethdev.h | 14 +++++++++++++-
> >   2 files changed, 44 insertions(+), 2 deletions(-)
> 
> Do we need a dedicated feature in doc/guides/nics/features.rst?
> Or should be just update buffer split to refer to a new supported header split API
> and callback?
> 
> Also the feature definitely deserves entry in the release notes.
 
Regarding the newly introduced protocol based buffer split, it is definitely worth a doc update.
The reason why we didn't do it before is because it is under discussion.

Before we send a new version, we will put more efforts to clean current patch to make
doc more comprehensive and easier to understand.

Thanks,
Xuan

> 
> [snip]
  
Andrew Rybchenko Aug. 1, 2022, 2:28 p.m. UTC | #6
On 7/21/22 06:24, Ding, Xuan wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: 2022年7月8日 23:01
>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
>> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> jerinjacobk@gmail.com
>> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>; Wang,
>> YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
>> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
>>
>> On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
>>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
>>>
>>> Currently, Rx buffer split supports length based split. With Rx queue
>>> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
>> segment
>>> configured, PMD will be able to split the received packets into
>>> multiple segments.
>>>
>>> However, length based buffer split is not suitable for NICs that do
>>> split based on protocol headers. Given an arbitrarily variable length
>>> in Rx packet segment, it is almost impossible to pass a fixed protocol
>>> header to driver. Besides, the existence of tunneling results in the
>>> composition of a packet is various, which makes the situation even worse.
>>>
>>> This patch extends current buffer split to support protocol header
>>> based buffer split. A new proto_hdr field is introduced in the
>>> reserved field of rte_eth_rxseg_split structure to specify protocol
>>> header. The proto_hdr field defines the split position of packet,
>>> splitting will always happens after the protocol header defined in the
>>> Rx packet segment. When Rx queue offload
>>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding protocol
>>> header is configured, driver will split the ingress packets into multiple
>> segments.
>>>
>>> struct rte_eth_rxseg_split {
>>>
>>>           struct rte_mempool *mp; /* memory pools to allocate segment from */
>>>           uint16_t length; /* segment maximal data length,
>>>                               configures "split point" */
>>>           uint16_t offset; /* data offset from beginning
>>>                               of mbuf data buffer */
>>>           uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
>>> 			       configures "split point" */
>>
>> There is a big problem here that using RTE_PTYPE_* defines I can't request split
>> after either TCP or UDP header.
> 
> Sorry, for some reason I missed your reply.
> 
> Current RTE_PTYPE_* list all the tunnel and L2/L3/L4 protocol headers (both outer and inner).
> Do you mean that we should support higher layer protocols after L4?
> 
> I think tunnel and L2/L3/L4 protocol headers are enough.
> In DPDK, we don't parse higher level protocols after L4.
> And the higher layer protocols are richer, we can't list all of them.
> What do you think?

It looks like you don't get my point. You simply cannot say:
RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP since it is numerically equal to
RTE_PTYPE_L4_FRAG. May be the design limitation is acceptable.
I have no strong opinion, but it must be clear for all that the
limitation exists.

>>
>>>       };
>>>
>>> If both inner and outer L2/L3/L4 level protocol header split can be
>>> supported by a PMD. Corresponding protocol header capability is
>>> RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6,
>>> RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
>>> RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
>>> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
>> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.
>>
>> I think there is no point to list above defines here if it is not the only supported
>> defines.
> 
> Yes, since we use a API to return the protocol header driver supported to split,
> there is no need to list the incomplete RTE_PTYPE* here. Please see next version.
> 
>>
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>       seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>>>       seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>>>       seg2 - pool2, off1=0B
>>>
>>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
>>> following:
>>>       seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>>>       seg1 - udp header @ 128 in mbuf from pool1
>>>       seg2 - payload @ 0 in mbuf from pool2
>>
>> Sorry, but I still see no definition what should happen with, for example, ARP
>> packet with above config.
> 
> Thanks, because the following reply was not answered in v8,
> the definition has not been added in v9 yet.
> 
> "
> Our NIC only supports to split the packets into two segments,
> so there will be an exact match for the only one protocol header configured. Back to this
> question, for the set of proto_hdrs configured, it can have two behaviors:
> 1. The aggressive way is to split on longest match you mentioned, E.g. we configure split
> on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or ETH-IPV6, it can also split on ETH-IPV4
> or ETH.
> 2. A more conservative way is to split only when the packets meet the all protocol headers
> in the Rx packet segment. In the above situation, it will not do split for ETH-IPV4-UDP
> and ETH-IPV6.
> 
> I prefer the second behavior, because the split is usually for the inner most header and
> payload, if it does not meet, the rest of the headers have no actual value.
> "
> 
> Hope to get your insights.
> And we will update the doc to define the behavior in next version.

I'm OK with (2) as well. Please, define it in the documentation. Also it
must be clear which segment/mempool is used if a packet is not split.
  
Ding, Xuan Aug. 2, 2022, 7:22 a.m. UTC | #7
Hi,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, August 1, 2022 10:28 PM
> To: Ding, Xuan <xuan.ding@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> viacheslavo@nvidia.com
> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 7/21/22 06:24, Ding, Xuan wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: 2022年7月8日 23:01
> >> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net;
> Li,
> >> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman
> >> Deep <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> jerinjacobk@gmail.com
> >> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> >> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> >> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based
> >> buffer split
> >>
> >> On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> >>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>>
> >>> Currently, Rx buffer split supports length based split. With Rx
> >>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx
> packet
> >> segment
> >>> configured, PMD will be able to split the received packets into
> >>> multiple segments.
> >>>
> >>> However, length based buffer split is not suitable for NICs that do
> >>> split based on protocol headers. Given an arbitrarily variable
> >>> length in Rx packet segment, it is almost impossible to pass a fixed
> >>> protocol header to driver. Besides, the existence of tunneling
> >>> results in the composition of a packet is various, which makes the
> situation even worse.
> >>>
> >>> This patch extends current buffer split to support protocol header
> >>> based buffer split. A new proto_hdr field is introduced in the
> >>> reserved field of rte_eth_rxseg_split structure to specify protocol
> >>> header. The proto_hdr field defines the split position of packet,
> >>> splitting will always happens after the protocol header defined in
> >>> the Rx packet segment. When Rx queue offload
> >>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> >>> protocol header is configured, driver will split the ingress packets
> >>> into multiple
> >> segments.
> >>>
> >>> struct rte_eth_rxseg_split {
> >>>
> >>>           struct rte_mempool *mp; /* memory pools to allocate segment
> from */
> >>>           uint16_t length; /* segment maximal data length,
> >>>                               configures "split point" */
> >>>           uint16_t offset; /* data offset from beginning
> >>>                               of mbuf data buffer */
> >>>           uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> >>> 			       configures "split point" */
> >>
> >> There is a big problem here that using RTE_PTYPE_* defines I can't
> >> request split after either TCP or UDP header.
> >
> > Sorry, for some reason I missed your reply.
> >
> > Current RTE_PTYPE_* list all the tunnel and L2/L3/L4 protocol headers
> (both outer and inner).
> > Do you mean that we should support higher layer protocols after L4?
> >
> > I think tunnel and L2/L3/L4 protocol headers are enough.
> > In DPDK, we don't parse higher level protocols after L4.
> > And the higher layer protocols are richer, we can't list all of them.
> > What do you think?
> 
> It looks like you don't get my point. You simply cannot say:
> RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP since it is numerically equal to
> RTE_PTYPE_L4_FRAG. May be the design limitation is acceptable.
> I have no strong opinion, but it must be clear for all that the limitation exists.

Thanks for your correction.
Similarly, RTE_PTYPE_INNER_L4_TCP and RTE_PTYPE_INNER_L4_UDP
also exists this situation.

I will try to solve this limitation by following ptypes_get approach.

> 
> >>
> >>>       };
> >>>
> >>> If both inner and outer L2/L3/L4 level protocol header split can be
> >>> supported by a PMD. Corresponding protocol header capability is
> >>> RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6,
> >>> RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
> >>> RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> >>> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
> >> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.
> >>
> >> I think there is no point to list above defines here if it is not the
> >> only supported defines.
> >
> > Yes, since we use a API to return the protocol header driver supported
> > to split, there is no need to list the incomplete RTE_PTYPE* here. Please
> see next version.
> >
> >>
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>       seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >>>       seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >>>       seg2 - pool2, off1=0B
> >>>
> >>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> >>> following:
> >>>       seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >>>       seg1 - udp header @ 128 in mbuf from pool1
> >>>       seg2 - payload @ 0 in mbuf from pool2
> >>
> >> Sorry, but I still see no definition what should happen with, for
> >> example, ARP packet with above config.
> >
> > Thanks, because the following reply was not answered in v8, the
> > definition has not been added in v9 yet.
> >
> > "
> > Our NIC only supports to split the packets into two segments, so there
> > will be an exact match for the only one protocol header configured.
> > Back to this question, for the set of proto_hdrs configured, it can have two
> behaviors:
> > 1. The aggressive way is to split on longest match you mentioned, E.g.
> > we configure split on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or
> > ETH-IPV6, it can also split on ETH-IPV4 or ETH.
> > 2. A more conservative way is to split only when the packets meet the
> > all protocol headers in the Rx packet segment. In the above situation,
> > it will not do split for ETH-IPV4-UDP and ETH-IPV6.
> >
> > I prefer the second behavior, because the split is usually for the
> > inner most header and payload, if it does not meet, the rest of the headers
> have no actual value.
> > "
> >
> > Hope to get your insights.
> > And we will update the doc to define the behavior in next version.
> 
> I'm OK with (2) as well. Please, define it in the documentation. Also it must
> be clear which segment/mempool is used if a packet is not split.

Get your point. Will fix it in next version.

Thanks,
Xuan
  

Patch

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index e1f2a0ffe3..b89e30296f 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1662,6 +1662,7 @@  rte_eth_rx_queue_check_split(uint16_t port_id,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1695,7 +1696,36 @@  rte_eth_rx_queue_check_split(uint16_t port_id,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-
+		uint32_t ptypes_mask;
+		int ret = rte_eth_supported_hdrs_get(port_id, &ptypes_mask);
+
+		if (ret == ENOTSUP || ptypes_mask == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else if (ret == 0) {
+			/* Split after specified protocol header. */
+			if (proto_hdr & ~ptypes_mask) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header 0x%x is not supported.\n",
+					proto_hdr & ~ptypes_mask);
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+		} else {
+			return ret;
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 72cac1518e..7df40f9f9b 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1176,6 +1176,9 @@  struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1197,12 +1200,21 @@  struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**< Supported ptypes mask of a specific pmd, configures split point. */
+	uint32_t proto_hdr;
 };
 
 /**