[dpdk-dev,01/17] mbuf: add definitions of unified packet types

Message ID 1422501365-12643-2-git-send-email-helin.zhang@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Zhang, Helin Jan. 29, 2015, 3:15 a.m. UTC
  As there are only 6 bit flags in ol_flags for indicating packet types,
which is not enough to describe all the possible packet types hardware
can recognize. For example, i40e hardware can recognize more than 150
packet types. Unified packet type is composed of tunnel type, L3 type,
L4 type and inner L3 type fields, and can be stored in 16 bits mbuf
field of 'packet_type'.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)
  

Comments

Olivier Matz Jan. 30, 2015, 1:56 p.m. UTC | #1
Hi Helin,

On 01/29/2015 04:15 AM, Helin Zhang wrote:
> As there are only 6 bit flags in ol_flags for indicating packet types,
> which is not enough to describe all the possible packet types hardware
> can recognize. For example, i40e hardware can recognize more than 150
> packet types. Unified packet type is composed of tunnel type, L3 type,
> L4 type and inner L3 type fields, and can be stored in 16 bits mbuf
> field of 'packet_type'.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 16059c6..94ae344 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -165,6 +165,80 @@ extern "C" {
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
>  
> +/*
> + * Sixteen bits are divided into several fields to mark packet types. Note that
> + * each field is indexical.
> + * - Bit 3:0 is for tunnel types.
> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
> + *   tunneling packets.
> + * - Bit 13:11 is for inner L3 types.
> + * - Bit 15:14 is reserved.

Is there a reason why using this specific order?

Also, there are 4 bits for outer L3 types and 3 bits for inner L3
types, but both of them have 6 different supported types. Is it
intentional?

> + *
> + * To be compitable with Vector PMD, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT,

compitable -> compatible

> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP
> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
> + *
> + * Note that L3 types values are selected for checking IPV4/IPV6 header from
> + * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
> + */
> +#define RTE_PTYPE_UNKNOWN                   0x0000 /* 0b0000000000000000 */
> +/* bit 3:0 for tunnel types */
> +#define RTE_PTYPE_TUNNEL_IP                 0x0001 /* 0b0000000000000001 */
> +#define RTE_PTYPE_TUNNEL_TCP                0x0002 /* 0b0000000000000010 */
> +#define RTE_PTYPE_TUNNEL_UDP                0x0003 /* 0b0000000000000011 */
> +#define RTE_PTYPE_TUNNEL_GRE                0x0004 /* 0b0000000000000100 */
> +#define RTE_PTYPE_TUNNEL_VXLAN              0x0005 /* 0b0000000000000101 */
> +#define RTE_PTYPE_TUNNEL_NVGRE              0x0006 /* 0b0000000000000110 */
> +#define RTE_PTYPE_TUNNEL_GENEVE             0x0007 /* 0b0000000000000111 */
> +#define RTE_PTYPE_TUNNEL_GRENAT             0x0008 /* 0b0000000000001000 */
> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC         0x0009 /* 0b0000000000001001 */
> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN     0x000a /* 0b0000000000001010 */
> +#define RTE_PTYPE_TUNNEL_MASK               0x000f /* 0b0000000000001111 */
> +/* bit 7:4 for L3 types */
> +#define RTE_PTYPE_L3_IPV4                   0x0010 /* 0b0000000000010000 */
> +#define RTE_PTYPE_L3_IPV4_EXT               0x0030 /* 0b0000000000110000 */
> +#define RTE_PTYPE_L3_IPV6                   0x0040 /* 0b0000000001000000 */
> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x0090 /* 0b0000000010010000 */
> +#define RTE_PTYPE_L3_IPV6_EXT               0x00c0 /* 0b0000000011000000 */
> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x00e0 /* 0b0000000011100000 */
> +#define RTE_PTYPE_L3_MASK                   0x00f0 /* 0b0000000011110000 */

can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT or
RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified the
L3 checksum?

My understanding is:

- if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0
   -> checksum was checked by hw and is good
- if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1
   -> checksum was checked by hw and is bad
- if packet_type is not IPv4*
   -> checksum was not checked by hw

I think it would solve the problem asked by Stephen
http://dpdk.org/ml/archives/dev/2015-January/011550.html

> +/* bit 10:8 for L4 types */
> +#define RTE_PTYPE_L4_TCP                    0x0100 /* 0b0000000100000000 */
> +#define RTE_PTYPE_L4_UDP                    0x0200 /* 0b0000001000000000 */
> +#define RTE_PTYPE_L4_FRAG                   0x0300 /* 0b0000001100000000 */
> +#define RTE_PTYPE_L4_SCTP                   0x0400 /* 0b0000010000000000 */
> +#define RTE_PTYPE_L4_ICMP                   0x0500 /* 0b0000010100000000 */
> +#define RTE_PTYPE_L4_NONFRAG                0x0600 /* 0b0000011000000000 */
> +#define RTE_PTYPE_L4_MASK                   0x0700 /* 0b0000011100000000 */

Same question for L4.

Note: it would means that if a hardware is able to recognize a TCP
packet but not to verify the checksum, it has to set RTE_PTYPE_L4 to
unknown.

> +/* bit 13:11 for inner L3 types */
> +#define RTE_PTYPE_INNER_L3_IPV4             0x0800 /* 0b0000100000000000 */
> +#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x1000 /* 0b0001000000000000 */
> +#define RTE_PTYPE_INNER_L3_IPV6             0x1800 /* 0b0001100000000000 */
> +#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x2000 /* 0b0010000000000000 */
> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /* 0b0010100000000000 */
> +#define RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* 0b0011000000000000 */
> +#define RTE_PTYPE_INNER_L3_MASK             0x3800 /* 0b0011100000000000 */
> +/* bit 15:14 reserved */
> +
> +/**
> + * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
> + * one, bit 4 is selected to be used for IPv4 only. Then checking bit 4 can
> + * determin if it is an IPV4 packet.
> + */
> +#define  RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
> +
> +/**
> + * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
> + * one, bit 6 is selected to be used for IPv4 only. Then checking bit 6 can
> + * determin if it is an IPV4 packet.
> + */
> +#define  RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV6)
> +
> +/* Check if it is a tunneling packet */
> +#define RTE_ETH_IS_TUNNEL_PKT(ptype) ((ptype) & RTE_PTYPE_TUNNEL_MASK)
> +
>  /**
>   * Get the name of a RX offload flag
>   *
> 

Thanks,
Olivier
  
Zhang, Helin Feb. 2, 2015, 1:43 a.m. UTC | #2
Hi Olivier

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Friday, January 30, 2015 9:56 PM
> To: Zhang, Helin; dev@dpdk.org
> Cc: Stephen Hemminger
> Subject: Re: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet
> types
> 
> Hi Helin,
> 
> On 01/29/2015 04:15 AM, Helin Zhang wrote:
> > As there are only 6 bit flags in ol_flags for indicating packet types,
> > which is not enough to describe all the possible packet types hardware
> > can recognize. For example, i40e hardware can recognize more than 150
> > packet types. Unified packet type is composed of tunnel type, L3 type,
> > L4 type and inner L3 type fields, and can be stored in 16 bits mbuf
> > field of 'packet_type'.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> > ---
> >  lib/librte_mbuf/rte_mbuf.h | 74
> > ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 74 insertions(+)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 16059c6..94ae344 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -165,6 +165,80 @@ extern "C" {
> >  /* Use final bit of flags to indicate a control mbuf */
> >  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control
> data */
> >
> > +/*
> > + * Sixteen bits are divided into several fields to mark packet types.
> > +Note that
> > + * each field is indexical.
> > + * - Bit 3:0 is for tunnel types.
> > + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
> > + * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
> > + *   tunneling packets.
> > + * - Bit 13:11 is for inner L3 types.
> > + * - Bit 15:14 is reserved.
> 
> Is there a reason why using this specific order?
Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need to be contiguous
and in this order.

> 
> Also, there are 4 bits for outer L3 types and 3 bits for inner L3 types, but both of
> them have 6 different supported types. Is it intentional?
Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are needed, though 1 bit wasted.

> 
> > + *
> > + * To be compitable with Vector PMD, RTE_PTYPE_L3_IPV4,
> > + RTE_PTYPE_L3_IPV4_EXT,
> 
> compitable -> compatible
Good catch! It will be fixed in next version. Thanks!

> 
> > + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP,
> > +RTE_PTYPE_L4_UDP
> > + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
> > + *
> > + * Note that L3 types values are selected for checking IPV4/IPV6
> > +header from
> > + * performance point of view. Reading annotations of
> > +RTE_ETH_IS_IPV4_HDR and
> > + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type
> values.
> > + */
> > +#define RTE_PTYPE_UNKNOWN                   0x0000 /*
> 0b0000000000000000 */
> > +/* bit 3:0 for tunnel types */
> > +#define RTE_PTYPE_TUNNEL_IP                 0x0001 /*
> 0b0000000000000001 */
> > +#define RTE_PTYPE_TUNNEL_TCP                0x0002 /*
> 0b0000000000000010 */
> > +#define RTE_PTYPE_TUNNEL_UDP                0x0003 /*
> 0b0000000000000011 */
> > +#define RTE_PTYPE_TUNNEL_GRE                0x0004 /*
> 0b0000000000000100 */
> > +#define RTE_PTYPE_TUNNEL_VXLAN              0x0005 /*
> 0b0000000000000101 */
> > +#define RTE_PTYPE_TUNNEL_NVGRE              0x0006 /*
> 0b0000000000000110 */
> > +#define RTE_PTYPE_TUNNEL_GENEVE             0x0007 /*
> 0b0000000000000111 */
> > +#define RTE_PTYPE_TUNNEL_GRENAT             0x0008 /*
> 0b0000000000001000 */
> > +#define RTE_PTYPE_TUNNEL_GRENAT_MAC         0x0009 /*
> 0b0000000000001001 */
> > +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN     0x000a /*
> 0b0000000000001010 */
> > +#define RTE_PTYPE_TUNNEL_MASK               0x000f /*
> 0b0000000000001111 */
> > +/* bit 7:4 for L3 types */
> > +#define RTE_PTYPE_L3_IPV4                   0x0010 /*
> 0b0000000000010000 */
> > +#define RTE_PTYPE_L3_IPV4_EXT               0x0030 /*
> 0b0000000000110000 */
> > +#define RTE_PTYPE_L3_IPV6                   0x0040 /*
> 0b0000000001000000 */
> > +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x0090 /*
> 0b0000000010010000 */
> > +#define RTE_PTYPE_L3_IPV6_EXT               0x00c0 /*
> 0b0000000011000000 */
> > +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x00e0 /*
> 0b0000000011100000 */
> > +#define RTE_PTYPE_L3_MASK                   0x00f0 /*
> 0b0000000011110000 */
> 
> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT or
> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified the
> L3 checksum?
RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of above 3 can be set.
These bits don't indicate any checksum, checksum should be indicated by other flags.
They are just for packet types hardware can recognized.

> 
> My understanding is:
> 
> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0
>    -> checksum was checked by hw and is good
> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1
>    -> checksum was checked by hw and is bad
> - if packet_type is not IPv4*
>    -> checksum was not checked by hw
> 
> I think it would solve the problem asked by Stephen
> http://dpdk.org/ml/archives/dev/2015-January/011550.html
> 
> > +/* bit 10:8 for L4 types */
> > +#define RTE_PTYPE_L4_TCP                    0x0100 /*
> 0b0000000100000000 */
> > +#define RTE_PTYPE_L4_UDP                    0x0200 /*
> 0b0000001000000000 */
> > +#define RTE_PTYPE_L4_FRAG                   0x0300 /*
> 0b0000001100000000 */
> > +#define RTE_PTYPE_L4_SCTP                   0x0400 /*
> 0b0000010000000000 */
> > +#define RTE_PTYPE_L4_ICMP                   0x0500 /*
> 0b0000010100000000 */
> > +#define RTE_PTYPE_L4_NONFRAG                0x0600 /*
> 0b0000011000000000 */
> > +#define RTE_PTYPE_L4_MASK                   0x0700 /*
> 0b0000011100000000 */
> 
> Same question for L4.
> 
> Note: it would means that if a hardware is able to recognize a TCP packet but
> not to verify the checksum, it has to set RTE_PTYPE_L4 to unknown.
> 
> > +/* bit 13:11 for inner L3 types */
> > +#define RTE_PTYPE_INNER_L3_IPV4             0x0800 /*
> 0b0000100000000000 */
> > +#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x1000 /*
> 0b0001000000000000 */
> > +#define RTE_PTYPE_INNER_L3_IPV6             0x1800 /*
> 0b0001100000000000 */
> > +#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x2000 /*
> 0b0010000000000000 */
> > +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /*
> > +0b0010100000000000 */ #define
> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* 0b0011000000000000
> */
We cannot define the hardware behaviors, it just reports the hardware recognized
packet information directly to the mbuf.
Based on my experiment on i40e hardware, if a IPV4 packet with wrong checksum,
by default, the PMD driver cannot see the packet at all. So we don't need to care
about it too much!
Thanks for the good comments!

> > +#define RTE_PTYPE_INNER_L3_MASK             0x3800 /*
> 0b0011100000000000 */
> > +/* bit 15:14 reserved */
> > +
> > +/**
> > + * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4
> > +types one by
> > + * one, bit 4 is selected to be used for IPv4 only. Then checking bit
> > +4 can
> > + * determin if it is an IPV4 packet.
> > + */
> > +#define  RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
> > +
> > +/**
> > + * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4
> > +types one by
> > + * one, bit 6 is selected to be used for IPv4 only. Then checking bit
> > +6 can
> > + * determin if it is an IPV4 packet.
> > + */
> > +#define  RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV6)
> > +
> > +/* Check if it is a tunneling packet */ #define
> > +RTE_ETH_IS_TUNNEL_PKT(ptype) ((ptype) & RTE_PTYPE_TUNNEL_MASK)
> > +
> >  /**
> >   * Get the name of a RX offload flag
> >   *
> >
> 
> Thanks,
> Olivier
  
Olivier Matz Feb. 2, 2015, 11:18 a.m. UTC | #3
Hi Helin,

On 02/02/2015 02:43 AM, Zhang, Helin wrote:
>>> +/*
>>> + * Sixteen bits are divided into several fields to mark packet types.
>>> +Note that
>>> + * each field is indexical.
>>> + * - Bit 3:0 is for tunnel types.
>>> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
>>> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
>>> + *   tunneling packets.
>>> + * - Bit 13:11 is for inner L3 types.
>>> + * - Bit 15:14 is reserved.
>>
>> Is there a reason why using this specific order?
> Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need to be contiguous
> and in this order.

When you say "need to be", do you mean it's impossible to do in another
manner or just that it would be slower?

>> Also, there are 4 bits for outer L3 types and 3 bits for inner L3 types, but both of
>> them have 6 different supported types. Is it intentional?
> Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are needed, though 1 bit wasted.

To be honnest, I'm always a surprised that in dpdk we prefer having
a strange API just because it's faster or easier to do on one
specific driver (usually i40e or ixgbe). Unfortunately, trying to
optimize the API for one driver may result in making the rest of the
code (application and other drivers) slower and more complex.

In your proposition, there is no inner l4_type. I consider it's as
useful as the other fields. From what I see, there are only 2 bits
left. What do you think about changing the packet type to 64 bits now?

From an API point of view, I think it would be good to have the
same structure for inner and outer types. For instance (this is
just an example):

union layer_pkt_type {
	struct {
		uint16_t l2_type:4;
		uint16_t l3_type:4;
		uint16_t l4_type:4;
		uint16_t tun_type:4;
	};
	uint16_t u16;
};

struct pkt_type {
	union layer_pkt_type outer;
	union layer_pkt_type inner;
};

When your application decapsulates tunnels, you can just do
outer = inner and enter into the same code.


>>> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP,
>>> +RTE_PTYPE_L4_UDP
>>> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
>>> + *
>>> + * Note that L3 types values are selected for checking IPV4/IPV6
>>> +header from
>>> + * performance point of view. Reading annotations of
>>> +RTE_ETH_IS_IPV4_HDR and
>>> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type
>> values.
>>> + */
>>> +#define RTE_PTYPE_UNKNOWN                   0x0000 /*
>> 0b0000000000000000 */
>>> +/* bit 3:0 for tunnel types */
>>> +#define RTE_PTYPE_TUNNEL_IP                 0x0001 /*
>> 0b0000000000000001 */
>>> +#define RTE_PTYPE_TUNNEL_TCP                0x0002 /*
>> 0b0000000000000010 */
>>> +#define RTE_PTYPE_TUNNEL_UDP                0x0003 /*
>> 0b0000000000000011 */
>>> +#define RTE_PTYPE_TUNNEL_GRE                0x0004 /*
>> 0b0000000000000100 */
>>> +#define RTE_PTYPE_TUNNEL_VXLAN              0x0005 /*
>> 0b0000000000000101 */
>>> +#define RTE_PTYPE_TUNNEL_NVGRE              0x0006 /*
>> 0b0000000000000110 */
>>> +#define RTE_PTYPE_TUNNEL_GENEVE             0x0007 /*
>> 0b0000000000000111 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT             0x0008 /*
>> 0b0000000000001000 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC         0x0009 /*
>> 0b0000000000001001 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN     0x000a /*
>> 0b0000000000001010 */
>>> +#define RTE_PTYPE_TUNNEL_MASK               0x000f /*
>> 0b0000000000001111 */
>>> +/* bit 7:4 for L3 types */
>>> +#define RTE_PTYPE_L3_IPV4                   0x0010 /*
>> 0b0000000000010000 */
>>> +#define RTE_PTYPE_L3_IPV4_EXT               0x0030 /*
>> 0b0000000000110000 */
>>> +#define RTE_PTYPE_L3_IPV6                   0x0040 /*
>> 0b0000000001000000 */
>>> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x0090 /*
>> 0b0000000010010000 */
>>> +#define RTE_PTYPE_L3_IPV6_EXT               0x00c0 /*
>> 0b0000000011000000 */
>>> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x00e0 /*
>> 0b0000000011100000 */
>>> +#define RTE_PTYPE_L3_MASK                   0x00f0 /*
>> 0b0000000011110000 */
>>
>> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT or
>> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified the
>> L3 checksum?
> RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of above 3 can be set.
> These bits don't indicate any checksum, checksum should be indicated by other flags.
> They are just for packet types hardware can recognized.

I think these 2 information are linked:

- if the hardware cannot recognize packet, it cannot calculate the
  checksum because it does not know the packet type
- if the hardware can recognize the packet, it can verify that the
  checksum is good or wrong

Today, we have:

- PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT to tell if the packet is
  seen as IPv4 by the hw.

- We can suppose that:

  - PKT_RX_IPV4_HDR(_EXT)=0 -> no hw checksum information
  - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=0 -> checksum
    is correct
  - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=1 -> checksum
    is not correct

- We cannot do the same with L4 because we have no L4 type info,
  but it would be good to be able to do the same.

With your patch, you are removing the PKT_RX_IPV4_HDR and
PKT_RX_IPV4_HDR_EXT flags, but I think the above assumption
about checksum should be kept. As you are adding a L4 type
info, the same method could be applied to L4 checksums.

I think this would definitely solve the problem described by Stephen.


>> My understanding is:
>>
>> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0
>>    -> checksum was checked by hw and is good
>> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1
>>    -> checksum was checked by hw and is bad
>> - if packet_type is not IPv4*
>>    -> checksum was not checked by hw
>>
>> I think it would solve the problem asked by Stephen
>> http://dpdk.org/ml/archives/dev/2015-January/011550.html
>>
>>> +/* bit 10:8 for L4 types */
>>> +#define RTE_PTYPE_L4_TCP                    0x0100 /*
>> 0b0000000100000000 */
>>> +#define RTE_PTYPE_L4_UDP                    0x0200 /*
>> 0b0000001000000000 */
>>> +#define RTE_PTYPE_L4_FRAG                   0x0300 /*
>> 0b0000001100000000 */
>>> +#define RTE_PTYPE_L4_SCTP                   0x0400 /*
>> 0b0000010000000000 */
>>> +#define RTE_PTYPE_L4_ICMP                   0x0500 /*
>> 0b0000010100000000 */
>>> +#define RTE_PTYPE_L4_NONFRAG                0x0600 /*
>> 0b0000011000000000 */
>>> +#define RTE_PTYPE_L4_MASK                   0x0700 /*
>> 0b0000011100000000 */
>>
>> Same question for L4.
>>
>> Note: it would means that if a hardware is able to recognize a TCP packet but
>> not to verify the checksum, it has to set RTE_PTYPE_L4 to unknown.
>>
>>> +/* bit 13:11 for inner L3 types */
>>> +#define RTE_PTYPE_INNER_L3_IPV4             0x0800 /*
>> 0b0000100000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x1000 /*
>> 0b0001000000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV6             0x1800 /*
>> 0b0001100000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x2000 /*
>> 0b0010000000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /*
>>> +0b0010100000000000 */ #define
>> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* 0b0011000000000000
>> */
> We cannot define the hardware behaviors, it just reports the hardware recognized
> packet information directly to the mbuf.
> Based on my experiment on i40e hardware, if a IPV4 packet with wrong checksum,
> by default, the PMD driver cannot see the packet at all. So we don't need to care
> about it too much!

I agree that the hardware reports some info that can be different
depending on the hw. But the role of the driver is to convert these info
into a common API with a well-defined behavior.

Regards,
Olivier
  
Zhang, Helin Feb. 3, 2015, 3:18 a.m. UTC | #4
> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Monday, February 2, 2015 7:18 PM
> To: Zhang, Helin; dev@dpdk.org
> Cc: Stephen Hemminger
> Subject: Re: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet
> types
> 
> Hi Helin,
> 
> On 02/02/2015 02:43 AM, Zhang, Helin wrote:
> >>> +/*
> >>> + * Sixteen bits are divided into several fields to mark packet types.
> >>> +Note that
> >>> + * each field is indexical.
> >>> + * - Bit 3:0 is for tunnel types.
> >>> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
> >>> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
> >>> + *   tunneling packets.
> >>> + * - Bit 13:11 is for inner L3 types.
> >>> + * - Bit 15:14 is reserved.
> >>
> >> Is there a reason why using this specific order?
> > Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need to
> > be contiguous and in this order.
> 
> When you say "need to be", do you mean it's impossible to do in another
> manner or just that it would be slower?
It was designed to be like this, otherwise, performance drop must be expected.

> 
> >> Also, there are 4 bits for outer L3 types and 3 bits for inner L3
> >> types, but both of them have 6 different supported types. Is it intentional?
> > Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are needed, though
> 1 bit wasted.
> 
> To be honnest, I'm always a surprised that in dpdk we prefer having a strange
> API just because it's faster or easier to do on one specific driver (usually i40e or
> ixgbe). Unfortunately, trying to optimize the API for one driver may result in
> making the rest of the code (application and other drivers) slower and more
> complex.
Based on my understanding, 'faster' is most of DPDK customers wanted. Otherwise,
they don't need DPDK. Different hardware must have different capabilities, I am trying
to unify at least packet types to get things easier.

> 
> In your proposition, there is no inner l4_type. I consider it's as useful as the
> other fields. From what I see, there are only 2 bits left. What do you think about
> changing the packet type to 64 bits now?
For tunneling cases, L4_type is for inner L4 type, outer L4 type is not needed, as it
can be in tunnel type.
I can expect 64 bits are needed in the future. But for now, I don't see any strong
demand on that for currently supported hardware.
In addition, there is no free bit in the first cache line of mbuf header, mbuf changes
are needed to expand it. I'd prefer to do it later to make things easier.

> 
> From an API point of view, I think it would be good to have the same structure
> for inner and outer types. For instance (this is just an example):
> 
> union layer_pkt_type {
> 	struct {
> 		uint16_t l2_type:4;
> 		uint16_t l3_type:4;
> 		uint16_t l4_type:4;
> 		uint16_t tun_type:4;
> 	};
> 	uint16_t u16;
> };
> 
> struct pkt_type {
> 	union layer_pkt_type outer;
> 	union layer_pkt_type inner;
> };
> 
> When your application decapsulates tunnels, you can just do outer = inner and
> enter into the same code.
Expanding packet_type is not easy, as there is no free bits in the first cache line.
Is there any tunnel type in inner packet? Is it a waste?
Is L2 type really needed? I don't know.

> 
> 
> >>> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP,
> >>> +RTE_PTYPE_L4_UDP
> >>> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7
> bits.
> >>> + *
> >>> + * Note that L3 types values are selected for checking IPV4/IPV6
> >>> +header from
> >>> + * performance point of view. Reading annotations of
> >>> +RTE_ETH_IS_IPV4_HDR and
> >>> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type
> >> values.
> >>> + */
> >>> +#define RTE_PTYPE_UNKNOWN                   0x0000 /*
> >> 0b0000000000000000 */
> >>> +/* bit 3:0 for tunnel types */
> >>> +#define RTE_PTYPE_TUNNEL_IP                 0x0001 /*
> >> 0b0000000000000001 */
> >>> +#define RTE_PTYPE_TUNNEL_TCP                0x0002 /*
> >> 0b0000000000000010 */
> >>> +#define RTE_PTYPE_TUNNEL_UDP                0x0003 /*
> >> 0b0000000000000011 */
> >>> +#define RTE_PTYPE_TUNNEL_GRE                0x0004 /*
> >> 0b0000000000000100 */
> >>> +#define RTE_PTYPE_TUNNEL_VXLAN              0x0005 /*
> >> 0b0000000000000101 */
> >>> +#define RTE_PTYPE_TUNNEL_NVGRE              0x0006 /*
> >> 0b0000000000000110 */
> >>> +#define RTE_PTYPE_TUNNEL_GENEVE             0x0007 /*
> >> 0b0000000000000111 */
> >>> +#define RTE_PTYPE_TUNNEL_GRENAT             0x0008 /*
> >> 0b0000000000001000 */
> >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC         0x0009 /*
> >> 0b0000000000001001 */
> >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN     0x000a /*
> >> 0b0000000000001010 */
> >>> +#define RTE_PTYPE_TUNNEL_MASK               0x000f /*
> >> 0b0000000000001111 */
> >>> +/* bit 7:4 for L3 types */
> >>> +#define RTE_PTYPE_L3_IPV4                   0x0010 /*
> >> 0b0000000000010000 */
> >>> +#define RTE_PTYPE_L3_IPV4_EXT               0x0030 /*
> >> 0b0000000000110000 */
> >>> +#define RTE_PTYPE_L3_IPV6                   0x0040 /*
> >> 0b0000000001000000 */
> >>> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x0090 /*
> >> 0b0000000010010000 */
> >>> +#define RTE_PTYPE_L3_IPV6_EXT               0x00c0 /*
> >> 0b0000000011000000 */
> >>> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x00e0 /*
> >> 0b0000000011100000 */
> >>> +#define RTE_PTYPE_L3_MASK                   0x00f0 /*
> >> 0b0000000011110000 */
> >>
> >> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT or
> >> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified the
> >> L3 checksum?
> > RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of above 3
> can be set.
> > These bits don't indicate any checksum, checksum should be indicated by
> other flags.
> > They are just for packet types hardware can recognized.
> 
> I think these 2 information are linked:
> 
> - if the hardware cannot recognize packet, it cannot calculate the
>   checksum because it does not know the packet type
> - if the hardware can recognize the packet, it can verify that the
>   checksum is good or wrong
We cannot know how hardware works, we care about what hardware can report.

> 
> Today, we have:
> 
> - PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT to tell if the packet is
>   seen as IPv4 by the hw.
> 
> - We can suppose that:
> 
>   - PKT_RX_IPV4_HDR(_EXT)=0 -> no hw checksum information
>   - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=0 -> checksum
>     is correct
>   - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=1 -> checksum
>     is not correct
> 
> - We cannot do the same with L4 because we have no L4 type info,
>   but it would be good to be able to do the same.
> 
> With your patch, you are removing the PKT_RX_IPV4_HDR and
> PKT_RX_IPV4_HDR_EXT flags, but I think the above assumption about
> checksum should be kept. As you are adding a L4 type info, the same method
> could be applied to L4 checksums.
> 
> I think this would definitely solve the problem described by Stephen.
I think packet type and checksum are different things. They are reported by different fields.
PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT mean packet type only,
nothing about checksum. Checksum GOOD/BAD can be reported by other flags in ol_flags.

> 
> 
> >> My understanding is:
> >>
> >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0
> >>    -> checksum was checked by hw and is good
> >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1
> >>    -> checksum was checked by hw and is bad
> >> - if packet_type is not IPv4*
> >>    -> checksum was not checked by hw
> >>
> >> I think it would solve the problem asked by Stephen
> >> http://dpdk.org/ml/archives/dev/2015-January/011550.html
> >>
> >>> +/* bit 10:8 for L4 types */
> >>> +#define RTE_PTYPE_L4_TCP                    0x0100 /*
> >> 0b0000000100000000 */
> >>> +#define RTE_PTYPE_L4_UDP                    0x0200 /*
> >> 0b0000001000000000 */
> >>> +#define RTE_PTYPE_L4_FRAG                   0x0300 /*
> >> 0b0000001100000000 */
> >>> +#define RTE_PTYPE_L4_SCTP                   0x0400 /*
> >> 0b0000010000000000 */
> >>> +#define RTE_PTYPE_L4_ICMP                   0x0500 /*
> >> 0b0000010100000000 */
> >>> +#define RTE_PTYPE_L4_NONFRAG                0x0600 /*
> >> 0b0000011000000000 */
> >>> +#define RTE_PTYPE_L4_MASK                   0x0700 /*
> >> 0b0000011100000000 */
> >>
> >> Same question for L4.
> >>
> >> Note: it would means that if a hardware is able to recognize a TCP
> >> packet but not to verify the checksum, it has to set RTE_PTYPE_L4 to
> unknown.
> >>
> >>> +/* bit 13:11 for inner L3 types */
> >>> +#define RTE_PTYPE_INNER_L3_IPV4             0x0800 /*
> >> 0b0000100000000000 */
> >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x1000 /*
> >> 0b0001000000000000 */
> >>> +#define RTE_PTYPE_INNER_L3_IPV6             0x1800 /*
> >> 0b0001100000000000 */
> >>> +#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x2000 /*
> >> 0b0010000000000000 */
> >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /*
> >>> +0b0010100000000000 */ #define
> >> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /*
> 0b0011000000000000 */
> > We cannot define the hardware behaviors, it just reports the hardware
> > recognized packet information directly to the mbuf.
> > Based on my experiment on i40e hardware, if a IPV4 packet with wrong
> > checksum, by default, the PMD driver cannot see the packet at all. So
> > we don't need to care about it too much!
> 
> I agree that the hardware reports some info that can be different depending on
> the hw. But the role of the driver is to convert these info into a common API
> with a well-defined behavior.
Yes, driver should report the received packet information to a well-defined behavior,
but not the same behavior, even for the same packet.
Capability can be queried for each port, and then the application can know the port
capability well, and know what the hardware can report, and what the hardware
cannot report.
Driver should enable the hardware with its advanced capabilities as most as possible.

> 
> Regards,
> Olivier
  
Zhang, Helin Feb. 3, 2015, 6:37 a.m. UTC | #5
> -----Original Message-----
> From: Zhang, Helin
> Sent: Tuesday, February 3, 2015 11:19 AM
> To: Olivier MATZ; dev@dpdk.org
> Cc: Stephen Hemminger
> Subject: RE: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet
> types
> 
> 
> 
> > -----Original Message-----
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Monday, February 2, 2015 7:18 PM
> > To: Zhang, Helin; dev@dpdk.org
> > Cc: Stephen Hemminger
> > Subject: Re: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified
> > packet types
> >
> > Hi Helin,
> >
> > On 02/02/2015 02:43 AM, Zhang, Helin wrote:
> > >>> +/*
> > >>> + * Sixteen bits are divided into several fields to mark packet types.
> > >>> +Note that
> > >>> + * each field is indexical.
> > >>> + * - Bit 3:0 is for tunnel types.
> > >>> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
> > >>> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
> > >>> + *   tunneling packets.
> > >>> + * - Bit 13:11 is for inner L3 types.
> > >>> + * - Bit 15:14 is reserved.
> > >>
> > >> Is there a reason why using this specific order?
> > > Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need
> > > to be contiguous and in this order.
> >
> > When you say "need to be", do you mean it's impossible to do in
> > another manner or just that it would be slower?
> It was designed to be like this, otherwise, performance drop must be expected.
> 
> >
> > >> Also, there are 4 bits for outer L3 types and 3 bits for inner L3
> > >> types, but both of them have 6 different supported types. Is it intentional?
> > > Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are
> > > needed, though
> > 1 bit wasted.
> >
> > To be honnest, I'm always a surprised that in dpdk we prefer having a
> > strange API just because it's faster or easier to do on one specific
> > driver (usually i40e or ixgbe). Unfortunately, trying to optimize the
> > API for one driver may result in making the rest of the code
> > (application and other drivers) slower and more complex.
> Based on my understanding, 'faster' is most of DPDK customers wanted.
> Otherwise, they don't need DPDK. Different hardware must have different
> capabilities, I am trying to unify at least packet types to get things easier.
> 
> >
> > In your proposition, there is no inner l4_type. I consider it's as
> > useful as the other fields. From what I see, there are only 2 bits
> > left. What do you think about changing the packet type to 64 bits now?
> For tunneling cases, L4_type is for inner L4 type, outer L4 type is not needed, as
> it can be in tunnel type.
> I can expect 64 bits are needed in the future. But for now, I don't see any
> strong demand on that for currently supported hardware.
> In addition, there is no free bit in the first cache line of mbuf header, mbuf
> changes are needed to expand it. I'd prefer to do it later to make things easier.
Sorry, I misremember the usage of the first cache line of mbuf. It still has some
free space. Based on this, enlarging (to 32 or 64 bits) the packet type might be good.

> 
> >
> > From an API point of view, I think it would be good to have the same
> > structure for inner and outer types. For instance (this is just an example):
> >
> > union layer_pkt_type {
> > 	struct {
> > 		uint16_t l2_type:4;
> > 		uint16_t l3_type:4;
> > 		uint16_t l4_type:4;
> > 		uint16_t tun_type:4;
> > 	};
> > 	uint16_t u16;
> > };
> >
> > struct pkt_type {
> > 	union layer_pkt_type outer;
> > 	union layer_pkt_type inner;
> > };
> >
> > When your application decapsulates tunnels, you can just do outer =
> > inner and enter into the same code.
> Expanding packet_type is not easy, as there is no free bits in the first cache
> line.
> Is there any tunnel type in inner packet? Is it a waste?
> Is L2 type really needed? I don't know.
If it is now not short of space in mbuf, the definition as yours might be good.
But tun_type is not required for inner packet, I'd prefer to define it as needed
with taking into account the Vector PMD support. It seems 32 bits might be enough,
like below,
struct pkt_type {
	uint32_t l2_type:4;
	uint32_t l3_type:4;
	uint32_t l4_type:4;
	uint32_t tun_type:4;
	uint32_t inner_l2_type:4;
	uint32_t inner_l3_type:4;
	uint32_t inner_l4_type:4;
}

Regards,
Helin

> 
> >
> >
> > >>> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP,
> > >>> +RTE_PTYPE_L4_UDP
> > >>> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous
> > >>> +7
> > bits.
> > >>> + *
> > >>> + * Note that L3 types values are selected for checking IPV4/IPV6
> > >>> +header from
> > >>> + * performance point of view. Reading annotations of
> > >>> +RTE_ETH_IS_IPV4_HDR and
> > >>> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3
> > >>> +type
> > >> values.
> > >>> + */
> > >>> +#define RTE_PTYPE_UNKNOWN                   0x0000 /*
> > >> 0b0000000000000000 */
> > >>> +/* bit 3:0 for tunnel types */
> > >>> +#define RTE_PTYPE_TUNNEL_IP                 0x0001 /*
> > >> 0b0000000000000001 */
> > >>> +#define RTE_PTYPE_TUNNEL_TCP                0x0002 /*
> > >> 0b0000000000000010 */
> > >>> +#define RTE_PTYPE_TUNNEL_UDP                0x0003 /*
> > >> 0b0000000000000011 */
> > >>> +#define RTE_PTYPE_TUNNEL_GRE                0x0004 /*
> > >> 0b0000000000000100 */
> > >>> +#define RTE_PTYPE_TUNNEL_VXLAN              0x0005 /*
> > >> 0b0000000000000101 */
> > >>> +#define RTE_PTYPE_TUNNEL_NVGRE              0x0006 /*
> > >> 0b0000000000000110 */
> > >>> +#define RTE_PTYPE_TUNNEL_GENEVE             0x0007 /*
> > >> 0b0000000000000111 */
> > >>> +#define RTE_PTYPE_TUNNEL_GRENAT             0x0008 /*
> > >> 0b0000000000001000 */
> > >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC         0x0009 /*
> > >> 0b0000000000001001 */
> > >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN     0x000a /*
> > >> 0b0000000000001010 */
> > >>> +#define RTE_PTYPE_TUNNEL_MASK               0x000f /*
> > >> 0b0000000000001111 */
> > >>> +/* bit 7:4 for L3 types */
> > >>> +#define RTE_PTYPE_L3_IPV4                   0x0010 /*
> > >> 0b0000000000010000 */
> > >>> +#define RTE_PTYPE_L3_IPV4_EXT               0x0030 /*
> > >> 0b0000000000110000 */
> > >>> +#define RTE_PTYPE_L3_IPV6                   0x0040 /*
> > >> 0b0000000001000000 */
> > >>> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x0090 /*
> > >> 0b0000000010010000 */
> > >>> +#define RTE_PTYPE_L3_IPV6_EXT               0x00c0 /*
> > >> 0b0000000011000000 */
> > >>> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x00e0 /*
> > >> 0b0000000011100000 */
> > >>> +#define RTE_PTYPE_L3_MASK                   0x00f0 /*
> > >> 0b0000000011110000 */
> > >>
> > >> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT
> or
> > >> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified
> > >> the
> > >> L3 checksum?
> > > RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of
> > > above 3
> > can be set.
> > > These bits don't indicate any checksum, checksum should be indicated
> > > by
> > other flags.
> > > They are just for packet types hardware can recognized.
> >
> > I think these 2 information are linked:
> >
> > - if the hardware cannot recognize packet, it cannot calculate the
> >   checksum because it does not know the packet type
> > - if the hardware can recognize the packet, it can verify that the
> >   checksum is good or wrong
> We cannot know how hardware works, we care about what hardware can
> report.
> 
> >
> > Today, we have:
> >
> > - PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT to tell if the packet is
> >   seen as IPv4 by the hw.
> >
> > - We can suppose that:
> >
> >   - PKT_RX_IPV4_HDR(_EXT)=0 -> no hw checksum information
> >   - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=0 ->
> checksum
> >     is correct
> >   - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=1 ->
> checksum
> >     is not correct
> >
> > - We cannot do the same with L4 because we have no L4 type info,
> >   but it would be good to be able to do the same.
> >
> > With your patch, you are removing the PKT_RX_IPV4_HDR and
> > PKT_RX_IPV4_HDR_EXT flags, but I think the above assumption about
> > checksum should be kept. As you are adding a L4 type info, the same
> > method could be applied to L4 checksums.
> >
> > I think this would definitely solve the problem described by Stephen.
> I think packet type and checksum are different things. They are reported by
> different fields.
> PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT mean packet type only,
> nothing about checksum. Checksum GOOD/BAD can be reported by other flags
> in ol_flags.
> 
> >
> >
> > >> My understanding is:
> > >>
> > >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0
> > >>    -> checksum was checked by hw and is good
> > >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1
> > >>    -> checksum was checked by hw and is bad
> > >> - if packet_type is not IPv4*
> > >>    -> checksum was not checked by hw
> > >>
> > >> I think it would solve the problem asked by Stephen
> > >> http://dpdk.org/ml/archives/dev/2015-January/011550.html
> > >>
> > >>> +/* bit 10:8 for L4 types */
> > >>> +#define RTE_PTYPE_L4_TCP                    0x0100 /*
> > >> 0b0000000100000000 */
> > >>> +#define RTE_PTYPE_L4_UDP                    0x0200 /*
> > >> 0b0000001000000000 */
> > >>> +#define RTE_PTYPE_L4_FRAG                   0x0300 /*
> > >> 0b0000001100000000 */
> > >>> +#define RTE_PTYPE_L4_SCTP                   0x0400 /*
> > >> 0b0000010000000000 */
> > >>> +#define RTE_PTYPE_L4_ICMP                   0x0500 /*
> > >> 0b0000010100000000 */
> > >>> +#define RTE_PTYPE_L4_NONFRAG                0x0600 /*
> > >> 0b0000011000000000 */
> > >>> +#define RTE_PTYPE_L4_MASK                   0x0700 /*
> > >> 0b0000011100000000 */
> > >>
> > >> Same question for L4.
> > >>
> > >> Note: it would means that if a hardware is able to recognize a TCP
> > >> packet but not to verify the checksum, it has to set RTE_PTYPE_L4
> > >> to
> > unknown.
> > >>
> > >>> +/* bit 13:11 for inner L3 types */
> > >>> +#define RTE_PTYPE_INNER_L3_IPV4             0x0800 /*
> > >> 0b0000100000000000 */
> > >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x1000 /*
> > >> 0b0001000000000000 */
> > >>> +#define RTE_PTYPE_INNER_L3_IPV6             0x1800 /*
> > >> 0b0001100000000000 */
> > >>> +#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x2000 /*
> > >> 0b0010000000000000 */
> > >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /*
> > >>> +0b0010100000000000 */ #define
> > >> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /*
> > 0b0011000000000000 */
> > > We cannot define the hardware behaviors, it just reports the
> > > hardware recognized packet information directly to the mbuf.
> > > Based on my experiment on i40e hardware, if a IPV4 packet with wrong
> > > checksum, by default, the PMD driver cannot see the packet at all.
> > > So we don't need to care about it too much!
> >
> > I agree that the hardware reports some info that can be different
> > depending on the hw. But the role of the driver is to convert these
> > info into a common API with a well-defined behavior.
> Yes, driver should report the received packet information to a well-defined
> behavior, but not the same behavior, even for the same packet.
> Capability can be queried for each port, and then the application can know the
> port capability well, and know what the hardware can report, and what the
> hardware cannot report.
> Driver should enable the hardware with its advanced capabilities as most as
> possible.
> 
> >
> > Regards,
> > Olivier
  
Olivier Matz Feb. 3, 2015, 9:12 a.m. UTC | #6
Hi Helin,

On 02/03/2015 07:37 AM, Zhang, Helin wrote:
>>> When your application decapsulates tunnels, you can just do outer =
>>> inner and enter into the same code.
>> Expanding packet_type is not easy, as there is no free bits in the first cache
>> line.
>> Is there any tunnel type in inner packet? Is it a waste?
>> Is L2 type really needed? I don't know.
> If it is now not short of space in mbuf, the definition as yours might be good.
> But tun_type is not required for inner packet, I'd prefer to define it as needed
> with taking into account the Vector PMD support. It seems 32 bits might be enough,
> like below,
> struct pkt_type {
> 	uint32_t l2_type:4;
> 	uint32_t l3_type:4;
> 	uint32_t l4_type:4;
> 	uint32_t tun_type:4;
> 	uint32_t inner_l2_type:4;
> 	uint32_t inner_l3_type:4;
> 	uint32_t inner_l4_type:4;
> }

Yes, I think a structure like this would be much better!
Maybe a union with a u32 could also help to assign the value
in one operation.

Thanks,
Olivier
  

Patch

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 16059c6..94ae344 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -165,6 +165,80 @@  extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+/*
+ * Sixteen bits are divided into several fields to mark packet types. Note that
+ * each field is indexical.
+ * - Bit 3:0 is for tunnel types.
+ * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
+ * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
+ *   tunneling packets.
+ * - Bit 13:11 is for inner L3 types.
+ * - Bit 15:14 is reserved.
+ *
+ * To be compitable with Vector PMD, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT,
+ * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP
+ * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
+ *
+ * Note that L3 types values are selected for checking IPV4/IPV6 header from
+ * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
+ * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
+ */
+#define RTE_PTYPE_UNKNOWN                   0x0000 /* 0b0000000000000000 */
+/* bit 3:0 for tunnel types */
+#define RTE_PTYPE_TUNNEL_IP                 0x0001 /* 0b0000000000000001 */
+#define RTE_PTYPE_TUNNEL_TCP                0x0002 /* 0b0000000000000010 */
+#define RTE_PTYPE_TUNNEL_UDP                0x0003 /* 0b0000000000000011 */
+#define RTE_PTYPE_TUNNEL_GRE                0x0004 /* 0b0000000000000100 */
+#define RTE_PTYPE_TUNNEL_VXLAN              0x0005 /* 0b0000000000000101 */
+#define RTE_PTYPE_TUNNEL_NVGRE              0x0006 /* 0b0000000000000110 */
+#define RTE_PTYPE_TUNNEL_GENEVE             0x0007 /* 0b0000000000000111 */
+#define RTE_PTYPE_TUNNEL_GRENAT             0x0008 /* 0b0000000000001000 */
+#define RTE_PTYPE_TUNNEL_GRENAT_MAC         0x0009 /* 0b0000000000001001 */
+#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN     0x000a /* 0b0000000000001010 */
+#define RTE_PTYPE_TUNNEL_MASK               0x000f /* 0b0000000000001111 */
+/* bit 7:4 for L3 types */
+#define RTE_PTYPE_L3_IPV4                   0x0010 /* 0b0000000000010000 */
+#define RTE_PTYPE_L3_IPV4_EXT               0x0030 /* 0b0000000000110000 */
+#define RTE_PTYPE_L3_IPV6                   0x0040 /* 0b0000000001000000 */
+#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x0090 /* 0b0000000010010000 */
+#define RTE_PTYPE_L3_IPV6_EXT               0x00c0 /* 0b0000000011000000 */
+#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x00e0 /* 0b0000000011100000 */
+#define RTE_PTYPE_L3_MASK                   0x00f0 /* 0b0000000011110000 */
+/* bit 10:8 for L4 types */
+#define RTE_PTYPE_L4_TCP                    0x0100 /* 0b0000000100000000 */
+#define RTE_PTYPE_L4_UDP                    0x0200 /* 0b0000001000000000 */
+#define RTE_PTYPE_L4_FRAG                   0x0300 /* 0b0000001100000000 */
+#define RTE_PTYPE_L4_SCTP                   0x0400 /* 0b0000010000000000 */
+#define RTE_PTYPE_L4_ICMP                   0x0500 /* 0b0000010100000000 */
+#define RTE_PTYPE_L4_NONFRAG                0x0600 /* 0b0000011000000000 */
+#define RTE_PTYPE_L4_MASK                   0x0700 /* 0b0000011100000000 */
+/* bit 13:11 for inner L3 types */
+#define RTE_PTYPE_INNER_L3_IPV4             0x0800 /* 0b0000100000000000 */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x1000 /* 0b0001000000000000 */
+#define RTE_PTYPE_INNER_L3_IPV6             0x1800 /* 0b0001100000000000 */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x2000 /* 0b0010000000000000 */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /* 0b0010100000000000 */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* 0b0011000000000000 */
+#define RTE_PTYPE_INNER_L3_MASK             0x3800 /* 0b0011100000000000 */
+/* bit 15:14 reserved */
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 4 is selected to be used for IPv4 only. Then checking bit 4 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 6 is selected to be used for IPv4 only. Then checking bit 6 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV6)
+
+/* Check if it is a tunneling packet */
+#define RTE_ETH_IS_TUNNEL_PKT(ptype) ((ptype) & RTE_PTYPE_TUNNEL_MASK)
+
 /**
  * Get the name of a RX offload flag
  *