[dpdk-dev,v4,3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields

Message ID 1417503172-18642-4-git-send-email-jijiang.liu@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Jijiang Liu Dec. 2, 2014, 6:52 a.m. UTC
  Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine and i40e PMD due to  these changes.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 app/test-pmd/csumonly.c         |   60 +++++++++++++++++++++-----------------
 lib/librte_mbuf/rte_mbuf.h      |    4 +-
 lib/librte_pmd_i40e/i40e_rxtx.c |   38 +++++++++++++-----------
 3 files changed, 55 insertions(+), 47 deletions(-)
  

Comments

Didier Pallard Dec. 2, 2014, 2:53 p.m. UTC | #1
Hello,

On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine and i40e PMD due to  these changes.
>
> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
[...]
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -276,8 +276,8 @@ struct rte_mbuf {
>   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
>   
>   			/* fields for TX offloading of tunnels */
> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
>   
>   			/* uint64_t unused:8; */
>   		};

Sorry for entering lately this discussion, but i'm not convinced by the 
choice of outer_lx_len rather than inner_lx_len for new fields.
I agree with Olivier that new flags should only be related to the use of 
new fields, to maintain coherency with oldest implementations.
But from a stack point of view, i think it is better to have lx_len 
fields that target the outer layers, and to name new fields inner_lx_len.

Let's discuss the two possibilities.

1) outer_lx_len fields are introduced.
In this case, the stack should have knowledge that it is processing 
tunneled packets to use outer_lx_len rather than lx_len,
or stack should always use outer_lx_len packet and move those fields to 
lx_len packets if no tunneling occurs...
I think it will induce extra processing that does not seem to be really 
needed.

2) inner_lx_len fields are introduced.
In this case, the stack first uses lx_len fields. When packet should be 
tunneled, lx_len fields are moved to inner_lx_len fields.
Then the stack can process the outer layer and still use the lx_len fields.

For  example:
an eth/IP/TCP forged packet will look like this:

Ether/IP/UDP/xxx
   m->flags = IP_CKSUM
   m->l2_len = sizeof(ether)
   m->l3_len = sizeof(ip)
   m->l4_len = sizeof(udp)
   m->inner_l2_len = 0
   m->inner_l3_len = 0

When entering tunnel for example a VXLAN interface, lx_len will be moved 
to inner_lx_len

Ether/IP/UDP/xxx
   m->flags = INNER_IP_CKSUM
   m->l2_len = 0
   m->l3_len = 0
   m->l4_len = 0
   m->inner_l2_len = sizeof(ether)
   m->inner_l3_len = sizeof(ip)
  

once complete encapsulation is processed by the stack, the packet will 
look like

Ether/IP/UDP/VXLAN/Ether/IP/UDP/xxx
   m->flags = IP_CKSUM | INNER_IP_CKSUM
   m->l2_len = sizeof(ether)
   m->l3_len = sizeof(ip)
   m->l4_len = sizeof(udp)
   m->inner_l2_len = sizeof(ether) + sizeof (vxlan)
   m->inner_l3_len = sizeof(ip)


didier
  
Ananyev, Konstantin Dec. 2, 2014, 3:36 p.m. UTC | #2
Hi Didier

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard
> Sent: Tuesday, December 02, 2014 2:53 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> 
> Hello,
> 
> On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine
> and i40e PMD due to  these changes.
> >
> > Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> [...]
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -276,8 +276,8 @@ struct rte_mbuf {
> >   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >
> >   			/* fields for TX offloading of tunnels */
> > -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> > -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> > +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> > +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
> >
> >   			/* uint64_t unused:8; */
> >   		};
> 
> Sorry for entering lately this discussion, but i'm not convinced by the
> choice of outer_lx_len rather than inner_lx_len for new fields.
> I agree with Olivier that new flags should only be related to the use of
> new fields, to maintain coherency with oldest implementations.
> But from a stack point of view, i think it is better to have lx_len
> fields that target the outer layers, and to name new fields inner_lx_len.
> 
> Let's discuss the two possibilities.
> 
> 1) outer_lx_len fields are introduced.
> In this case, the stack should have knowledge that it is processing
> tunneled packets to use outer_lx_len rather than lx_len,
> or stack should always use outer_lx_len packet and move those fields to
> lx_len packets if no tunneling occurs...
> I think it will induce extra processing that does not seem to be really
> needed.
> 
> 2) inner_lx_len fields are introduced.
> In this case, the stack first uses lx_len fields. When packet should be
> tunneled, lx_len fields are moved to inner_lx_len fields.
> Then the stack can process the outer layer and still use the lx_len fields.

Not sure, that I understood why 2) is better than 1).
Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
Then later, you decided to 'tunnel' that packet.
So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
l3_len/l4_len and ol_flags bits set for them remain intact.
That's with 1)

With 2) - you'll have to move l3_len/l4_len to inner_lx_len. 
And I suppose ol_flags values too:
ol_flags &= ~PKT_ IP_CKSUM;
ol_flgas  |=  PKT_INNER_IP_CKSUM
?
And same for L4_CKSUM flags too?

Konstantin

> 
> For  example:
> an eth/IP/TCP forged packet will look like this:
> 
> Ether/IP/UDP/xxx
>    m->flags = IP_CKSUM
>    m->l2_len = sizeof(ether)
>    m->l3_len = sizeof(ip)
>    m->l4_len = sizeof(udp)
>    m->inner_l2_len = 0
>    m->inner_l3_len = 0
> 
> When entering tunnel for example a VXLAN interface, lx_len will be moved
> to inner_lx_len
> 
> Ether/IP/UDP/xxx
>    m->flags = INNER_IP_CKSUM
>    m->l2_len = 0
>    m->l3_len = 0
>    m->l4_len = 0
>    m->inner_l2_len = sizeof(ether)
>    m->inner_l3_len = sizeof(ip)
> 
> 
> once complete encapsulation is processed by the stack, the packet will
> look like
> 
> Ether/IP/UDP/VXLAN/Ether/IP/UDP/xxx
>    m->flags = IP_CKSUM | INNER_IP_CKSUM
>    m->l2_len = sizeof(ether)
>    m->l3_len = sizeof(ip)
>    m->l4_len = sizeof(udp)
>    m->inner_l2_len = sizeof(ether) + sizeof (vxlan)
>    m->inner_l3_len = sizeof(ip)
> 
> 
> didier
>
  
Olivier Matz Dec. 3, 2014, 8:57 a.m. UTC | #3
Hi Didier, Konstantin, Jijiang,

On 12/02/2014 04:36 PM, Ananyev, Konstantin wrote:
> Hi Didier
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard
>> Sent: Tuesday, December 02, 2014 2:53 PM
>> To: Liu, Jijiang; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
>>
>> Hello,
>>
>> On 12/02/2014 07:52 AM, Jijiang Liu wrote:
>>> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine
>> and i40e PMD due to  these changes.
>>>
>>> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
>> [...]
>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>> @@ -276,8 +276,8 @@ struct rte_mbuf {
>>>    			uint64_t tso_segsz:16; /**< TCP TSO segment size */
>>>
>>>    			/* fields for TX offloading of tunnels */
>>> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
>>> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
>>> +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
>>> +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
>>>
>>>    			/* uint64_t unused:8; */
>>>    		};
>>
>> Sorry for entering lately this discussion, but i'm not convinced by the
>> choice of outer_lx_len rather than inner_lx_len for new fields.
>> I agree with Olivier that new flags should only be related to the use of
>> new fields, to maintain coherency with oldest implementations.
>> But from a stack point of view, i think it is better to have lx_len
>> fields that target the outer layers, and to name new fields inner_lx_len.
>>
>> Let's discuss the two possibilities.
>>
>> 1) outer_lx_len fields are introduced.
>> In this case, the stack should have knowledge that it is processing
>> tunneled packets to use outer_lx_len rather than lx_len,
>> or stack should always use outer_lx_len packet and move those fields to
>> lx_len packets if no tunneling occurs...
>> I think it will induce extra processing that does not seem to be really
>> needed.
>>
>> 2) inner_lx_len fields are introduced.
>> In this case, the stack first uses lx_len fields. When packet should be
>> tunneled, lx_len fields are moved to inner_lx_len fields.
>> Then the stack can process the outer layer and still use the lx_len fields.
>
> Not sure, that I understood why 2) is better than 1).
> Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
> In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
> Then later, you decided to 'tunnel' that packet.
> So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
> l3_len/l4_len and ol_flags bits set for them remain intact.
> That's with 1)
>
> With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
> And I suppose ol_flags values too:
> ol_flags &= ~PKT_ IP_CKSUM;
> ol_flgas  |=  PKT_INNER_IP_CKSUM
> ?
> And same for L4_CKSUM flags too?

After reading Didier's mail, I start to be convinced that keeping inner
may be better than outer. From a network stack architecture point of
view, 2) looks better:

- the functions in the network stack that write the Ether/IP header
   always deal with l2_len/l3_len, whatever it's a tunnel or not.

- the function that adds the tunnel header moves the l2_len/l3_len and
   the flags to inner_l2_len/inner_l3_len and inner_flags.

Althought it was my first idea, now I cannot find a better argument in
favor of outer_lX_len. The initial argument was that the correspondance
between a flag and a lX_len should always remain the same, but it is
still possible with Didier's approach:
   - PKT_IP_CKSUM uses l2_len and l3_len
   - PKT_INNER_CKSUM uses inner_l2_len and inner_l3_len

Jijiang, I'm sorry to change my mind about this. If you want (and if
Konstantin is also ok with that), I can try to rebase your patches to
match this. Or do you prefer to do it by yourself?

Regards,
Olivier
  
Ananyev, Konstantin Dec. 3, 2014, 11:11 a.m. UTC | #4
Hi Oliver,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, December 03, 2014 8:57 AM
> To: Ananyev, Konstantin; didier.pallard; Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> 
> Hi Didier, Konstantin, Jijiang,
> 
> On 12/02/2014 04:36 PM, Ananyev, Konstantin wrote:
> > Hi Didier
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard
> >> Sent: Tuesday, December 02, 2014 2:53 PM
> >> To: Liu, Jijiang; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> >>
> >> Hello,
> >>
> >> On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> >>> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward
> engine
> >> and i40e PMD due to  these changes.
> >>>
> >>> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> >> [...]
> >>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>> @@ -276,8 +276,8 @@ struct rte_mbuf {
> >>>    			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >>>
> >>>    			/* fields for TX offloading of tunnels */
> >>> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> >>> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> >>> +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> >>> +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
> >>>
> >>>    			/* uint64_t unused:8; */
> >>>    		};
> >>
> >> Sorry for entering lately this discussion, but i'm not convinced by the
> >> choice of outer_lx_len rather than inner_lx_len for new fields.
> >> I agree with Olivier that new flags should only be related to the use of
> >> new fields, to maintain coherency with oldest implementations.
> >> But from a stack point of view, i think it is better to have lx_len
> >> fields that target the outer layers, and to name new fields inner_lx_len.
> >>
> >> Let's discuss the two possibilities.
> >>
> >> 1) outer_lx_len fields are introduced.
> >> In this case, the stack should have knowledge that it is processing
> >> tunneled packets to use outer_lx_len rather than lx_len,
> >> or stack should always use outer_lx_len packet and move those fields to
> >> lx_len packets if no tunneling occurs...
> >> I think it will induce extra processing that does not seem to be really
> >> needed.
> >>
> >> 2) inner_lx_len fields are introduced.
> >> In this case, the stack first uses lx_len fields. When packet should be
> >> tunneled, lx_len fields are moved to inner_lx_len fields.
> >> Then the stack can process the outer layer and still use the lx_len fields.
> >
> > Not sure, that I understood why 2) is better than 1).
> > Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
> > In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
> > Then later, you decided to 'tunnel' that packet.
> > So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
> > l3_len/l4_len and ol_flags bits set for them remain intact.
> > That's with 1)
> >
> > With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
> > And I suppose ol_flags values too:
> > ol_flags &= ~PKT_ IP_CKSUM;
> > ol_flgas  |=  PKT_INNER_IP_CKSUM
> > ?
> > And same for L4_CKSUM flags too?
> 
> After reading Didier's mail, I start to be convinced that keeping inner
> may be better than outer. From a network stack architecture point of
> view, 2) looks better:
> 
> - the functions in the network stack that write the Ether/IP header
>    always deal with l2_len/l3_len, whatever it's a tunnel or not.
> 
> - the function that adds the tunnel header moves the l2_len/l3_len and
>    the flags to inner_l2_len/inner_l3_len and inner_flags.

Hmm, still don't understand you here.
Why all that you mentioned above suddenly become 'better'?
Here is you original suggestion about introducing 'outer_lx_len':
http://dpdk.org/ml/archives/dev/2014-November/008268.html
As you pointed, it is a clean and straightforward way to extend DPDK HW TX offload API with tunnelling support.
And we agreed with you here.

From other side, 2) approach looks like a mess:
If packet is going to be tunnelled, the upper layer has to:
1.  move lx_len values to inner_lx_len.
2. move PKT_TX_*_CKSUM bit to PKT_TX_INNER_*_CKSUM bits in ol_flags. 
Plus in the mbuf we'll either have to introduce PKT_TX_INNER_(TCP|UDP|SCTP)_CKSUM flags
(otherwise we'll have a weird situation when PKT_TX_IP_CKSUM we'll be for outer IP header, but
PKT_TX_TCP_CKSUM will be for inner).

So, from DPDK perspective, 2) looks like a waste of bits in ol_flags and unnecessary complication.  
At least to me.
You are talking about 'the network stack', but as I know at dpdk.org we don't have any open sourced L3/L4 stack supported. 
From other side - in DPDK we just adding fields into mbuf for TX tunnelling.
So if any of the existing commercial stacks already support tunnelling - they should have their own fields for that in their own packet metadata structure.   
Which means - they somehow have to copy information from their packet structure into mbuf anyway. 
If they don't support tunnelling yet and plan to use mbuf directly (without copying info into their own packet metadata structure),
I suppose they can adopt  the DPDK approach.
So, from my point - let's implement it in a way that makes most sense from DPDK perspective: 1).
Konstantin

> 
> Althought it was my first idea, now I cannot find a better argument in
> favor of outer_lX_len. The initial argument was that the correspondance
> between a flag and a lX_len should always remain the same, but it is
> still possible with Didier's approach:
>    - PKT_IP_CKSUM uses l2_len and l3_len
>    - PKT_INNER_CKSUM uses inner_l2_len and inner_l3_len
> 
> Jijiang, I'm sorry to change my mind about this. If you want (and if
> Konstantin is also ok with that), I can try to rebase your patches to
> match this. Or do you prefer to do it by yourself?
> 
> Regards,
> Olivier
>
  
Olivier Matz Dec. 3, 2014, 11:27 a.m. UTC | #5
Hi Konstantin,

On 12/03/2014 12:11 PM, Ananyev, Konstantin wrote:
>>>> Let's discuss the two possibilities.
>>>>
>>>> 1) outer_lx_len fields are introduced.
>>>> In this case, the stack should have knowledge that it is processing
>>>> tunneled packets to use outer_lx_len rather than lx_len,
>>>> or stack should always use outer_lx_len packet and move those fields to
>>>> lx_len packets if no tunneling occurs...
>>>> I think it will induce extra processing that does not seem to be really
>>>> needed.
>>>>
>>>> 2) inner_lx_len fields are introduced.
>>>> In this case, the stack first uses lx_len fields. When packet should be
>>>> tunneled, lx_len fields are moved to inner_lx_len fields.
>>>> Then the stack can process the outer layer and still use the lx_len fields.
>>>
>>> Not sure, that I understood why 2) is better than 1).
>>> Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
>>> In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
>>> Then later, you decided to 'tunnel' that packet.
>>> So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
>>> l3_len/l4_len and ol_flags bits set for them remain intact.
>>> That's with 1)
>>>
>>> With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
>>> And I suppose ol_flags values too:
>>> ol_flags &= ~PKT_ IP_CKSUM;
>>> ol_flgas  |=  PKT_INNER_IP_CKSUM
>>> ?
>>> And same for L4_CKSUM flags too?
>>
>> After reading Didier's mail, I start to be convinced that keeping inner
>> may be better than outer. From a network stack architecture point of
>> view, 2) looks better:
>>
>> - the functions in the network stack that write the Ether/IP header
>>     always deal with l2_len/l3_len, whatever it's a tunnel or not.
>>
>> - the function that adds the tunnel header moves the l2_len/l3_len and
>>     the flags to inner_l2_len/inner_l3_len and inner_flags.
>
> Hmm, still don't understand you here.
> Why all that you mentioned above suddenly become 'better'?
> Here is you original suggestion about introducing 'outer_lx_len':
> http://dpdk.org/ml/archives/dev/2014-November/008268.html
> As you pointed, it is a clean and straightforward way to extend DPDK HW TX offload API with tunnelling support.
> And we agreed with you here.
>
>  From other side, 2) approach looks like a mess:
> If packet is going to be tunnelled, the upper layer has to:
> 1.  move lx_len values to inner_lx_len.
> 2. move PKT_TX_*_CKSUM bit to PKT_TX_INNER_*_CKSUM bits in ol_flags.
> Plus in the mbuf we'll either have to introduce PKT_TX_INNER_(TCP|UDP|SCTP)_CKSUM flags
> (otherwise we'll have a weird situation when PKT_TX_IP_CKSUM we'll be for outer IP header, but
> PKT_TX_TCP_CKSUM will be for inner).
>
> So, from DPDK perspective, 2) looks like a waste of bits in ol_flags and unnecessary complication.
> At least to me.
> You are talking about 'the network stack', but as I know at dpdk.org we don't have any open sourced L3/L4 stack supported.
>  From other side - in DPDK we just adding fields into mbuf for TX tunnelling.
> So if any of the existing commercial stacks already support tunnelling - they should have their own fields for that in their own packet metadata structure.
> Which means - they somehow have to copy information from their packet structure into mbuf anyway.
> If they don't support tunnelling yet and plan to use mbuf directly (without copying info into their own packet metadata structure),
> I suppose they can adopt  the DPDK approach.
> So, from my point - let's implement it in a way that makes most sense from DPDK perspective: 1).

OK. your argumentation makes sense even though I think DPDK aims to be
a SDK for building network applications or stacks and should ease the
work done in application. But this is something we can handle in the
stack if the API is properly defined in DPDK.

Anyway, we need to find a way to conclude on this :)
And it does not prevent us to think again about it for 2.0, if a
dev_prep_tx() function is introduced.

I still have few comments on the v5 patch from Jijiang but I think we
can converge soon.

Regards,
Olivier
  

Patch

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 9094967..e874ac5 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -189,11 +189,12 @@  process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		} else {
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
 				ol_flags |= PKT_TX_IP_CKSUM;
-			else
+			else {
 				ipv4_hdr->hdr_checksum =
 					rte_ipv4_cksum(ipv4_hdr);
+				ol_flags |= PKT_TX_IPV4;
+			}
 		}
-		ol_flags |= PKT_TX_IPV4;
 	} else if (ethertype == _htons(ETHER_TYPE_IPv6))
 		ol_flags |= PKT_TX_IPV6;
 	else
@@ -262,22 +263,25 @@  process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
 		ipv4_hdr->hdr_checksum = 0;
 
-		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+			ol_flags |= PKT_TX_OUTER_IP_CKSUM;
+		else {
+			ol_flags |= PKT_TX_OUTER_IPV4;
 			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
-	}
+		}
+	} else
+		ol_flags |= PKT_TX_OUTER_IPV6;
 
 	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
 	/* do not recalculate udp cksum if it was 0 */
 	if (udp_hdr->dgram_cksum != 0) {
 		udp_hdr->dgram_cksum = 0;
-		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
-			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
-				udp_hdr->dgram_cksum =
-					rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
-			else
-				udp_hdr->dgram_cksum =
-					rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
-		}
+		if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
+			udp_hdr->dgram_cksum =
+				rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
+		else
+			udp_hdr->dgram_cksum =
+				rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
 	}
 
 	return ol_flags;
@@ -303,8 +307,7 @@  process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
  * TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
  * wether a checksum must be calculated in software or in hardware. The
  * IP, UDP, TCP and SCTP flags always concern the inner layer.  The
- * VxLAN flag concerns the outer IP and UDP layer (if packet is
- * recognized as a vxlan packet).
+ * VxLAN flag concerns the outer IP(if packet is recognized as a vxlan packet).
  */
 static void
 pkt_burst_checksum_forward(struct fwd_stream *fs)
@@ -320,7 +323,7 @@  pkt_burst_checksum_forward(struct fwd_stream *fs)
 	uint16_t i;
 	uint64_t ol_flags;
 	uint16_t testpmd_ol_flags;
-	uint8_t l4_proto;
+	uint8_t l4_proto, l4_tun_len = 0;
 	uint16_t ethertype = 0, outer_ethertype = 0;
 	uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
 	uint16_t outer_l2_len = 0, outer_l3_len = 0;
@@ -360,6 +363,7 @@  pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		ol_flags = 0;
 		tunnel = 0;
+		l4_tun_len = 0;
 		m = pkts_burst[i];
 
 		/* Update the L3/L4 checksum error packet statistics */
@@ -378,14 +382,16 @@  pkt_burst_checksum_forward(struct fwd_stream *fs)
 		if (l4_proto == IPPROTO_UDP) {
 			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
 
+			/* check udp destination port, 4789 is the default
+			 * vxlan port (rfc7348) */
+			if (udp_hdr->dst_port == _htons(4789)) {
+				l4_tun_len = ETHER_VXLAN_HLEN;
+				tunnel = 1;
+
 			/* currently, this flag is set by i40e only if the
 			 * packet is vxlan */
-			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
-					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
-				tunnel = 1;
-			/* else check udp destination port, 4789 is the default
-			 * vxlan port (rfc7348) */
-			else if (udp_hdr->dst_port == _htons(4789))
+			} else if (m->ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
+					PKT_RX_TUNNEL_IPV6_HDR))
 				tunnel = 1;
 
 			if (tunnel == 1) {
@@ -432,10 +438,10 @@  pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		if (tunnel == 1) {
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
-				m->l2_len = outer_l2_len;
-				m->l3_len = outer_l3_len;
-				m->inner_l2_len = l2_len;
-				m->inner_l3_len = l3_len;
+				m->outer_l2_len = outer_l2_len;
+				m->outer_l3_len = outer_l3_len;
+				m->l2_len = l4_tun_len + l2_len;
+				m->l3_len = l3_len;
 			}
 			else {
 				/* if we don't do vxlan cksum in hw,
@@ -503,8 +509,8 @@  pkt_burst_checksum_forward(struct fwd_stream *fs)
 					m->l2_len, m->l3_len, m->l4_len);
 			if ((tunnel == 1) &&
 				(testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM))
-				printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n",
-					m->inner_l2_len, m->inner_l3_len);
+				printf("tx: m->outer_l2_len=%d m->outer_l3_len=%d\n",
+					m->outer_l2_len, m->outer_l3_len);
 			if (tso_segsz != 0)
 				printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
 			printf("tx: flags=");
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 6eb898f..0404261 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -276,8 +276,8 @@  struct rte_mbuf {
 			uint64_t tso_segsz:16; /**< TCP TSO segment size */
 
 			/* fields for TX offloading of tunnels */
-			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
-			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
+			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
 
 			/* uint64_t unused:8; */
 		};
diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 078e973..9f0d1eb 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -462,41 +462,43 @@  i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			uint8_t l2_len,
 			uint16_t l3_len,
-			uint8_t inner_l2_len,
-			uint16_t inner_l3_len,
+			uint8_t outer_l2_len,
+			uint16_t outer_l3_len,
 			uint32_t *cd_tunneling)
 {
 	if (!l2_len) {
 		PMD_DRV_LOG(DEBUG, "L2 length set to 0");
 		return;
 	}
-	*td_offset |= (l2_len >> 1) << I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
 
 	if (!l3_len) {
 		PMD_DRV_LOG(DEBUG, "L3 length set to 0");
 		return;
 	}
 
-	/* VXLAN packet TX checksum offload */
+	/* UDP tunneling packet TX checksum offload */
 	if (unlikely(ol_flags & PKT_TX_UDP_TUNNEL_PKT)) {
-		uint8_t l4tun_len;
 
-		l4tun_len = ETHER_VXLAN_HLEN + inner_l2_len;
+		*td_offset |= (outer_l2_len >> 1)
+				<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
 
-		if (ol_flags & PKT_TX_IPV4_CSUM)
+		if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
 			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
-		else if (ol_flags & PKT_TX_IPV6)
+		else if (ol_flags & PKT_TX_OUTER_IPV4)
+			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
+		else if (ol_flags & PKT_TX_OUTER_IPV6)
 			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
 
 		/* Now set the ctx descriptor fields */
-		*cd_tunneling |= (l3_len >> 2) <<
+		*cd_tunneling |= (outer_l3_len >> 2) <<
 				I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT |
 				I40E_TXD_CTX_UDP_TUNNELING |
-				(l4tun_len >> 1) <<
+				(l2_len >> 1) <<
 				I40E_TXD_CTX_QW0_NATLEN_SHIFT;
 
-		l3_len = inner_l3_len;
-	}
+	} else
+		*td_offset |= (l2_len >> 1)
+			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & PKT_TX_IPV4_CSUM) {
@@ -1190,8 +1192,8 @@  i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint64_t ol_flags;
 	uint8_t l2_len;
 	uint16_t l3_len;
-	uint8_t inner_l2_len;
-	uint16_t inner_l3_len;
+	uint8_t outer_l2_len;
+	uint16_t outer_l3_len;
 	uint16_t nb_used;
 	uint16_t nb_ctx;
 	uint16_t tx_last;
@@ -1219,9 +1221,9 @@  i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		ol_flags = tx_pkt->ol_flags;
 		l2_len = tx_pkt->l2_len;
-		inner_l2_len = tx_pkt->inner_l2_len;
 		l3_len = tx_pkt->l3_len;
-		inner_l3_len = tx_pkt->inner_l3_len;
+		outer_l2_len = tx_pkt->outer_l2_len;
+		outer_l3_len = tx_pkt->outer_l3_len;
 
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = i40e_calc_context_desc(ol_flags);
@@ -1271,8 +1273,8 @@  i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Enable checksum offloading */
 		cd_tunneling_params = 0;
 		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
-						l2_len, l3_len, inner_l2_len,
-						inner_l3_len,
+						l2_len, l3_len, outer_l2_len,
+						outer_l3_len,
 						&cd_tunneling_params);
 
 		if (unlikely(nb_ctx)) {